Bayesian inference in quantum chemistry

| 3 min read

Bayesian Inference in Quantum Chemistry

As an example, methanol, with six atoms, has 6 ×\times 3 = 18 input variables, one for each Cartesian coordinate. Each xjR18x_j \in \mathbf{R}^{18} single input through Q-Chem and receive the response variable V(xj)V(x_j). We can also probe the chemistry using more specific inputs, such as a C-O stretch or rotation, which will have its own set of inputs and responses. Visualize spatially:

For this minimal example, we construct a table of these three data entries:
CxC_xCyC_yCzC_zHx1H^1_xHy1H^1_yHz1H^1_zHx2H^2_xHy2H^2_yHz2H^2_zHx3H^3_xHy3H^3_yHz3H^3_zOxO_xOyO_yOzO_zHx4H^4_xHy4H^4_yHz4H^4_zVV
16.23644.558828.6098417.24464.355149.0014915.58044.895139.4276515.83923.632178.1846916.26485.51487.560216.71076.325437.83257-115.631
16.23374.464589.1049717.24184.260899.1049715.57764.800889.5311415.83643.537938.2881716.26755.609057.7290916.71356.419687.83257-115.606
16.23654.558828.6098417.24464.355149.0014915.58044.895139.4276515.83923.632178.1846916.26485.51487.560215.59175.328976.89489-115.629

We could try every possible input xx and try to compute the expected values that way… but that is way too expensive for this to be an option.

One technique we can use is dimensionality reduction so that we need fewer samples to obtain a result with high confidence. The most obvious way to do this is to do a coordinate transformation, which can be thought of as feature engineering the input data into a more informative form. For example, each datapoint can be referenced to the center of mass and aligned to orthogonal axes that minimize the Euclidean distance to each point. Additionally, relationships between columns in the dataset with each other can be found using trigonometry, so that interatomic distances and angles can be calculated and used as the relevant inputs. I won’t go into all the complicated linear algebra involved in this non-linear transformation. Sufficeth to say that after all is said and done, we can reduce the dimensionality by 6-D using these methods alone, and we’re left with more informative input data xx' such as interatomic distances and dihedral angles of internal rotation, and bond angles.

For larger datasets, a 6-D dimension reduction isn’t enough, and we’re left with a high dimensional problem where solving the PDE using matrix methods exceeds memory limitations of modern computers. One method to further reduce the dimensionality is to model each variable within xx' as statistically independent. In statistical terms, the probability distribution of the full-dimensional dataset can be represented as a product of each variable’s marginal distribution.

Prior probabilities on intramolecular motion

P(x=X1,X2,,XN)=P(X1)P(X2)P(XN)P(x'=X_1, X_2, \dots, X_N) \,=\, P(X_1)\, P(X_2)\,\dots\, P(X_N)

This implies that V(x)V(x') can be modeled through a linear response model, so that each component can be evaluated independently:

V(x)=v1(X1)+v2(X2)++vN(XN)V(x') \,=\, v_1({X}_1) \,+\, v_2({X}_2) \,+\, \dots \,+\, v_N(X_N)

Then we only need to solve 1-D functions of each component using basis function regression techniques, and independently solve each 1-D Schrödinger equation using eigendecomposition methods.

122Φi(Xk)+vk(Xk)=EiΦi(Xk)-\frac{1}{2} \nabla^2 \Phi_i (X_k) + v_k (X_k) \,=\, E_i \Phi_i(X_k)

We can compute each marginal probability density function P(Xk)P(X_k) using the squared probability amplitude, which is the pdf for XkX_k conditioned on E\mathscr{E}:

P(Xk=xE=Ei)=Φi(x)Φi(x)P(X_k=x \,|\,\mathscr{E}=E_i) \,=\, \left| \Phi_i^*({x})\Phi_i({x}) \right|

P(Xk)=iP(XkE=Ei)P(E=Ei)P(X_k) \,=\,\sum_i \, P(X_k \,|\,\mathscr{E}=E_i) \, P(\mathscr{E}=E_i)

The motivation for doing this is that we can now be much more intelligent about how we construct our data table. Since we only want to expend computational power for a feasible number of data, we can directly sample the expected distribution of XX to increase our confidence that the expected values computed from the data are representative of the “population” means.