Bayesian inference in quantum chemistry

| 3 min read

Small introduction

In the field of computational quantum chemistry, scientists model molecules using foundational physical models like density functional theory. These methods are electronic, which according to our good pal of recent popularity Oppenheimer, is only half the story. Nuclear contributions to the molecular wavefunction include vibrations of the atoms within the molecule. The solution to foundationally accurate molecular simulations is as simple as computing the electronic energies of all possible atomic configurations, and assessing the likelihood of finding the molecule in each configuration. The problem is that there’s no computationally tractable way to visit every single configuration and run a computation. If this sounds like the setup to a Bayesian problem, you’re paying attention.

Take a look at methanol, six atoms, and 6 ×\times 3 = 18 input variables, one for each Cartesian coordinate. Each configuration vector xjR18x_j \in \mathbf{R}^{18} is a single input to the a ab initio predictor, V(xj)V(x_j). xx is realistically constrained to certain spatial values – molecules stay together after all. These can be represented as stretches, bends, and rotations within the molecule, like this:

Represented numerically, we have a table of three data entries:
CxC_xCyC_yCzC_zHx1H^1_xHy1H^1_yHz1H^1_zHx2H^2_xHy2H^2_yHz2H^2_zHx3H^3_xHy3H^3_yHz3H^3_zOxO_xOyO_yOzO_zHx4H^4_xHy4H^4_yHz4H^4_zVV
16.23644.558828.6098417.24464.355149.0014915.58044.895139.4276515.83923.632178.1846916.26485.51487.560216.71076.325437.83257-115.631
16.23374.464589.1049717.24184.260899.1049715.57764.800889.5311415.83643.537938.2881716.26755.609057.7290916.71356.419687.83257-115.606
16.23654.558828.6098417.24464.355149.0014915.58044.895139.4276515.83923.632178.1846916.26485.51487.560215.59175.328976.89489-115.629

Even with constraints on the internal “relative” positions of the atoms w.r.t. each other (i.e. feature engineering a.k.a. coordinate transformation a.k.a. internal coordinates), visiting every possible input xx and computing the energy is prohibitively expensive, even when parallelized.

Don’t forget, in silico quantum mechanics simulation requires constructing and diagonalizing a matrix in a finite basis. Even for small molecules like methane, this is severely limited by memory and computation cost.

Further complicating things, classical simulations such as molecular dynamics (glorified Newtonian mechanics) are not actually foundational because molecules are quantum mechanical. The consequences are an inaccurate model (validated on data from laboratory measurements/observations).

Many assumptions simplify the model at the cost of some degree of technical accuracy. One assumption is non-interacting motions, meaning mathematically a molecule’s potential energy VV is linearly separable.

V(x)=i=13NVi(xi)V(\mathbb{x}) \,=\, \sum_{i=1}^{3N} V_i(x_i)

Then one needs to train just 3N3N models, one for each axis, and add their result.

The choice of model for ViV_i is critical. A good place to start is the margins. Collect data for each motion independently, fit the appropriate model, and diagonalize its Hamiltonian matrix (how-to is detailed in previous posts). The matrices are now much smaller under this approximation. This is tractable.

H^=12i=13Ni2+i=13NVi=i=13NH^i\begin{aligned} \hat{H} &= -\frac{1}{2} \sum_{i=1}^{3N} \nabla_i^2 + \sum_{i=1}^{3N} V_i \\ &= \sum_{i=1}^{3N} \hat{H}_i \\ \end{aligned}

H^ini=Ei,nini\hat{H}_i \ket{n_i} = E_{i,n_i} \ket{n_i}

The a priori probability under this approximation is therefore separable. The normalization constant is called the partition function in chemistry.

Zapproxqu=n1n3Neβ(E1,n1++E3N,n3N)=i=13N(nieβEi,ni)=i=13NZiqu\begin{aligned} Z_{\text{approx}}^{qu} &= \sum_{n_1} \sum_{n_{3N}} e^{-\beta (E_{1,n_1} + \cdots + E_{3N,n_{3N}})} \\ &= \prod_{i=1}^{3N} \left(\sum_{n_i} e^{-\beta E_{i,n_i}} \right) \\ &= \prod_{i=1}^{3N} Z_i^{qu} \end{aligned}

P(n1,n2,,n3N)eβ(E1,n1+E2,n2++E3N,n3N)Zapproxqu=i=13NeβEi,niZiqu=i=13NP(ni)\begin{aligned} P(n_1, n_2, \dots, n_{3N}) &\approx \frac{e^{-\beta (E_{1,n_1} + E_{2,n_2} + \dots + E_{3N,n_{3N}}})}{Z_{\text{approx}}^{qu}} \\ &= \prod_{i=1}^{3N} \frac{e^{-\beta E_{i,n_i}}}{Z_i^{qu}} \\ &= \prod_{i=1}^{3N} P(n_i) \end{aligned}

Probabilistically, a hypothetical molecule with only three motions in the quantum state 0,1,0\ket{0, 1, 0} is the product of three probabilities.

P(0,1,0)=(eβE1,0Z1qu)(eβE2,1Z2qu)(βeE3,0Z3qu)P(0, 1, 0) = \left( \frac{e^{-\beta E_{1,0}}}{Z_1^{qu}} \right)\left( \frac{e^{-\beta E_{2,1}}}{Z_2^{qu}} \right)\left( \frac{-\beta e^{E_{3,0}}}{Z_3^{qu}} \right)

But we can improve. Although we cannot compute the joint probability directly, we can compute the a posteriori probability by updating our priors with new data from the joint distribution.

So for each motion in xx and quantum number nn pair

P(nx)=P(xn)P(n)P(n \vert x) = P(x \vert n) P(n)

P(xn)=dxxα^nxP(x \vert n) = \int \mathrm{d}{x} \bra{x}{\hat{\alpha}_n}\ket{x}

α^=α(x)\hat{\alpha} = \alpha(x)

The prior on nn is what we just defined, so what’s left to obtain the posterior distribution is the likelihood of a given outcome (nn). This is easy, considering all energies are distributed in the canonical ensemble with an exponential likelihood. Therefore, the likelihood of an observation E=E\mathscr{E}=E is

p(E=En)=eβ(EEn)p(\mathscr{E}=E \vert n) = e^{-\beta (E-E_n)}

P(E)=P(Ex1,,xN)P(x1,,xNn1,,nN)P(n1,,nN)P(E) = \int \cdots \int P(E \vert x_1, \dots, x_N) P(x_1, \dots, x_N \vert n_1, \dots, n_N) P(n_1, \dots, n_N)