Small introduction

In the field of computational quantum chemistry, scientists model molecules using foundational physical models like density functional theory. These methods are electronic, which according to our good pal of recent popularity Oppenheimer, is only half the story. Nuclear contributions to the molecular wavefunction include vibrations of the atoms within the molecule. The solution to foundationally accurate molecular simulations is as simple as computing the electronic energies of all possible atomic configurations, and assessing the likelihood of finding the molecule in each configuration. The problem is that there’s no computationally tractable way to visit every single configuration and run a computation. If this sounds like the setup to a Bayesian problem, you’re paying attention.

Take a look at methanol, six atoms, and 6 $\times$ 3 = 18 input variables, one for each Cartesian coordinate. Each configuration vector $x_j \in \mathbf{R}^{18}$ is a single input to the a ab initio predictor, $V(x_j)$ . $x$ is realistically constrained to certain spatial values – molecules stay together after all. These can be represented as stretches, bends, and rotations within the molecule, like this:

Represented numerically, we have a table of three data entries:

$C_x$	$C_y$	$C_z$	$H^1_x$	$H^1_y$	$H^1_z$	$H^2_x$	$H^2_y$	$H^2_z$	$H^3_x$	$H^3_y$	$H^3_z$	$O_x$	$O_y$	$O_z$	$H^4_x$	$H^4_y$	$H^4_z$	$V$
16.2364	4.55882	8.60984	17.2446	4.35514	9.00149	15.5804	4.89513	9.42765	15.8392	3.63217	8.18469	16.2648	5.5148	7.5602	16.7107	6.32543	7.83257	-115.631
16.2337	4.46458	9.10497	17.2418	4.26089	9.10497	15.5776	4.80088	9.53114	15.8364	3.53793	8.28817	16.2675	5.60905	7.72909	16.7135	6.41968	7.83257	-115.606
16.2365	4.55882	8.60984	17.2446	4.35514	9.00149	15.5804	4.89513	9.42765	15.8392	3.63217	8.18469	16.2648	5.5148	7.5602	15.5917	5.32897	6.89489	-115.629

Even with constraints on the internal “relative” positions of the atoms w.r.t. each other (i.e. feature engineering a.k.a. coordinate transformation a.k.a. internal coordinates), visiting every possible input $x$ and computing the energy is prohibitively expensive, even when parallelized.

Don’t forget, in silico quantum mechanics simulation requires constructing and diagonalizing a matrix in a finite basis. Even for small molecules like methane, this is severely limited by memory and computation cost.

Further complicating things, classical simulations such as molecular dynamics (glorified Newtonian mechanics) are not actually foundational because molecules are quantum mechanical. The consequences are an inaccurate model (validated on data from laboratory measurements/observations).

Many assumptions simplify the model at the cost of some degree of technical accuracy. One assumption is non-interacting motions, meaning mathematically a molecule’s potential energy $V$ is linearly separable.

V(\mathbb{x}) \,=\, \sum_{i=1}^{3N} V_i(x_i)

Then one needs to train just $3N$ models, one for each axis, and add their result.

The choice of model for $V_i$ is critical. A good place to start is the margins. Collect data for each motion independently, fit the appropriate model, and diagonalize its Hamiltonian matrix (how-to is detailed in previous posts). The matrices are now much smaller under this approximation. This is tractable.

\begin{aligned} \hat{H} &= -\frac{1}{2} \sum_{i=1}^{3N} \nabla_i^2 + \sum_{i=1}^{3N} V_i \\ &= \sum_{i=1}^{3N} \hat{H}_i \\ \end{aligned}

\hat{H}_i \ket{n_i} = E_{i,n_i} \ket{n_i}

The a priori probability under this approximation is therefore separable. The normalization constant is called the partition function in chemistry.

\begin{aligned} Z_{\text{approx}}^{qu} &= \sum_{n_1} \sum_{n_{3N}} e^{-\beta (E_{1,n_1} + \cdots + E_{3N,n_{3N}})} \\ &= \prod_{i=1}^{3N} \left(\sum_{n_i} e^{-\beta E_{i,n_i}} \right) \\ &= \prod_{i=1}^{3N} Z_i^{qu} \end{aligned}

\begin{aligned} P(n_1, n_2, \dots, n_{3N}) &\approx \frac{e^{-\beta (E_{1,n_1} + E_{2,n_2} + \dots + E_{3N,n_{3N}}})}{Z_{\text{approx}}^{qu}} \\ &= \prod_{i=1}^{3N} \frac{e^{-\beta E_{i,n_i}}}{Z_i^{qu}} \\ &= \prod_{i=1}^{3N} P(n_i) \end{aligned}

Probabilistically, a hypothetical molecule with only three motions in the quantum state $\ket{0, 1, 0}$ is the product of three probabilities.

P(0, 1, 0) = \left( \frac{e^{-\beta E_{1,0}}}{Z_1^{qu}} \right)\left( \frac{e^{-\beta E_{2,1}}}{Z_2^{qu}} \right)\left( \frac{-\beta e^{E_{3,0}}}{Z_3^{qu}} \right)

But we can improve. Although we cannot compute the joint probability directly, we can compute the a posteriori probability by updating our priors with new data from the joint distribution.

So for each motion in $x$ and quantum number $n$ pair

P(n \vert x) = P(x \vert n) P(n)

P(x \vert n) = \int \mathrm{d}{x} \bra{x}{\hat{\alpha}_n}\ket{x}

\hat{\alpha} = \alpha(x)

The prior on $n$ is what we just defined, so what’s left to obtain the posterior distribution is the likelihood of a given outcome ( $n$ ). This is easy, considering all energies are distributed in the canonical ensemble with an exponential likelihood. Therefore, the likelihood of an observation $\mathscr{E}=E$ is

p(\mathscr{E}=E \vert n) = e^{-\beta (E-E_n)}

P(E) = \int \cdots \int P(E \vert x_1, \dots, x_N) P(x_1, \dots, x_N \vert n_1, \dots, n_N) P(n_1, \dots, n_N)

Lance A. Bettinson

Bayesian inference in quantum chemistry

Small introduction