Coupled cluster (CC) theory is a powerful quantum mechanical approach widely used in computational chemistry and physics to describe the electronic structure of atoms, molecules, and materials.
It offers a systematic and rigorous framework for accurate predictions of molecular properties and reactions by accounting for electron correlation effects beyond the mean-field approximation.
Among the various variants of the CC method, the CCSD(T) method stands as the gold standard of quantum chemistry.
CCSD(T), which incorporates singles, doubles, and a perturbative treatment of triples, has demonstrated exceptional accuracy and reliability, making it one of the preferred choices for benchmark calculations and highly accurate predictions.
The CCSD(T) method has found successful applications in a diverse range of areas, including spectroscopy,\cite{villa_2011,watson_2016,vilarrubias_2020} reaction kinetics,\cite{dontgen_2015,castaneda_2012} and materials design,\cite{zhang_2019} and has played a pivotal role in advancing our understanding of complex chemical phenomena.
In the context of CC theory, perturbative triples represent an important contribution to the accuracy of electronic structure calculations.\cite{stanton_1997}
However, the computational cost associated with their calculation can be prohibitively high, especially for large molecular systems.
The CCSD(T) method, which includes the perturbative treatment of triples, is known to have a computational scaling of $\order{N^7}$, where $N$ represents the system size.
This scaling can rapidly become impractical, posing significant challenges in terms of computational resources and time requirements.
To address this computational bottleneck, our goal is to develop a novel semi-stochastic algorithm that brings back the computational time to a level comparable to that of the CCSD method, which has a scaling of $\order{N^6}$, while ensuring well-controlled approximations.
By incorporating stochastic sampling techniques, our approach provides an alternative avenue for approximating perturbative triples, relieving the computational burden inherent in traditional deterministic methods. This not only reduces the computational time to a more favorable level but also preserves the parallelism capabilities of CC calculations, ensuring efficient utilization of computational resources.
In the following sections of this paper, we will provide a brief introduction to the computation of perturbative triples in coupled cluster theory. We will explain the principles of our semi-stochastic algorithm, outlining its key features and advantages. Additionally, we will present implementation details, discussing the technical aspects and considerations involved in the algorithm's practical realization. To demonstrate the effectiveness and applicability of our approach, we finally present illustrative examples that showcase the algorithm's performance and compare it with the conventional algorithm.
\caption{\label{fig:benzene} Convergence of the energy of benzene as a function of the execution time of the program. The top curve corresponds to the cc-pVTZ basis set and the bottom curve to cc-pVQZ. The blue line represents the exact CCSD(T) energy.}
\caption{\label{fig:benzene_err} Convergence of the statistical error of the perturbative triples contribution in benzene as a function of the percentage of computed contributions, in the cc-pVTZ and cc-pVQZ basis sets.}
\end{figure}
Figure~\ref{fig:benzene} shows the convergence of the CCSD(T) energy as a function of the execution time of the program with both basis sets. We observe that the exact CCSD(T) energy is alwasy within $2\sigma$, showing that the statistical error is reliable.
Figure~\ref{fig:benzene_err} displays the statistical error as a function of the percentage of computed contributions.
In both figures, the discontinuities in the curves are due to changes in the splitting between deterministic and stochastic components, leading to a change in the estimated value, and a reduction of the statistical error.
For the both basis sets, the chemical accuracy (\SI{1.6}{\milli\hartree}) is reached using less than 1\% of the contributions.
A precision of (\SI{0.1}{\milli\hartree}) is obtained using respectively 32\% and 15\% for the cc-pVTZ and cc-pVQZ basis sets.
The faster convergence for the largest basis set was expected: using a larger basis set increases the number of tiny contributions, and keeps the number of large contributions rather constant. Hence, this curve shows that the proposed algorithm is better adapted to fewer electrons in large basis sets than many electrons in small basis sets.
Our methodology proves especially advantageous for scenarios requiring the aggregation of numerous CCSD(T) energies, such as neural network training or the exploration of potential energy surfaces.
In this section, we discuss the application of our novel algorithm within the context of computing vibrational frequencies, specifically through the example of copper chloride (\ce{CuCl}).
A demonstrative application presented here involves the determination of the equilibrium bond length and the computation of the vibrational frequency of \ce{CuCl} using the CCSD(T)/cc-pVQZ level of theory.
The procedure involves determining the CCSD(T) potential energy curve for \ce{CuCl}, followed by its analytical representation through a Morse potential fitting:
where $E(r)$ represents the energy at a bond length $r$, $D_e$ the depth of the potential well, $r_e$ the equilibrium bond length, $a$ the parameter defining the potential well's width, and $E_0$ the energy at the equilibrium bond length. The vibrational frequency, $\nu$, is derived as follows:
with $\mu$ denoting the reduced mass of the \ce{CuCl} molecule, and $c$ the speed of light.
\begin{figure}
\includegraphics[width=\columnwidth]{cucl.pdf}
\caption{\label{fig:cucl} CCSD(T) energies of CuCl obtained with the exact CCSD(T) algorithm (dots), the stochastic algorithm using only 1\% of the contributions (error bars), and the Morse potential fitting the points obtained with the stochastic algorithm.}
\end{figure}
The initial step involved the precise calculation of the CCSD(T) energy across various points along the potential curve.
We froze the six lowest molecular orbitals, specifically the $1s$ orbital of \ce{Cl} and the $1s$, $2s$, and $2p$ orbitals of \ce{Cu}, and correlated 34 electrons within 157 molecular orbitals.
The fitted Morse potential revealed a vibrational frequency of $\nu=\SI{414.7}{\per\centi\meter}$ and an equilibrium bond length of $r_e =\SI{3.92}{\bohr}$, aligning remarkably well with experimental values $\nu=\SI{414}{\per\centi\meter}$ and $r_e =\SI{3.88}{\bohr}$.
Subsequently, we applied our semi-stochastic algorithm to estimate the perturbative triples correction, utilizing merely 1\% of the total contributions.
This approach yielded a hundredfold acceleration in computational efficiency, achieving statistical uncertainty within the range of \SI{1.3} to \SI{2.5}{\milli\hartree}.
The vibrational frequency and equilibrium distance estimated using this data, $\nu=\SI{415.0}{\per\centi\meter}$ and $r_e =\SI{3.92}{\bohr}$, demonstrated comparable precision to the full computational results.
Figure \ref{fig:cucl} illustrates the potential energy surface of \ce{CuCl}, displaying both the exact CCSD(T) energies and those estimated via the semi-stochastic method.