Algorithm

This commit is contained in:
Anthony Scemama 2024-04-16 14:16:19 +02:00
parent 3986bf4062
commit a83cdf77d1
4 changed files with 78 additions and 28 deletions

View File

@ -1,20 +1,16 @@
# 1
# 2
# 4
# 8
#16
#32
#64
# TZ : AMD EPYC 7402 24-Core Processor
#1
#2
#4
#6
#12
#24
#32 123.718727827072
#48 121.613038063049
# DZ: AMD EPYC
1 266. 0.1015625
2 133.202792882919 0.203125
4 68.3963158130646 0.40625
8 35.3168480396271 0.8125
16 17.4276471138000 1.625
24 11.5433599948883
28 10.3698871135712
32 9.43897294998169 3.25 12.1150507926941
40 8.34011387825012
48 7.40271902084351
56 6.72331714630127
64 6.40971302986145 6.5
# DZ: ARM Q80

View File

@ -260,3 +260,16 @@ i@article{watson_2016,
publisher = {North-Holland},
doi = {10.1016/0009-2614(91)87003-T}
}
@article{garniron_2017,
author = {Garniron, Yann and Scemama, Anthony and Loos, Pierre-Fran{\c{c}}ois and Caffarel, Michel},
title = {{Hybrid stochastic-deterministic calculation of the second-order perturbative contribution of multireference perturbation theory}},
journal = {J. Chem. Phys.},
volume = {147},
number = {3},
year = {2017},
month = jul,
issn = {0021-9606},
publisher = {AIP Publishing},
doi = {10.1063/1.4992127}
}

View File

@ -121,13 +121,20 @@ that were previously computationally prohibitive.
\section{Introduction}
\label{sec:introduction}
Coupled cluster (CC) theory is a powerful quantum mechanical approach widely used in computational chemistry and physics to describe the electronic structure of atoms, molecules, and materials.
Coupled cluster (CC) theory is an accurate quantum mechanical approach widely used in computational chemistry and physics to describe the electronic structure of atoms, molecules, and materials.
It offers a systematic and rigorous framework for accurate predictions of molecular properties and reactions by accounting for electron correlation effects beyond the mean-field approximation.
Among the various variants of the CC method, the Coupled Cluster Singles and Doubles with perturbative Triples method, CCSD(T), stands as the gold standard of quantum chemistry.
CC theory starts with a parametrized wave function, typically referred to as the CC wave function, which is expressed as an exponential series of excitation operators acting on a reference:
\begin{equation}
\ket{\Psi_{\text{CC}}} = e^{\hat{T}} \ket{\Phi}
\end{equation}
where $|\ket{\Phi}$ is the reference determinant, and $\hat{T}$ is the cluster operator representing single, double, triple, and higher excitations from the reference wave function.
Coupled Cluster with Singles and Doubles (CCSD) includes single and double excitations and represents the most commonly used variant of CC theory due to its favorable balance between accuracy and computational cost.
Coupled Cluster with Singles, Doubles, and perturbative Triples (CCSD(T)) incorporates a perturbative correction to the CCSD energy to account for some higher-order correlation effects, and stands as the gold standard of quantum chemistry.
CCSD(T) has demonstrated exceptional accuracy and reliability, making it one of the preferred choices for benchmark calculations and highly accurate predictions.
It has found successful applications in a diverse range of areas, including spectroscopy,\cite{villa_2011,watson_2016,vilarrubias_2020} reaction kinetics,\cite{dontgen_2015,castaneda_2012} and materials design,\cite{zhang_2019} and has played a pivotal role in advancing our understanding of complex chemical phenomena.
In the context of CC theory, perturbative triples represent an important contribution to the accuracy of electronic structure calculations.\cite{stanton_1997}
In the context of CC theory, the perturbative triples correction represents an important contribution to the accuracy of electronic structure calculations.\cite{stanton_1997}
However, the computational cost associated with the calculation of this correction can be prohibitively high, especially for large systems.
The inclusion of the perturbative triples in the CCSD(T) method leads to a computational scaling of $\order{N^7}$, where $N$ is proportional to the number of molecular orbitals.
This scaling can rapidly become impractical, posing significant challenges in terms of computational resources and time requirements.
@ -196,13 +203,48 @@ In the algorithm proposed by Rendell\cite{rendell_1991}, for each given triplet
\subsection{Stochastic formulation}
\subsection{Test code}
\label{subsec:test_code}
% Include the test code here, if applicable.
We propose an algorithm influenced by the semi-stochastic approach originally developed for computing the Epstein-Nesbet second-order perturbation correction to the energy. \cite{garniron_2017}
The perturbative triples correction is expressed as a sum of corrections, each indexed solely by virtual orbitals:
\begin{equation}
E_{(T)} = \sum_{abc} E^{abc} \text{, where }
E^{abc} = \sum_{ijk} E_{ijk}^{abc}.
\end{equation}
Monte Carlo sampling is employed by selecting samples $E_{abc}$.
The principal advantage of this formulation is that the number of triplet combinations $(a,b,c)$, given by $N_v^3$, is sufficiently small to allow for all contributions $E_{abc}$ to be stored in memory.
The first time a triplet $(a,b,c)$ is drawn, its corresponding value $E_{abc}$ is computed and then stored.
Subsequent drawings of the same triplet retrieve the value from memory. We refer to this technique as \emph{memoization}.
Thus, the computational expense of calculating the sample, which scales as $N_\text{o}^3 \times N_\text{v}$, is incurred only once, with all subsequent accesses being computationally trivial.
Consequently, employing a sufficient number of Monte Carlo samples to ensure that each contribution is selected at least once results in a total computational cost that is only negligibly higher than that of an exact computation.
To reduce the variance of the samples, the samples are drawn using the
probability
\begin{equation}
P(a,b,c) = \frac{1}{\mathcal{N}} \frac{1}{\bar{\epsilon}_{ijk} - \epsilon_a - \epsilon_b - \epsilon_c}
\end{equation}
where $\mathcal{N}$ normalizes the sum such that $\sum P(a,b,c) = 1$. Here, $\bar{\epsilon}_{ijk}$ represents the average energy of the occupied orbitals, calculated as follows:
\begin{equation}
\bar{\epsilon}_{ijk} = \frac{3}{N_\text{o}} \sum_{i=1}^{N_\text{o}} \epsilon_i.
\end{equation}
The perturbative contribution is then computed by
\begin{equation}
E_{(T)} = \mathcal{N} \sum_{abc} P(a,b,c) \, E^{abc} \,
(\bar{\epsilon}_{ijk} - \epsilon_a - \epsilon_b - \epsilon_c).
\end{equation}
This approach effectively reduces the statistical error bars by approximately a factor of two for the same computational expense due to two primary reasons: i) the estimator exhibits reduced fluctuations, ii) some triplet combinations are more likely to be selected than others, enhancing the efficiency of memoization.
We employ the inverse transform sampling technique to select samples, where an array of pairs $\qty(P(a,b,c), (a,b,c))$ is stored.
To further reduce the variance of the samples, this array is sorted in descending order based on $P(a,b,c)$ and subsequently partitioned into buckets, $B$.
Each bucket is designed such that the sum $\sum_{(a,b,c) \in B} P(a,b,c)$ within it is as uniform
as possible across all buckets.
As each bucket is equiprobable, samples are defined as combinations of triplets, with one triplet drawn from each bucket.
Should the values of $E_{abc}$ be skewed, this advanced refinement significantly diminishes the variance.
%=================================================================%
\section{Implementation Details}
\subsection{Implementation Details}
\label{sec:implementation}
The algorithm presented in Algorithm~\cite{alg:stoch} was implemented in the \textsc{Quantum Package} software.
@ -410,7 +452,7 @@ These multiplications exhibit an arithmetic intensity of
I = \frac{2\, {N_\text{o}}^3\, N_\text{v}}{8\, \qty({N_\text{o}}^3 + {N_\text{o}}^2 N_\text{v} + {N_\text{o}} N_\text{v})}
\end{equation}
which can be approximated by $N_\text{o} / 4$ flops/byte as an upper bound, which is usually relatively low.
For instance, in the case of benzene with a triple-zeta basis set, the arithmetic intensity is calculated to be 3.52 flops/byte, falling short of the threshold required to attain peak performance on any of the CPUs.
For instance, in the case of benzene with a triple-zeta basis set, the arithmetic intensity is calculated to be 3.33 flops/byte, falling short of the threshold required to attain peak performance on any of the CPUs.
By leveraging memory bandwidth and double precision throughput peak, we determined the critical arithmetic intensity necessary to achieve peak performance. On the Xeon and ARM CPUs, this critical value stands at approximately 8.4 and 8.8 flops/byte, respectively. Meanwhile, the EPYC CPU exhibits a value of 6.5 flops/byte, thanks to its superior memory bandwidth.

View File

@ -59,7 +59,7 @@ fetched from memory. In this way, the $N_o^3 \times N_v$ cost of
computing the sample is paid only once, and all other evaluations are
negligible.
Hence, using an number of Monte Carlo samples large enough such that each
Hence, using a number of Monte Carlo samples large enough such that each
contribution has been drawn at least once will have a computational
cost larger only by a negligible amount than the cost of making the
exact computation.
@ -81,10 +81,9 @@ contributions of the occupied orbitals:
The perturbative contribution is computed as
\[
\[
E_{(T)} = \mathcal{N} \sum_{abc} P(a,b,c) \, E_{abc} \,
(\epsilon_{\text{occ}} - \epsilon_a - \epsilon_b - \epsilon_c) \,
\]
\]
This modification reduces the statistical error bars by a factor of two
for the same computational cost for two reasons: