Following the recent work of Eriksen \textit{et al.}~[\href{https://arxiv.org/abs/2008.02678}{arXiv:2008.02678 [physics.chem-ph]}], we report the performance of the \textit{Configuration Interaction using a Perturbative Selection made Iteratively} (CIPSI) method on the non-relativistic frozen-core correlation energy of the benzene molecule in the cc-pVDZ basis. Following our usual protocol, we obtain a correlation energy of $-863.4(5)$ m$E_h$ which agrees with the theoretical estimate of $-863$ m$E_h$ proposed by Eriksen \textit{et al.}~using an extensive array of highly-accurate new electronic structure methods.
Although sometimes decried, one cannot deny the usefulness of benchmark sets and their corresponding reference data for the electronic structure community.
These are indeed essential for the validation of existing theoretical models and to bring to light and subsequently understand their strengths and, more importantly, their weaknesses.
In that regard, the previous benchmark datasets provided by the \textit{Simons Collaboration on the Many-Electron Problem} have been extremely valuable. \cite{Leblanc_2015,Motta_2017,Williams_2020}
The same comment applies to the excited-state benchmark set of Thiel and coworkers. \cite{Sauer_2009,Schreiber_2008,Silva-Junior_2010a,Silva-Junior_2010b,Silva-Junior_2010c}
Following a similar goal, we have recently proposed a large set of highly-accurate vertical transition energies for various types of excited states thanks to the renaissance of selected configuration interaction (SCI) methods \cite{Bender_1969,Huron_1973,Buenker_1974} which can now routinely produce near full configuration interaction (FCI) quality excitation energies for small- and medium-sized organic molecules. \cite{Loos_2018a,Loos_2019,Loos_2020a,Loos_2020b,Loos_2020c}
In a recent preprint, \cite{Eriksen_2020} Eriksen \textit{et al.}~have proposed a blind test for a particular electronic structure problem inviting several groups around the world to contribute to this endeavour.
In addition to coupled cluster theory with singles, doubles, triples, and quadruples (CCSDTQ), \cite{Oliphant_1991,Kucharski_1992} a large panel of highly-accurate, emerging electronic structure methods were considered:
(ii) three SCI methods including a second-order perturbative correction (ASCI, \cite{Tubman_2016,Tubman_2018,Tubman_2020} iCI, \cite{Liu_2016} and SHCI \cite{Holmes_2016,Holmes_2017,Sharma_2017}),
(iii) a selected coupled-cluster theory method which also includes a second-order perturbative correction (FCCR), \cite{Xu_2018}
(iv) the density-matrix renornalization group approach (DMRG), \cite{White_1992} and
(v) two flavors of FCI quantum Monte Carlo (FCIQMC), \cite{Booth_2009,Cleland_2010} namely AS-FCIQMC \cite{Ghanem_2019} and CAD-FCIQMC. \cite{Deustua_2018}
We refer the interested reader to Ref.~\onlinecite{Eriksen_2020} and its supporting information for additional details on each method and the complete list of references.
Soon after, Lee \textit{et al.}~reported phaseless auxiliary-field quantum Monte Carlo \cite{Motta_2018} (ph-AFQMC) correlation energies for the very same problem. \cite{Lee_2020}
The geometry of benzene has been computed at the MP2/6-31G* level and it can be found in the supporting information of Ref.~\onlinecite{Eriksen_2020} alongside its nuclear repulsion and Hartree-Fock energies.
This corresponds to an active space of 30 electrons and 108 orbitals, \ie, the Hilbert space of benzene is of the order of $10^{35}$ Slater determinants.
The correlation energies reported in Ref.~\onlinecite{Eriksen_2020} are gathered in Table \ref{tab:energy} alongside the best ph-AFQMC estimate from Ref.~\onlinecite{Lee_2020} based on a CAS(6,6) trial wave function.
The outcome of this work is nicely summarized in the abstract of Ref.~\onlinecite{Eriksen_2020}:
\textit{``In our assessment, the evaluated high-level methods are all found to qualitatively agree on a final correlation energy, with most methods yielding an estimate of the FCI value around $-863$ m$E_h$. However, we find the root-mean-square deviation of the energies from the studied methods to be considerable ($1.3$ m$E_h$), which in light of the acclaimed performance of each of the methods for smaller molecular systems clearly displays the challenges faced in extending reliable, near-exact correlation methods to larger systems.''}
For the sake of completeness and our very own curiosity, we report in this Note the frozen-core correlation energy obtained with a fourth flavor of SCI known as \textit{Configuration Interaction using a Perturbative Selection made Iteratively} (CIPSI), \cite{Huron_1973} which also includes a second-order perturbative (PT2) correction.
The idea behind such methods is to avoid the exponential increase of the size of the CI expansion by retaining the most energetically relevant determinants only, thanks to the use of a second-order energetic criterion to select perturbatively determinants in the FCI space.
However, performing SCI calculations rapidly becomes extremely tedious when one increases the system size as one hits the exponential wall inherently linked to these methods.
Recently, the determinant-driven CIPSI algorithm has been efficiently implemented \cite{Giner_2013,Giner_2015} in the open-source programming environment {\QP} by our group enabling to perform massively parallel computations. \cite{Garniron_2017,Garniron_2018,Garniron_2019}
In particular, we were able to compute highly-accurate calculations of ground- and excited-state energies for small- and medium-sized molecules (including benzene). \cite{Loos_2018a,Loos_2019,Loos_2020a,Loos_2020b,Loos_2020c}
CIPSI is also frequently used to provide accurate trial wave function for QMC calculations. \cite{Caffarel_2014,Caffarel_2016a,Caffarel_2016b,Giner_2013,Giner_2015,Scemama_2015,Scemama_2016,Scemama_2018,Scemama_2018b,Scemama_2019,Dash_2018,Dash_2019}
The particularity of the current implementation is that the selection step and the PT2 correction are computed \textit{simultaneously} via a hybrid semistochastic algorithm \cite{Garniron_2017,Garniron_2019} (which explains the statistical error associated with the PT2 correction in the following).
Moreover, a renormalized version of the PT2 correction (dubbed rPT2 below) has been recently implemented for a more efficient extrapolation to the FCI limit. \cite{Garniron_2019}
We refer the interested reader to Ref.~\onlinecite{Garniron_2019} where one can find all the details regarding the implementation of the CIPSI algorithm.
% Computational details
Being late to the party, we obviously cannot report blindly our CIPSI results.
However, following the philosophy of Eriksen \textit{et al.}\cite{Eriksen_2020} and Lee \textit{et al.}, \cite{Lee_2020} we will report our results with the most neutral tone, leaving the freedom to the reader to make up his/her mind.
We then follow our usual ``protocol'' \cite{Scemama_2018,Scemama_2018b,Scemama_2019,Loos_2018a,Loos_2019,Loos_2020a,Loos_2020b,Loos_2020c} by performing a preliminary SCI calculation using Hartree-Fock orbitals in order to generate a SCI wave function with at least $10^7$ determinants.
The total SCI energy is defined as the sum of the variational energy $E_\text{var.}$ (computed via diagonalization of the CI matrix in the reference space) and a second-order perturbative correction $E_\text{(r)PT2}$ which takes into account the external determinants, \ie, the determinants which do not belong to the variational space but are linked to the reference space via a nonzero matrix element. The magnitude of $E_\text{(r)PT2}$ provides a qualitative idea of the ``distance'' to the FCI limit.
We then linearly extrapolate the total SCI energy to $E_\text{(r)PT2}=0$ (which effectively corresponds to the FCI limit).
Note that, unlike excited-state calculations where it is important to enforce that the wave functions are eigenfunctions of the $\Hat{S}^2$ spin operator, \cite{Applencourt_2018} the present wave functions do not fulfil this property as we aim for the lowest possible energy of a single state. We have found that $\expval*{\Hat{S}^2}$ is, nonetheless, very close to zero ($\sim5\times10^{-3}$ a.u.).
Starting from the same natural orbitals, a Boys-Foster localization procedure \cite{Boys_1960} was performed in several orbital windows: i) core, ii) valence $\sigma$, iii) valence $\pi$, iv) valence $\pi^*$, v) valence $\sigma^*$, vi) the higher-lying $\sigma$ orbitals, and vii) the higher-lying $\pi$ orbitals.
\footnote{MO indices for Boys-Foster localization procedure:
Like Pipek-Mezey, \cite{Pipek_1989} this choice of orbital windows allows to preserve a strict $\sigma$-$\pi$ separation in planar systems like benzene.
As one can see from the energies of Table \ref{tab:NOvsLO}, for a given value of $N_\text{det}$, the variational energy as well as the PT2-corrected energies are much lower with localized orbitals than with natural orbitals. We, therefore, consider these energies more trustworthy, and we will base our best estimate of the correlation energy of benzene on these calculations.
The convergence of the CIPSI correlation energy using localized orbitals is illustrated in Fig.~\ref{fig:CIPSI}, where one can see the behavior of the correlation energy, $\Delta E_\text{var.}$ and $\Delta E_\text{var.}+ E_\text{(r)PT2}$, as a function of $N_\text{det}$ (left panel).
The right panel of Fig.~\ref{fig:CIPSI} is more instructive as it shows $\Delta E_\text{var.}$ as a function of $E_\text{(r)PT2}$, and their corresponding four-point linear extrapolation curves that we have used to get our final estimate of the correlation energy.
From this figure, one clearly sees that the rPT2-based correction behaves more linearly than its corresponding PT2 version, and is thus systematically employed in the following.
Our final number are gathered in Table \ref{tab:extrap_dist_table}, where, following the notations of Ref.~\onlinecite{Eriksen_2020}, we report, in addition to the final variational energies $\Delta E_{\text{var.}}$, the
extrapolation distances, $\Delta E_{\text{dist}}$, defined as the difference between the final computed energy, $\Delta E_{\text{final}}$, and the extrapolated energy, $\Delta E_{\text{extrap.}}$ associated with ASCI, iCI, SHCI, DMRS, and CIPSI.
The three flavours of SCI fall into an interval ranging from $-860.0$ m$E_h$ (ASCI) to $-864.2$ m$E_h$ (SHCI), while the other non-SCI methods yield correlation energies ranging from $-863.7$ to $-862.8$ m$E_h$ (see Table \ref{tab:energy}). Our final CIPSI number (obtained with localized orbitals and rPT2 correction via a four-point linear extrapolation) is $-863.4(5)$ m$E_h$, where the error reported in parenthesis represents the fitting error (not the extrapolation error for which it is much harder to provide a theoretically sound estimate).
Convergence of the CIPSI correlation energy of benzene using localized orbitals.
Left: $\Delta E_\text{var.}$, $\Delta E_\text{var.}+ E_\text{PT2}$, and $\Delta E_\text{var.}+ E_\text{rPT2}$ (in m$E_h$) as functions of the number of determinants in the variational space $N_\text{det}$.
Right: $\Delta E_\text{var.}$ (in m$E_h$) as a function of $E_\text{PT2}$ or $E_\text{rPT2}$.
\caption{Variational energy $E_\text{var.}$, second-order perturbative correction $E_\text{PT2}$ and its renormalized version $E_\text{rPT2}$ (in $E_h$) as a function of the number of determinants $N_\text{det}$ for the ground-state of the benzene molecule computed in the cc-pVDZ basis set.
\caption{Extrapolation distances, $\Delta E_{\text{dist}}$, defined as the difference between the final computed energy, $\Delta E_{\text{final}}$, and the extrapolated energy, $\Delta E_{\text{extrap.}}$ associated with ASCI, iCI, SHCI, DMRG, and CIPSI.
The present calculations have been performed on the AMD partition of GENCI's Irene supercomputer.
Each Irene's AMD node is a dual-socket AMD Rome (Epyc) CPU@2.60 GHz with 256GiB of RAM, with a total of 64 physical CPU cores per socket.
These nodes are connected via Infiniband HDR100.
The first step of the calculation, \ie, performing a CIPSI calculation up to $N_\text{det}\sim10^7$ with Hartree-Fock orbitals in order to produce natural orbitals, takes roughly 24 hours on a single node, and reaching the same number of determinants with natural orbitals or localized orbitals takes roughly the same amount of time.
A second 24-hour run on 10 distributed nodes was performed to push the selection to $8\times10^7$ determinants, and a third distributed run using 40 nodes was used to reach $16\times10^7$ determinants.
In total, the present calculation has required 150k core hours, most of it being spent in the last stage of the computation.