We report a universal density-based basis set incompleteness correction that can be applied to any wave function method.
The present correction relies on short-range correlation density functionals (with multi-determinant reference) from range-separated density-functional theory (RS-DFT) to estimate the basis set incompleteness error and appropriately vanishes in the complete basis set (CBS) limit.
Contrary to conventional RS-DFT schemes which require an \textit{ad hoc} range separation \textit{parameter}$\mu$, the key ingredient here is a range separation \textit{function}$\mu(\bf{r})$ which automatically adapts to the spatial non-homogeneity of the basis set incompleteness error.
As illustrative examples, we show how this density-based correction allows us to obtain CCSD(T) atomization energies near the CBS limit for the G2-1 set of molecules with compact Gaussian basis sets.
For example, while CCSD(T)/cc-pVTZ yields a mean absolute deviation (MAD) of 7.79 kcal/mol compared to CCSD(T)/CBS atomization energies, the CCSD(T)+LDA and CCSD(T)+PBE corrected methods return MAD of 2.89 and 2.46 kcal/mol (respectively) with the same basis.
These values drop below 1 {\kcal} with the cc-pVQZ basis set.
Contemporary quantum chemistry has developed in two directions --- wave function theory (WFT) \cite{Pop-RMP-99} and density-functional theory (DFT). \cite{Koh-RMP-99}
WFT is attractive as it exists a well-defined path for systematic improvement as well as powerful tools, such as perturbation theory, to guide the development of new attractive WFT \textit{ans\"atze}.
The coupled cluster (CC) family of methods is a typical example of the WFT philosophy and is well regarded as the gold standard of quantum chemistry for weakly correlated systems.
By increasing the excitation degree of the CC expansion, one can systematically converge, for a given basis set, to the exact, full configuration interaction (FCI) limit, although the computational cost associated with such improvement is usually pricey.
One of the most fundamental drawback of conventional WFT methods is the slow convergence of energies and properties with respect to the size of the one-electron basis set.
To palliate this, following Hylleraas' footsteps, \cite{Hyl-ZP-29} Kutzelnigg proposed to introduce explicitly the interelectronic distance $r_{12}=\abs{\br{1}-\br{2}}$ to properly describe the electronic wave function around the coalescence of two electrons. \cite{Kut-TCA-85, KutKlo-JCP-91, NogKut-JCP-94}
The resulting F12 methods yields a prominent improvement of the energy convergence, and achieve chemical accuracy for small organic molecules with relatively small Gaussian basis sets. \cite{Ten-TCA-12, TenNog-WIREs-12, HatKloKohTew-CR-12, KonBisVal-CR-12}
For example, at the CCSD(T) level, it is advertised that one can obtain quintuple-$\zeta$ quality correlation energies with a triple-$\zeta$ basis, \cite{TewKloNeiHat-PCCP-07} although computational overheads are introduced by the large auxiliary basis used to resolve three- and four-electron integrals. \cite{BarLoo-JCP-17}
%\trashPFL{Except for these computational considerations, a possible drawback of F12 theory is its quite complicated formulation which requires a deep knowledge in this field in order to adapt F12 theory to a new WFT model.}
To reduce further the computational cost and/or ease the transferability of the F12 correction, approximated and/or universal schemes have recently emerged. \cite{TorVal-JCP-09, KonVal-JCP-10, KonVal-JCP-11, BooCleAlaTew-JCP-2012, IrmHumGru-arXiv-2019, IrmGru-arXiv-2019}
Present-day DFT calculations are almost exclusively done within the so-called Kohn-Sham (KS) formalism, which corresponds to an exact dressed one-electron theory. \cite{KohSha-PR-65}
DFT's attractiveness originates from its very favorable cost/efficiency ratio as it can provide accurate energies and properties at a relatively low computational cost.
Thanks to this, KS-DFT \cite{HohKoh-PR-64, KohSha-PR-65} has become the workhorse of electronic structure calculations for atoms, molecules and solids. \cite{ParYan-BOOK-89}
In the present context, one of the interesting feature of density-based methods is their much faster convergence with respect to the size of the basis set. \cite{FraMusLupTou-JCP-15}
%especially in the range-separated (RS) context where the WFT method is relieved from describing the short-range part of the correlation hole. \cite{TouColSav-PRA-04, FraMusLupTou-JCP-15}
%To obtain accurate results within DFT, one must develop the art of selecting the adequate exchange-correlation functional, which can be classified in various families depending on their physical input quantities. \cite{Bec-JCP-14}
Although there is no clear way on how to systematically improve density-functional approximations, \cite{Bec-JCP-14} climbing the Jacob's ladder of DFT is potentially the most satisfactory way forward. \cite{PerSch-AIPCP-01, PerRuzTaoStaScuCso-JCP-05}
%The local-density approximation (LDA) sits on the first rung of the Jacob's ladder and only uses as input the electron density $n$. \cite{Dir-PCPRS-30, VosWilNus-CJP-80}
%The generalized-gradient approximation (GGA) corresponds to the second rung and adds the gradient of the electron density $\nabla n$ as an extra ingredient.\cite{Bec-PRA-88, PerWan-PRA-91, PerBurErn-PRL-96}
Progress toward unifying WFT and DFT are on-going.
In particular, range-separated DFT (RS-DFT) (see Ref.~\onlinecite{TouColSav-PRA-04} and references therein) rigorously combines these two approaches via a decomposition of the electron-electron (e-e) interaction into a smooth long-range part and a (complementary) short-range part treated with WFT and DFT, respectively.
As the WFT method is relieved from describing the short-range part of the correlation hole around the e-e coalescence points, the convergence with respect to the one-electron basis set is greatly improved. \cite{FraMusLupTou-JCP-15}
Therefore, a number of approximate RS-DFT schemes have been developed using either single-reference \cite{AngGerSavTou-PRA-05, GolWerSto-PCCP-05, TouGerJanSavAng-PRL-09,JanHenScu-JCP-09} or multi-reference \cite{LeiStoWerSav-CPL-97, FroTouJen-JCP-07, FroCimJen-PRA-10, HedKneKieJenRei-JCP-15, FerGinTou-JCP-18} WFT approaches.
%Therefore, a number of approximate RS-DFT schemes have been developed using either single-reference WFT approaches (such as M{\o}ller-Plesset perturbation theory\cite{AngGerSavTou-PRA-05}, coupled cluster\cite{GolWerSto-PCCP-05}, random-phase approximations\cite{TouGerJanSavAng-PRL-09,JanHenScu-JCP-09}) or multi-reference WFT approaches (such as multi-reference CI\cite{LeiStoWerSav-CPL-97}, multiconfiguration self-consistent field\cite{FroTouJen-JCP-07}, multi-reference perturbation theory\cite{FroCimJen-PRA-10}, density-matrix renormalization group\cite{HedKneKieJenRei-JCP-15}, selected CI\cite{FerGinTou-JCP-18}).
The present work proposes an extension of a recently proposed basis set correction scheme based on RS-DFT \cite{GinPraFerAssSavTou-JCP-18} together with the first numerical tests on molecular systems.
The present basis set correction relies on the RS-DFT formalism to capture the missing part of the short-range correlation effects, a consequence of the incompleteness of the one-electron basis set.
Let us assume we have both the energy $\E{\modY}{\Bas}$ and density $\n{\modZ}{\Bas}$ of a $\Ne$-electron system described by two methods $\modY$ and $\modZ$ (potentially identical) in an incomplete basis set $\Bas$.
According to Eq.~(15) of Ref.~\onlinecite{GinPraFerAssSavTou-JCP-18}, assuming that $\E{\modY}{\Bas}$ and $\n{\modZ}{\Bas}$ are reasonable approximations of the \alert{FCI} energy and density within $\Bas$, the exact ground state energy $\E{}{}$ may be written as
is the basis-dependent complementary density functional, $\hT$ is the kinetic operator and $\hWee{}=\sum_{i<j} r_{ij}^{-1}$ is the interelectronic repulsion operator.
In Eq.~\eqref{eq:E_funcbasis}, $\wf{}{\Bas}$ and $\wf{}{}$ are two general $\Ne$-electron wave functions belonging to the Hilbert space spanned by $\Bas$ and the complete basis, respectively.
Importantly, in the limit of a complete basis set (which we refer to as $\Bas\to\infty$), we have, for any density $\n{}{}$, $\lim_{\Bas\to\infty}\bE{}{\Bas}[\n{}{}]=0$, which implies that
where $\E{\modY}{}$ is the energy associated with the method $\modY$ in the complete basis set.
In the case $\modY=\FCI$, we have a strict equality as $\E{\FCI}{}=\E{}{}$.
Provided that the functional $\bE{}{\Bas}[\n{}{}]$ is known exactly, the only source of error at this stage lies in the potential approximate nature of the methods $\modY$ and $\modZ$ for the \titou{FCI} energy and density within $\Bas$, respectively.
Because the e-e cusp originates from the divergence of the Coulomb operator at $r_{12}=0$, a cuspless wave function could equivalently originate from a Hamiltonian with a non-divergent Coulomb interaction at $r_{12}=0$.
Therefore, as we shall do later on, it feels natural to approximate $\bE{}{\Bas}[\n{}{}]$ with short-range density functionals which deal with a smooth long-range electron interaction.
Contrary to the conventional RS-DFT scheme which requires a range-separated \textit{parameter}$\rsmu{}{}$, here we use a range-separated \textit{function}$\rsmu{\Bas}{}(\br{})$ which automatically adapts to quantify the incompleteness of $\Bas$ in $\mathbb{R}^3$.
The first step of the basis set correction consists in obtaining an effective two-electron interaction $\W{\Bas}{}(\br{1},\br{2})$ which represents the effect of the projection of the Coulomb operator in an incomplete basis set $\Bas$.
The present definition ensures that $\W{\Bas}{}(\br{1},\br{2})$ is finite at the e-e coalescence point as long as an incomplete basis set is used, and tends to the genuine, unbounded $r_{12}^{-1}$ Coulomb interaction as $\Bas\to\infty$.
In a second step, we shall link $\W{\Bas}{}(\br{1},\br{2})$ to $\rsmu{\Bas}{}(\br{})$.
and $\Gam{pq}{rs}=\mel*{\wf{}{\Bas}}{\aic{r}\aic{s}\ai{p}\ai{q}}{\wf{}{\Bas}}$ are the opposite-spin pair density and density tensor (respectively) associated with $\wf{}{\Bas}$, $\SO{p}{}$ is a spinorbital,
Note that the divergence condition of $\W{\Bas}{}(\br{1},\br{2})$ in Eq.~\eqref{eq:def_weebasis} ensures that one-electron systems do not have any basis set correction.
As already discussed in Ref.~\onlinecite{GinPraFerAssSavTou-JCP-18}, $\W{\Bas}{}(\br{1},\br{2})$ is symmetric, \textit{a priori} non translational, nor rotational invariant if $\Bas$ does not have such symmetries.
Of course, there exists \textit{a priori} an infinite set of functions in $\mathbb{R}^6$ satisfying \eqref{eq:int_eq_wee}, but thanks to its definition one can show that (see Appendix B of Ref.~\onlinecite{GinPraFerAssSavTou-JCP-18})
%An important point here is that, with the present definition of $\W{\Bas}{}(\br{1},\br{2})$, one can quantify the effect of the incompleteness of $\Bas$ on the Coulomb operator itself as a removal of the divergence of the two-electron interaction near the electron coalescence.
%As shown in Ref.~\onlinecite{GinPraFerAssSavTou-JCP-18}, choosing a HF wave function as $\wf{}{\Bas}$ to define the effective interaction $\W{\Bas}{}(\br{1},\br{2})$ already provides a quantitative representation of the incompleteness of $\Bas$ for weakly correlated systems.
\alert{Because the Coulomb operator within a basis set $\Bas$ is a non divergent two-electron interaction, we can straightforwardly link the present theory with the RS-DFT which uses the so-called long-range interaction which are smooth bounded two-electron operators.}
Although this choice is not unique, we choose here the range-separation function
\PFL{This expression looks like a cheap spherical average.
What about $\rsmu{\Bas}{}(\br{1},\br{2})=\sqrt{\rsmu{\Bas}{}(\br{1})\rsmu{\Bas}{}(\br{2})}$ and a proper spherical average to get $\rsmu{\Bas}{}(r_{12})$?}
As in Ref.~\onlinecite{GinPraFerAssSavTou-JCP-18}, we consider here a specific class of short-range correlation functionals known as ECMD whose general definition reads \cite{TouGorSav-TCA-05}
The choice of ECMD in the present scheme is motivated by the analogy between the definition of $\bE{}{\Bas}[\n{}{}]$ [Eq.~\eqref{eq:E_funcbasis}] and that of the ECMD functionals [Eq.~\eqref{eq:ec_md_mu}].
Indeed, provided that $\w{}{\lr,\rsmu{\Bas}{}}(\br{1},\br{2})\approx\W{\Bas}{}(\br{1},\br{2})$, then the wave function $\wf{}{\rsmu{\Bas}{}}$ coincides with $\wf{}{\Bas}$.
%The ECMD functionals differ from the standard RS-DFT correlation functional by the fact that the reference is not the KS Slater determinant but a multi-determinantal wave function.
%This makes them particularly well adapted to the present context where one aims at correcting a general WFT method.
Following Ref.~\onlinecite{GinPraFerAssSavTou-JCP-18}, we approximate $\bE{}{\Bas}[\n{}{}]$ by the ECMD functionals evaluated with the range separation function $\rsmu{\Bas}{}(\br{})$.
Therefore, we define the LDA version of $\bE{}{\Bas}[\n{}{}]$ as
where $\be{\LDA}{\sr}(\n{}{},\rsmu{}{})$ is the short-range ECMD per particle of the uniform electron gas (UEG) \cite{LooGil-WIRES-16} for which a parametrization can be found in Ref.~\onlinecite{PazMorGorBac-PRB-06}.
The short-range LDA correlation functional defined in Eq.~\eqref{eq:def_lda_tot} relies on the transferability of the physics of the UEG which is certainly valid for large $\mu$ but is known to over correlate for small $\mu$.
In order to correct such a defect, we propose here a new ECMD functional inspired by the recent functional proposed by some of the present authors \cite{FerGinTou-JCP-18} which interpolates between the usual PBE correlation functional $\e{\PBE}{}(\n{}{},\nabla\n{}{})$ for $\rsmu{}{}=0$ and the exact large-$\rsmu{}{}$ behavior, \cite{TouColSav-PRA-04, GoriSav-PRA-06, PazMorGorBac-PRB-06} yielding
The difference between the ECMD PBE functional defined in Ref.~\onlinecite{FerGinTou-JCP-18} and the present expression \eqref{eq:epsilon_cmdpbe} is that we approximate here the \textit{exact} ground-state on-top pair density of the system $\n{2}{}(\br{})$ by its UEG version, i.e.~$\n{2}{}(\br{})\approx\n{2}{\UEG}(\br{})=\n{}{}(\br{})^2 g_0(\n{}{}(\br{}))$, where $g_0(\n{}{})$ is the UEG correlation factor whose parametrization can be found in Eq.~(46) of Ref.~\onlinecite{GorSav-PRA-06}.
This represents a major computational saving without loss of performance as we eschew the computation of $\n{2}{}(\br{})$.
Depending on the functional choice, the complementary functional $\bE{}{\Bas}[\n{\modZ}{}]$ is then equal to $\bE{\LDA}{\sr}[\n{\modZ}{}(\br{}),\rsmu{\Bas}{}(\br{})]$ or $\bE{\PBE}{\sr}[\n{\modZ}{}(\br{}),\rsmu{\Bas}{}(\br{})]$ where $\rsmu{\Bas}{}(\br{})$ is given by Eq.~\eqref{eq:mu_of_r}.
As most WFT calculations are performed within the frozen-core (FC) approximation, it is important to define an effective interaction within a general subset of molecular orbitals.
%We then naturally split the basis set as $\Bas = \Cor \bigcup \Val$, where $\Cor$ and $\Val$ are its core and valence parts, respectively.% and $\Cor \bigcap \Val = \O$.
Therefore, we define the valence-only effective interaction
Defining $\n{\modZ}{\Val}$ as the valence one-electron density obtained with the model $\modZ$, the valence part of the complementary functional $\bE{}{\Val}[\n{\modZ}{\Val}]$ is then evaluated as $\bE{\LDA}{\sr}[\n{\modZ}{\Val}(\br{}),\rsmu{\Bas}{\Val}(\br{})]$ or $\bE{\PBE}{\sr}[\n{\modZ}{\Val}(\br{}),\rsmu{\Bas}{\Val}(\br{})]$.
which scales as $\Nb^2\times N_{elec}^2\times\Ng$ and is embarassingly parallel. Within the present formulation, the bottleneck is the four-index transformation to obtain the two-electron integrals on the MO basis which appear in \eqref{eq:fcoal}. Nevertheless, this step has in general to be performed before a correlated WFT calculations and therefore it represent a minor limitation.
When the four-index transformation become prohibitive, by performing successive matrix multiplications, one could rewrite the equations directly in the AO basis where it scales formally as $\order{\Ng\Nb^4}$ but where one can take advantage of the sparsity atomic-orbital-based algorithms to significantly speed up the calculations.
To conclude this theory session, it is important to notice that the basis set correction proposed here has the folowing properties whatever the approximations made in the DFT part: i) it can be applied to any WFT model that provides an energy and a density, ii) it vanishes for one-electron systems,
Deviation (in \kcal) from CCSD(T)/CBS reference atomization energies obtained with various methods with the cc-pVDZ (top), cc-pVTZ (center) and cc-pVQZ (bottom) basis sets.
The green region corresponds to chemical accuracy (i.e.~error below 1 {\kcal}).
We begin our investigation of the performance of the basis set correction by computing the atomization energies of \ce{C2}, \ce{N2}, \ce{O2} and \ce{F2} obtained with Dunning's cc-pVXZ basis sets (X $=$ D, T, Q and 5).
In the case of \ce{C2} and \ce{N2}, we also perform calculations with the cc-pCVXZ family.
\ce{N2}, \ce{O2} and \ce{F2} are weakly correlated systems and belong to the G2 set, whereas \ce{C2} already contains a non-negligible amount of strong correlation. \cite{BooCleThoAla-JCP-11}
In a second time, we compute the entire atomization energies of the G2 set \cite{CurRagTruPop-JCP-91} composed by 55 molecules with the cc-pVXZ family of basis sets.
This molecular set has been exhausively studied in the last 20 years (see, for example, Refs.~\onlinecite{FelPetDix-JCP-08,Gro-JCP-09,FelPet-JCP-09,NemTowNee-JCP-10,FelPetHil-JCP-11,PetTouUmr-JCP-12,FelPet-JCP-13,KesSylKohTewMar-JCP-18}).
%The reference values for the atomization energies are extracted from Ref.~\onlinecite{HauKlo-JCP-12} and corresponds to frozen-core non-relativistic atomization energies obtained at the CCSD(T)(F12)/cc-pVQZ-F12 level of theory corrected for higher-excitation contributions ($E_\text{CCSDT(Q)/cc-pV(D+d)Z} - E_\text{CCSD(T)/cc-pV(D+d)Z})$.
We refer the interested reader to Refs.~\onlinecite{HolUmrSha-JCP-17, SceGarCafLoo-JCTC-18, LooSceBloGarCafJac-JCTC-18, SceBenJacCafLoo-JCP-18, LooBogSceCafJAc-JCTC-19} for more details.
In the case of the CCSD(T) calculations, we have $\modZ=\HF$ as we use the restricted open-shell Hartree-Fock (ROHF) one-electron density to compute the complementary energy.
For the definition of the interaction, we use a single Slater determinant built in the ROHF basis for the CCSD(T) calculations, and built with the natural orbitals of the converged variational wave function for the exFCI calculations.
Except for the carbon dimer where we have taken the experimental equilibrium bond length (\InAA{1.2425}), all geometries have been extracted from Ref.~\onlinecite{HauJanScu-JCP-09} and have been obtained at the B3LYP/6-31G(2df,p) level of theory.
Frozen core calculations are defined as such: an \ce{He} core is frozen from \ce{Li} to \ce{Ne}, while a \ce{Ne} core is frozen from \ce{Na} to \ce{Ar}.
In the context of the basis set correction, the set of valence spinorbitals $\Val$ involved in the definition of the effective interaction refers to the non-frozen spinorbitals.
In order to estimate the complete basis set (CBS) limit for each model, we employed the two-point extrapolation proposed in Ref.~\onlinecite{HalHelJorKloKocOlsWil-CPL-98} for the correlation energies.
They will be our references for \ce{C2}, \ce{N2}, \ce{O2} and \ce{F2} in a given basis, and the results for these diatomics are reported in Table \ref{tab:diatomics}.
As one can see, the convergence of the exFCI atomization energies is, as expected, slow with respect to the basis set: chemical accuracy (error below 1 {\kcal}) is barely reached for \ce{C2}, \ce{O2} and \ce{F2} even with a cc-pV5Z basis set.
Also, the atomization energies are consistently underestimated, reflecting that, in a given basis, the atom is always better described than the molecule due to the larger number of interacting electron pairs in the molecular system.
A similar trend holds for CCSD(T).
%, and one can notice that the atomization energies of the CCSD(T) are always slightly underestimated with respect to the CIPSI ones, showing that the CCSD(T) ansatz is better suited for the atoms than for the molecule.
Regarding the effect of the basis set correction, several general observations can be made for both exFCI and CCSD(T).
First, in a given basis set, the basis set correction systematically improves the result (both at the LDA and PBE level).
A small overestimation can occur compared to the CBS values by a few tenths of a {\kcal} (the largest deviation being 0.6 {\kcal} for \ce{N2} at the CCSD(T)+PBE/cc-pV5Z level).
Nevertheless, the deviation observed for the largest basis set is typically within the extrapolation error of the CBS atomization energies, which is highly satisfactory knowing the marginal computation cost of the present correction.
%Also, the values obtained with the largest basis sets tends to converge toward a value close to the estimated CBS values.
Importantly, the sensitivity with respect to the SR-DFT functional is quite large for the double- and triple-$\zeta$ basis sets, where clearly the PBE functional performs better.
However, from the quadruple-$\zeta$ basis, the LDA and PBE functionals agree within a few tenths of a {\kcal}.