diff --git a/Manuscript/QUEST_WIREs.tex b/Manuscript/QUEST_WIREs.tex index 16f280d..3152004 100644 --- a/Manuscript/QUEST_WIREs.tex +++ b/Manuscript/QUEST_WIREs.tex @@ -15,8 +15,7 @@ % \documentclass[blind,alpha-refs]{wiley-article} % Add additional packages here if required -\usepackage{siunitx} -\usepackage{mhchem} +\usepackage{graphicx,dcolumn,bm,xcolor,microtype,multirow,amscd,amsmath,amssymb,amsfonts,physics,longtable,mhchem,siunitx} % macros \newcommand{\ra}{\rightarrow} @@ -192,17 +191,18 @@ All excited-state calculations are performed, except when explicitly mentioned, All the SCI calculations are performed within the FC approximation using QUANTUM PACKAGE \cite{Garniron_2019} where the CIPSI algorithm \cite{Huron_1973} is implemented. Details regarding this specific CIPSI implementation can be found in Refs.~\cite{Garniron_2019} and \cite{Scemama_2019}. A state-averaged formalism is employed, i.e., the ground and excited states are described with the same set of determinants, but different CI coefficients. -Our usual protocol \cite{Scemama_2018,Scemama_2018b,Scemama_2019,Loos_2018a,Loos_2019,Loos_2020a,Loos_2020b,Loos_2020c} consists of performing a preliminary SCI calculation using Hartree-Fock orbitals in order to generate a SCI wave function with at least $10^7$ determinants. -Natural orbitals are then computed based on this wave function, and a new, larger SCI calculation is performed with this new set of orbitals. +Our usual protocol \cite{Scemama_2018,Scemama_2018b,Scemama_2019,Loos_2018a,Loos_2019,Loos_2020a,Loos_2020b,Loos_2020c} consists of performing a preliminary CIPSI calculation using Hartree-Fock orbitals in order to generate a CIPSI wave function with at least $10^7$ determinants. +Natural orbitals are then computed based on this wave function, and a new, larger CIPSI calculation is performed with this new set of orbitals. This has the advantage to produce a smoother and faster convergence of the SCI energy toward the FCI limit. -The SCI energy is defined as the sum of the variational energy (computed via diagonalization of the CI matrix in the reference space) and a PT2 correction which estimates the contribution of the determinants not included in the CI space \cite{Garniron_2017b}. -By linearly extrapolating this second-order correction to zero, one can efficiently estimate the FCI limit for the total energies, and hence, compute the corresponding transition energies. +The CIPSI energy $E_\text{CIPSI}$ is defined as the sum of the variational energy $E_\text{var}$ (computed via diagonalization of the CI matrix in the reference space) and a PT2 correction $E_\text{PT2}$ which estimates the contribution of the determinants not included in the CI space \cite{Garniron_2017b}. +By linearly extrapolating this second-order correction to zero, one can efficiently estimate the FCI limit for the total energies. +These extrapolated total energies (simply labeled as $E_\text{FCI}$ in the remainder of the paper) are then used to compute vertical excitation energies. Depending on the set, we estimated the extrapolation error via different techniques. For example, in Ref.~\cite{Loos_2020b}, we estimated the extrapolation error by the difference between the transition energies obtained with the largest SCI wave function and the FCI extrapolated value. This definitely cannot be viewed as a true error bar, but it provides a rough idea of the quality of the FCI extrapolation and estimate. Below, we provide a much cleaner way of estimating the extrapolation error in SCI methods, and we adopt this scheme for the five- and six-membered rings. The particularity of the current implementation is that the selection step and the PT2 correction are computed \textit{simultaneously} via a hybrid semistochastic algorithm \cite{Garniron_2017,Garniron_2019}. -Moreover, a renormalized version of the PT2 correction has been recently implemented for a more efficient extrapolation to the FCI limit \cite{Garniron_2019}. +Moreover, a renormalized version of the PT2 correction (dubbed rPT2) has been recently implemented for a more efficient extrapolation to the FCI limit \cite{Garniron_2019}. We refer the interested reader to Ref.~\cite{Garniron_2019} where one can find all the details regarding the implementation of the CIPSI algorithm. Note that, all our SCI wave functions are eigenfunctions of the $\Hat{S}^2$ spin operator which is, unlike ground-state calculations, paramount in the case of excited states \cite{Applencourt_2018}. @@ -241,8 +241,81 @@ The definition of the active space considered for each system as well as the num \subsubsection{Estimating the extrapolation error} %------------------------------------------------ -\alert{Here comes Anthony's part on error bars in SCI methods.} +For the $m$th excited states (where $m = 0$ corresponds to the ground state), we usually estimate its FCI energy by performing a linear extrapolation of its variational energy $E_\text{var}^{(m)}$ as a function of its rPT2 correction $E_{\text{rPT2}}^{(m)}$ as follows +\begin{equation} + E_\text{var}^{(m)} = E_{\text{FCI}}^{(m)} - \alpha^{(m)} E_{\text{rPT2}}^{(m)} +\end{equation} +$E_\text{var}^{(m)}$ varies almost linearly as a function of $E_{\text{rPT2}}^{(m)}$, but with a coefficient $\alpha^{(m)}$ which deviates slightly from unity in well-behaved cases. +This implies that at any iteration of the CIPSI algorithm, the estimated error on the CIPSI energy is +\begin{equation} + E_{\text{CIPSI}}^{(m)} - E_{\text{FCI}}^{(m)} + = \qty(E_\text{var}^{(m)}+E_{\text{rPT2}}^{(m)}) - E_{\text{FCI}}^{(m)} + = \qty(1-\alpha^{(m)}) E_{\text{rPT2}}^{(m)} +\end{equation} +For the large systems considered here, $\abs{E_{\text{rPT2}}} > 2$ eV. +Therefore, the accuracy of the excitation energy estimates will strongly depend on our ability to compensate the errors in the calculations. +Because our selection procedure ensures that the rPT2 values of both states match as well as possible (a trick known as PT2 matching \cite{Dash_2018,Dash_2019}), i.e., $E_{\text{rPT2}} = E_{\text{rPT2}}^{(0)} \approx E_{\text{rPT2}}^{(m)}$, the extrapolated excitation energy associated with the $m$th excited state can be estimated as +\begin{equation} +\begin{split} + \Delta E^{(m)} + & = E^{(m)}_{\text{CIPSI}} - E^{(0)}_{\text{CIPSI}} + \\ + & = \qty[ E^{(m)} + E_{\text{rPT2}} + \qty(\alpha^{(n)}-1) E_{\text{rPT2}} ] + - \qty[ E^{(0)} + E_{\text{rPT2}} + \qty(\alpha^{(0)}-1) E_{\text{rPT2}} ] + + \order{E_{\text{rPT2}}^2 } +\end{split} +\end{equation} +which evidences that the error on $\Delta E^{(m)}$ can be expressed as $\qty(\alpha^{(m)}-\alpha^{(0)}) E_{\text{rPT2}} + \order{E_{\text{rPT2}}^2}$. +Moreover, using a common set of state-averaged natural orbitals for the ground and excited states tends to make the values of $\alpha^{(0)}$ and $\alpha^{(m)}$ very close to each other, such that the error on the energy difference is practically of the order of $E_{\text{rPT2}}^2$. + +At the $n$th CIPSI iteration, we have access to the variational energies of both states, $E^{(0)}(n)$ and $E^{(m)}(n)$, as well as their the rPT2 corrections, $E_{\text{rPT2}}^{(0)}(n)$ and $E_{\text{rPT2}}^{(m)}(n)$. +The $m$th excitation energy at iteration $n$ is then modeled as a Gaussian random variable with mean and variance +\begin{gather} + \Delta E^{(m)}(n) = \qty[ E^{(m)}(n) + E_{\text{rPT2}}^{(m)}(n) ] - \qty[ E^{(0)}(n) + E_{\text{rPT2}}^{(0)}(n) ] + \\ + \sigma^2(n) \propto \qty[E_{\text{rPT2}}^{(m)}(n)]^2 + \qty[E_{\text{rPT2}}^{(0)}(n)]^2 +\end{gather} +and we treat all CIPSI iterations as samples coming from the same Gaussian process with weights $w(n) = 1/\sqrt{\sigma^2(n)}$. + + The confidence interval is chosen to be equivalent to what one + would obtain using $\pm 1$ standard deviation with Gaussian-distributed + variables ($\mathcal{G}$). In other words, we will search for an interval $\mathcal{I}$ + such that the probability $P( \Delta E_{\text{FCI}} \in \mathcal{I})$ + that the true value of the excitation energy lies within the interval is + equal to + $P( \Delta E_{\text{FCI}} \in [ \Delta E \pm \sigma ] \; | \; \mathcal{G}) = 0.6827$. + The probability that the FCI excitation energy is in an interval + $\mathcal{I}$ is + +\begin{equation} + P( \Delta E_{\text{FCI}} \in \mathcal{I} ) = P( E_{\text{FCI}} \in I | \mathcal{G}) \times P(\mathcal{G}) +\end{equation} + where the probability $P(\mathcal{G})$ that the random variables are + normally distributed can be deduced from the Jarque-Bera test $J$ as + +\begin{equation} + P(\mathcal{G}) = 1 - \chi^2_{\text{CDF}}(J,2) +\end{equation} + where $\chi^2_{\text{CDF}}(x,k)$ is the cumulative distribution function of the + $\chi^2$ distribution with $k$ degrees of freedom. + As the number of samples is usually small, we use Student's $t$ distribution to + estimate the statistical error. The inverse of the cumulative + distribution function of the $t$ distribution will allow us to find how + to scale the interval with a parameter $\beta$ such that + $P( \Delta E_{\text{FCI}} \in [ \Delta E \pm \beta \sigma ] ) = p$. + +\begin{equation} + %\beta = t_{\text{CDF}}^{-1} \left[ + %\frac{1}{2} \left( 1 + \frac{P( \Delta E_{\text{FCI}} \in [ \Delta E \pm \sigma ] \; | \; \mathcal{G}) }{P(\mathcal{G})}\right), n \right] + \beta = t_{\text{CDF}}^{-1} \left[ + \frac{1}{2} \left( 1 + \frac{0.6827}{P(\mathcal{G})}\right), n \right] +\end{equation} + Only the last $M>2$ computed energy differences are considered. $M$ is chosen + such that $P(\mathcal{G})>0.8$ and such that the error bar is minimal. + If all the values of $P(\mathcal{G})$ are below $0.8$, $M$ is chosen such that + $P(\mathcal{G})$ is maximal. + %%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{The QUEST database} \label{sec:QUEST}