Modifs Mimi

This commit is contained in:
Anthony Scemama 2020-11-29 01:00:03 +01:00
parent f28d600231
commit d7cd8332a6
8 changed files with 427 additions and 242 deletions

View File

@ -287,61 +287,121 @@ The definition of the active space considered for each system as well as the num
%------------------------------------------------
In this section, we present our scheme to estimate the extrapolation error in SCI calculations.
This new protocol is then applied to five- and six-membered ring molecules for which SCI calculations are particularly challenging even for small basis sets.
Note that the present method does only apply to ``state-averaged'' SCI calculations where ground- and excited-state energies are produced during the same calculation with the same set of molecular orbitals, not to ``state-specific'' calculations where one computes solely the energy of a single state (like conventional ground-state calculations).
Note that the present method does only apply to \emph{state-averaged} SCI calculations where ground- and excited-state energies are produced during the same calculation with the same set of molecular orbitals, not to \emph{state-specific} calculations where one computes solely the energy of a single state (like conventional ground-state calculations).
For the $m$th excited state (where $m = 0$ corresponds to the ground state), we usually estimate its FCI energy $E_{\text{FCI}}^{(m)}$ by performing a linear extrapolation of its variational energy $E_\text{var}^{(m)}$ as a function of its rPT2 correction $E_{\text{rPT2}}^{(m)}$ as follows
For the $m$th excited state (where $m = 0$ corresponds to the ground state), we usually estimate its FCI energy $E_{\text{FCI}}^{(m)}$ by performing a linear extrapolation of its variational energy $E_\text{var}^{(m)}$ as a function of its rPT2 correction $E_{\text{rPT2}}^{(m)}$ \cite{Holmes_2017, Garniron_2019} using
\begin{equation}
E_\text{FCI}^{(m)} = E_{\text{var}}^{(m)} + \alpha^{(m)} E_{\text{rPT2}}^{(m)}
E_{\text{var}}^{(m)} \approx E_\text{FCI}^{(m)} - \alpha^{(m)} E_{\text{rPT2}}^{(m)},
\label{eqx}
\end{equation}
$E_\text{var}^{(m)}$ varies almost linearly as a function of $E_{\text{rPT2}}^{(m)}$, but with a coefficient $\alpha^{(m)}$ which deviates slightly from unity in well-behaved cases.
This implies that, at any iteration of the CIPSI algorithm, the estimated error on the CIPSI energy is
where $E_{\text{var}}^{(m)}$ and $E_{\text{rPT2}}^{(m)}$ are calculated with CIPSI and $E_\text{FCI}^{(m)}$ is the FCI energy
to be extrapolated. This relation is valid in the regime of a sufficiently large number of determinants where the second-order perturbational
correction largely dominates.
However, in practice, due to the residual higher-order terms, the coefficient $\alpha^{(m)}$ deviates slightly from unity.
Using Eq.(\ref{eqx}) the estimated error on the CIPSI energy is calculated as
\begin{equation}
E_{\text{CIPSI}}^{(m)} - E_{\text{FCI}}^{(m)}
= \qty(E_\text{var}^{(m)}+E_{\text{rPT2}}^{(m)}) - E_{\text{FCI}}^{(m)}
= \qty(1-\alpha^{(m)}) E_{\text{rPT2}}^{(m)}
= \qty(1-\alpha^{(m)}) E_{\text{rPT2}}^{(m)},
\end{equation}
For the large systems considered here, $\abs{E_{\text{rPT2}}} > 2$ eV.
Therefore, the accuracy of the excitation energy estimates will strongly depend on our ability to compensate the errors in the calculations.
Because our selection procedure ensures that the rPT2 values of both states match as well as possible (a trick known as PT2 matching \cite{Dash_2018,Dash_2019}), i.e., $E_{\text{rPT2}} = E_{\text{rPT2}}^{(0)} \approx E_{\text{rPT2}}^{(m)}$, the extrapolated excitation energy associated with the $m$th excited state can be estimated as
and thus the extrapolated excitation energy associated with the $m$th
state is given by
\begin{equation}
\Delta E_{\text{FCI}}^{(m)}
= \qty[ E_\text{var}^{(m)} + E_{\text{rPT2}} + \qty(\alpha^{(m)}-1) E_{\text{rPT2}} ]
- \qty[ E_\text{var}^{(0)} + E_{\text{rPT2}} + \qty(\alpha^{(0)}-1) E_{\text{rPT2}} ]
+ \order{E_{\text{rPT2}}^2 }
+ O\qty[{E_{\text{rPT2}}^2 }]
\end{equation}
which evidences that the error in $\Delta E_{\text{FCI}}^{(m)}$ can be expressed as $\qty(\alpha^{(m)}-\alpha^{(0)}) E_{\text{rPT2}} + \order{E_{\text{rPT2}}^2}$.
Moreover, using a common set of state-averaged natural orbitals for the ground and excited states tends to make the values of $\alpha^{(0)}$ and $\alpha^{(m)}$ very close to each other, such that the error on the energy difference is practically of the order of $E_{\text{rPT2}}^2$.
which evidences that the error in $\Delta E_{\text{FCI}}^{(m)}$ can be expressed as $\qty(\alpha^{(m)}-\alpha^{(0)}) E_{\text{rPT2}} + O\qty[{E_{\text{rPT2}}^2}]$.
At the $n$th CIPSI iteration, we have access to the variational energies of both states, $E_\text{var}^{(0)}(n)$ and $E_\text{var}^{(m)}(n)$, as well as their rPT2 corrections, $E_{\text{rPT2}}^{(0)}(n)$ and $E_{\text{rPT2}}^{(m)}(n)$.
The $m$th excitation energy at iteration $n$ is then assumed to be a Gaussian random variable with mean
Now, for the largest systems considered here, $\qty|{E_{\text{rPT2}}}|$ can be as large as 2~eV and, thus,
the accuracy of the excitation energy estimates strongly depends on our ability to compensate the errors in the calculations.
Here, we greatly enhance the compensation of errors by making use of
our selection procedure ensuring that the PT2 values of both states
match as well as possible (a trick known as PT2 matching
\cite{Dash_2018,Dash_2019}), i.e. $E_{\text{rPT2}} =
E_{\text{rPT2}}^{(0)} \approx E_{\text{rPT2}}^{(m)}$, and
by using a common set of state-averaged natural orbitals with equal weights for the ground and excited states.
This last feature tends to make the values of $\alpha^{(0)}$ and $\alpha^{(m)}$ very close to each other, such that the error on the energy difference
is decreased.
In the ideal case where we would be able to fully correlate the CIPSI calculations for the ground- and excited-states, the fluctuations of
$\Delta E_\text{CIPSI}^{(m)}(n)$ as a function of $n$ would completely vanish and the exact excitation energy would be obtained from the first CIPSI iterations.
Quite remarkably, in practice, numerical experience shows that the fluctuations with respect to the extrapolated value $\Delta E_\text{FCI}^{(m)}$ are small,
zero-centered, almost independent of $n$ when not too close iteration
numbers are considered, and display a Gaussian-like distribution.
In addition, the fluctuations are found to be (very weakly) dependent on the iteration number $n$ (see, Fig.\ref{fig2}), so
this dependence will not significantly alter our results and will not be considered here.
We thus introduce the following random variable
\begin{equation}
\Delta E_\text{CIPSI}^{(m)}(n) = \qty[ E_\text{var}^{(m)}(n) + E_{\text{rPT2}}^{(m)}(n) ] - \qty[ E_\text{var}^{(0)}(n) + E_{\text{rPT2}}^{(0)}(n) ]
X^{(m)}= \frac{\Delta E_\text{CIPSI}^{(m)}(n)- \Delta E_\text{FCI}^{(m)}}{\sigma(n)}
\end{equation}
and variance
where
\begin{equation}
\sigma^2(n) \propto \qty[E_{\text{rPT2}}^{(m)}(n)]^2 + \qty[E_{\text{rPT2}}^{(0)}(n)]^2
\Delta E_\text{CIPSI}^{(m)}(n) = \qty[ E_\text{var}^{(m)}(n) +
E_{\text{rPT2}}^{(m)}(n) ]
- \qty[ E_\text{var}^{(0)}(n) + E_{\text{rPT2}}^{(0)}(n) ],
\end{equation}
and we treat all CIPSI iterations as a set of Gaussian-distributed variables ($\mathcal{G}$) with weights $w(n) = 1/\sqrt{\sigma^2(n)}$.
This choice ensures that the statistical uncertainty vanishes at the FCI limit.
We then search for a confidence interval $\mathcal{I}$ such that the true value of the excitation energy $\Delta E_{\text{FCI}}^{(m)}$ lies within one standard deviation of $\Delta E_\text{CIPSI}^{(m)}$, i.e., $P( \Delta E_{\text{FCI}}^{(m)} \in [ \Delta E_\text{CIPSI}^{(m)} \pm \sigma ] \; | \; \mathcal{G}) = 0.6827$.
The probability that $\Delta E_{\text{FCI}}^{(m)}$ is in an interval $\mathcal{I}$ is
and
${\sigma(n)}$ is a quantity proportional to the average fluctuations of $\Delta E_\text{CIPSI}^{(m)}$.
A natural choice for $\sigma^2(n)$, playing here the role of a variance, is
\begin{equation}
P\qty( \Delta E_{\text{FCI}}^{(m)} \in \mathcal{I} ) = P\qty( \Delta E_{\text{FCI}}^{(m)} \in I \Big| \mathcal{G}) \times P(\mathcal{G})
\sigma^2(n) \propto \qty[E_{\text{rPT2}}^{(m)}(n)]^2 + \qty[E_{\text{rPT2}}^{(0)}(n)]^2,
\end{equation}
where the probability $P(\mathcal{G})$ that the random variables are normally distributed can be deduced from the Jarque-Bera test $J$ as
which vanishes in the large-$n$ limit as it should be.
%%% FIGURE 2 %%%
\begin{figure}
\centering
\includegraphics[width=0.9\linewidth]{fig2/fig2}
\caption{Histogram of the random variable $X^{(m)}$ (see, text). About 200 values of the transition energies
for the 13 five- and six-membered ring molecules, both for the singlet and triplet transitions and for a number of CIPSI iterations, are used.
The number $M$ of iterations kept is chosen according to the statistical test presented in the text.}
\label{fig2}
\end{figure}
The histogram of $X^{(m)}$ resulting from the excitation energies
obtained at different values of the CIPSI iterations $n$
and for the 13 five- and six-membered ring molecules, both for the singlet and triplet transitions,
is shown in Fig.\ref{fig2}. To avoid transient effects, only excitation energies at sufficiently large $n$ are retained in the data set.
The criterion used to decide from which precise value of $n$ the data should be kept will be presented below. In our application, the total number
of values employed to make the histogram is about 200. The dashed line of Fig.\ref{fig2} represents the best Gaussian fit
(in the sense of least-squares) reproducing the data.
As seen, the distribution can be described by the Gaussian probability
\begin{equation}
P(\mathcal{G}) = 1 - \chi^2_{\text{CDF}}(J,2)
P\qty[X^{(m)}] \propto e^{-\frac{{X^{(m)}}^2} {2{\sigma^{*}}^2}}
\end{equation}
where $\sigma^{*2}$ is some "universal" variance depending only
on the way the correlated selection of both states is done, not on the molecule considered in our set.
An estimate of $\Delta E_{\text{FCI}}^{(m)}$ as the average excitation energy of $\Delta E_\text{CIPSI}^{(m)}$ is thus
$$\Delta E_\text{FCI}^{(m)} = \frac{ \sum_{n=1}^M \frac{\Delta E_\text{CIPSI}^{(m)}(n)} {\sigma(n)} }
{ \sum_{n=1}^M \frac{1}{\sigma(n)} },
$$
where $M$ is the number of data kept.
Now, regarding the estimate of the error on $\Delta E_\text{FCI}^{(m)}$ some caution is required since, although the distribution is globally Gaussian-like
(see Fig.\ref{fig2}) there exists
some significant departure from it and we need to take this feature into account.
More precisely, we search for a confidence interval $\mathcal{I}$ such that the true value of the excitation energy $\Delta E_{\text{FCI}}^{(m)}$ lies within one standard deviation of $\Delta E_\text{CIPSI}^{(m)}$, i.e., $P\qty( \Delta E_{\text{FCI}}^{(m)} \in \qty[ \Delta E_\text{CIPSI}^{(m)} \pm \sigma ] \; \Big| \; \mathcal{G}) = 0.6827$.
In a Bayesian framework, the probability that $\Delta E_{\text{FCI}}^{(m)}$ is in an interval $\mathcal{I}$ is
\begin{equation}
P\qty( \Delta E_{\text{FCI}}^{(m)} \in \mathcal{I} ) = P\qty( \Delta E_{\text{FCI}}^{(m)} \in I \Big| \mathcal{G}) \times P\qty(\mathcal{G})
\end{equation}
where $P\qty(\mathcal{G})$ is the probability that the random variables considered in the latest CIPSI iterations are normally distributed.
A common test in statistics of the normality of a distribution is the Jarque-Bera test $J$ and we have
\begin{equation}
P\qty(\mathcal{G}) = 1 - \chi^2_{\text{CDF}}(J,2)
\end{equation}
where $\chi^2_{\text{CDF}}(x,k)$ is the cumulative distribution function (CDF) of the $\chi^2$-distribution with $k$ degrees of freedom.
As the number of samples is usually small, we use Student's $t$-distribution to estimate the statistical error.
As the number of samples $M$ is usually small, we use Student's $t$-distribution to estimate the statistical error.
The inverse of the cumulative distribution function of the $t$-distribution, $t_{\text{CDF}}^{-1}$, allows us to find how to scale the interval by a parameter
\begin{equation}
\beta = t_{\text{CDF}}^{-1} \qty[
\frac{1}{2} \qty( 1 + \frac{0.6827}{P(\mathcal{G})}), M ]
\end{equation}
such that $P\qty( \Delta E_{\text{FCI}}^{(m)} \in \qty[ \Delta E_{\text{CIPSI}}^{(m)} \pm \beta \sigma ] ) = p = 0.6827$.
Only the last $M>2$ computed energy differences are considered. $M$ is chosen such that $P(\mathcal{G})>0.8$ and such that the error bar is minimal.
Only the last $M>2$ computed transition energies are considered. $M$ is chosen such that $P(\mathcal{G})>0.8$ and such that the error bar is minimal.
If all the values of $P(\mathcal{G})$ are below $0.8$, $M$ is chosen such that $P(\mathcal{G})$ is maximal.
A Python code associated with this procedure is provided in the {\SupInf}.
@ -409,10 +469,10 @@ The error bars reported in parenthesis correspond to one standard deviation.
\end{threeparttable}
\end{table}
%%% FIGURE 2 %%%
%%% FIGURE 3 %%%
\begin{figure}
\centering
\includegraphics[width=\linewidth]{fig2}
\includegraphics[width=\linewidth]{fig3}
\caption{Deviation from the CCSDT excitation energies for the lowest singlet and triplet excitation energies (in eV) of five- and six-membered rings obtained at the CIPSI/6-31+G(d) level of theory. Red dots: excitation energies and error bars estimated via the present method (see Sec.~\ref{sec:error}). Blue dots: excitation energies obtained via a three-point linear fit using the three largest CIPSI wave functions, and error bars estimated via the extrapolation distance, \ie, the difference in excitation energies obtained with the three-point linear extrapolation and the largest CIPSI wave function.}
\label{fig:errors}
\end{figure}
@ -430,10 +490,10 @@ from diatomics to molecules as large as naphthalene (see Fig.~\ref{fig:molecules
pure hydrocarbons and various heteroatomic structures, etc. Each of the five subsets making up the QUEST dataset is detailed below. Throughout the present review, we report several statistical indicators: the mean signed
error (MSE), mean absolute error (MAE), root-mean square error (RMSE), and standard deviation of the errors (SDE), as well as the maximum positive [Max(+)] and maximum negative [Max($-$)] errors.
%%% FIGURE 3 %%%
%%% FIGURE 4 %%%
\begin{figure}
\centering
\includegraphics[width=\linewidth]{fig3}
\includegraphics[width=\linewidth]{fig4}
\caption{Molecules from each of the five subsets making up the present QUEST dataset of highly-accurate vertical excitation energies:
QUEST\#1 (red), QUEST\#2 (magenta and/or underlined), QUEST\#3 (black), QUEST\#4 (green), and QUEST\#5 (blue).}
\label{fig:molecules}

Binary file not shown.

View File

@ -0,0 +1,33 @@
-5.8181818181818182E-002 1.0503392263823726E-013
-5.4545454545454543E-002 3.1979374437806884E-012
-5.0909090909090911E-002 7.8108457347063320E-011
-4.7272727272727272E-002 1.5304314242122915E-009
-4.3636363636363640E-002 2.4055667144690598E-008
-4.0000000000000001E-002 3.0332531323628614E-007
-3.6363636363636362E-002 3.0682279452286258E-006
-3.2727272727272730E-002 2.4897417674801438E-005
-2.9090909090909087E-002 1.6207226680756907E-004
-2.5454545454545448E-002 8.4635161104256395E-004
-2.1818181818181816E-002 3.5455258945109665E-003
-1.8181818181818177E-002 1.1915114370157230E-002
-1.4545454545454540E-002 3.2122067227954430E-002
-1.0909090909090908E-002 6.9469868291055614E-002
-7.2727272727272693E-003 0.12052501154967402
-3.6363636363636368E-003 0.16774346335684942
2.1684043449710089E-018 0.18728446158700213
3.6363636363636411E-003 0.16774346335685103
7.2727272727272797E-003 0.12052501154967629
1.0909090909090912E-002 6.9469868291057640E-002
1.4545454545454544E-002 3.2122067227955672E-002
1.8181818181818191E-002 1.1915114370157782E-002
2.1818181818181823E-002 3.5455258945111712E-003
2.5454545454545455E-002 8.4635161104262119E-004
2.9090909090909101E-002 1.6207226680758102E-004
3.2727272727272730E-002 2.4897417674803691E-005
3.6363636363636362E-002 3.0682279452289367E-006
3.9999999999999994E-002 3.0332531323632113E-007
4.3636363636363640E-002 2.4055667144693550E-008
4.7272727272727272E-002 1.5304314242124981E-009
5.0909090909090904E-002 7.8108457347074978E-011
5.4545454545454550E-002 3.1979374437811545E-012
5.8181818181818182E-002 1.0503392263825442E-013

View File

@ -0,0 +1,33 @@
-5.8283990969853983E-002 0.0000000000000000
-5.4692045827783142E-002 0.0000000000000000
-5.1100100685712302E-002 5.2356020942408380E-003
-4.7508155543641448E-002 0.0000000000000000
-4.3916210401570607E-002 5.2356020942408380E-003
-4.0324265259499767E-002 1.0471204188481676E-002
-3.6732320117428927E-002 0.0000000000000000
-3.3140374975358072E-002 0.0000000000000000
-2.9548429833287228E-002 5.2356020942408380E-003
-2.5956484691216385E-002 5.2356020942408380E-003
-2.2364539549145530E-002 5.2356020942408380E-003
-1.8772594407074690E-002 1.0471204188481676E-002
-1.5180649265003851E-002 3.6649214659685861E-002
-1.1588704122933013E-002 4.1884816753926704E-002
-7.9967589808621585E-003 9.4240837696335081E-002
-4.4048138387913181E-003 0.13612565445026178
-8.1286869672047755E-004 0.21465968586387435
2.7790764453503769E-003 0.10471204188481675
6.3710215874212169E-003 0.12565445026178010
9.9629667294920572E-003 7.3298429319371722E-002
1.3554911871562898E-002 2.0942408376963352E-002
1.7146857013633738E-002 4.1884816753926704E-002
2.0738802155704582E-002 5.2356020942408380E-003
2.4330747297775450E-002 1.0471204188481676E-002
2.7922692439846290E-002 1.0471204188481676E-002
3.1514637581917131E-002 5.2356020942408380E-003
3.5106582723987964E-002 1.5706806282722512E-002
3.8698527866058804E-002 5.2356020942408380E-003
4.2290473008129645E-002 0.0000000000000000
4.5882418150200485E-002 0.0000000000000000
4.9474363292271353E-002 0.0000000000000000
5.3066308434342194E-002 5.2356020942408380E-003
5.6658253576413034E-002 0.0000000000000000

59
Manuscript/fig2/fig2.org Normal file
View File

@ -0,0 +1,59 @@
** Initialize R packages
#+begin_src R :results output :session *R* :exports code
library(ggplot2)
library(latex2exp)
library(extrafont)
library(RColorBrewer)
loadfonts()
#+end_src
#+RESULTS:
:
: Registering fonts with R
** Read data
#+begin_src R :results output :session *R* :exports both
df <- read.table("data_histogram_paper");
df$x <- df$V1
df$y <- df$V2
df2 <- read.table("data_gaussian_histogram_paper");
spline.d <- as.data.frame(spline(df2$V1, df2$V2))
summary(spline.d)
#+end_src
#+RESULTS:
:
: x y
: Min. :-0.05818 Min. :0.000e+00
: 1st Qu.:-0.02909 1st Qu.:2.000e-08
: Median : 0.00000 Median :1.213e-04
: Mean : 0.00000 Mean :3.093e-02
: 3rd Qu.: 0.02909 3rd Qu.:3.011e-02
: Max. : 0.05818 Max. :1.873e-01
#+begin_src R :results output graphics :file (org-babel-temp-file "figure" ".png") :exports both :width 600 :height 400 :session *R*
p <- ggplot(data=df, aes(x, y)) +
geom_bar(stat="identity", fill="steelblue")
p <- p+ geom_line(data=spline.d, lwd=1, linetype="dashed")
p <- p + scale_x_continuous(name=TeX("$X^{(m)}$"))
p <- p + scale_y_continuous(name=TeX("Frequency"))
p <- p + theme(text = element_text(size = 20, family="Times"),
legend.position = c(.20, .20),
legend.title = element_blank())
p
#+end_src
#+RESULTS:
[[file:/tmp/babel-nBBwmV/figureJJu58N.png]]
* Export to pdf
#+begin_src R :results output :session *R* :exports code
pdf("fig2.pdf", family="Times", width=8, height=5)
p
dev.off()
#+end_src
#+RESULTS:
:
: png
: 2

BIN
Manuscript/fig2/fig2.pdf Normal file

Binary file not shown.

Binary file not shown.

BIN
Manuscript/fig4.pdf Normal file

Binary file not shown.