toto part

This commit is contained in:
Pierre-Francois Loos 2020-09-08 15:10:52 +02:00
parent a507931e79
commit 830a893fe2
4 changed files with 37 additions and 50 deletions

View File

@ -241,12 +241,12 @@ The definition of the active space considered for each system as well as the num
\subsubsection{Estimating the extrapolation error}
%------------------------------------------------
For the $m$th excited states (where $m = 0$ corresponds to the ground state), we usually estimate its FCI energy by performing a linear extrapolation of its variational energy $E_\text{var}^{(m)}$ as a function of its rPT2 correction $E_{\text{rPT2}}^{(m)}$ as follows
For the $m$th excited state (where $m = 0$ corresponds to the ground state), we usually estimate its FCI energy $E_{\text{FCI}}^{(m)}$ by performing a linear extrapolation of its variational energy $E_\text{var}^{(m)}$ as a function of its rPT2 correction $E_{\text{rPT2}}^{(m)}$ as follows
\begin{equation}
E_\text{var}^{(m)} = E_{\text{FCI}}^{(m)} - \alpha^{(m)} E_{\text{rPT2}}^{(m)}
\end{equation}
$E_\text{var}^{(m)}$ varies almost linearly as a function of $E_{\text{rPT2}}^{(m)}$, but with a coefficient $\alpha^{(m)}$ which deviates slightly from unity in well-behaved cases.
This implies that at any iteration of the CIPSI algorithm, the estimated error on the CIPSI energy is
This implies that, at any iteration of the CIPSI algorithm, the estimated error on the CIPSI energy is
\begin{equation}
E_{\text{CIPSI}}^{(m)} - E_{\text{FCI}}^{(m)}
= \qty(E_\text{var}^{(m)}+E_{\text{rPT2}}^{(m)}) - E_{\text{FCI}}^{(m)}
@ -257,64 +257,43 @@ Therefore, the accuracy of the excitation energy estimates will strongly depend
Because our selection procedure ensures that the rPT2 values of both states match as well as possible (a trick known as PT2 matching \cite{Dash_2018,Dash_2019}), i.e., $E_{\text{rPT2}} = E_{\text{rPT2}}^{(0)} \approx E_{\text{rPT2}}^{(m)}$, the extrapolated excitation energy associated with the $m$th excited state can be estimated as
\begin{equation}
\begin{split}
\Delta E^{(m)}
& = E^{(m)}_{\text{CIPSI}} - E^{(0)}_{\text{CIPSI}}
\\
& = \qty[ E^{(m)} + E_{\text{rPT2}} + \qty(\alpha^{(n)}-1) E_{\text{rPT2}} ]
- \qty[ E^{(0)} + E_{\text{rPT2}} + \qty(\alpha^{(0)}-1) E_{\text{rPT2}} ]
\Delta E_{\text{FCI}}^{(m)}
= \qty[ E_\text{var}^{(m)} + E_{\text{rPT2}} + \qty(\alpha^{(m)}-1) E_{\text{rPT2}} ]
- \qty[ E_\text{var}^{(0)} + E_{\text{rPT2}} + \qty(\alpha^{(0)}-1) E_{\text{rPT2}} ]
+ \order{E_{\text{rPT2}}^2 }
\end{split}
\end{equation}
which evidences that the error on $\Delta E^{(m)}$ can be expressed as $\qty(\alpha^{(m)}-\alpha^{(0)}) E_{\text{rPT2}} + \order{E_{\text{rPT2}}^2}$.
which evidences that the error in $\Delta E_{\text{FCI}}^{(m)}$ can be expressed as $\qty(\alpha^{(m)}-\alpha^{(0)}) E_{\text{rPT2}} + \order{E_{\text{rPT2}}^2}$.
Moreover, using a common set of state-averaged natural orbitals for the ground and excited states tends to make the values of $\alpha^{(0)}$ and $\alpha^{(m)}$ very close to each other, such that the error on the energy difference is practically of the order of $E_{\text{rPT2}}^2$.
At the $n$th CIPSI iteration, we have access to the variational energies of both states, $E^{(0)}(n)$ and $E^{(m)}(n)$, as well as their the rPT2 corrections, $E_{\text{rPT2}}^{(0)}(n)$ and $E_{\text{rPT2}}^{(m)}(n)$.
The $m$th excitation energy at iteration $n$ is then modeled as a Gaussian random variable with mean and variance
At the $n$th CIPSI iteration, we have access to the variational energies of both states, $E_\text{var}^{(0)}(n)$ and $E_\text{var}^{(m)}(n)$, as well as their rPT2 corrections, $E_{\text{rPT2}}^{(0)}(n)$ and $E_{\text{rPT2}}^{(m)}(n)$.
The $m$th excitation energy at iteration $n$ is then assumed to be a Gaussian random variable with mean and variance
\begin{gather}
\Delta E^{(m)}(n) = \qty[ E^{(m)}(n) + E_{\text{rPT2}}^{(m)}(n) ] - \qty[ E^{(0)}(n) + E_{\text{rPT2}}^{(0)}(n) ]
\Delta E_\text{CIPSI}^{(m)}(n) = \qty[ E_\text{var}^{(m)}(n) + E_{\text{rPT2}}^{(m)}(n) ] - \qty[ E_\text{var}^{(0)}(n) + E_{\text{rPT2}}^{(0)}(n) ]
\\
\sigma^2(n) \propto \qty[E_{\text{rPT2}}^{(m)}(n)]^2 + \qty[E_{\text{rPT2}}^{(0)}(n)]^2
\end{gather}
and we treat all CIPSI iterations as samples coming from the same Gaussian process with weights $w(n) = 1/\sqrt{\sigma^2(n)}$.
The confidence interval is chosen to be equivalent to what one
would obtain using $\pm 1$ standard deviation with Gaussian-distributed
variables ($\mathcal{G}$). In other words, we will search for an interval $\mathcal{I}$
such that the probability $P( \Delta E_{\text{FCI}} \in \mathcal{I})$
that the true value of the excitation energy lies within the interval is
equal to
$P( \Delta E_{\text{FCI}} \in [ \Delta E \pm \sigma ] \; | \; \mathcal{G}) = 0.6827$.
The probability that the FCI excitation energy is in an interval
$\mathcal{I}$ is
and we treat all CIPSI iterations as a set of Gaussian-distributed variables ($\mathcal{G}$) with weights $w(n) = 1/\sqrt{\sigma^2(n)}$.
We then search for a confidence interval $\mathcal{I}$ such that the true value of the excitation energy $\Delta E_{\text{FCI}}^{(m)}$ lies within one standard deviation of $\Delta E_\text{CIPSI}^{(m)}$, i.e., $P( \Delta E_{\text{FCI}} \in [ \Delta E_\text{CIPSI}^{(m)} \pm \sigma ] \; | \; \mathcal{G}) = 0.6827$.
The probability that $\Delta E_{\text{FCI}}^{(m)}$ is in an interval $\mathcal{I}$ is
\begin{equation}
P( \Delta E_{\text{FCI}} \in \mathcal{I} ) = P( E_{\text{FCI}} \in I | \mathcal{G}) \times P(\mathcal{G})
P( \Delta E_{\text{FCI}}^{(m)} \in \mathcal{I} ) = P( \Delta E_{\text{FCI}}^{(m)} \in I | \mathcal{G}) \times P(\mathcal{G})
\end{equation}
where the probability $P(\mathcal{G})$ that the random variables are
normally distributed can be deduced from the Jarque-Bera test $J$ as
where the probability $P(\mathcal{G})$ that the random variables are normally distributed can be deduced from the Jarque-Bera test $J$ as
\begin{equation}
P(\mathcal{G}) = 1 - \chi^2_{\text{CDF}}(J,2)
\end{equation}
where $\chi^2_{\text{CDF}}(x,k)$ is the cumulative distribution function of the
$\chi^2$ distribution with $k$ degrees of freedom.
As the number of samples is usually small, we use Student's $t$ distribution to
estimate the statistical error. The inverse of the cumulative
distribution function of the $t$ distribution will allow us to find how
to scale the interval with a parameter $\beta$ such that
$P( \Delta E_{\text{FCI}} \in [ \Delta E \pm \beta \sigma ] ) = p$.
where $\chi^2_{\text{CDF}}(x,k)$ is the cumulative distribution function of the $\chi^2$-distribution with $k$ degrees of freedom.
As the number of samples is usually small, we use Student's $t$-distribution to estimate the statistical error.
The inverse of the cumulative distribution function of the $t$-distribution allows us to find how to scale the interval by a parameter
\begin{equation}
%\beta = t_{\text{CDF}}^{-1} \left[
%\frac{1}{2} \left( 1 + \frac{P( \Delta E_{\text{FCI}} \in [ \Delta E \pm \sigma ] \; | \; \mathcal{G}) }{P(\mathcal{G})}\right), n \right]
\beta = t_{\text{CDF}}^{-1} \left[
\frac{1}{2} \left( 1 + \frac{0.6827}{P(\mathcal{G})}\right), n \right]
\end{equation}
Only the last $M>2$ computed energy differences are considered. $M$ is chosen
such that $P(\mathcal{G})>0.8$ and such that the error bar is minimal.
If all the values of $P(\mathcal{G})$ are below $0.8$, $M$ is chosen such that
$P(\mathcal{G})$ is maximal.
such that $P( \Delta E_{\text{FCI}}^{(m)} \in [ \Delta E_{\text{CIPSI}}^{(m)} \pm \beta \sigma ] ) = p$.
Only the last $M>2$ computed energy differences are considered. $M$ is chosen such that $P(\mathcal{G})>0.8$ and such that the error bar is minimal.
If all the values of $P(\mathcal{G})$ are below $0.8$, $M$ is chosen such that $P(\mathcal{G})$ is maximal.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The QUEST database}
@ -324,17 +303,25 @@ and we treat all CIPSI iterations as samples coming from the same Gaussian proce
%=======================
\subsection{Overview}
%=======================
The QUEST database gathers more than \alert{470} highly-accurate excitation energies of various natures (valence, Rydberg, $n \ra \pis$, $\pi \ra \pis$, singlet, soublet, triplet, and double excitations) for molecules ranging from diatomics to molecules as large as naphthalene.
The QUEST database gathers more than \alert{470} highly-accurate excitation energies of various natures (valence, Rydberg, $n \ra \pis$, $\pi \ra \pis$, singlet, doublet, triplet, and double excitations) for molecules ranging from diatomics to molecules as large as naphthalene.
Each of the five subsets making up the QUEST dataset is detailed below.
Throughout the present article, we report several statistical indicators: the mean signed error (MSE), mean absolute error (MAE), root-mean square error (RMSE), and standard deviation of the errors (SDE).
%%% FIGURE 1 %%%
\begin{figure}[bt]
\begin{figure}[ht]
\centering
\includegraphics[width=0.5\linewidth]{fig1/fig1}
\caption{Composition of each of the five subsets making up the present QUEST dataset of highly-accurate vertical excitation energies.}
\end{figure}
%%% FIGURE 2 %%%
\begin{figure}[ht]
\centering
\includegraphics[width=0.8\linewidth]{fig2}
\caption{Molecules each of the five subsets making up the present QUEST dataset of highly-accurate vertical excitation energies:
QUEST\#1 (red), QUEST\#2 (magenta and/or underlined), QUEST\#3 (black), QUEST\#4 (green), and QUEST\#5 (blue).}
\end{figure}
%=======================
\subsection{QUEST\#1}
%=======================

Binary file not shown.

View File

@ -22,12 +22,12 @@ decoration={snake,
\begin{tikzpicture}
\begin{scope}[very thick,
node distance=2cm,on grid,>=stealth',
QUEST0/.style={circle,draw,fill=green!45},
QUEST1/.style={rectangle,draw,fill=yellow!45},
QUEST2/.style={rectangle,draw,fill=orange!45},
QUEST3/.style={rectangle,draw,fill=red!45},
QUEST4/.style={rectangle,draw,fill=violet!45},
QUEST5/.style={rectangle,draw,fill=black!45}]
QUEST0/.style={circle,draw,fill=orange!45},
QUEST1/.style={rectangle,draw,fill=red!45},
QUEST2/.style={rectangle,draw,fill=magenta!45},
QUEST3/.style={rectangle,draw,fill=black!45},
QUEST4/.style={rectangle,draw,fill=green!45},
QUEST5/.style={rectangle,draw,fill=blue!45}]
\node [QUEST0, align=center] (Q) at (4*0, 4*0) {QUEST \\ \tiny 470 highly-accurate \\ \tiny excitations };
\node [QUEST1, align=center] (Q1) at (4*0.587785, 4*0.809017) {QUEST\#1 \\ \tiny small-sized molecules \\ \tiny \bf \red{JCTC 14 (2018) 4360}};
\node [QUEST2, align=center] (Q2) at (4*0.951057, -4*0.309017) {QUEST\#2 \\ \tiny double excitations \\ \tiny \bf \red{JCTC 15 (2019) 1939}};

BIN
Manuscript/fig2.pdf Normal file

Binary file not shown.