toto part

2020-09-08 15:10:52 +02:00 · 2020-09-08 15:10:52 +02:00 · 830a893fe2
commit 830a893fe2
parent a507931e79
4 changed files with 37 additions and 50 deletions
--- a/Manuscript/QUEST_WIREs.tex
+++ b/Manuscript/QUEST_WIREs.tex
@ -241,12 +241,12 @@ The definition of the active space considered for each system as well as the num
 \subsubsection{Estimating the extrapolation error}
 %------------------------------------------------
-For the $m$th excited states (where $m = 0$ corresponds to the ground state), we usually estimate its FCI energy by performing a linear extrapolation of its variational energy $E_\text{var}^{(m)}$ as a function of its rPT2 correction $E_{\text{rPT2}}^{(m)}$ as follows
+For the $m$th excited state (where $m = 0$ corresponds to the ground state), we usually estimate its FCI energy $E_{\text{FCI}}^{(m)}$ by performing a linear extrapolation of its variational energy $E_\text{var}^{(m)}$ as a function of its rPT2 correction $E_{\text{rPT2}}^{(m)}$ as follows
 \begin{equation}
  E_\text{var}^{(m)} = E_{\text{FCI}}^{(m)} - \alpha^{(m)} E_{\text{rPT2}}^{(m)}
 \end{equation}
 $E_\text{var}^{(m)}$ varies almost linearly as a function of $E_{\text{rPT2}}^{(m)}$, but with a coefficient $\alpha^{(m)}$ which deviates slightly from unity in well-behaved cases. 
-This implies that at any iteration of the CIPSI algorithm, the estimated error on the CIPSI energy is 
+This implies that, at any iteration of the CIPSI algorithm, the estimated error on the CIPSI energy is 
 \begin{equation}
  E_{\text{CIPSI}}^{(m)} - E_{\text{FCI}}^{(m)} 
  = \qty(E_\text{var}^{(m)}+E_{\text{rPT2}}^{(m)}) - E_{\text{FCI}}^{(m)} 
@ -257,64 +257,43 @@ Therefore, the accuracy of the excitation energy estimates will strongly depend
 Because our selection procedure ensures that the rPT2 values of both states match as well as possible (a trick known as PT2 matching \cite{Dash_2018,Dash_2019}), i.e., $E_{\text{rPT2}} = E_{\text{rPT2}}^{(0)} \approx E_{\text{rPT2}}^{(m)}$, the extrapolated excitation energy associated with the $m$th excited state can be estimated as
 \begin{equation}
-\begin{split}
+  \Delta E_{\text{FCI}}^{(m)} 
-  \Delta E^{(m)} 
+  = \qty[ E_\text{var}^{(m)} + E_{\text{rPT2}} + \qty(\alpha^{(m)}-1) E_{\text{rPT2}} ]   
-  & = E^{(m)}_{\text{CIPSI}} - E^{(0)}_{\text{CIPSI}}  
+  - \qty[ E_\text{var}^{(0)} + E_{\text{rPT2}} + \qty(\alpha^{(0)}-1) E_{\text{rPT2}} ]  
  \\ 
   & = \qty[ E^{(m)} + E_{\text{rPT2}} + \qty(\alpha^{(n)}-1) E_{\text{rPT2}} ]   
     - \qty[ E^{(0)} + E_{\text{rPT2}} + \qty(\alpha^{(0)}-1) E_{\text{rPT2}} ]  
  + \order{E_{\text{rPT2}}^2 }
 \end{split}
 \end{equation}
-which evidences that the error on $\Delta E^{(m)}$ can be expressed as $\qty(\alpha^{(m)}-\alpha^{(0)}) E_{\text{rPT2}} + \order{E_{\text{rPT2}}^2}$.
+which evidences that the error in $\Delta E_{\text{FCI}}^{(m)}$ can be expressed as $\qty(\alpha^{(m)}-\alpha^{(0)}) E_{\text{rPT2}} + \order{E_{\text{rPT2}}^2}$.
 Moreover, using a common set of state-averaged natural orbitals for the ground and excited states tends to make the values of $\alpha^{(0)}$ and $\alpha^{(m)}$ very close to each other, such that the error on the energy difference is practically of the order of $E_{\text{rPT2}}^2$. 
-At the $n$th CIPSI iteration, we have access to the variational energies of both states, $E^{(0)}(n)$ and $E^{(m)}(n)$, as well as their the rPT2 corrections, $E_{\text{rPT2}}^{(0)}(n)$ and $E_{\text{rPT2}}^{(m)}(n)$.
+At the $n$th CIPSI iteration, we have access to the variational energies of both states, $E_\text{var}^{(0)}(n)$ and $E_\text{var}^{(m)}(n)$, as well as their rPT2 corrections, $E_{\text{rPT2}}^{(0)}(n)$ and $E_{\text{rPT2}}^{(m)}(n)$.
-The $m$th excitation energy at iteration $n$ is then modeled as a Gaussian random variable with mean and variance 
+The $m$th excitation energy at iteration $n$ is then assumed to be a Gaussian random variable with mean and variance 
 \begin{gather}
-  \Delta E^{(m)}(n) = \qty[ E^{(m)}(n) + E_{\text{rPT2}}^{(m)}(n) ] - \qty[ E^{(0)}(n) + E_{\text{rPT2}}^{(0)}(n) ]
+  \Delta E_\text{CIPSI}^{(m)}(n) = \qty[ E_\text{var}^{(m)}(n) + E_{\text{rPT2}}^{(m)}(n) ] - \qty[ E_\text{var}^{(0)}(n) + E_{\text{rPT2}}^{(0)}(n) ]
  \\
  \sigma^2(n) \propto \qty[E_{\text{rPT2}}^{(m)}(n)]^2 + \qty[E_{\text{rPT2}}^{(0)}(n)]^2
 \end{gather}
-and we treat all CIPSI iterations as samples coming from the same Gaussian process with weights $w(n) = 1/\sqrt{\sigma^2(n)}$.
+and we treat all CIPSI iterations as a set of Gaussian-distributed variables ($\mathcal{G}$) with weights $w(n) = 1/\sqrt{\sigma^2(n)}$.
-
+We then search for a confidence interval $\mathcal{I}$ such that the true value of the excitation energy $\Delta E_{\text{FCI}}^{(m)}$ lies within one standard deviation of $\Delta E_\text{CIPSI}^{(m)}$, i.e., $P( \Delta E_{\text{FCI}} \in [ \Delta E_\text{CIPSI}^{(m)} \pm \sigma ] \; | \; \mathcal{G}) = 0.6827$. 
-   The confidence interval is chosen to be equivalent to what one
+The probability that $\Delta E_{\text{FCI}}^{(m)}$ is in an interval $\mathcal{I}$ is
   would obtain using $\pm 1$ standard deviation with Gaussian-distributed
   variables ($\mathcal{G}$). In other words, we will search for an interval $\mathcal{I}$
   such that the probability $P( \Delta E_{\text{FCI}} \in \mathcal{I})$
   that the true value of the excitation energy lies within the interval is
   equal to
   $P( \Delta E_{\text{FCI}} \in [ \Delta E \pm \sigma ] \; | \; \mathcal{G}) = 0.6827$.
   The probability that the FCI excitation energy is in an interval
   $\mathcal{I}$ is
 \begin{equation}
-   P( \Delta E_{\text{FCI}} \in \mathcal{I} ) = P( E_{\text{FCI}} \in I | \mathcal{G}) \times P(\mathcal{G}) 
+   P( \Delta E_{\text{FCI}}^{(m)} \in \mathcal{I} ) = P( \Delta E_{\text{FCI}}^{(m)} \in I | \mathcal{G}) \times P(\mathcal{G}) 
 \end{equation}
-   where the probability $P(\mathcal{G})$ that the random variables are
+where the probability $P(\mathcal{G})$ that the random variables are normally distributed can be deduced from the Jarque-Bera test $J$ as
   normally distributed can be deduced from the Jarque-Bera test $J$ as
 \begin{equation}
   P(\mathcal{G}) = 1 - \chi^2_{\text{CDF}}(J,2)
 \end{equation}
-   where $\chi^2_{\text{CDF}}(x,k)$ is the cumulative distribution function of the
+where $\chi^2_{\text{CDF}}(x,k)$ is the cumulative distribution function of the $\chi^2$-distribution with $k$ degrees of freedom.
-   $\chi^2$ distribution with $k$ degrees of freedom.
+As the number of samples is usually small, we use Student's $t$-distribution to estimate the statistical error. 
-   As the number of samples is usually small, we use Student's $t$ distribution to
+The inverse of the cumulative distribution function of the $t$-distribution allows us to find how to scale the interval by a parameter 
   estimate the statistical error. The inverse of the cumulative
   distribution function of the $t$ distribution will allow us to find how
   to scale the interval with a parameter $\beta$ such that
   $P( \Delta E_{\text{FCI}} \in [ \Delta E \pm \beta \sigma ] ) = p$.
 \begin{equation}
   %\beta = t_{\text{CDF}}^{-1} \left[ 
   %\frac{1}{2} \left( 1 + \frac{P( \Delta E_{\text{FCI}} \in [ \Delta E \pm \sigma ] \; | \; \mathcal{G}) }{P(\mathcal{G})}\right), n \right]
   \beta = t_{\text{CDF}}^{-1} \left[ 
   \frac{1}{2} \left( 1 + \frac{0.6827}{P(\mathcal{G})}\right), n \right]
 \end{equation}
-   Only the last $M>2$ computed energy differences are considered. $M$ is chosen
+such that $P( \Delta E_{\text{FCI}}^{(m)} \in [ \Delta E_{\text{CIPSI}}^{(m)} \pm \beta \sigma ] ) = p$.
-   such that $P(\mathcal{G})>0.8$ and such that the error bar is minimal.
+Only the last $M>2$ computed energy differences are considered. $M$ is chosen such that $P(\mathcal{G})>0.8$ and such that the error bar is minimal.
-   If all the values of $P(\mathcal{G})$ are below $0.8$, $M$ is chosen such that
+If all the values of $P(\mathcal{G})$ are below $0.8$, $M$ is chosen such that $P(\mathcal{G})$ is maximal.
   $P(\mathcal{G})$ is maximal.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{The QUEST database}
@ -324,17 +303,25 @@ and we treat all CIPSI iterations as samples coming from the same Gaussian proce
 %=======================
 \subsection{Overview}
 %=======================
-The QUEST database gathers more than \alert{470} highly-accurate excitation energies of various natures (valence, Rydberg, $n \ra \pis$, $\pi \ra \pis$, singlet, soublet, triplet, and double excitations) for molecules ranging from diatomics to molecules as large as naphthalene.
+The QUEST database gathers more than \alert{470} highly-accurate excitation energies of various natures (valence, Rydberg, $n \ra \pis$, $\pi \ra \pis$, singlet, doublet, triplet, and double excitations) for molecules ranging from diatomics to molecules as large as naphthalene.
 Each of the five subsets making up the QUEST dataset is detailed below.
 Throughout the present article, we report several statistical indicators: the mean signed error (MSE), mean absolute error (MAE), root-mean square error (RMSE), and standard deviation of the errors (SDE).
 %%% FIGURE 1 %%%
-\begin{figure}[bt]
+\begin{figure}[ht]
 	\centering
 	\includegraphics[width=0.5\linewidth]{fig1/fig1}
 	\caption{Composition of each of the five subsets making up the present QUEST dataset of highly-accurate vertical excitation energies.}
 \end{figure}
 %%% FIGURE 2 %%%
 \begin{figure}[ht]
 	\centering
 	\includegraphics[width=0.8\linewidth]{fig2}
 	\caption{Molecules each of the five subsets making up the present QUEST dataset of highly-accurate vertical excitation energies: 
 	QUEST\#1 (red), QUEST\#2 (magenta and/or underlined), QUEST\#3 (black), QUEST\#4 (green), and QUEST\#5 (blue).}
 \end{figure}
 %=======================
 \subsection{QUEST\#1}
 %=======================
--- a/Manuscript/fig1/fig1.pdf
+++ b/Manuscript/fig1/fig1.pdf
--- a/Manuscript/fig1/fig1.tex
+++ b/Manuscript/fig1/fig1.tex
@ -22,12 +22,12 @@ decoration={snake,
 \begin{tikzpicture}
 	\begin{scope}[very thick,
 		node distance=2cm,on grid,>=stealth',
-		QUEST0/.style={circle,draw,fill=green!45},
+		QUEST0/.style={circle,draw,fill=orange!45},
-		QUEST1/.style={rectangle,draw,fill=yellow!45},
+		QUEST1/.style={rectangle,draw,fill=red!45},
-		QUEST2/.style={rectangle,draw,fill=orange!45},
+		QUEST2/.style={rectangle,draw,fill=magenta!45},
-		QUEST3/.style={rectangle,draw,fill=red!45},
+		QUEST3/.style={rectangle,draw,fill=black!45},
-		QUEST4/.style={rectangle,draw,fill=violet!45},
+		QUEST4/.style={rectangle,draw,fill=green!45},
-		QUEST5/.style={rectangle,draw,fill=black!45}]
+		QUEST5/.style={rectangle,draw,fill=blue!45}]
 		\node [QUEST0, align=center]		(Q)		at (4*0, 		 4*0)			{QUEST \\ \tiny 470 highly-accurate \\ \tiny excitations };
 		\node [QUEST1, align=center]		(Q1)	at (4*0.587785,  4*0.809017)	{QUEST\#1 \\ \tiny small-sized molecules \\ \tiny \bf \red{JCTC 14 (2018) 4360}};
 		\node [QUEST2, align=center] 		(Q2)	at (4*0.951057, -4*0.309017)	{QUEST\#2 \\ \tiny double excitations \\ \tiny \bf \red{JCTC 15 (2019) 1939}};
--- a/Manuscript/fig2.pdf
+++ b/Manuscript/fig2.pdf