Modifs Mimi

This commit is contained in:
Anthony Scemama 2020-11-29 01:00:03 +01:00
parent f28d600231
commit d7cd8332a6
8 changed files with 427 additions and 242 deletions

View File

@ -1,9 +1,9 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% A template for Wiley article submissions.
% Developed by Overleaf.
% Developed by Overleaf.
%
% Please note that whilst this template provides a
% preview of the typeset manuscript for submission, it
% Please note that whilst this template provides a
% preview of the typeset manuscript for submission, it
% will not necessarily be the final publication layout.
%
% Usage notes:
@ -83,14 +83,14 @@
\maketitle
\begin{abstract}
We describe our efforts of the past few years to create a large set of more than 500 highly-accurate vertical excitation energies of various natures ($\pi \to \pis$, $n \to \pis$, double excitation,
Rydberg, singlet, doublet, triplet, etc) in small- and medium-sized molecules. These values have been obtained using an incremental strategy which consists in combining high-order coupled
cluster and selected configuration interaction calculations using increasingly large diffuse basis sets in order to reach high accuracy. One of the key aspect of the so-called QUEST database
of vertical excitations is that it does not rely on any experimental values, avoiding potential biases inherently linked to experiments and facilitating theoretical cross comparisons. Following this
composite protocol, we have been able to produce theoretical best estimate (TBEs) with the aug-cc-pVTZ basis set for each of these transitions, as well as basis set corrected TBEs (i.e., near
the complete basis set limit) for some of them. The TBEs/aug-cc-pVTZ have been employed to benchmark a large number of (lower-order) wave function methods such as CIS(D), ADC(2), CC2,
STEOM-CCSD, CCSD, CCSDR(3), CCSDT-3, ADC(3), CC3, NEVPT2, and others (including spin-scaled variants). In order to gather the huge amount of data produced during the QUEST
project, we have created a website [\url{https://lcpq.github.io/QUESTDB_website}] where one can easily test and compare the accuracy of a given method with respect to various variables
We describe our efforts of the past few years to create a large set of more than 500 highly-accurate vertical excitation energies of various natures ($\pi \to \pis$, $n \to \pis$, double excitation,
Rydberg, singlet, doublet, triplet, etc) in small- and medium-sized molecules. These values have been obtained using an incremental strategy which consists in combining high-order coupled
cluster and selected configuration interaction calculations using increasingly large diffuse basis sets in order to reach high accuracy. One of the key aspect of the so-called QUEST database
of vertical excitations is that it does not rely on any experimental values, avoiding potential biases inherently linked to experiments and facilitating theoretical cross comparisons. Following this
composite protocol, we have been able to produce theoretical best estimate (TBEs) with the aug-cc-pVTZ basis set for each of these transitions, as well as basis set corrected TBEs (i.e., near
the complete basis set limit) for some of them. The TBEs/aug-cc-pVTZ have been employed to benchmark a large number of (lower-order) wave function methods such as CIS(D), ADC(2), CC2,
STEOM-CCSD, CCSD, CCSDR(3), CCSDT-3, ADC(3), CC3, NEVPT2, and others (including spin-scaled variants). In order to gather the huge amount of data produced during the QUEST
project, we have created a website [\url{https://lcpq.github.io/QUESTDB_website}] where one can easily test and compare the accuracy of a given method with respect to various variables
such as the molecule size or its family, the nature of the excited states, the type of basis set, etc.
%Add website address here
We hope that the present review will provide a useful summary of our effort so far and foster new developments around excited-state methods.
@ -102,57 +102,57 @@ We hope that the present review will provide a useful summary of our effort so f
\section{Introduction}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Nowadays, there exist a very large number of electronic structure computational approaches, more or less expensive depending on their overall accuracy, able to quantitatively predict the
absolute and/or relative energies of electronic states in molecular systems \cite{SzaboBook,JensenBook,CramerBook,HelgakerBook}. One important aspect of some of these theoretical
Nowadays, there exist a very large number of electronic structure computational approaches, more or less expensive depending on their overall accuracy, able to quantitatively predict the
absolute and/or relative energies of electronic states in molecular systems \cite{SzaboBook,JensenBook,CramerBook,HelgakerBook}. One important aspect of some of these theoretical
methods is their ability to access the energies of electronic excited states, i.e., states that have higher total energies than the so-called ground (that is, lowest-energy) state
\cite{Roos_1996,Piecuch_2002,Dreuw_2005,Krylov_2006,Sneskov_2012,Gonzales_2012,Laurent_2013,Adamo_2013,Ghosh_2018,Blase_2020,Loos_2020a}.
The faithful description of excited states is particularly challenging from a theoretical point of view but is key to a deeper understanding of photochemical and photophysical processes
The faithful description of excited states is particularly challenging from a theoretical point of view but is key to a deeper understanding of photochemical and photophysical processes
like absorption, fluorescence, phosphorescence, chemoluminescence, and others \cite{Bernardi_1996,Olivucci_2010,Robb_2007,Navizet_2011,Crespo_2018,Robb_2018,Mai_2020}.
For a given level of theory, ground-state methods are usually more accurate than their excited-state analogs.
The reasons behind this are (at least) threefold:
i) accurately modeling the electronic structure of excited states usually requires larger one-electron basis sets (including diffuse functions most of the times) than their ground-state counterpart,
The reasons behind this are (at least) threefold:
i) accurately modeling the electronic structure of excited states usually requires larger one-electron basis sets (including diffuse functions most of the times) than their ground-state counterpart,
ii) excited states can be governed by different amounts of dynamic/static correlations, present very different physical natures ($\pi \to \pis$, $n \to \pis$, charge transfer, double excitation, valence, Rydberg, singlet, doublet, triplet, etc), yet be very close in energy from one another, and
iii) one usually has to rely on response theory formalisms \cite{Monkhorst_1977,Helgaker_1989,Koch_1990,Koch_1990b,Christiansen_1995b,Christiansen_1998b,Hattig_2003,Kallay_2004,Hattig_2005c}, which inherently introduce a ground-state ``bias''.
Hence, designing excited-state methods able to tackle simultaneously and on an equal footing all these types of excited states at an affordable cost remains an open challenge in theoretical computational chemistry as evidenced by the large number of review
Hence, designing excited-state methods able to tackle simultaneously and on an equal footing all these types of excited states at an affordable cost remains an open challenge in theoretical computational chemistry as evidenced by the large number of review
articles on this particular subject \cite{Roos_1996,Piecuch_2002,Dreuw_2005,Krylov_2006,Sneskov_2012,Gonzales_2012,Laurent_2013,Adamo_2013,Dreuw_2015,Ghosh_2018,Blase_2020,Loos_2020a}.
When designing a new theoretical model, the first feature that one might want to test is its overall accuracy, i.e., its ability to reproduce reference (or benchmark) values for a given system with a well-defined
setup (same geometry, basis set, etc). These values can be absolute and/or relative energies, geometrical parameters, physical or chemical spectroscopic properties extracted from experiments,
high-level theoretical calculations, or any combination of these. To this end, the electronic structure community has designed along the years benchmark sets, i.e., sets of molecules for which one
can (very) accurately compute theoretical estimates and/or access solid experimental data for given properties. Regarding ground-states properties, two of the oldest and most employed sets are
probably the Gaussian-1 and Gaussian-2 benchmark sets \cite{Pople_1989,Curtiss_1991,Curtiss_1997} developed by the group of Pople in the 1990's. For example, the Gaussian-2 set gathers atomization
energies, ionization energies, electron affinities, proton affinities, bond dissociation energies, and reaction barriers. This set was subsequently extended and refined \cite{Curtiss_1998,Curtiss_2007}.
Another very useful set for the design of methods able to catch dispersion effects \cite{Angyan_2020} is the S22 benchmark set \cite{Jureka_2006} (and its extended S66 version \cite{Rezac_2011})
of Hobza and collaborators which provides benchmark interaction energies for weakly-interacting (non covalent) systems. One could also mentioned the $GW$100 set \cite{vanSetten_2015,Krause_2015,Maggio_2016}
(and its $GW$5000 extension \cite{Stuke_2020}) of ionization energies which has helped enormously the community to compare the implementation of $GW$-type methods for molecular
systems \cite{vanSetten_2013,Bruneval_2016,Caruso_2016,Govoni_2018}. The extrapolated ab initio thermochemistry (HEAT) set designed to achieve high accuracy for enthalpies of formation
of atoms and small molecules (without experimental data) is yet another successful example of benchmark set \cite{Tajti_2004,Bomble_2006,Harding_2008}. More recently, let us mention the benchmark datasets
of the \textit{Simons Collaboration on the Many-Electron Problem} providing, for example, highly-accurate ground-state energies for
hydrogen chains \cite{Motta_2017} as well as transition metal atoms and their ions and monoxides \cite{Williams_2020}. Let us also mention the set of Zhao and Truhlar for small transition metal complexes
employed to compare the accuracy of density-functional methods \cite{ParrBook} for $3d$ transition-metal chemistry \cite{Zhao_2006}, and finally the popular GMTKN24 \cite{Goerigk_2010},
GMTKN30 \cite{Goerigk_2011a,Goerigk_2011b} and GMTKN55 \cite{Goerigk_2017} databases for general main group thermochemistry, kinetics, and non-covalent interactions developed by Goerigk, Grimme and
When designing a new theoretical model, the first feature that one might want to test is its overall accuracy, i.e., its ability to reproduce reference (or benchmark) values for a given system with a well-defined
setup (same geometry, basis set, etc). These values can be absolute and/or relative energies, geometrical parameters, physical or chemical spectroscopic properties extracted from experiments,
high-level theoretical calculations, or any combination of these. To this end, the electronic structure community has designed along the years benchmark sets, i.e., sets of molecules for which one
can (very) accurately compute theoretical estimates and/or access solid experimental data for given properties. Regarding ground-states properties, two of the oldest and most employed sets are
probably the Gaussian-1 and Gaussian-2 benchmark sets \cite{Pople_1989,Curtiss_1991,Curtiss_1997} developed by the group of Pople in the 1990's. For example, the Gaussian-2 set gathers atomization
energies, ionization energies, electron affinities, proton affinities, bond dissociation energies, and reaction barriers. This set was subsequently extended and refined \cite{Curtiss_1998,Curtiss_2007}.
Another very useful set for the design of methods able to catch dispersion effects \cite{Angyan_2020} is the S22 benchmark set \cite{Jureka_2006} (and its extended S66 version \cite{Rezac_2011})
of Hobza and collaborators which provides benchmark interaction energies for weakly-interacting (non covalent) systems. One could also mentioned the $GW$100 set \cite{vanSetten_2015,Krause_2015,Maggio_2016}
(and its $GW$5000 extension \cite{Stuke_2020}) of ionization energies which has helped enormously the community to compare the implementation of $GW$-type methods for molecular
systems \cite{vanSetten_2013,Bruneval_2016,Caruso_2016,Govoni_2018}. The extrapolated ab initio thermochemistry (HEAT) set designed to achieve high accuracy for enthalpies of formation
of atoms and small molecules (without experimental data) is yet another successful example of benchmark set \cite{Tajti_2004,Bomble_2006,Harding_2008}. More recently, let us mention the benchmark datasets
of the \textit{Simons Collaboration on the Many-Electron Problem} providing, for example, highly-accurate ground-state energies for
hydrogen chains \cite{Motta_2017} as well as transition metal atoms and their ions and monoxides \cite{Williams_2020}. Let us also mention the set of Zhao and Truhlar for small transition metal complexes
employed to compare the accuracy of density-functional methods \cite{ParrBook} for $3d$ transition-metal chemistry \cite{Zhao_2006}, and finally the popular GMTKN24 \cite{Goerigk_2010},
GMTKN30 \cite{Goerigk_2011a,Goerigk_2011b} and GMTKN55 \cite{Goerigk_2017} databases for general main group thermochemistry, kinetics, and non-covalent interactions developed by Goerigk, Grimme and
their coworkers.
The examples of benchmark sets presented above are all designed for ground-state properties, and there exists specific protocols taylored to accurately model excited-state energies and properties as well.
Indeed, benchmark datasets of excited-state energies and/or properties are less numerous than their ground-state counterparts but their number has been growing at a consistent pace in the past few years.
Below, we provide a short description for some of them. One of the most characteristic example is the benchmark set of vertical excitation energies proposed by Thiel and coworkers
\cite{Schreiber_2008,Silva-Junior_2008,Silva-Junior_2010,Silva-Junior_2010b,Silva-Junior_2010c}. The so-called Thiel (or M\"ulheim) set of excitation energies gathers a large number of excitation energies
determined in 28 medium-sized organic CNOH molecules with a total of 223 valence excited states (152 singlet and 71 triplet states) for which theoretical best estimates (TBEs) were defined.
In their first study, Thiel and collaborators performed CC2 \cite{Christiansen_1995a,Hattig_2000}, CCSD \cite{Rowe_1968,Koch_1990,Stanton_1993,Koch_1994}, CC3 \cite{Christiansen_1995b,Koch_1997}, and
CASPT2 \cite{Andersson_1990,Andersson_1992,Roos,Roos_1996} calculations (with the TZVP basis) on MP2/6-31G(d) geometries in order to provide (based on additional high-quality literature data) TBEs for these
transitions. These TBEs were quickly refined with the larger aug-cc-pVTZ basis set \cite{Silva-Junior_2010b,Silva-Junior_2010c}. In the same spirit, it is also worth mentioning Gordon's set of vertical transitions
(based on experimental values) \cite{Leang_2012} used to benchmark the performance of time-dependent density-functional theory (TD-DFT) \cite{Runge_1984,Casida_1995,Casida_2012,Ulrich_2012}, as well
Indeed, benchmark datasets of excited-state energies and/or properties are less numerous than their ground-state counterparts but their number has been growing at a consistent pace in the past few years.
Below, we provide a short description for some of them. One of the most characteristic example is the benchmark set of vertical excitation energies proposed by Thiel and coworkers
\cite{Schreiber_2008,Silva-Junior_2008,Silva-Junior_2010,Silva-Junior_2010b,Silva-Junior_2010c}. The so-called Thiel (or M\"ulheim) set of excitation energies gathers a large number of excitation energies
determined in 28 medium-sized organic CNOH molecules with a total of 223 valence excited states (152 singlet and 71 triplet states) for which theoretical best estimates (TBEs) were defined.
In their first study, Thiel and collaborators performed CC2 \cite{Christiansen_1995a,Hattig_2000}, CCSD \cite{Rowe_1968,Koch_1990,Stanton_1993,Koch_1994}, CC3 \cite{Christiansen_1995b,Koch_1997}, and
CASPT2 \cite{Andersson_1990,Andersson_1992,Roos,Roos_1996} calculations (with the TZVP basis) on MP2/6-31G(d) geometries in order to provide (based on additional high-quality literature data) TBEs for these
transitions. These TBEs were quickly refined with the larger aug-cc-pVTZ basis set \cite{Silva-Junior_2010b,Silva-Junior_2010c}. In the same spirit, it is also worth mentioning Gordon's set of vertical transitions
(based on experimental values) \cite{Leang_2012} used to benchmark the performance of time-dependent density-functional theory (TD-DFT) \cite{Runge_1984,Casida_1995,Casida_2012,Ulrich_2012}, as well
as its extended version by Goerigk and coworkers who decided to replace the experimental reference values by CC3 excitation energies \cite{Schwabe_2017,Casanova-Paez_2019,Casanova_Paes_2020}.
For comparisons with experimental values, there also exists various sets of measured 0-0 energies used in various benchmarks, notably by the Furche \cite{Furche_2002,Send_2011a}, H\"attig \cite{Winter_2013}
and our \cite{Loos_2018,Loos_2019a,Loos_2019b} groups for gas-phase compounds and by Grimme \cite{Dierksen_2004,Goerigk_2010a} and one of us \cite{Jacquemin_2012,Jacquemin_2015b} for solvated dyes.
Let us also mention the new benchmark set of charge-transfer excited states recently introduced by Szalay and coworkers [based on equation-of-motion coupled cluster (EOM-CC) methods] \cite{Kozma_2020}
For comparisons with experimental values, there also exists various sets of measured 0-0 energies used in various benchmarks, notably by the Furche \cite{Furche_2002,Send_2011a}, H\"attig \cite{Winter_2013}
and our \cite{Loos_2018,Loos_2019a,Loos_2019b} groups for gas-phase compounds and by Grimme \cite{Dierksen_2004,Goerigk_2010a} and one of us \cite{Jacquemin_2012,Jacquemin_2015b} for solvated dyes.
Let us also mention the new benchmark set of charge-transfer excited states recently introduced by Szalay and coworkers [based on equation-of-motion coupled cluster (EOM-CC) methods] \cite{Kozma_2020}
as well as the Gagliardi-Truhlar set employed to compare the accuracy of multiconfiguration pair-density functional theory \cite{Ghosh_2018} against the well-established CASPT2 method \cite{Hoyer_2016}.
Following a similar philosophy and striving for chemical accuracy, we have recently reported in several studies highly-accurate vertical excitations for small- and medium-sized molecules
\cite{Loos_2020a,Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c}. The so-called QUEST dataset of vertical excitations which we will describe in detail in the present review article is composed by 5
subsets (see Fig.~\ref{fig:scheme}): i) a subset of excitations in small molecules containing from 1 to 3 non-hydrogen atoms known as QUEST\#1, ii) a subset of double excitations in molecules of small and
medium sizes known as QUEST\#2, iii) a subset of excitation energies for medium-sized molecules containing from 4 to 6 non-hydrogen atoms known as QUEST\#3, iv) a subset composed by more ``exotic''
Following a similar philosophy and striving for chemical accuracy, we have recently reported in several studies highly-accurate vertical excitations for small- and medium-sized molecules
\cite{Loos_2020a,Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c}. The so-called QUEST dataset of vertical excitations which we will describe in detail in the present review article is composed by 5
subsets (see Fig.~\ref{fig:scheme}): i) a subset of excitations in small molecules containing from 1 to 3 non-hydrogen atoms known as QUEST\#1, ii) a subset of double excitations in molecules of small and
medium sizes known as QUEST\#2, iii) a subset of excitation energies for medium-sized molecules containing from 4 to 6 non-hydrogen atoms known as QUEST\#3, iv) a subset composed by more ``exotic''
molecules and radicals labeled as QUEST\#4, and v) a subset known as QUEST\#5, specifically designed for the present article, gathering excitation energies in larger molecules as well as additional smaller molecules.
One of the key aspect of the QUEST dataset is that it does not rely on any experimental values, avoiding potential biases inherently linked to experiments and facilitating in the process theoretical comparisons.
Moreover, our protocol has been designed to be as uniform as possible, which means that we have designed a very systematic procedure for all excited states in order to make cross-comparison as straightforward as possible.
@ -168,27 +168,27 @@ review of the generic benchmark studies devoted to adiabatic and 0-0 energies pe
\label{fig:scheme}
\end{figure}
The QUEST dataset has the particularity to be based to a large extent on selected configuration interaction (SCI) reference excitation energies as well as high-order linear-response (LR) CC methods such as LR-CCSDT and
LR-CCSDTQ \cite{Noga_1987,Koch_1990,Kucharski_1991,Christiansen_1998b,Kucharski_2001,Kowalski_2001,Kallay_2003,Kallay_2004,Hirata_2000,Hirata_2004}. Recently, SCI methods have been a force to reckon with for
The QUEST dataset has the particularity to be based to a large extent on selected configuration interaction (SCI) reference excitation energies as well as high-order linear-response (LR) CC methods such as LR-CCSDT and
LR-CCSDTQ \cite{Noga_1987,Koch_1990,Kucharski_1991,Christiansen_1998b,Kucharski_2001,Kowalski_2001,Kallay_2003,Kallay_2004,Hirata_2000,Hirata_2004}. Recently, SCI methods have been a force to reckon with for
the computation of highly-accurate energies in small- and medium-sized molecules as they yield near full configuration interaction (FCI) quality energies for only a very tiny fraction of the computational cost of a genuine FCI calculation \cite{Booth_2009,Booth_2010,Cleland_2010,Booth_2011,Daday_2012,Blunt_2015,Ghanem_2019,Deustua_2017,Deustua_2018,Holmes_2017,Chien_2018,Li_2018,Yao_2020,Li_2020,Eriksen_2017,Eriksen_2018,Eriksen_2019a,Eriksen_2019b,Xu_2018,Xu_2020,Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020a,Loos_2020e,Eriksen_2021}.
Due to the fairly natural idea underlying these methods, the SCI family is composed of numerous members \cite{Bender_1969,Whitten_1969,Huron_1973,Abrams_2005,Bunge_2006,Bytautas_2009,Giner_2013,Caffarel_2014,Giner_2015,Garniron_2017b,Caffarel_2016a,Caffarel_2016b,Holmes_2016,Sharma_2017,Holmes_2017,Chien_2018,Scemama_2018,Scemama_2018b,Garniron_2018,Evangelista_2014,Tubman_2016,Tubman_2020,Schriber_2016,Schriber_2017,Liu_2016,Per_2017,Ohtsuka_2017,Zimmerman_2017,Li_2018,Ohtsuka_2017,Coe_2018,Loos_2019}.
Due to the fairly natural idea underlying these methods, the SCI family is composed of numerous members \cite{Bender_1969,Whitten_1969,Huron_1973,Abrams_2005,Bunge_2006,Bytautas_2009,Giner_2013,Caffarel_2014,Giner_2015,Garniron_2017b,Caffarel_2016a,Caffarel_2016b,Holmes_2016,Sharma_2017,Holmes_2017,Chien_2018,Scemama_2018,Scemama_2018b,Garniron_2018,Evangelista_2014,Tubman_2016,Tubman_2020,Schriber_2016,Schriber_2017,Liu_2016,Per_2017,Ohtsuka_2017,Zimmerman_2017,Li_2018,Ohtsuka_2017,Coe_2018,Loos_2019}.
Their fundamental philosophy consists, roughly speaking, in retaining only the most relevant determinants of the FCI space following a given criterion to slow down the exponential increase of the size of the CI expansion.
Originally developed in the late 1960's by Bender and Davidson \cite{Bender_1969} as well as Whitten and Hackmeyer \cite{Whitten_1969}, new efficient SCI algorithms have resurfaced recently.
Originally developed in the late 1960's by Bender and Davidson \cite{Bender_1969} as well as Whitten and Hackmeyer \cite{Whitten_1969}, new efficient SCI algorithms have resurfaced recently.
Three examples are iCI \cite{Liu_2014,Liu_2016,Lei_2017,Zhang_2020}, semistochastic heat-bath CI (SHCI) \cite{Holmes_2016,Holmes_2017,Sharma_2017,Li_2018,Li_2020,Yao_2020}, and \textit{Configuration Interaction using a Perturbative Selection made Iteratively} (CIPSI) \cite{Huron_1973,Giner_2013,Giner_2015,Garniron_2019}.
These flavors of SCI include a second-order perturbative (PT2) correction which is key to estimate the ``distance'' to the FCI solution (see below).
The SCI calculations performed for the QUEST set of excitation energies relies on the CIPSI algorithm, which is, from a historical point of view, one of the oldest SCI algorithms.
The SCI calculations performed for the QUEST set of excitation energies relies on the CIPSI algorithm, which is, from a historical point of view, one of the oldest SCI algorithms.
It was developed in 1973 by Huron, Rancurel, and Malrieu \cite{Huron_1973} (see also Refs.~\cite{Evangelisti_1983,Cimiraglia_1985,Cimiraglia_1987,Illas_1988,Povill_1992}).
Recently, the determinant-driven CIPSI algorithm has been efficiently implemented \cite{Garniron_2019} in the open-source programming environment QUANTUM PACKAGE by the Toulouse group enabling to perform massively
parallel computations \cite{Garniron_2017,Garniron_2018,Garniron_2019,Loos_2020e}. CIPSI is also frequently employed to provide accurate trial wave functions for quantum Monte Carlo calculations in molecules \cite{Caffarel_2014,Caffarel_2016a,Caffarel_2016b,Giner_2013,Giner_2015,Scemama_2015,Scemama_2016,Scemama_2018,Scemama_2018b,Scemama_2019,Dash_2018,Dash_2019,Scemama_2020} and more recently
Recently, the determinant-driven CIPSI algorithm has been efficiently implemented \cite{Garniron_2019} in the open-source programming environment QUANTUM PACKAGE by the Toulouse group enabling to perform massively
parallel computations \cite{Garniron_2017,Garniron_2018,Garniron_2019,Loos_2020e}. CIPSI is also frequently employed to provide accurate trial wave functions for quantum Monte Carlo calculations in molecules \cite{Caffarel_2014,Caffarel_2016a,Caffarel_2016b,Giner_2013,Giner_2015,Scemama_2015,Scemama_2016,Scemama_2018,Scemama_2018b,Scemama_2019,Dash_2018,Dash_2019,Scemama_2020} and more recently
for periodic solids \cite{Benali_2020}. We refer the interested reader to Ref.~\cite{Garniron_2019} where one can find additional details regarding the implementation of the CIPSI algorithm.
The present article is organized as follows. In Sec.~\ref{sec:tools}, we detail the specificities of our protocol by providing computational details regarding geometries, basis sets, (reference and benchmarked)
The present article is organized as follows. In Sec.~\ref{sec:tools}, we detail the specificities of our protocol by providing computational details regarding geometries, basis sets, (reference and benchmarked)
computational methods, and a new way of estimating rigorously the extrapolation error in SCI calculations which is tested by computing additional FCI values for five- and six-membered rings.
We then describe in Sec.~\ref{sec:QUEST} the content of our five QUEST subsets providing for each of them the number of reference excitation energies, the nature and size of the molecules, the list of
benchmarked methods, as well as other specificities. A special emphasis is placed on our latest (previously unpublished) add-on, QUEST\#5, specifically designed for the present manuscript where we have considered, in particular
but not only, larger molecules. Section \ref{sec:TBE} discusses the generation of the TBEs, while Sec.~\ref{sec:bench} proposes a comprehensive benchmark of various methods on the entire QUEST set which is
composed by more than 400 excitations with, in addition, a specific analysis for each type of excited states. Section \ref{sec:website} describes the feature of the website that we have specifically designed to gather the
entire data generated during these last few years. Thanks to this website, one can easily test and compare the accuracy of a given method with respect to various variables such as the molecule size or its family, the nature
We then describe in Sec.~\ref{sec:QUEST} the content of our five QUEST subsets providing for each of them the number of reference excitation energies, the nature and size of the molecules, the list of
benchmarked methods, as well as other specificities. A special emphasis is placed on our latest (previously unpublished) add-on, QUEST\#5, specifically designed for the present manuscript where we have considered, in particular
but not only, larger molecules. Section \ref{sec:TBE} discusses the generation of the TBEs, while Sec.~\ref{sec:bench} proposes a comprehensive benchmark of various methods on the entire QUEST set which is
composed by more than 400 excitations with, in addition, a specific analysis for each type of excited states. Section \ref{sec:website} describes the feature of the website that we have specifically designed to gather the
entire data generated during these last few years. Thanks to this website, one can easily test and compare the accuracy of a given method with respect to various variables such as the molecule size or its family, the nature
of the excited states, the size of the basis set, etc. Finally, we draw our conclusions in Sec.~\ref{sec:ccl} where we discuss, in particular, future projects aiming at expanding and improving the usability and accuracy of the QUEST database.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -200,7 +200,7 @@ of the excited states, the size of the basis set, etc. Finally, we draw our conc
\subsection{Geometries}
%=======================
The ground-state structures of the molecules included in the QUEST dataset have been systematically optimized at the CC3/aug-cc-pVTZ level of theory, except for a very few cases.
As shown in Refs.~\cite{Hattig_2005c,Budzak_2017}, CC3 provides extremely accurate ground- and excited-state geometries. These optimizations have been performed using DALTON 2017
As shown in Refs.~\cite{Hattig_2005c,Budzak_2017}, CC3 provides extremely accurate ground- and excited-state geometries. These optimizations have been performed using DALTON 2017
\cite{dalton} and CFOUR 2.1 \cite{cfour} applying default parameters. For the open-shell derivatives belonging to QUEST\#4 \cite{Loos_2020c}, the geometries are optimized at the UCCSD(T)/aug-cc-pVTZ level using the GAUSSIAN16 program \cite{Gaussian16} and applying the ``tight'' convergence threshold. For the purpose of the present review article, we have gathered all the geometries in the {\SupInf}.
%=======================
@ -223,24 +223,24 @@ Obviously, the smaller the molecule, the larger the basis we can afford.
For larger systems (\ie, 4--6 non-hydrogen atom), one cannot afford SCI calculations anymore except in a few special occasions, and we then rely on LR-CC theory (LR-CCSDT and LR-CCSDTQ typically \cite{Kucharski_1991,Kallay_2003,Kallay_2004,Hirata_2000,Hirata_2004}) to obtain accurate transition energies.
In the following, we will omit the prefix LR for the sake of clarity, as equivalent values would be obtained with the equation-of-motion (EOM) formalism \cite{Rowe_1968,Stanton_1993}.
The CC calculations are performed with several codes.
For closed-shell molecules, CC3 \cite{Christiansen_1995b,Koch_1997} calculations are achieved with DALTON \cite{dalton} and CFOUR \cite{cfour}.
The CC calculations are performed with several codes.
For closed-shell molecules, CC3 \cite{Christiansen_1995b,Koch_1997} calculations are achieved with DALTON \cite{dalton} and CFOUR \cite{cfour}.
CCSDT and CCSDTQ calculations are performed with CFOUR \cite{cfour} and MRCC 2017 \cite{Rolik_2013,mrcc}, the latter code being also used for CCSDTQP.
%Note that all our excited-state CC calculations are performed within the equation-of-motion (EOM) or linear-response (LR) formalism that yield the same excited-state energies.
The reported oscillator strengths have been computed in the LR-CC3 formalism only.
For open-shell molecules, the CCSDT, CCSDTQ, and CCSDTQP calculations performed with MRCC \cite{Rolik_2013,mrcc} do consider an unrestricted Hartree-Fock wave function as reference but for a few exceptions.
All excited-state calculations are performed, except when explicitly mentioned, in the frozen-core (FC) approximation using large cores for the third-row atoms.
%Note that all our excited-state CC calculations are performed within the equation-of-motion (EOM) or linear-response (LR) formalism that yield the same excited-state energies.
The reported oscillator strengths have been computed in the LR-CC3 formalism only.
For open-shell molecules, the CCSDT, CCSDTQ, and CCSDTQP calculations performed with MRCC \cite{Rolik_2013,mrcc} do consider an unrestricted Hartree-Fock wave function as reference but for a few exceptions.
All excited-state calculations are performed, except when explicitly mentioned, in the frozen-core (FC) approximation using large cores for the third-row atoms.
All the SCI calculations are performed within the frozen-core approximation using QUANTUM PACKAGE \cite{Garniron_2019} where the CIPSI algorithm \cite{Huron_1973} is implemented. Details regarding this specific CIPSI implementation can be found in Refs.~\cite{Garniron_2019} and \cite{Scemama_2019}.
A state-averaged formalism is employed, i.e., the ground and excited states are described with the same set of determinants and orbitals, but different CI coefficients.
All the SCI calculations are performed within the frozen-core approximation using QUANTUM PACKAGE \cite{Garniron_2019} where the CIPSI algorithm \cite{Huron_1973} is implemented. Details regarding this specific CIPSI implementation can be found in Refs.~\cite{Garniron_2019} and \cite{Scemama_2019}.
A state-averaged formalism is employed, i.e., the ground and excited states are described with the same set of determinants and orbitals, but different CI coefficients.
Our usual protocol \cite{Scemama_2018,Scemama_2018b,Scemama_2019,Loos_2018a,Loos_2019,Loos_2020a,Loos_2020b,Loos_2020c} consists of performing a preliminary CIPSI calculation using Hartree-Fock orbitals in order to generate a CIPSI wave function with at least $10^7$ determinants.
Natural orbitals are then computed based on this wave function, and a new, larger CIPSI calculation is performed with this new set of orbitals.
Natural orbitals are then computed based on this wave function, and a new, larger CIPSI calculation is performed with this new set of orbitals.
This has the advantage to produce a smoother and faster convergence of the SCI energy toward the FCI limit.
The CIPSI energy $E_\text{CIPSI}$ is defined as the sum of the variational energy $E_\text{var}$ (computed via diagonalization of the CI matrix in the reference space) and a PT2 correction $E_\text{PT2}$ which estimates the contribution of the determinants not included in the CI space \cite{Garniron_2017b}.
By linearly extrapolating this second-order correction to zero, one can efficiently estimate the FCI limit for the total energies.
These extrapolated total energies (simply labeled as $E_\text{FCI}$ in the remainder of the paper) are then used to compute vertical excitation energies.
The CIPSI energy $E_\text{CIPSI}$ is defined as the sum of the variational energy $E_\text{var}$ (computed via diagonalization of the CI matrix in the reference space) and a PT2 correction $E_\text{PT2}$ which estimates the contribution of the determinants not included in the CI space \cite{Garniron_2017b}.
By linearly extrapolating this second-order correction to zero, one can efficiently estimate the FCI limit for the total energies.
These extrapolated total energies (simply labeled as $E_\text{FCI}$ in the remainder of the paper) are then used to compute vertical excitation energies.
Depending on the set, we estimated the extrapolation error via different techniques.
For example, in Ref.~\cite{Loos_2020b}, we estimated the extrapolation error by the difference between the transition energies obtained with the largest SCI wave function and the FCI extrapolated value.
For example, in Ref.~\cite{Loos_2020b}, we estimated the extrapolation error by the difference between the transition energies obtained with the largest SCI wave function and the FCI extrapolated value.
This definitely cannot be viewed as a true error bar, but it provides an idea of the quality of the FCI extrapolation and estimate.
Below, we provide a much cleaner way of estimating the extrapolation error in SCI methods, and we adopt this scheme for the five- and six-membered rings considered in the QUEST\#3 subset.
The particularity of the current implementation is that the selection step and the PT2 correction are computed \textit{simultaneously} via a hybrid semistochastic algorithm \cite{Garniron_2017,Garniron_2019}.
@ -253,32 +253,32 @@ Note that all our SCI wave functions are eigenfunctions of the $\Hat{S}^2$ spin
%------------------------------------------------
Using a large variety of codes, our benchmark effort consists in evaluating the accuracy of vertical transition energies obtained at lower levels of theory.
For example, we rely on GAUSSIAN \cite{Gaussian16} and TURBOMOLE 7.3 \cite{Turbomole} for CIS(D) \cite{Head-Gordon_1994,Head-Gordon_1995};
For example, we rely on GAUSSIAN \cite{Gaussian16} and TURBOMOLE 7.3 \cite{Turbomole} for CIS(D) \cite{Head-Gordon_1994,Head-Gordon_1995};
Q-CHEM 5.2 \cite{Krylov_2013} for EOM-MP2 [CCSD(2)] \cite{Stanton_1995c} and ADC(3) \cite{Trofimov_2002,Harbach_2014,Dreuw_2015};
Q-CHEM \cite{Krylov_2013} and TURBOMOLE \cite{Turbomole} for ADC(2) \cite{Trofimov_1997,Dreuw_2015};
DALTON \cite{dalton} and TURBOMOLE \cite{Turbomole} for CC2 \cite{Christiansen_1995a,Hattig_2000};
DALTON \cite{dalton} and GAUSSIAN \cite{Gaussian16} for CCSD \cite{Koch_1990,Stanton_1993,Koch_1994};
DALTON \cite{dalton} for CCSDR(3) \cite{Christiansen_1996b};
CFOUR \cite{cfour} for CCSDT-3 \cite{Watts_1996b,Prochnow_2010};
Q-CHEM \cite{Krylov_2013} and TURBOMOLE \cite{Turbomole} for ADC(2) \cite{Trofimov_1997,Dreuw_2015};
DALTON \cite{dalton} and TURBOMOLE \cite{Turbomole} for CC2 \cite{Christiansen_1995a,Hattig_2000};
DALTON \cite{dalton} and GAUSSIAN \cite{Gaussian16} for CCSD \cite{Koch_1990,Stanton_1993,Koch_1994};
DALTON \cite{dalton} for CCSDR(3) \cite{Christiansen_1996b};
CFOUR \cite{cfour} for CCSDT-3 \cite{Watts_1996b,Prochnow_2010};
and ORCA \cite{Neese_2012} for similarity-transformed EOM-CCSD (STEOM-CCSD) \cite{Nooijen_1997,Dutta_2018}.
In addition, we evaluate the spin-opposite scaling (SOS) variants of ADC(2), SOS-ADC(2), as implemented in both Q-CHEM \cite{Krauter_2013} and TURBOMOLE \cite{Hellweg_2008}.
Note that these two codes have distinct SOS implementations, as explained in Ref.~\cite{Krauter_2013}.
We also test the SOS and spin-component scaled (SCS) versions of CC2, as implemented in TURBOMOLE \cite{Hellweg_2008,Turbomole}.
Discussion of various spin-scaling schemes can be found elsewhere \cite{Goerigk_2010a}.
%When available, we take advantage of the resolution-of-the-identity (RI) approximation in TURBOMOLE and Q-CHEM.
For the STEOM-CCSD calculations, it was checked that the active character percentage was, at least, $98\%$.
%When comparisons between various codes/implementations were possible, we could not detect variations in the transition energies larger than $0.01$ eV.
In addition, we evaluate the spin-opposite scaling (SOS) variants of ADC(2), SOS-ADC(2), as implemented in both Q-CHEM \cite{Krauter_2013} and TURBOMOLE \cite{Hellweg_2008}.
Note that these two codes have distinct SOS implementations, as explained in Ref.~\cite{Krauter_2013}.
We also test the SOS and spin-component scaled (SCS) versions of CC2, as implemented in TURBOMOLE \cite{Hellweg_2008,Turbomole}.
Discussion of various spin-scaling schemes can be found elsewhere \cite{Goerigk_2010a}.
%When available, we take advantage of the resolution-of-the-identity (RI) approximation in TURBOMOLE and Q-CHEM.
For the STEOM-CCSD calculations, it was checked that the active character percentage was, at least, $98\%$.
%When comparisons between various codes/implementations were possible, we could not detect variations in the transition energies larger than $0.01$ eV.
For radicals, we applied both the U (unrestricted) and RO (restricted open-shell) versions of CCSD and CC3 as implemented in the PSI4 code \cite{Psi4} to perform our benchmarks.
Finally, the composite approach, ADC(2.5), which follows the spirit of Grimme's and Hobza's MP2.5 approach \cite{Pitonak_2009} by averaging the ADC(2) and ADC(3) excitation energies, is also tested in the following \cite{Loos_2020d}.
For the double excitations composing the QUEST database, we have performed additional calculations using various multiconfigurational methods.
In particular, state-averaged (SA) CASSCF and CASPT2 \cite{Roos,Andersson_1990} have been performed with MOLPRO (RS2 contraction level) \cite{molpro}.
Concerning the NEVPT2 calculations (which are also performed with MOLPRO), the partially-contracted (PC) and strongly-contracted (SC) variants have been tested \cite{Angeli_2001a,Angeli_2001b,Angeli_2002}.
From a strict theoretical point of view, we point out that PC-NEVPT2 is supposed to be more accurate than SC-NEVPT2 given that it has a larger number of perturbers and greater flexibility.
Concerning the NEVPT2 calculations (which are also performed with MOLPRO), the partially-contracted (PC) and strongly-contracted (SC) variants have been tested \cite{Angeli_2001a,Angeli_2001b,Angeli_2002}.
From a strict theoretical point of view, we point out that PC-NEVPT2 is supposed to be more accurate than SC-NEVPT2 given that it has a larger number of perturbers and greater flexibility.
PC-NEVPT2 calculations were also systematically performed for the QUEST\#3.
In the case of double excitations \cite{Loos_2019}, we have also performed calculations with multi-state (MS) CASPT2 (MS-MR formalism), \cite{Finley_1998} and its extended variant (XMS-CASPT2) \cite{Shiozaki_2011} when there is a strong mixing between states with same spin and spatial symmetries.
The CASPT2 calculations have been performed with level shift and IPEA parameters set to the standard values of $0.3$ and $0.25$ a.u., respectively.
Large active spaces carefully chosen and tailored for the desired transitions have been selected.
Large active spaces carefully chosen and tailored for the desired transitions have been selected.
The definition of the active space considered for each system as well as the number of states in the state-averaged calculation is provided in their corresponding publication.
%------------------------------------------------
@ -287,61 +287,121 @@ The definition of the active space considered for each system as well as the num
%------------------------------------------------
In this section, we present our scheme to estimate the extrapolation error in SCI calculations.
This new protocol is then applied to five- and six-membered ring molecules for which SCI calculations are particularly challenging even for small basis sets.
Note that the present method does only apply to ``state-averaged'' SCI calculations where ground- and excited-state energies are produced during the same calculation with the same set of molecular orbitals, not to ``state-specific'' calculations where one computes solely the energy of a single state (like conventional ground-state calculations).
Note that the present method does only apply to \emph{state-averaged} SCI calculations where ground- and excited-state energies are produced during the same calculation with the same set of molecular orbitals, not to \emph{state-specific} calculations where one computes solely the energy of a single state (like conventional ground-state calculations).
For the $m$th excited state (where $m = 0$ corresponds to the ground state), we usually estimate its FCI energy $E_{\text{FCI}}^{(m)}$ by performing a linear extrapolation of its variational energy $E_\text{var}^{(m)}$ as a function of its rPT2 correction $E_{\text{rPT2}}^{(m)}$ as follows
For the $m$th excited state (where $m = 0$ corresponds to the ground state), we usually estimate its FCI energy $E_{\text{FCI}}^{(m)}$ by performing a linear extrapolation of its variational energy $E_\text{var}^{(m)}$ as a function of its rPT2 correction $E_{\text{rPT2}}^{(m)}$ \cite{Holmes_2017, Garniron_2019} using
\begin{equation}
E_\text{FCI}^{(m)} = E_{\text{var}}^{(m)} + \alpha^{(m)} E_{\text{rPT2}}^{(m)}
E_{\text{var}}^{(m)} \approx E_\text{FCI}^{(m)} - \alpha^{(m)} E_{\text{rPT2}}^{(m)},
\label{eqx}
\end{equation}
$E_\text{var}^{(m)}$ varies almost linearly as a function of $E_{\text{rPT2}}^{(m)}$, but with a coefficient $\alpha^{(m)}$ which deviates slightly from unity in well-behaved cases.
This implies that, at any iteration of the CIPSI algorithm, the estimated error on the CIPSI energy is
\begin{equation}
E_{\text{CIPSI}}^{(m)} - E_{\text{FCI}}^{(m)}
= \qty(E_\text{var}^{(m)}+E_{\text{rPT2}}^{(m)}) - E_{\text{FCI}}^{(m)}
= \qty(1-\alpha^{(m)}) E_{\text{rPT2}}^{(m)}
\end{equation}
For the large systems considered here, $\abs{E_{\text{rPT2}}} > 2$ eV.
Therefore, the accuracy of the excitation energy estimates will strongly depend on our ability to compensate the errors in the calculations.
where $E_{\text{var}}^{(m)}$ and $E_{\text{rPT2}}^{(m)}$ are calculated with CIPSI and $E_\text{FCI}^{(m)}$ is the FCI energy
to be extrapolated. This relation is valid in the regime of a sufficiently large number of determinants where the second-order perturbational
correction largely dominates.
However, in practice, due to the residual higher-order terms, the coefficient $\alpha^{(m)}$ deviates slightly from unity.
Because our selection procedure ensures that the rPT2 values of both states match as well as possible (a trick known as PT2 matching \cite{Dash_2018,Dash_2019}), i.e., $E_{\text{rPT2}} = E_{\text{rPT2}}^{(0)} \approx E_{\text{rPT2}}^{(m)}$, the extrapolated excitation energy associated with the $m$th excited state can be estimated as
Using Eq.(\ref{eqx}) the estimated error on the CIPSI energy is calculated as
\begin{equation}
\Delta E_{\text{FCI}}^{(m)}
= \qty[ E_\text{var}^{(m)} + E_{\text{rPT2}} + \qty(\alpha^{(m)}-1) E_{\text{rPT2}} ]
- \qty[ E_\text{var}^{(0)} + E_{\text{rPT2}} + \qty(\alpha^{(0)}-1) E_{\text{rPT2}} ]
+ \order{E_{\text{rPT2}}^2 }
E_{\text{CIPSI}}^{(m)} - E_{\text{FCI}}^{(m)}
= \qty(E_\text{var}^{(m)}+E_{\text{rPT2}}^{(m)}) - E_{\text{FCI}}^{(m)}
= \qty(1-\alpha^{(m)}) E_{\text{rPT2}}^{(m)},
\end{equation}
which evidences that the error in $\Delta E_{\text{FCI}}^{(m)}$ can be expressed as $\qty(\alpha^{(m)}-\alpha^{(0)}) E_{\text{rPT2}} + \order{E_{\text{rPT2}}^2}$.
Moreover, using a common set of state-averaged natural orbitals for the ground and excited states tends to make the values of $\alpha^{(0)}$ and $\alpha^{(m)}$ very close to each other, such that the error on the energy difference is practically of the order of $E_{\text{rPT2}}^2$.
and thus the extrapolated excitation energy associated with the $m$th
state is given by
\begin{equation}
\Delta E_{\text{FCI}}^{(m)}
= \qty[ E_\text{var}^{(m)} + E_{\text{rPT2}} + \qty(\alpha^{(m)}-1) E_{\text{rPT2}} ]
- \qty[ E_\text{var}^{(0)} + E_{\text{rPT2}} + \qty(\alpha^{(0)}-1) E_{\text{rPT2}} ]
+ O\qty[{E_{\text{rPT2}}^2 }]
\end{equation}
which evidences that the error in $\Delta E_{\text{FCI}}^{(m)}$ can be expressed as $\qty(\alpha^{(m)}-\alpha^{(0)}) E_{\text{rPT2}} + O\qty[{E_{\text{rPT2}}^2}]$.
At the $n$th CIPSI iteration, we have access to the variational energies of both states, $E_\text{var}^{(0)}(n)$ and $E_\text{var}^{(m)}(n)$, as well as their rPT2 corrections, $E_{\text{rPT2}}^{(0)}(n)$ and $E_{\text{rPT2}}^{(m)}(n)$.
The $m$th excitation energy at iteration $n$ is then assumed to be a Gaussian random variable with mean
Now, for the largest systems considered here, $\qty|{E_{\text{rPT2}}}|$ can be as large as 2~eV and, thus,
the accuracy of the excitation energy estimates strongly depends on our ability to compensate the errors in the calculations.
Here, we greatly enhance the compensation of errors by making use of
our selection procedure ensuring that the PT2 values of both states
match as well as possible (a trick known as PT2 matching
\cite{Dash_2018,Dash_2019}), i.e. $E_{\text{rPT2}} =
E_{\text{rPT2}}^{(0)} \approx E_{\text{rPT2}}^{(m)}$, and
by using a common set of state-averaged natural orbitals with equal weights for the ground and excited states.
This last feature tends to make the values of $\alpha^{(0)}$ and $\alpha^{(m)}$ very close to each other, such that the error on the energy difference
is decreased.
In the ideal case where we would be able to fully correlate the CIPSI calculations for the ground- and excited-states, the fluctuations of
$\Delta E_\text{CIPSI}^{(m)}(n)$ as a function of $n$ would completely vanish and the exact excitation energy would be obtained from the first CIPSI iterations.
Quite remarkably, in practice, numerical experience shows that the fluctuations with respect to the extrapolated value $\Delta E_\text{FCI}^{(m)}$ are small,
zero-centered, almost independent of $n$ when not too close iteration
numbers are considered, and display a Gaussian-like distribution.
In addition, the fluctuations are found to be (very weakly) dependent on the iteration number $n$ (see, Fig.\ref{fig2}), so
this dependence will not significantly alter our results and will not be considered here.
We thus introduce the following random variable
\begin{equation}
\Delta E_\text{CIPSI}^{(m)}(n) = \qty[ E_\text{var}^{(m)}(n) + E_{\text{rPT2}}^{(m)}(n) ] - \qty[ E_\text{var}^{(0)}(n) + E_{\text{rPT2}}^{(0)}(n) ]
X^{(m)}= \frac{\Delta E_\text{CIPSI}^{(m)}(n)- \Delta E_\text{FCI}^{(m)}}{\sigma(n)}
\end{equation}
and variance
where
\begin{equation}
\sigma^2(n) \propto \qty[E_{\text{rPT2}}^{(m)}(n)]^2 + \qty[E_{\text{rPT2}}^{(0)}(n)]^2
\Delta E_\text{CIPSI}^{(m)}(n) = \qty[ E_\text{var}^{(m)}(n) +
E_{\text{rPT2}}^{(m)}(n) ]
- \qty[ E_\text{var}^{(0)}(n) + E_{\text{rPT2}}^{(0)}(n) ],
\end{equation}
and we treat all CIPSI iterations as a set of Gaussian-distributed variables ($\mathcal{G}$) with weights $w(n) = 1/\sqrt{\sigma^2(n)}$.
This choice ensures that the statistical uncertainty vanishes at the FCI limit.
We then search for a confidence interval $\mathcal{I}$ such that the true value of the excitation energy $\Delta E_{\text{FCI}}^{(m)}$ lies within one standard deviation of $\Delta E_\text{CIPSI}^{(m)}$, i.e., $P( \Delta E_{\text{FCI}}^{(m)} \in [ \Delta E_\text{CIPSI}^{(m)} \pm \sigma ] \; | \; \mathcal{G}) = 0.6827$.
The probability that $\Delta E_{\text{FCI}}^{(m)}$ is in an interval $\mathcal{I}$ is
and
${\sigma(n)}$ is a quantity proportional to the average fluctuations of $\Delta E_\text{CIPSI}^{(m)}$.
A natural choice for $\sigma^2(n)$, playing here the role of a variance, is
\begin{equation}
P\qty( \Delta E_{\text{FCI}}^{(m)} \in \mathcal{I} ) = P\qty( \Delta E_{\text{FCI}}^{(m)} \in I \Big| \mathcal{G}) \times P(\mathcal{G})
\sigma^2(n) \propto \qty[E_{\text{rPT2}}^{(m)}(n)]^2 + \qty[E_{\text{rPT2}}^{(0)}(n)]^2,
\end{equation}
where the probability $P(\mathcal{G})$ that the random variables are normally distributed can be deduced from the Jarque-Bera test $J$ as
which vanishes in the large-$n$ limit as it should be.
%%% FIGURE 2 %%%
\begin{figure}
\centering
\includegraphics[width=0.9\linewidth]{fig2/fig2}
\caption{Histogram of the random variable $X^{(m)}$ (see, text). About 200 values of the transition energies
for the 13 five- and six-membered ring molecules, both for the singlet and triplet transitions and for a number of CIPSI iterations, are used.
The number $M$ of iterations kept is chosen according to the statistical test presented in the text.}
\label{fig2}
\end{figure}
The histogram of $X^{(m)}$ resulting from the excitation energies
obtained at different values of the CIPSI iterations $n$
and for the 13 five- and six-membered ring molecules, both for the singlet and triplet transitions,
is shown in Fig.\ref{fig2}. To avoid transient effects, only excitation energies at sufficiently large $n$ are retained in the data set.
The criterion used to decide from which precise value of $n$ the data should be kept will be presented below. In our application, the total number
of values employed to make the histogram is about 200. The dashed line of Fig.\ref{fig2} represents the best Gaussian fit
(in the sense of least-squares) reproducing the data.
As seen, the distribution can be described by the Gaussian probability
\begin{equation}
P(\mathcal{G}) = 1 - \chi^2_{\text{CDF}}(J,2)
P\qty[X^{(m)}] \propto e^{-\frac{{X^{(m)}}^2} {2{\sigma^{*}}^2}}
\end{equation}
where $\sigma^{*2}$ is some "universal" variance depending only
on the way the correlated selection of both states is done, not on the molecule considered in our set.
An estimate of $\Delta E_{\text{FCI}}^{(m)}$ as the average excitation energy of $\Delta E_\text{CIPSI}^{(m)}$ is thus
$$\Delta E_\text{FCI}^{(m)} = \frac{ \sum_{n=1}^M \frac{\Delta E_\text{CIPSI}^{(m)}(n)} {\sigma(n)} }
{ \sum_{n=1}^M \frac{1}{\sigma(n)} },
$$
where $M$ is the number of data kept.
Now, regarding the estimate of the error on $\Delta E_\text{FCI}^{(m)}$ some caution is required since, although the distribution is globally Gaussian-like
(see Fig.\ref{fig2}) there exists
some significant departure from it and we need to take this feature into account.
More precisely, we search for a confidence interval $\mathcal{I}$ such that the true value of the excitation energy $\Delta E_{\text{FCI}}^{(m)}$ lies within one standard deviation of $\Delta E_\text{CIPSI}^{(m)}$, i.e., $P\qty( \Delta E_{\text{FCI}}^{(m)} \in \qty[ \Delta E_\text{CIPSI}^{(m)} \pm \sigma ] \; \Big| \; \mathcal{G}) = 0.6827$.
In a Bayesian framework, the probability that $\Delta E_{\text{FCI}}^{(m)}$ is in an interval $\mathcal{I}$ is
\begin{equation}
P\qty( \Delta E_{\text{FCI}}^{(m)} \in \mathcal{I} ) = P\qty( \Delta E_{\text{FCI}}^{(m)} \in I \Big| \mathcal{G}) \times P\qty(\mathcal{G})
\end{equation}
where $P\qty(\mathcal{G})$ is the probability that the random variables considered in the latest CIPSI iterations are normally distributed.
A common test in statistics of the normality of a distribution is the Jarque-Bera test $J$ and we have
\begin{equation}
P\qty(\mathcal{G}) = 1 - \chi^2_{\text{CDF}}(J,2)
\end{equation}
where $\chi^2_{\text{CDF}}(x,k)$ is the cumulative distribution function (CDF) of the $\chi^2$-distribution with $k$ degrees of freedom.
As the number of samples is usually small, we use Student's $t$-distribution to estimate the statistical error.
The inverse of the cumulative distribution function of the $t$-distribution, $t_{\text{CDF}}^{-1}$, allows us to find how to scale the interval by a parameter
As the number of samples $M$ is usually small, we use Student's $t$-distribution to estimate the statistical error.
The inverse of the cumulative distribution function of the $t$-distribution, $t_{\text{CDF}}^{-1}$, allows us to find how to scale the interval by a parameter
\begin{equation}
\beta = t_{\text{CDF}}^{-1} \qty[
\beta = t_{\text{CDF}}^{-1} \qty[
\frac{1}{2} \qty( 1 + \frac{0.6827}{P(\mathcal{G})}), M ]
\end{equation}
such that $P\qty( \Delta E_{\text{FCI}}^{(m)} \in \qty[ \Delta E_{\text{CIPSI}}^{(m)} \pm \beta \sigma ] ) = p = 0.6827$.
Only the last $M>2$ computed energy differences are considered. $M$ is chosen such that $P(\mathcal{G})>0.8$ and such that the error bar is minimal.
Only the last $M>2$ computed transition energies are considered. $M$ is chosen such that $P(\mathcal{G})>0.8$ and such that the error bar is minimal.
If all the values of $P(\mathcal{G})$ are below $0.8$, $M$ is chosen such that $P(\mathcal{G})$ is maximal.
A Python code associated with this procedure is provided in the {\SupInf}.
@ -354,11 +414,11 @@ This strategy has been considered in some of our previous works \cite{Loos_2020b
The deviation from the CCSDT excitation energies for the same set of excitations are depicted in Fig.~\ref{fig:errors}, where the red dots correspond to the excitation energies and error bars estimated via the present method, and the blue dots correspond to the excitation energies obtained via a three-point linear fit and error bars estimated via the extrapolation distance.
These results contain a good balance between well-behaved and ill-behaved cases.
For example, cyclopentadiene and furan correspond to well-behaved scenarios where the two flavors of extrapolations yield nearly identical estimates and the error bars associated with these two methods nicely overlap.
In these cases, one can observe that our method based on Gaussian random variables provides almost systematically smaller error bars.
In these cases, one can observe that our method based on Gaussian random variables provides almost systematically smaller error bars.
Even in less idealistic situations (like in imidazole, pyrrole, and thiophene), the results are very satisfactory and stable.
The six-membered rings represent much more challenging cases for SCI methods, and even for these systems the newly-developed method provides realistic error bars, and allows to easily detect problematic events (like pyridine for instance).
The present scheme has also been tested on smaller systems when one can tightly converge the CIPSI calculations.
In such cases, the agreement is nearly perfect in every scenario that we have encountered.
In such cases, the agreement is nearly perfect in every scenario that we have encountered.
A selection of these results can be found in the {\SupInf}.
%%% TABLE I %%%
@ -376,7 +436,7 @@ Cyclopentadiene & $^1 B_2 (\pi \ra \pis)$ & 5.79 & 5.80 & 5.80(2) & 5.79(2)
Furan & $^1A_2(\pi \ra 3s)$ & 6.26 & 6.28 & 6.31(5) & 6.37(1) \\
& $^3B_2(\pi \ra \pis)$ & 4.28 & 4.28 & 4.26(4) & 4.22(7) \\
Imidazole & $^1A''(\pi \ra 3s)$ & 5.77 & 5.77 & 5.78(5) & 5.96(14) \\
& $^3A'(\pi \ra \pis)$ & 4.83 & 4.81 & 4.82(7) & 4.65(22) \\
& $^3A'(\pi \ra \pis)$ & 4.83 & 4.81 & 4.82(7) & 4.65(22) \\
Pyrrole & $^1A_2(\pi \ra 3s)$ & 5.25 & 5.25 & 5.23(7) & 5.31(1) \\
& $^3B_2(\pi \ra \pis)$ & 4.59 & 4.58 & 4.54(7) & 4.37(23) \\
Thiophene & $^1A_1(\pi \ra \pis)$ & 5.79 & 5.77 & 5.75(8) & 5.73(9) \\
@ -396,23 +456,23 @@ Pyridine & $^1B_1(n \ra \pis)$ & 5.12 & 5.10 & 5.15(12)& 4.90(24) \\
& $^3A_1(\pi \ra \pis)$ & 4.33 & 4.31 & 4.42(85)& 3.68(105) \\
Pyrimidine & $^1B_1(n \ra \pis)$ & 4.58 & 4.57 & 4.64(11)& 2.54(5) \\
& $^3B_1(n \ra \pis)$ & 4.20 & 4.20 & 4.55(37)& 2.18(27) \\
Triazine & $^1A_1''(n \ra \pis)$ & 4.85 & 4.84 & 4.77(13)& 5.12(51) \\
& $^3A_2''(n \ra \pis)$ & 4.40 & 4.40 & 4.45(39)& 4.73(6) \\
Triazine & $^1A_1''(n \ra \pis)$ & 4.85 & 4.84 & 4.77(13)& 5.12(51) \\
& $^3A_2''(n \ra \pis)$ & 4.40 & 4.40 & 4.45(39)& 4.73(6) \\
%\hiderowcolors
\hline % Please only put a hline at the end of the table
\end{tabular}
\begin{tablenotes}
\item $^a$ Excitation energies and error bars estimated via the novel statistical method based on Gaussian random variables (see Sec.~\ref{sec:error}).
\item $^a$ Excitation energies and error bars estimated via the novel statistical method based on Gaussian random variables (see Sec.~\ref{sec:error}).
The error bars reported in parenthesis correspond to one standard deviation.
\item $^b$ Excitation energies obtained via a three-point linear fit using the three largest CIPSI variational wave functions, and error bars estimated via the extrapolation distance, \ie, the difference in excitation energies obtained with the three-point linear extrapolation and the largest CIPSI wave function.
\end{tablenotes}
\end{threeparttable}
\end{table}
%%% FIGURE 2 %%%
%%% FIGURE 3 %%%
\begin{figure}
\centering
\includegraphics[width=\linewidth]{fig2}
\includegraphics[width=\linewidth]{fig3}
\caption{Deviation from the CCSDT excitation energies for the lowest singlet and triplet excitation energies (in eV) of five- and six-membered rings obtained at the CIPSI/6-31+G(d) level of theory. Red dots: excitation energies and error bars estimated via the present method (see Sec.~\ref{sec:error}). Blue dots: excitation energies obtained via a three-point linear fit using the three largest CIPSI wave functions, and error bars estimated via the extrapolation distance, \ie, the difference in excitation energies obtained with the three-point linear extrapolation and the largest CIPSI wave function.}
\label{fig:errors}
\end{figure}
@ -425,16 +485,16 @@ The error bars reported in parenthesis correspond to one standard deviation.
%=======================
\subsection{Overview}
%=======================
The QUEST database gathers more than 500 highly-accurate excitation energies of various natures (valence, Rydberg, $n \ra \pis$, $\pi \ra \pis$, singlet, doublet, triplet, and double excitations) for molecules ranging
from diatomics to molecules as large as naphthalene (see Fig.~\ref{fig:molecules}). This set is also chemically diverse, with organic and inorganic systems, open- and closed-shell compounds, acyclic and cyclic systems,
pure hydrocarbons and various heteroatomic structures, etc. Each of the five subsets making up the QUEST dataset is detailed below. Throughout the present review, we report several statistical indicators: the mean signed
The QUEST database gathers more than 500 highly-accurate excitation energies of various natures (valence, Rydberg, $n \ra \pis$, $\pi \ra \pis$, singlet, doublet, triplet, and double excitations) for molecules ranging
from diatomics to molecules as large as naphthalene (see Fig.~\ref{fig:molecules}). This set is also chemically diverse, with organic and inorganic systems, open- and closed-shell compounds, acyclic and cyclic systems,
pure hydrocarbons and various heteroatomic structures, etc. Each of the five subsets making up the QUEST dataset is detailed below. Throughout the present review, we report several statistical indicators: the mean signed
error (MSE), mean absolute error (MAE), root-mean square error (RMSE), and standard deviation of the errors (SDE), as well as the maximum positive [Max(+)] and maximum negative [Max($-$)] errors.
%%% FIGURE 3 %%%
%%% FIGURE 4 %%%
\begin{figure}
\centering
\includegraphics[width=\linewidth]{fig3}
\caption{Molecules from each of the five subsets making up the present QUEST dataset of highly-accurate vertical excitation energies:
\includegraphics[width=\linewidth]{fig4}
\caption{Molecules from each of the five subsets making up the present QUEST dataset of highly-accurate vertical excitation energies:
QUEST\#1 (red), QUEST\#2 (magenta and/or underlined), QUEST\#3 (black), QUEST\#4 (green), and QUEST\#5 (blue).}
\label{fig:molecules}
\end{figure}
@ -442,65 +502,65 @@ error (MSE), mean absolute error (MAE), root-mean square error (RMSE), and stand
%=======================
\subsection{QUEST\#1}
%=======================
The QUEST\#1 benchmark set \cite{Loos_2018a} consists of 110 vertical excitation energies (as well as oscillator strengths) from 18 molecules with sizes ranging from one to three non-hydrogen atoms
(water, hydrogen sulfide, ammonia, hydrogen chloride, dinitrogen, carbon monoxide, acetylene, ethylene, formaldehyde, methanimine, thioformaldehyde, acetaldehyde, cyclopropene, diazomethane,
formamide, ketene, nitrosomethane, and the smallest streptocyanine). For this set, we provided two sets of TBEs: i) one obtained within the frozen-core approximation and the aug-cc-pVTZ basis set, and ii)
The QUEST\#1 benchmark set \cite{Loos_2018a} consists of 110 vertical excitation energies (as well as oscillator strengths) from 18 molecules with sizes ranging from one to three non-hydrogen atoms
(water, hydrogen sulfide, ammonia, hydrogen chloride, dinitrogen, carbon monoxide, acetylene, ethylene, formaldehyde, methanimine, thioformaldehyde, acetaldehyde, cyclopropene, diazomethane,
formamide, ketene, nitrosomethane, and the smallest streptocyanine). For this set, we provided two sets of TBEs: i) one obtained within the frozen-core approximation and the aug-cc-pVTZ basis set, and ii)
another one including further corrections for basis set incompleteness and ``all electron'' effects. For the former set, we systematically employed FCI/aug-cc-pVTZ values to define our TBEs, except for a few cases.
For the latter set, both the ``all electron'' correlation and the basis set corrections were systematically obtained at the CC3 level of theory and with the d-aug-cc-pV5Z basis for the nine smallest molecules, and
slightly more compact basis sets for the larger compounds. Our TBE/aug-cc-pVTZ reference excitation energies were employed to benchmark a series of popular excited-state wave function methods partially
or fully accounting for double and triple excitations, namely CIS(D), CC2, CCSD, STEOM-CCSD, CCSDR(3), CCSDT-3, CC3, ADC(2), and ADC(3). Our main conclusions were that i) ADC(2) and CC2 show
strong similarities in terms of accuracy, ii) STEOM-CCSD is, on average, as accurate as CCSD, the latter overestimating transition energies, iii) CC3 is extremely accurate (with a mean absolute error of only
$\sim 0.03$ eV) and that although slightly less accurate than CC3, CCSDT-3 could be used as a reliable reference for benchmark studies, and iv) ADC(3) was found to be significantly less accurate than CC3
For the latter set, both the ``all electron'' correlation and the basis set corrections were systematically obtained at the CC3 level of theory and with the d-aug-cc-pV5Z basis for the nine smallest molecules, and
slightly more compact basis sets for the larger compounds. Our TBE/aug-cc-pVTZ reference excitation energies were employed to benchmark a series of popular excited-state wave function methods partially
or fully accounting for double and triple excitations, namely CIS(D), CC2, CCSD, STEOM-CCSD, CCSDR(3), CCSDT-3, CC3, ADC(2), and ADC(3). Our main conclusions were that i) ADC(2) and CC2 show
strong similarities in terms of accuracy, ii) STEOM-CCSD is, on average, as accurate as CCSD, the latter overestimating transition energies, iii) CC3 is extremely accurate (with a mean absolute error of only
$\sim 0.03$ eV) and that although slightly less accurate than CC3, CCSDT-3 could be used as a reliable reference for benchmark studies, and iv) ADC(3) was found to be significantly less accurate than CC3
by overcorrecting ADC(2) excitation energies.
%=======================
\subsection{QUEST\#2}
%=======================
The QUEST\#2 benchmark set \cite{Loos_2019} reports reference energies for double excitations. This set gathers 20 vertical transitions from 14 small- and medium-sized molecules (acrolein, benzene, beryllium atom,
butadiene, carbon dimer and trimer, ethylene, formaldehyde, glyoxal, hexatriene, nitrosomethane, nitroxyl, pyrazine, and tetrazine). The TBEs of the QUEST\#2 set are obtained with SCI and/or multiconfigurational
[CASSCF, CASPT2, (X)MS-CASPT2, and NEVPT2] calculations depending on the size of the molecules and the level of theory that we could afford. An important addition to this second study was also the inclusion of
The QUEST\#2 benchmark set \cite{Loos_2019} reports reference energies for double excitations. This set gathers 20 vertical transitions from 14 small- and medium-sized molecules (acrolein, benzene, beryllium atom,
butadiene, carbon dimer and trimer, ethylene, formaldehyde, glyoxal, hexatriene, nitrosomethane, nitroxyl, pyrazine, and tetrazine). The TBEs of the QUEST\#2 set are obtained with SCI and/or multiconfigurational
[CASSCF, CASPT2, (X)MS-CASPT2, and NEVPT2] calculations depending on the size of the molecules and the level of theory that we could afford. An important addition to this second study was also the inclusion of
various flavors of multiconfigurational methods (CASSCF, CASPT2, and NEVPT2) in addition to high-order CC methods including, at least, perturbative triples (CC3, CCSDT, CCSDTQ, etc).
Our results demonstrated that the error of CC methods is intimately linked to the amount of double-excitation character in the vertical transition. For ``pure'' double excitations (i.e., for transitions which do not mix with
single excitations), the error in CC3 and CCSDT can easily reach $1$ and $0.5$ eV, respectively, while it goes down to a few tenths of an eV for more common transitions involving a significant amount of single excitations
(such as the well-known $A_g$ transition in butadiene or the $E_{2g}$ excitation in benzene). The quality of the excitation energies obtained with CASPT2 and NEVPT2 was harder to predict as the overall accuracy of
these methods is highly dependent on both the system and the selected active space. Nevertheless, these two methods were found to be more accurate for transitions with a very small percentage of single excitations
Our results demonstrated that the error of CC methods is intimately linked to the amount of double-excitation character in the vertical transition. For ``pure'' double excitations (i.e., for transitions which do not mix with
single excitations), the error in CC3 and CCSDT can easily reach $1$ and $0.5$ eV, respectively, while it goes down to a few tenths of an eV for more common transitions involving a significant amount of single excitations
(such as the well-known $A_g$ transition in butadiene or the $E_{2g}$ excitation in benzene). The quality of the excitation energies obtained with CASPT2 and NEVPT2 was harder to predict as the overall accuracy of
these methods is highly dependent on both the system and the selected active space. Nevertheless, these two methods were found to be more accurate for transitions with a very small percentage of single excitations
(error usually below $0.1$ eV) than for excitations dominated by single excitations where the error is closer to $0.1$--$0.2$ eV.
%=======================
\subsection{QUEST\#3}
%=======================
The QUEST\#3 benchmark set \cite{Loos_2020b} is, by far, our largest set, and consists of highly accurate vertical transition energies and oscillator strengths obtained for 27 molecules encompassing 4, 5, and
6 non-hydrogen atoms (acetone, acrolein, benzene, butadiene, cyanoacetylene, cyanoformaldehyde, cyanogen, cyclopentadiene, cyclopropenone, cyclopropenethione, diacetylene, furan, glyoxal, imidazole, isobutene,
methylenecyclopropene, propynal, pyrazine, pyridazine, pyridine, pyrimidine, pyrrole, tetrazine, thioacetone, thiophene, thiopropynal, and triazine) for a total of 238 vertical transition energies and 90 oscillator strengths
with a reasonably good balance between singlet, triplet, valence, and Rydberg excited states. For these 238 transitions, we have estimated that 224 are chemically accurate for the aug-cc-pVTZ basis and for the
considered geometry. To define the TBEs of the QUEST\#3 set, we employed CC methods up to the highest technically possible order (CC3, CCSDT, and CCSDTQ), and, when affordable SCI calculations with very
large reference spaces (up to hundred million determinants in certain cases), as well as one of the most reliable multiconfigurational methods, NEVPT2, for double excitations. Most of our TBEs are based on CCSDTQ
(4 non-hydrogen atoms) or CCSDT (5 and 6 non-hydrogen atoms) excitation energies. For all the transitions of the QUEST\#3 set, we reported at least CCSDT/aug-cc-pVTZ (sometimes with basis set extrapolation)
and CC3/aug-cc-pVQZ transition energies as well as CC3/aug-cc-pVTZ oscillator strengths for each dipole-allowed transition. Pursuing our previous benchmarking efforts, we confirmed that CC3 almost systematically
delivers transition energies in agreement with higher-level theoretical models ($\pm0.04$ eV) except for transitions presenting a dominant double-excitation character where multiconfigurational methods like NEVPT2 have
logically the edge. This settles down, at least for now, the debate by demonstrating the superiority of CC3 (in terms of accuracy) compared to methods like CCSDT-3 or ADC(3). For the latter model, this was further
The QUEST\#3 benchmark set \cite{Loos_2020b} is, by far, our largest set, and consists of highly accurate vertical transition energies and oscillator strengths obtained for 27 molecules encompassing 4, 5, and
6 non-hydrogen atoms (acetone, acrolein, benzene, butadiene, cyanoacetylene, cyanoformaldehyde, cyanogen, cyclopentadiene, cyclopropenone, cyclopropenethione, diacetylene, furan, glyoxal, imidazole, isobutene,
methylenecyclopropene, propynal, pyrazine, pyridazine, pyridine, pyrimidine, pyrrole, tetrazine, thioacetone, thiophene, thiopropynal, and triazine) for a total of 238 vertical transition energies and 90 oscillator strengths
with a reasonably good balance between singlet, triplet, valence, and Rydberg excited states. For these 238 transitions, we have estimated that 224 are chemically accurate for the aug-cc-pVTZ basis and for the
considered geometry. To define the TBEs of the QUEST\#3 set, we employed CC methods up to the highest technically possible order (CC3, CCSDT, and CCSDTQ), and, when affordable SCI calculations with very
large reference spaces (up to hundred million determinants in certain cases), as well as one of the most reliable multiconfigurational methods, NEVPT2, for double excitations. Most of our TBEs are based on CCSDTQ
(4 non-hydrogen atoms) or CCSDT (5 and 6 non-hydrogen atoms) excitation energies. For all the transitions of the QUEST\#3 set, we reported at least CCSDT/aug-cc-pVTZ (sometimes with basis set extrapolation)
and CC3/aug-cc-pVQZ transition energies as well as CC3/aug-cc-pVTZ oscillator strengths for each dipole-allowed transition. Pursuing our previous benchmarking efforts, we confirmed that CC3 almost systematically
delivers transition energies in agreement with higher-level theoretical models ($\pm0.04$ eV) except for transitions presenting a dominant double-excitation character where multiconfigurational methods like NEVPT2 have
logically the edge. This settles down, at least for now, the debate by demonstrating the superiority of CC3 (in terms of accuracy) compared to methods like CCSDT-3 or ADC(3). For the latter model, this was further
demonstrated in a recent study by two of the present authors \cite{Loos_2020d}.
%=======================
\subsection{QUEST\#4}
%=======================
The QUEST\#4 benchmark set \cite{Loos_2020c} consists of two subsets of excitations and oscillator strengths. An ``exotic'' subset of 30 excited states for closed-shell molecules containing F, Cl, P, and Si atoms
(carbonyl fluoride, \ce{CCl2}, \ce{CClF}, \ce{CF2}, difluorodiazirine, formyl fluoride, \ce{HCCl}, \ce{HCF}, \ce{HCP}, \ce{HPO}, \ce{HPS}, \ce{HSiF}, \ce{SiCl2}, and silylidene) and a ``radical'' subset of 51 doublet-doublet
transitions in 24 small radicals (allyl, \ce{BeF}, \ce{BeH}, \ce{BH2}, \ce{CH}, \ce{CH3}, \ce{CN}, \ce{CNO}, \ce{CON}, \ce{CO+}, \ce{F2BO}, \ce{F2BS}, \ce{H2BO}, \ce{HCO}, \ce{HOC}, \ce{H2PO}, \ce{H2PS}, \ce{NCO},
\ce{NH2}, nitromethyl, \ce{NO}, \ce{OH}, \ce{PH2}, and vinyl) characterized by open-shell electronic configurations and an unpaired electron. This represents a total of 81 high-quality TBEs, the vast majority being obtained
at the FCI level with at least the aug-cc-pVTZ basis set. We additionnaly performed high-order CC calculations to ascertain these estimates. For the exotic set, these TBEs have been used to assess the performances of
15 ``lower-order'' wave function approaches, including several CC and ADC variants. Consistent with our previous works, we found that CC3 is very accurate, whereas the trends for the other methods are similar to that
obtained on more standard CNOSH organic compounds. In contrast, for the radical set, even the refined ROCC3 method yields a comparatively large MAE of $0.05$ eV. Likewise, the excitation energies obtained with CCSD
The QUEST\#4 benchmark set \cite{Loos_2020c} consists of two subsets of excitations and oscillator strengths. An ``exotic'' subset of 30 excited states for closed-shell molecules containing F, Cl, P, and Si atoms
(carbonyl fluoride, \ce{CCl2}, \ce{CClF}, \ce{CF2}, difluorodiazirine, formyl fluoride, \ce{HCCl}, \ce{HCF}, \ce{HCP}, \ce{HPO}, \ce{HPS}, \ce{HSiF}, \ce{SiCl2}, and silylidene) and a ``radical'' subset of 51 doublet-doublet
transitions in 24 small radicals (allyl, \ce{BeF}, \ce{BeH}, \ce{BH2}, \ce{CH}, \ce{CH3}, \ce{CN}, \ce{CNO}, \ce{CON}, \ce{CO+}, \ce{F2BO}, \ce{F2BS}, \ce{H2BO}, \ce{HCO}, \ce{HOC}, \ce{H2PO}, \ce{H2PS}, \ce{NCO},
\ce{NH2}, nitromethyl, \ce{NO}, \ce{OH}, \ce{PH2}, and vinyl) characterized by open-shell electronic configurations and an unpaired electron. This represents a total of 81 high-quality TBEs, the vast majority being obtained
at the FCI level with at least the aug-cc-pVTZ basis set. We additionnaly performed high-order CC calculations to ascertain these estimates. For the exotic set, these TBEs have been used to assess the performances of
15 ``lower-order'' wave function approaches, including several CC and ADC variants. Consistent with our previous works, we found that CC3 is very accurate, whereas the trends for the other methods are similar to that
obtained on more standard CNOSH organic compounds. In contrast, for the radical set, even the refined ROCC3 method yields a comparatively large MAE of $0.05$ eV. Likewise, the excitation energies obtained with CCSD
are much less satisfying for open-shell derivatives (MAE of $0.20$ eV with UCCSD and $0.15$ eV with ROCCSD) than for closed-shell systems of similar size (MAE of $0.07$ eV).
%=======================
\subsection{QUEST\#5}
%=======================
The QUEST\#5 subset is composed of additional accurate excitation energies that we have produced for the present article. This new set gathers 13 new systems composed by small molecules as well as larger molecules
(see blue molecules in Fig.~\ref{fig:molecules}): aza-naphthalene, benzoquinone, cyclopentadienone, cyclopentadienethione, diazirine, hexatriene, maleimide, naphthalene, nitroxyl, octatetraene, streptocyanine-C3, streptocyanine-C5,
and thioacrolein. For these new transitions, we report again quality vertical transition energies, the vast majority being of CCSDT quality, and we consider that, out of these 80 new transitions, 55 of them can be labeled
as ``safe'', \ie, considered as chemically accurate or within 0.05 eV of the FCI limit for the given geometry and basis set. We refer the interested reader to the {\SupInf} for a detailed discussion of each molecule for which comparisons
The QUEST\#5 subset is composed of additional accurate excitation energies that we have produced for the present article. This new set gathers 13 new systems composed by small molecules as well as larger molecules
(see blue molecules in Fig.~\ref{fig:molecules}): aza-naphthalene, benzoquinone, cyclopentadienone, cyclopentadienethione, diazirine, hexatriene, maleimide, naphthalene, nitroxyl, octatetraene, streptocyanine-C3, streptocyanine-C5,
and thioacrolein. For these new transitions, we report again quality vertical transition energies, the vast majority being of CCSDT quality, and we consider that, out of these 80 new transitions, 55 of them can be labeled
as ``safe'', \ie, considered as chemically accurate or within 0.05 eV of the FCI limit for the given geometry and basis set. We refer the interested reader to the {\SupInf} for a detailed discussion of each molecule for which comparisons
are made with literature data.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -508,27 +568,27 @@ are made with literature data.
\label{sec:TBE}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We discuss in this section the generation of the TBEs obtained with the aug-cc-pVTZ basis.
For the closed-shell compounds, the exhaustive list of TBEs can be found in Table \ref{tab:TBE} alongside various specifications: the molecule's name, the excitation, its nature (valence, Rydberg, or charge transfer), its oscillator strength (when symmetry- and spin-allowed),
and its percentage of single excitations $\%T_1$ (computed at the LR-CC3 level). All these quantities are computed with the same aug-cc-pVTZ basis.
Importantly, we also report the composite approach considered to compute the TBEs (see column ``Method'').
For the closed-shell compounds, the exhaustive list of TBEs can be found in Table \ref{tab:TBE} alongside various specifications: the molecule's name, the excitation, its nature (valence, Rydberg, or charge transfer), its oscillator strength (when symmetry- and spin-allowed),
and its percentage of single excitations $\%T_1$ (computed at the LR-CC3 level). All these quantities are computed with the same aug-cc-pVTZ basis.
Importantly, we also report the composite approach considered to compute the TBEs (see column ``Method'').
Following an ONIOM-like strategy \cite{Svensson_1996a,Svensson_1996b}, the TBEs are computed as ``A/SB + [B/TB - B/SB]'', where A/SB is the excitation energy computed with a method A in a smaller basis (SB), and B/SB and B/TB are excitation energies computed with a method B in the small basis and target basis TB, respectively.
Table \ref{tab:rad} reports the TBEs for the open-shell molecules belonging to the QUEST\#4 subset.
Talking about numbers, the QUEST database is composed of 551 excitation energies, including 302 singlet, 197 triplet, 51 doublet, 412 valence, and 176 Rydberg excited states.
Amongst the valence transitions in closed-shell compounds, 135 transitions correspond to $n \ra \pis$ excitations, 200 to $\pi \ra \pis$ excitations, and 23 are doubly-excited states. In terms of molecular sizes, 146 excitations are obtained
in molecules having in-between 1 and 3 non-hydrogen atoms, 97 excitations from 4 non-hydrogen atom compounds, 177 from molecules composed by 5 and 6 non-hydrogen atoms, and, finally, 68 excitations are obtained from systems with 7 to 10 non-hydrogen atoms.
Talking about numbers, the QUEST database is composed of 551 excitation energies, including 302 singlet, 197 triplet, 51 doublet, 412 valence, and 176 Rydberg excited states.
Amongst the valence transitions in closed-shell compounds, 135 transitions correspond to $n \ra \pis$ excitations, 200 to $\pi \ra \pis$ excitations, and 23 are doubly-excited states. In terms of molecular sizes, 146 excitations are obtained
in molecules having in-between 1 and 3 non-hydrogen atoms, 97 excitations from 4 non-hydrogen atom compounds, 177 from molecules composed by 5 and 6 non-hydrogen atoms, and, finally, 68 excitations are obtained from systems with 7 to 10 non-hydrogen atoms.
In addition, QUEST is composed by 24 open-shell molecules with a single unpaired electron.
Amongst these excited states, 485 of them are considered as ``safe'', \ie, chemically-accurate for the considered basis set and geometry.
Amongst these excited states, 485 of them are considered as ``safe'', \ie, chemically-accurate for the considered basis set and geometry.
Besides this energetic criterion, we consider as ``safe'' transitions that are either: i) computed with FCI or CCSDTQ, or ii) in which the difference between CC3 and CCSDT excitation energies is small (\ie, around $0.03$--$0.04$ eV) with a large $\%T_1$ value.
\begin{ThreePartTable}
\scriptsize
\centering
\begin{longtable}{clccccclc}
\caption{Theoretical best estimates TBEs (in eV), oscillator strengths $f$, percentage of single excitations $\%T_1$ involved in the transition (computed at the CC3 level) for the full set of closed-shell compounds of the QUEST database.
``Method'' provides the protocol employed to compute the TBEs.
The nature of the excitation is also provided: V, R, and CT stands for valence, Rydberg, and charge transfer, respectively.
[F] indicates a fluorescence transition, \ie, a vertical transition energy computed from an excited-state geometry.
\caption{Theoretical best estimates TBEs (in eV), oscillator strengths $f$, percentage of single excitations $\%T_1$ involved in the transition (computed at the CC3 level) for the full set of closed-shell compounds of the QUEST database.
``Method'' provides the protocol employed to compute the TBEs.
The nature of the excitation is also provided: V, R, and CT stands for valence, Rydberg, and charge transfer, respectively.
[F] indicates a fluorescence transition, \ie, a vertical transition energy computed from an excited-state geometry.
AVXZ stands for aug-cc-pVXZ.
\label{tab:TBE}}
\\
@ -1052,13 +1112,13 @@ AVXZ stands for aug-cc-pVXZ.
\begin{table}[htp]
\centering
\scriptsize
\caption{Theoretical best estimates TBEs (in eV) for the doublet-doublet transitions of the open-shell molecules belonging to QUEST\#4.
\caption{Theoretical best estimates TBEs (in eV) for the doublet-doublet transitions of the open-shell molecules belonging to QUEST\#4.
These TBEs are obtained with the aug-cc-pVTZ basis set, and ``Method'' indicates the protocol employed to compute them.}
\label{tab:rad}
\begin{threeparttable}
\begin{tabular}{cllcl}
\headrow
\thead{\#} & \thead{Molecule} & \thead{Transition} & \thead{TBE/aug-cc-pVTZ} & \thead{Method} \\
\thead{\#} & \thead{Molecule} & \thead{Transition} & \thead{TBE/aug-cc-pVTZ} & \thead{Method} \\
1 & Allyl &$^2B_1$ &3.39 & FCI/6-31+G(d) + [CCSDT/aug-cc-pVTZ - CCSDT/6-31+G(d)] \\
2 & &$^2A_1$ &4.99 & FCI/6-31+G(d) + [CCSDT/aug-cc-pVTZ - CCSDT/6-31+G(d)] \\
3 & \ce{BeF} &$^2\Pi$ &4.14 & FCI/aug-cc-pVTZ \\
@ -1110,20 +1170,20 @@ These TBEs are obtained with the aug-cc-pVTZ basis set, and ``Method'' indicates
49 & &$^2A''$ &4.69 & FCI/aug-cc-pVTZ \\
50 & &$^2A'$ &5.60 & FCI/aug-cc-pVTZ \\
51 & &$^2A'$ &6.20 & FCI/6-31+G(d) + [CCSDT/aug-cc-pVTZ - CCSDT/6-31+G(d)] \\
\hline
\hline
\end{tabular}
\end{threeparttable}
\end{table}
%%% %%% %%% %%%
%%% %%% %%% %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Benchmarks}
\label{sec:bench}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
In this section, we report a comprehensive benchmark of various lower-order methods on the entire set of closed-shell compounds belonging to the QUEST database.
In this section, we report a comprehensive benchmark of various lower-order methods on the entire set of closed-shell compounds belonging to the QUEST database.
Statistical quantities are reported in Table \ref{tab:stat} (the entire set of data can be found in the {\SupInf}).
Additionally, we also provide a specific analysis for each type of excited states.
Hence, the statistical values are reported for various types of excited states and molecular sizes for the MSE and MAE.
Additionally, we also provide a specific analysis for each type of excited states.
Hence, the statistical values are reported for various types of excited states and molecular sizes for the MSE and MAE.
The distribution of the errors in vertical excitation energies (with respect to the TBE/aug-cc-pVTZ reference values) are represented in Fig.~\ref{fig:QUEST_stat} for all the ``safe'' excitations having a dominant single excitation character (\ie, the double excitations are discarded).
Similar graphs are reported in the {\SupInf} for specific sets of transitions and molecules.
@ -1134,13 +1194,13 @@ Similar graphs are reported in the {\SupInf} for specific sets of transitions an
\caption{Mean signed error (MSE), mean absolute error (MAE), root-mean-square error (RMSE), standard deviation of the errors (SDE), as well as the maximum positive error [Max(+)] and negative error [Max($-$)] with respect to the TBE/aug-cc-pVTZ for the entire QUEST database.
Only the ``safe'' TBEs are considered (see Table \ref{tab:TBE}).
For the MSE and MAE, the statistical values are reported for various types of excited states and molecular sizes.
All quantities are given in eV.
All quantities are given in eV.
``Count'' refers to the number of transitions considered for each method.}
\label{tab:stat}
\begin{threeparttable}
\begin{tabular}{llccccccccccccccc}
\headrow
& & \thead{CIS(D)} & \thead{CC2} & \thead{EOM-MP2} & \thead{STEOM-CCSD} & \thead{CCSD} & \thead{CCSDR(3)} & \thead{CCCSDT-3} & \thead{CC3}
& & \thead{CIS(D)} & \thead{CC2} & \thead{EOM-MP2} & \thead{STEOM-CCSD} & \thead{CCSD} & \thead{CCSDR(3)} & \thead{CCCSDT-3} & \thead{CC3}
& \thead{SOS-ADC(2)$^a$} & \thead{SOS-CC2$^a$} & \thead{SCS-CC2$^a$} & \thead{SOS-ADC(2)$^b$} & \thead{ADC(2)} & \thead{ADC(3)} & \thead{ADC(2.5)} \\
Count & & 429 & 431 & 427 & 360 & 431 & 259 & 251 & 431 & 430 & 430 & 430 & 430 & 426 & 423 & 423 \\
Max(+) & & 1.06 & 0.63 & 0.80 & 0.59 & 0.80 & 0.43 & 0.26 & 0.19 & 0.87 & 0.84 & 0.76 & 0.73 & 0.64 & 0.60 & 0.24 \\
@ -1156,20 +1216,20 @@ MSE & & 0.13 & 0.02 & 0.18 & -0.01 & 0.10 & 0.04 & 0.04 & 0.00 & 0.18 & 0.2
& 4 non-H & 0.13 & 0.04 & 0.12 & 0.00 & 0.09 & 0.03 & 0.04 & 0.00 & 0.19 & 0.26 & 0.19 & 0.03 & -0.04 & -0.10 & -0.07 \\
& 5--6 non-H & 0.17 & 0.02 & 0.30 & -0.01 & 0.11 & 0.05 & 0.05 & 0.00 & 0.21 & 0.20 & 0.14 & 0.03 & 0.03 & -0.10 & -0.04 \\
& 7--10 non-H & 0.15 & -0.03 & 0.42 & -0.05 & 0.22 & 0.10 & 0.08 & -0.01 & 0.26 & 0.29 & 0.19 & 0.05 & -0.06 & -0.02 & -0.04 \\
SDE & & 0.24 & 0.20 & 0.21 & 0.13 & 0.12 & 0.05 & 0.04 & 0.02 & 0.17 & 0.16 & 0.16 & 0.15 & 0.20 & 0.22 & 0.08 \\
SDE & & 0.24 & 0.20 & 0.21 & 0.13 & 0.12 & 0.05 & 0.04 & 0.02 & 0.17 & 0.16 & 0.16 & 0.15 & 0.20 & 0.22 & 0.08 \\
RMSE & & 0.29 & 0.22 & 0.28 & 0.15 & 0.16 & 0.07 & 0.06 & 0.03 & 0.25 & 0.26 & 0.22 & 0.17 & 0.21 & 0.26 & 0.10 \\
MAE & & 0.22 & 0.16 & 0.22 & 0.11 & 0.12 & 0.05 & 0.04 & 0.02 & 0.20 & 0.22 & 0.18 & 0.13 & 0.15 & 0.21 & 0.08 \\
& singlet & 0.22 & 0.16 & 0.25 & 0.10 & 0.14 & 0.05 & 0.04 & 0.02 & 0.21 & 0.22 & 0.17 & 0.14 & 0.16 & 0.20 & 0.09 \\
& triplet & 0.23 & 0.15 & 0.18 & 0.12 & 0.08 & & & 0.01 & 0.20 & 0.23 & 0.19 & 0.11 & 0.15 & 0.22 & 0.08 \\
& valence & 0.22 & 0.14 & 0.24 & 0.12 & 0.13 & 0.06 & 0.05 & 0.02 & 0.21 & 0.25 & 0.20 & 0.12 & 0.13 & 0.22 & 0.08 \\
& Rydberg & 0.22 & 0.21 & 0.19 & 0.10 & 0.08 & 0.03 & 0.03 & 0.02 & 0.20 & 0.15 & 0.13 & 0.14 & 0.21 & 0.18 & 0.09 \\
& $n \ra \pis$ & 0.18 & 0.08 & 0.28 & 0.08 & 0.17 & 0.07 & 0.07 & 0.01 & 0.26 & 0.32 & 0.22 & 0.11 & 0.10 & 0.14 & 0.07 \\
& $\pi \ra \pis$& 0.27 & 0.19 & 0.21 & 0.14 & 0.11 & 0.06 & 0.04 & 0.02 & 0.18 & 0.21 & 0.20 & 0.12 & 0.16 & 0.28 & 0.09 \\
& 1--3 non-H & 0.23 & 0.19 & 0.13 & 0.10 & 0.07 & 0.03 & 0.03 & 0.02 & 0.18 & 0.20 & 0.19 & 0.14 & 0.19 & 0.24 & 0.10 \\
& 4 non-H & 0.22 & 0.19 & 0.15 & 0.11 & 0.11 & 0.03 & 0.04 & 0.02 & 0.19 & 0.26 & 0.22 & 0.13 & 0.18 & 0.23 & 0.08 \\
& singlet & 0.22 & 0.16 & 0.25 & 0.10 & 0.14 & 0.05 & 0.04 & 0.02 & 0.21 & 0.22 & 0.17 & 0.14 & 0.16 & 0.20 & 0.09 \\
& triplet & 0.23 & 0.15 & 0.18 & 0.12 & 0.08 & & & 0.01 & 0.20 & 0.23 & 0.19 & 0.11 & 0.15 & 0.22 & 0.08 \\
& valence & 0.22 & 0.14 & 0.24 & 0.12 & 0.13 & 0.06 & 0.05 & 0.02 & 0.21 & 0.25 & 0.20 & 0.12 & 0.13 & 0.22 & 0.08 \\
& Rydberg & 0.22 & 0.21 & 0.19 & 0.10 & 0.08 & 0.03 & 0.03 & 0.02 & 0.20 & 0.15 & 0.13 & 0.14 & 0.21 & 0.18 & 0.09 \\
& $n \ra \pis$ & 0.18 & 0.08 & 0.28 & 0.08 & 0.17 & 0.07 & 0.07 & 0.01 & 0.26 & 0.32 & 0.22 & 0.11 & 0.10 & 0.14 & 0.07 \\
& $\pi \ra \pis$& 0.27 & 0.19 & 0.21 & 0.14 & 0.11 & 0.06 & 0.04 & 0.02 & 0.18 & 0.21 & 0.20 & 0.12 & 0.16 & 0.28 & 0.09 \\
& 1--3 non-H & 0.23 & 0.19 & 0.13 & 0.10 & 0.07 & 0.03 & 0.03 & 0.02 & 0.18 & 0.20 & 0.19 & 0.14 & 0.19 & 0.24 & 0.10 \\
& 4 non-H & 0.22 & 0.19 & 0.15 & 0.11 & 0.11 & 0.03 & 0.04 & 0.02 & 0.19 & 0.26 & 0.22 & 0.13 & 0.18 & 0.23 & 0.08 \\
& 5--6 non-H & 0.21 & 0.12 & 0.30 & 0.12 & 0.13 & 0.06 & 0.05 & 0.01 & 0.22 & 0.21 & 0.15 & 0.11 & 0.11 & 0.19 & 0.07 \\
& 7--10 non-H & 0.24 & 0.11 & 0.42 & 0.12 & 0.23 & 0.10 & 0.08 & 0.02 & 0.27 & 0.29 & 0.19 & 0.12 & 0.14 & 0.16 & 0.07 \\
\hline
\hline
\end{tabular}
\begin{tablenotes}
\item $^a$ Excitation energies computed with TURBOMOLE.
@ -1191,22 +1251,22 @@ MAE & & 0.22 & 0.16 & 0.22 & 0.11 & 0.12 & 0.05 & 0.04 & 0.02 & 0.20 & 0.22
The most striking feature from the statistical indicators gathered in Table \ref{tab:stat} is the overall accuracy of CC3 with MAEs and MSEs systematically below the chemical accuracy threshold (errors $<$ 0.043 eV or 1 kcal/mol), irrespective of the nature of the transition and the size of the molecule.
CCSDR(3) are CCCSDT-3 can also be regarded as excellent performers with overall MAEs below $0.05$ eV, though one would notice a slight degradation of their performances for the $n \ra \pis$ excitations and the largest molecules of the database.
The other third-order method, ADC(3), which enjoys a lower computational cost, is significantly less accurate and does not really improve upon its second-order analog, even for the largest systems considered here, an observation in line with a previous analysis by some of the authors \cite{Loos_2020d}.
Nonetheless, ADC(3)'s accuracy improves in larger compounds, with a MAE of 0.24 eV (0.16 eV) for the subsets of the most compact (extended) compounds considered herein. The ADC(2.5) composite method introduced in Ref.~\cite{Loos_2020d}, which corresponds to grossly average the ADC(2) and ADC(3)
Nonetheless, ADC(3)'s accuracy improves in larger compounds, with a MAE of 0.24 eV (0.16 eV) for the subsets of the most compact (extended) compounds considered herein. The ADC(2.5) composite method introduced in Ref.~\cite{Loos_2020d}, which corresponds to grossly average the ADC(2) and ADC(3)
values, yields an appreciable accuracy improvement, as shown in Fig.~\ref{fig:QUEST_stat}. Indeed, we note that the MAE of 0.07 eV obtained for ``large'' compounds is comparable to the one obtained with CCSDR(3) and CCSDT-3 for these molecules. All these third-order methods
are rather equally efficient for valence and Rydberg transitions.
Concerning the second-order methods (which have the indisputable advantage to be applicable to larger molecules than the ones considered here), we have the following ranking in terms of MAEs: EOM-MP2 $\approx$ CIS(D) $<$ CC2 $\approx$ ADC(2) $<$ CCSD $\approx$ STEOM-CCSD, which fits our previous conclusions on the specific subsets \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d}.
A very similar ranking is obtained when one looks at the MSEs.
It is noteworthy that the performances of EOM-MP2 and CCSD are getting notably worse when the system size increases, while CIS(D) and STEOM-CCSD have a very stable behavior with respect to system size.
Indeed, the EOM-MP2 MAE attains 0.42 eV for molecules containing between 7 and 10 non-hydrogen atoms, whereas the CCSD tendency to overshoot the transition energies yield a MSE of 0.22 eV for the same set (a rather large error).
For CCSD, this conclusion fits benchmark studies published by other groups \cite{Schreiber_2008,Caricato_2010,Watson_2013,Kannar_2014,Kannar_2017,Dutta_2018}.
Concerning the second-order methods (which have the indisputable advantage to be applicable to larger molecules than the ones considered here), we have the following ranking in terms of MAEs: EOM-MP2 $\approx$ CIS(D) $<$ CC2 $\approx$ ADC(2) $<$ CCSD $\approx$ STEOM-CCSD, which fits our previous conclusions on the specific subsets \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d}.
A very similar ranking is obtained when one looks at the MSEs.
It is noteworthy that the performances of EOM-MP2 and CCSD are getting notably worse when the system size increases, while CIS(D) and STEOM-CCSD have a very stable behavior with respect to system size.
Indeed, the EOM-MP2 MAE attains 0.42 eV for molecules containing between 7 and 10 non-hydrogen atoms, whereas the CCSD tendency to overshoot the transition energies yield a MSE of 0.22 eV for the same set (a rather large error).
For CCSD, this conclusion fits benchmark studies published by other groups \cite{Schreiber_2008,Caricato_2010,Watson_2013,Kannar_2014,Kannar_2017,Dutta_2018}.
For example, K\'ann\'ar and Szalay obtained a MAE of 0.18 eV on Thiel's set for the states exhibiting a dominant single excitation character.
The CCSD degradation with system size might partially explain the similar (though less pronounced) trend obtained for CCSDR(3).
Regarding the apparently better performances of STEOM-CCSD as compared to CCSD, we recall that several challenging states have been naturally removed from the STEOM-CCSD statistics because the active character percentage was lower than $98\%$ (see above).
In contrast to EOM-MP2 and CCSD, the overall accuracy of CC2 and ADC(2) does significantly improve for larger molecules, the performances of the two methods being, as expected, similar \cite{Harbach_2014}.
Let us note that these two methods show similar accuracies for singlet and triplet transitions, but are significantly less accurate for Rydberg transitions, as already pointed out previously \cite{Kannar_2017}.
Therefore, both CC2 and ADC(2) offer an appealing cost-to-accuracy ratio for large compounds, which explains their popularity in realistic chemical scenarios \cite{Hattig_2005c,Goerigk_2010a,Send_2011a,Winter_2013,Jacquemin_2015b,Oruganti_2016}.
For the scaled methods [SOS-ADC(2), SOS-CC2, and SCS-CC2], the TURBOMOLE scaling factors do not seem to improve things upon the unscaled versions, while the Q-CHEM scaling factors for ADC(2) provide a small, yet significant improvement for this set of molecules.
The CCSD degradation with system size might partially explain the similar (though less pronounced) trend obtained for CCSDR(3).
Regarding the apparently better performances of STEOM-CCSD as compared to CCSD, we recall that several challenging states have been naturally removed from the STEOM-CCSD statistics because the active character percentage was lower than $98\%$ (see above).
In contrast to EOM-MP2 and CCSD, the overall accuracy of CC2 and ADC(2) does significantly improve for larger molecules, the performances of the two methods being, as expected, similar \cite{Harbach_2014}.
Let us note that these two methods show similar accuracies for singlet and triplet transitions, but are significantly less accurate for Rydberg transitions, as already pointed out previously \cite{Kannar_2017}.
Therefore, both CC2 and ADC(2) offer an appealing cost-to-accuracy ratio for large compounds, which explains their popularity in realistic chemical scenarios \cite{Hattig_2005c,Goerigk_2010a,Send_2011a,Winter_2013,Jacquemin_2015b,Oruganti_2016}.
For the scaled methods [SOS-ADC(2), SOS-CC2, and SCS-CC2], the TURBOMOLE scaling factors do not seem to improve things upon the unscaled versions, while the Q-CHEM scaling factors for ADC(2) provide a small, yet significant improvement for this set of molecules.
Of course, one of the remaining open questions regarding all these methods is their accuracy for even larger systems.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -1215,7 +1275,7 @@ Of course, one of the remaining open questions regarding all these methods is th
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Quite a large number of calculations were required for each of the
QUEST articles \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d}.
QUEST articles \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d}.
Up to now, all the curated data was shared as
supplementary information presented as a file in portable document
format (pdf). This way of sharing data does not require too much
@ -1230,8 +1290,8 @@ The application also gives the possibility to the user to import
external data files, in order to compare the performance of methods
that are not in our database.
Both the web application and the data are hosted in a single GitHub
repository (\url{https://github.com/LCPQ/QUESTDB_website})
and available at the following address: \url{https://lcpq.github.io/QUESTDB_website}.
repository (\url{https://github.com/LCPQ/QUESTDB_website})
and available at the following address: \url{https://lcpq.github.io/QUESTDB_website}.
In this way, extending the database is as simple as adding new data files to the
repository, together with the corresponding bibliographic references,
and we strongly encourage users to contribute to enlarge this database
@ -1241,21 +1301,21 @@ via GitHub pull requests.
\section{Concluding remarks}
\label{sec:ccl}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
In the present review article, we have presented and extended the QUEST database of highly-accurate excitation energies for molecular systems \cite{Loos_2020a,Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c} that we started building
In the present review article, we have presented and extended the QUEST database of highly-accurate excitation energies for molecular systems \cite{Loos_2020a,Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c} that we started building
in 2018 and that is now composed by more than 500 vertical excitations, many of which can be reasonably considered as within 1 kcal/mol (or less) of the FCI limit for the considered CC3/aug-cc-pVTZ geometry and basis set (\emph{aug}-cc-pVTZ).
In particular, we have detailed the specificities of our protocol by providing computational details regarding geometries, basis sets, as well as reference and benchmarked computational methods. The content of our five QUEST subsets has
been presented in detail, and for each of them, we have provided the number of reference excitation energies, the nature and size of the molecules, the list of benchmarked methods, as well as other useful specificities.
In particular, we have detailed the specificities of our protocol by providing computational details regarding geometries, basis sets, as well as reference and benchmarked computational methods. The content of our five QUEST subsets has
been presented in detail, and for each of them, we have provided the number of reference excitation energies, the nature and size of the molecules, the list of benchmarked methods, as well as other useful specificities.
Importantly, we have proposed a new statistical method that produces much safer estimates of the extrapolation error in SCI calculations. This new method based on Gaussian random variables has been tested by computing additional FCI values for five- and six-membered rings.
After having discussed the generation of our TBEs, we have reported a comprehensive benchmark for a significant number of methods on the entire QUEST set with, in addition, a specific analysis for each type of excited states.
Finally, the main features of the website specifically designed to gather the entire data generated during these past few years have been presented and discussed.
Paraphrasing Thiel's conclusions \cite{Schreiber_2008}, we hope that not only the QUEST database will be used for further benchmarking and testing, but that other research groups will also improve it, providing not only corrections
(inevitable in such a large data set), but more importantly extensions with both improved estimates for some compounds and states, or new molecules.
(inevitable in such a large data set), but more importantly extensions with both improved estimates for some compounds and states, or new molecules.
In this framework, we provide in the {\SupInf} a file with all our benchmark data.
Regarding future improvements and extensions, we would like to mention that although our present goal is to produce chemically accurate vertical excitation energies, we are currently devoting great efforts to obtain highly-accurate excited-state properties \cite{Hodecker_2019,Eriksen_2020b} such as dipoles and oscillator strengths for molecules of small and medium sizes \cite{Chrayteh_2021,Sarkar_2021}, so as to complete previous efforts aiming at determining accurate excited-state geometries \cite{Budzak_2017,Jacquemin_2018}.
Regarding future improvements and extensions, we would like to mention that although our present goal is to produce chemically accurate vertical excitation energies, we are currently devoting great efforts to obtain highly-accurate excited-state properties \cite{Hodecker_2019,Eriksen_2020b} such as dipoles and oscillator strengths for molecules of small and medium sizes \cite{Chrayteh_2021,Sarkar_2021}, so as to complete previous efforts aiming at determining accurate excited-state geometries \cite{Budzak_2017,Jacquemin_2018}.
Reference ground-state properties (such as correlation energies and atomization energies) are also being currently produced \cite{Scemama_2020,Loos_2020e}.
Besides this, because computing 500 (or so) excitation energies can be a costly exercise even with cheap computational methods, we are planning on developing a ``diet set'' (\ie, a much smaller set of excitation energies which can reproduce key results of the full QUEST database, including ranking of approximations) following the philosophy of the ``diet GMTKN55'' set proposed recently by Gould \cite{Gould_2018b}.
Besides this, because computing 500 (or so) excitation energies can be a costly exercise even with cheap computational methods, we are planning on developing a ``diet set'' (\ie, a much smaller set of excitation energies which can reproduce key results of the full QUEST database, including ranking of approximations) following the philosophy of the ``diet GMTKN55'' set proposed recently by Gould \cite{Gould_2018b}.
We hope to report on this in the near future.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -1263,7 +1323,7 @@ We hope to report on this in the near future.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
This work was performed using HPC resources from GENCI-TGCC (Grand Challenge 2019-gch0418) and from CALMIP (Toulouse) under allocation 2020-18005.
AS, MC, and PFL thank the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (Grant agreement No.~863481) for financial support.
Funding from the \textit{``Centre National de la Recherche Scientifique''} is also acknowledged.
Funding from the \textit{``Centre National de la Recherche Scientifique''} is also acknowledged.
DJ acknowledges the \textit{R\'egion des Pays de la Loire} for financial support and the CCIPL computational center for ultra-generous allocation of computational time.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -1280,14 +1340,14 @@ Cartesian coordinates of each molecule (in bohr), Python code associated with th
\bibliography{QUESTDB}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\newpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{biography}[MVeril]{M.~V\'eril}
was born in Toulouse in 1993.
was born in Toulouse in 1993.
He received his B.Sc.~in Molecular Chemistry from the Universit\'e Paul Sabatier (Toulouse, France) in 2015 and his M.Sc.~in Computational and Theoretical Chemistry and Modeling from the same university in 2018.
Since 2018, he is a Ph.D.~student in the group of Dr.~Pierre-Fran\c{c}ois Loos at the Laboratoire de Chimie et Physique Quantiques in Toulouse.
He is currently developing QUANTUM PACKAGE and the web application linked to the QUEST project.
He is currently developing QUANTUM PACKAGE and the web application linked to the QUEST project.
\end{biography}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -1301,7 +1361,7 @@ In 2006, he obtained a Research Engineer position from the \textit{``Centre Nati
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{biography}[MCaffarel]{M.~Caffarel}
received his Ph.D. in Theoretical Physics and Chemistry from the Universit\'e Pierre et Marie Curie (Paris, France) in 1987, before moving to the University of Illinois at Urbana-Champaign for a two-year postdoctoral stay in the group of Prof.~David Ceperley.
received his Ph.D. in Theoretical Physics and Chemistry from the Universit\'e Pierre et Marie Curie (Paris, France) in 1987, before moving to the University of Illinois at Urbana-Champaign for a two-year postdoctoral stay in the group of Prof.~David Ceperley.
He is currently working as a senior scientist at the ``Centre National de la Recherche Scientifique (CNRS)'' at the Laboratoire de Chimie et Physique Quantiques in Toulouse (France).
His research is mainly focused on the development and application of quantum Monte Carlo methods for theoretical chemistry and condensed-mater physics.
\end{biography}
@ -1315,31 +1375,31 @@ got his Ph.D. in Chemistry from Scuola Normale Superiore, Pisa in 2013. He worke
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{biography}[MBoggioPasqua]{M.~Boggio-Pasqua}
received his PhD in Physical Chemistry from the Universit\'e Bordeaux 1 in 1999.
He then worked as a post-doctoral research associate at King's College London (2000-2003) and at Imperial College London (2004-2007) with M.~Robb and M.~Bearpark.
He was then appointed as a CNRS researcher at the Laboratoire de Chimie et Physique Quantiques at the Universit\'e Paul Sabatier (Toulouse).
received his PhD in Physical Chemistry from the Universit\'e Bordeaux 1 in 1999.
He then worked as a post-doctoral research associate at King's College London (2000-2003) and at Imperial College London (2004-2007) with M.~Robb and M.~Bearpark.
He was then appointed as a CNRS researcher at the Laboratoire de Chimie et Physique Quantiques at the Universit\'e Paul Sabatier (Toulouse).
His main research interests are focused on the theoretical studies of photochemical processes in complex molecular systems including the description of excited-state reaction mechanisms based on static explorations of potential energy surfaces and simulations of nonadiabatic dynamics.
\end{biography}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{biography}[DJacquemin]{D.~Jacquemin}
received his PhD in Chemistry from the University of Namur in 1998, before moving to the University of Florida for his postdoctoral stay. He is currently full Professor at the University of Nantes (France).
His research is focused on modeling electronically excited-state processes in organic and inorganic dyes as well as photochromes using a large panel of \emph{ab initio} approaches. His group collaborates with many experimental and theoretical groups.
received his PhD in Chemistry from the University of Namur in 1998, before moving to the University of Florida for his postdoctoral stay. He is currently full Professor at the University of Nantes (France).
His research is focused on modeling electronically excited-state processes in organic and inorganic dyes as well as photochromes using a large panel of \emph{ab initio} approaches. His group collaborates with many experimental and theoretical groups.
He is the author of more than 500 scientific papers. He has been ERC grantee (2011--2016), member of Institut Universitaire de France (2012--2017) and received the WATOC's Dirac Medal (2014).
\end{biography}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{biography}[PFLoos]{P.-F.~Loos}
received his Ph.D.~in Computational and Theoretical Chemistry from the Universit\'e Henri Poincar\'e (Nancy, France) in 2008.
From 2009 to 2013, He was undertaking postdoctoral research with Peter M.W.~Gill at the Australian National University (ANU).
From 2013 to 2017, he was a \textit{``Discovery Early Career Researcher Award''} recipient and, then, a senior lecturer at the ANU.
received his Ph.D.~in Computational and Theoretical Chemistry from the Universit\'e Henri Poincar\'e (Nancy, France) in 2008.
From 2009 to 2013, He was undertaking postdoctoral research with Peter M.W.~Gill at the Australian National University (ANU).
From 2013 to 2017, he was a \textit{``Discovery Early Career Researcher Award''} recipient and, then, a senior lecturer at the ANU.
Since 2017, he holds a researcher position from the \textit{``Centre National de la Recherche Scientifique (CNRS)} at the \textit{Laboratoire de Chimie et Physique Quantiques} in Toulouse (France), and was awarded, in 2019, an ERC consolidator grant for the development of new excited-state methodologies.
\end{biography}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\newpage
\graphicalabstract{TOC}{QUEST: a dataset of highly-accurate excitation energies.}

Binary file not shown.

View File

@ -0,0 +1,33 @@
-5.8181818181818182E-002 1.0503392263823726E-013
-5.4545454545454543E-002 3.1979374437806884E-012
-5.0909090909090911E-002 7.8108457347063320E-011
-4.7272727272727272E-002 1.5304314242122915E-009
-4.3636363636363640E-002 2.4055667144690598E-008
-4.0000000000000001E-002 3.0332531323628614E-007
-3.6363636363636362E-002 3.0682279452286258E-006
-3.2727272727272730E-002 2.4897417674801438E-005
-2.9090909090909087E-002 1.6207226680756907E-004
-2.5454545454545448E-002 8.4635161104256395E-004
-2.1818181818181816E-002 3.5455258945109665E-003
-1.8181818181818177E-002 1.1915114370157230E-002
-1.4545454545454540E-002 3.2122067227954430E-002
-1.0909090909090908E-002 6.9469868291055614E-002
-7.2727272727272693E-003 0.12052501154967402
-3.6363636363636368E-003 0.16774346335684942
2.1684043449710089E-018 0.18728446158700213
3.6363636363636411E-003 0.16774346335685103
7.2727272727272797E-003 0.12052501154967629
1.0909090909090912E-002 6.9469868291057640E-002
1.4545454545454544E-002 3.2122067227955672E-002
1.8181818181818191E-002 1.1915114370157782E-002
2.1818181818181823E-002 3.5455258945111712E-003
2.5454545454545455E-002 8.4635161104262119E-004
2.9090909090909101E-002 1.6207226680758102E-004
3.2727272727272730E-002 2.4897417674803691E-005
3.6363636363636362E-002 3.0682279452289367E-006
3.9999999999999994E-002 3.0332531323632113E-007
4.3636363636363640E-002 2.4055667144693550E-008
4.7272727272727272E-002 1.5304314242124981E-009
5.0909090909090904E-002 7.8108457347074978E-011
5.4545454545454550E-002 3.1979374437811545E-012
5.8181818181818182E-002 1.0503392263825442E-013

View File

@ -0,0 +1,33 @@
-5.8283990969853983E-002 0.0000000000000000
-5.4692045827783142E-002 0.0000000000000000
-5.1100100685712302E-002 5.2356020942408380E-003
-4.7508155543641448E-002 0.0000000000000000
-4.3916210401570607E-002 5.2356020942408380E-003
-4.0324265259499767E-002 1.0471204188481676E-002
-3.6732320117428927E-002 0.0000000000000000
-3.3140374975358072E-002 0.0000000000000000
-2.9548429833287228E-002 5.2356020942408380E-003
-2.5956484691216385E-002 5.2356020942408380E-003
-2.2364539549145530E-002 5.2356020942408380E-003
-1.8772594407074690E-002 1.0471204188481676E-002
-1.5180649265003851E-002 3.6649214659685861E-002
-1.1588704122933013E-002 4.1884816753926704E-002
-7.9967589808621585E-003 9.4240837696335081E-002
-4.4048138387913181E-003 0.13612565445026178
-8.1286869672047755E-004 0.21465968586387435
2.7790764453503769E-003 0.10471204188481675
6.3710215874212169E-003 0.12565445026178010
9.9629667294920572E-003 7.3298429319371722E-002
1.3554911871562898E-002 2.0942408376963352E-002
1.7146857013633738E-002 4.1884816753926704E-002
2.0738802155704582E-002 5.2356020942408380E-003
2.4330747297775450E-002 1.0471204188481676E-002
2.7922692439846290E-002 1.0471204188481676E-002
3.1514637581917131E-002 5.2356020942408380E-003
3.5106582723987964E-002 1.5706806282722512E-002
3.8698527866058804E-002 5.2356020942408380E-003
4.2290473008129645E-002 0.0000000000000000
4.5882418150200485E-002 0.0000000000000000
4.9474363292271353E-002 0.0000000000000000
5.3066308434342194E-002 5.2356020942408380E-003
5.6658253576413034E-002 0.0000000000000000

59
Manuscript/fig2/fig2.org Normal file
View File

@ -0,0 +1,59 @@
** Initialize R packages
#+begin_src R :results output :session *R* :exports code
library(ggplot2)
library(latex2exp)
library(extrafont)
library(RColorBrewer)
loadfonts()
#+end_src
#+RESULTS:
:
: Registering fonts with R
** Read data
#+begin_src R :results output :session *R* :exports both
df <- read.table("data_histogram_paper");
df$x <- df$V1
df$y <- df$V2
df2 <- read.table("data_gaussian_histogram_paper");
spline.d <- as.data.frame(spline(df2$V1, df2$V2))
summary(spline.d)
#+end_src
#+RESULTS:
:
: x y
: Min. :-0.05818 Min. :0.000e+00
: 1st Qu.:-0.02909 1st Qu.:2.000e-08
: Median : 0.00000 Median :1.213e-04
: Mean : 0.00000 Mean :3.093e-02
: 3rd Qu.: 0.02909 3rd Qu.:3.011e-02
: Max. : 0.05818 Max. :1.873e-01
#+begin_src R :results output graphics :file (org-babel-temp-file "figure" ".png") :exports both :width 600 :height 400 :session *R*
p <- ggplot(data=df, aes(x, y)) +
geom_bar(stat="identity", fill="steelblue")
p <- p+ geom_line(data=spline.d, lwd=1, linetype="dashed")
p <- p + scale_x_continuous(name=TeX("$X^{(m)}$"))
p <- p + scale_y_continuous(name=TeX("Frequency"))
p <- p + theme(text = element_text(size = 20, family="Times"),
legend.position = c(.20, .20),
legend.title = element_blank())
p
#+end_src
#+RESULTS:
[[file:/tmp/babel-nBBwmV/figureJJu58N.png]]
* Export to pdf
#+begin_src R :results output :session *R* :exports code
pdf("fig2.pdf", family="Times", width=8, height=5)
p
dev.off()
#+end_src
#+RESULTS:
:
: png
: 2

BIN
Manuscript/fig2/fig2.pdf Normal file

Binary file not shown.

Binary file not shown.

BIN
Manuscript/fig4.pdf Normal file

Binary file not shown.