update website section

This commit is contained in:
Pierre-Francois Loos 2020-11-26 23:01:32 +01:00
parent dbca76a9c7
commit 58b5dac6d8

View File

@ -91,7 +91,7 @@ of vertical excitations is that it does not rely on any experimental values, avo
composite protocol, we have been able to produce theoretical best estimate (TBEs) with the aug-cc-pVTZ basis set for each of these transitions, as well as basis set corrected TBEs (i.e., near
the complete basis set limit) for some of them. The TBEs/aug-cc-pVTZ have been employed to benchmark a large number of (lower-order) wave function methods such as CIS(D), ADC(2), CC2,
STEOM-CCSD, CCSD, CCSDR(3), CCSDT-3, ADC(3), CC3, NEVPT2, and others (including spin-scaled variants). In order to gather the huge amount of data produced during the QUEST
project, we have created a website [\url{https://github.com/mveril/QUESTDB_website}] where one can easily test and compare the accuracy of a given method with respect to various variables
project, we have created a website [\url{https://github.com/LCPQ/QUESTDB_website}] where one can easily test and compare the accuracy of a given method with respect to various variables
such as the molecule size or its family, the nature of the excited states, the type of basis set, etc.
%Add website address here
We hope that the present review will provide a useful summary of our effort so far and foster new developments around excited-state methods.
@ -174,9 +174,9 @@ The QUEST dataset has the particularity to be based in a large proportion on sel
LR-CCSDTQ \cite{Noga_1987,Koch_1990,Kucharski_1991,Christiansen_1998b,Kucharski_2001,Kowalski_2001,Kallay_2003,Kallay_2004,Hirata_2000,Hirata_2004}. Recently, SCI methods have been a force to reckon with for
the computation of highly-accurate energies in small- and medium-sized molecules as they yield near full configuration interaction (FCI) quality energies for only a fraction of the computational cost of a genuine FCI calculation \cite{Booth_2009,Booth_2010,Cleland_2010,Booth_2011,Daday_2012,Blunt_2015,Ghanem_2019,Deustua_2017,Deustua_2018,Holmes_2017,Chien_2018,Li_2018,Yao_2020,Li_2020,Eriksen_2017,Eriksen_2018,Eriksen_2019a,Eriksen_2019b,Xu_2018,Xu_2020,Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020a,Loos_2020e,Eriksen_2021}.
Due to the fairly natural idea underlying these methods, the SCI family is composed by numerous members \cite{Bender_1969,Whitten_1969,Huron_1973,Abrams_2005,Bunge_2006,Bytautas_2009,Giner_2013,Caffarel_2014,Giner_2015,Garniron_2017b,Caffarel_2016a,Caffarel_2016b,Holmes_2016,Sharma_2017,Holmes_2017,Chien_2018,Scemama_2018,Scemama_2018b,Garniron_2018,Evangelista_2014,Tubman_2016,Tubman_2020,Schriber_2016,Schriber_2017,Liu_2016,Per_2017,Ohtsuka_2017,Zimmerman_2017,Li_2018,Ohtsuka_2017,Coe_2018,Loos_2019}.
Their fundamental philosophy consists, roughly speaking, in retaining only the most \alert{\textst{energetically}} relevant determinants of the FCI space following a given criterion to slow down the exponential increase of the size of the CI expansion.
Their fundamental philosophy consists, roughly speaking, in retaining only the most relevant determinants of the FCI space following a given criterion to slow down the exponential increase of the size of the CI expansion.
Originally developed in the late 1960's by Bender and Davidson \cite{Bender_1969} as well as Whitten and Hackmeyer \cite{Whitten_1969}, new efficient SCI algorithms have resurfaced recently.
Three examples are \alert{\textst{adaptive sampling CI (ASCI)}, }iCI \cite{Liu_2014,Liu_2016,Lei_2017,Zhang_2020}, semistochastic heat-bath CI (SHCI) \cite{Holmes_2016,Holmes_2017,Sharma_2017,Li_2018,Li_2020,Yao_2020}, and \textit{Configuration Interaction using a Perturbative Selection made Iteratively} (CIPSI) \cite{Huron_1973,Giner_2013,Giner_2015,Garniron_2019}.
Three examples are iCI \cite{Liu_2014,Liu_2016,Lei_2017,Zhang_2020}, semistochastic heat-bath CI (SHCI) \cite{Holmes_2016,Holmes_2017,Sharma_2017,Li_2018,Li_2020,Yao_2020}, and \textit{Configuration Interaction using a Perturbative Selection made Iteratively} (CIPSI) \cite{Huron_1973,Giner_2013,Giner_2015,Garniron_2019}.
These flavors of SCI include a second-order perturbative (PT2) correction which is key to estimate the ``distance'' to the FCI solution (see below).
The SCI calculations performed for the QUEST set of excitation energies relies on the CIPSI algorithm, which is, from a historical point of view, one of the oldest SCI algorithms.
It was developed in 1973 by Huron, Rancurel, and Malrieu \cite{Huron_1973} (see also Refs.~\cite{Evangelisti_1983,Cimiraglia_1985,Cimiraglia_1987,Illas_1988,Povill_1992}).
@ -1267,128 +1267,28 @@ Of course, one of the remaining open questions regarding all these methods is th
\label{sec:website}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%DJ: Pas relu ceci
Quite a large number of calculations were required for each of the
QUEST articles \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d}.
Up to now, all the curated data was shared as
supplementary information presented as a file in portable document
format (pdf). This way of sharing data does not require too much
effort for the authors, but it is obviously not optimal from the
user's point of view.
We have now addressed this problem by creating a database which
contains all the vertical and fluorescence transition energies as well
the corresponding molecular geometries. This data can be manipulated via
a web application which allows to plot the statistical indicators
computed on selected subsets of molecules, methods and basis sets.
The application also gives the possibility to the user to import
external data files, in order to compare the performance of methods
that are not in our database.
Both the web application and the data are hosted in a single GitHub
repository (\url{https://github.com/LCPQ/QUESTDB_website}). In this way,
extending the database is as simple as adding new data files to the
repository, together with the corresponding bibliographic references,
and we strongly encourage users to contribute to enlarge this database
via GitHub pull requests.
{
\newcommand{\meth}{\text{meth}}
\newcommand{\err}{e}
\newcommand{\nEx}{X}
\newcommand{\nExnn}{\mathcal{X}}
%=======================
\subsection{Introduction}
\label{sec:websiteIntro}
%=======================
The previous QUEST publications \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d} expose vertical excitation data, some statistics were provided considering the most relevant parameters.
But depending to the specific interest of a quantum chemist this parameter selection can be irrelevant for his study.
Furthermore to determine the accuracy of a new method, it must be compared with reference data, such as those of the QUEST project.
For this we have to calculate the same type of statistics for the new method. The QUESTDB website was created exactly to solve these issues.
%=======================
\subsection{Specification}
%=======================
The website specification are the following
\begin{itemize}
\item Display the QUEST excitations energy value as table
\item Allow to import local files from user's computer
\item Allow to filter data with various parameters
\item Calculate statistics from these parameters
\item Display a box plot graph to easily show the methods accuracy
\end{itemize}
This solve the issues described previously section \ref{sec:websiteIntro}.
%=======================
\subsection{Architecure}
%=======================
The website architecure is desinged to be simple and to facilitate the integration of new data.
It is composed of two part.
\begin{itemize}
\item A static website used to display data and statistics.
\item A series of python tools used to generate data readable by the website
\end{itemize}
%------------------------------------------------
\subsubsection{The static website}
%------------------------------------------------
The static website is the main part. All the statistical calculation are made locally on the dataset page.
The server is only responsible for serving the pages and data of the QUEST project to the client.
If you whant to work with the QUEST data you must to go to the dataset page.
Firstly the website proposes to the user to import new data (see Sec.~\ref{sec:gentools}),
these data are added temporarily to the current session (and removed after leaving the page).
There are four multi selection list. Each list depends on the previous ones.
These lists allow to select information about the selected sets \ref{fig:scheme},
molecules \ref{fig:molecules}, methods and basis (see Sec.~\ref{sec:methods}).
After there are many filters to choose the properties of included excitations.
We provide also the ability to filter by molecule size or the active character percentage.
After that we need to define a reference method inside the already selected methods to compare with (TBE by default).
We also provide a flag to take off all the value declared not safe. We declared value as unsafe when the value have a too big
uncertainty.
\paragraph{Statistics calculations}
We want to calculate the accuracy of each couple method/basis compared to the reference (usually TBEs).
For each method we define a vector containing all the energies of the user selected vertical transitions.
With $\meth$ a couple method/basis and $E^x_\meth$ the energy of the vertical excitation $\nEx$ for the method $\meth$
and $\err_\meth$ the error vector of the method $\meth$ compared to the reference $\text{ref}$
\begin{equation}
\vec{E_\meth} = \qty{E^1_\meth, \ldots , E^\nEx_\meth}
\end{equation}
\begin{equation}
\err^x_\meth = E^x_\text{ref} - E^x_\meth
\end{equation}
When the vertical excitation $x$ is defined for the method $\meth$ and the method $\text{ref}$.
So with $\nExnn$ the size of the vector $\vec{\err^x_\meth}$
\begin{gather}
MSE_\meth = \overline{{\vec{\err_\meth}}} = \frac{1}{\nExnn}\sum_{x=1}^\nExnn\err_\meth^x \\
MAE_\meth = \overline{\abs{\vec{\err_\meth}}} \\
RMSE_\meth = \sqrt{\overline{\vec{\err_\meth}^2}} \\
\end{gather}
These statistics data inform about the accuracy of the methods compared to the reference.
\begin{gather}
SDE_\meth = \sqrt{\frac{1}{\nExnn}\sum_{x=1}^\nExnn(\err_x-MAE)^2}
\end{gather}
This statistics data inform about the precision of the methods compared to the reference.
On the website the statistics are forwarded in a table and in a box plot graph.
%------------------------------------------------
\subsubsection{Data generation tools}
\label{sec:gentools}
%------------------------------------------------
There are multiple tools that we used to generate the data.
These tools can also be used by the user (see scenario \ref{scenar:new})
The main tools is \texttt{datafileBuilder} used to generate data files from a {\LaTeX} \texttt{tabular}.
The \texttt{tabular} is associated to some options and {\LaTeX} \texttt{\textbackslash newcommand} parsed by the main script and the \texttt{tabular} environment is converted to a \texttt{NumPy} 2d array.
So the options, the {\LaTeX} \texttt{\textbackslash newcommand} to apply and the 2d array that represents the tabular environment are passed to the appropriate table parser module chosen using the \texttt{\textbackslash formatName} option in the input file.
Each module is responsible to parse the \texttt{tabular} and return all the corresponding dataFiles as object.
After, the main script output these objects to the corresponding files. Theses files can be used in the website
By importing it temporarily or to make a pull request for the new data.
The modular aspect of this tool gives us enough flexibility to easily convert many types of {\LaTeX} \texttt{tabular} to a standardized file format.
%=======================
\subsection{Usage}
%=======================
\subsubsection{Manipulation}
Firsly the user can add his own absorption and fluorescence data if he want to analyse a custom datasets.
In the dataset tab the user can select the data of his interest by selecting the sets, meolecule, method and basis.
After that the user can customize the excitations he want to taking to acount.
\subsubsection{Scenaros}
We built the website to meet mainly two useage.
\theoremstyle{break}
\theorembodyfont{\normalfont}
\newtheorem{scenar}{Scenario}{}
\begin{scenar}
\label{scenar:choose}
The user wants to choose a method for his calculation or a series of calculations.
Of course he search a compromise between the accuracy and the cost of the method.
In this case he wants to compare the accuracy of each method with a subset of excitations data corresponding to his target.
He can optimise the filter to correspond to his target (Molecular size, molecule or excitation type).
If it is possible he can only select the target molecule when this molecule is available in the QUEST data.
\end{scenar}
\begin{scenar}
\label{scenar:new}
The user has created a new method and wants to compare its accuracy with the methods of the QUEST project.
Fistly he has to create an input file for the Python tools (see Sec.~\ref{sec:gentools}) by formating the calculated results as a {\LaTeX} \texttt{tabular}.
After the data generation using the same python tools we are used to import the QUEST data, he must to import the new absorption and fluorescence data files using the button on the website,
so the new data are used in the same way than the references data to generate statistics.
After can use the website to compute the statistics in order to compare the methods.
\end{scenar}
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Concluding remarks}
\label{sec:ccl}