diff --git a/Manuscript/QUEST_WIREs.tex b/Manuscript/QUEST_WIREs.tex index 666c6dc..07dc02d 100644 --- a/Manuscript/QUEST_WIREs.tex +++ b/Manuscript/QUEST_WIREs.tex @@ -37,6 +37,8 @@ \newcommand{\tabc}[1]{\multicolumn{1}{c}{#1}} \newcommand{\QP}{\textsc{quantum package}} \newcommand{\SupInf}{supporting information} +%Vector +\renewcommand{\vec}[1]{\bm{#1}} % Update article type if known \papertype{Review Article} @@ -189,7 +191,7 @@ These basis sets have been downloaded from the \href{https://www.basissetexchang %================================== \subsection{Computational methods} %================================== - +\label{sec:methods} %------------------------------------------------ \subsubsection{Reference computational methods} %------------------------------------------------ @@ -660,10 +662,95 @@ MAE & & 0.22 & 0.16 & 0.22 & 0.11 & 0.12 & 0.05 & 0.04 & 0.02 & 0.20 & 0.22 \label{sec:website} %%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\alert{Here comes the description of Mika's website.} -Here we describe the feature of the website that we have specifically designed to gather the entire data generated during these last few years. -Thanks to this website, one can easily test and compare the accuracy of a given method with respect to various variables such as the molecule size or its family, the nature of the excited states, the size of the basis set, etc. +{ +\newcommand{\meth}{\text{meth}} +\newcommand{\err}{\mathcal{E}} +\newcommand{\nEx}{X} +\newcommand{\nExnn}{\mathcal{X}} +%======================= +\subsection{Introduction} +\label{sec:websiteIntro} +%======================= +The previous QUEST publications \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d} expose vertical excitation data, some statistics were provided considering the most relevant parameters. +But depending to the specific interest of quantum chemist this parameter selection can be irrelevant for his study. +Furthermore to determine the accuracy of a new method, it must be compared with reference data, such as those of the QUEST project. +For this we have to calculate the same type of statistics for the new method. The QUESTDB website was created exactly to solve these issues. +%======================= +\subsection{Specification} +%======================= +The website specification are the following +\begin{itemize} + \item Display the QUEST excitations energy value as table + \item Allow to import local files from user's computer + \item Allow to filter data with various parameters + \item Calculate statistics from these parameters + \item Display a box plot graph to easily show the methods accuracy +\end{itemize} +This solve the issues described at \ref{sec:websiteIntro} +%======================= +\subsection{Project} +%======================= +The project containing two parts +%------------------------------------------------ +\subsubsection{Website} +%------------------------------------------------ +This is the main part of the project. All the calculation are made locally on the dataset page. +Firstly the website proposes to the user to import new data \ref{sec:tools}. +these data are added to the current session (and removed after lost the page). +There are four multi selection list. Each list depends on the previous ones. +These lists allow to select information about the selected sets \ref{fig:scheme}. +Molecules \ref{fig:molecules} methods and basis \ref{sec:methods}. +After there are many filters to choose the properties of included excitations. +We provide also the ability to filter by molecule size or the active character percentage. +After that we need to define a reference method to compare with (TBE by default). +We also provide a flag to take off all the value declared not safe. We declared value as unsafe when the value have too big +uncertainty. +\paragraph{Statistics calculations} +We want to calculate the accuracy of each couple method/basis compared to the reference (usually TBEs). +For each method we define a vector containing all the energies of the user selected vertical transitions. +With $\meth$ a couple method/basis and $E^x_\meth$ the energy of the vertical excitation $\nEx$ for the method $\meth$ +and $\err_\meth$ the error vector of the method $\meth$ compared to the reference $\text{ref}$ +\begin{equation} + \vec{E_\meth} = \qty{E^1_\meth, \ldots , E^\nEx_\meth} +\end{equation} +\begin{equation} + \err^x_\meth = E^x_\text{ref} - E^x_\meth +\end{equation} +When the vertical excitation $x$ is defined for the method $\meth$ and the method $\text{ref}$. +So with $\nExnn$ the size of the vector $\vec{\err^x_\meth}$ +\begin{gather} + MSE_\meth = \overline{{\vec{\err_\meth}}} = \frac{1}{\nExnn}\sum_{x=1}^\nExnn\err_\meth^x \\ + MAE_\meth = \overline{\abs{\vec{\err_\meth}}} \\ + RMSE_\meth = \sqrt{\overline{\vec{\err_\meth}^2}} \\ + SDE_\meth = \sqrt{\frac{1}{\nExnn}\sum_{x=1}^\nExnn\err_x^2-MAE^2} +\end{gather} +These statistics allow user to determine the accuracy of each couple methods/basis. +On the website the statistics are forwarded in a table and in a box plot graph. + +%------------------------------------------------ +\subsubsection{Data generation tools} +\label{sec:tools} +%------------------------------------------------ +There are currently two main tools to generate data \texttt{datafileBuilder} and \texttt{ADC25generator} +\paragraph{datafileBuilder} +The \texttt{datafileBuilder} tool is used to build datafile from {\LaTeX} \texttt{tabular}. +The \texttt{tabular} is associated to some options and {\LaTeX} \texttt{\textbackslash newcommand} parsed by the main script and the \texttt{tabular} environment is converted to a \texttt{NumPy} 2d array. +So the options, the {\LaTeX} \texttt{\textbackslash newcommand} to apply and the 2d array that represents the tabular environment are passed to the appropriate table parser module chosen using the \texttt{\textbackslash formatName} option in the input file. +Each module is responsible to parse the \texttt{tabular} and return all the corresponding dataFiles as object. +After, the main script output these objects to the corresponding files. Theses files can be used in the website +By importing it temporarily or to make a pull request for the new data. +The modular aspect of this tool gives us enough flexibility to easily convert many types of {\LaTeX} \texttt{tabular} to a standardized file format. +\paragraph*{ADC25generator} +The \texttt{ADC25generator} tool merge ADC(2) and ADC(3) metadata and calculate the ADC(2.5) energy from ADC(2) and ADC(3) datafile as +\begin{equation} + E_\text{ADC(2.5)} = \frac{E_\text{ADC(2)}+E_\text{ADC(3)}}{2} +\end{equation} +and the value is considered as not safe when one or more value as not safe +\begin{equation} + \mathrm{unsafe}_\text{ADC(2.5)} = \mathrm{unsafe}_\text{ADC(2)} \lor \mathrm{unsafe}_\text{ADC(3)} +\end{equation} +} %%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Concluding remarks} \label{sec:ccl}