Merge pull request 'Website' (#2) from mveril/QUESTDB:website into master

Reviewed-on: loos/QUESTDB#2
This commit is contained in:
Pierre-Francois Loos 2020-11-02 17:42:19 +01:00
commit 3d700aa8cf

View File

@ -37,6 +37,8 @@
\newcommand{\tabc}[1]{\multicolumn{1}{c}{#1}} \newcommand{\tabc}[1]{\multicolumn{1}{c}{#1}}
\newcommand{\QP}{\textsc{quantum package}} \newcommand{\QP}{\textsc{quantum package}}
\newcommand{\SupInf}{supporting information} \newcommand{\SupInf}{supporting information}
%Vector
\renewcommand{\vec}[1]{\bm{#1}}
% Update article type if known % Update article type if known
\papertype{Review Article} \papertype{Review Article}
@ -189,7 +191,7 @@ These basis sets have been downloaded from the \href{https://www.basissetexchang
%================================== %==================================
\subsection{Computational methods} \subsection{Computational methods}
%================================== %==================================
\label{sec:methods}
%------------------------------------------------ %------------------------------------------------
\subsubsection{Reference computational methods} \subsubsection{Reference computational methods}
%------------------------------------------------ %------------------------------------------------
@ -660,10 +662,95 @@ MAE & & 0.22 & 0.16 & 0.22 & 0.11 & 0.12 & 0.05 & 0.04 & 0.02 & 0.20 & 0.22
\label{sec:website} \label{sec:website}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\alert{Here comes the description of Mika's website.} {
Here we describe the feature of the website that we have specifically designed to gather the entire data generated during these last few years. \newcommand{\meth}{\text{meth}}
Thanks to this website, one can easily test and compare the accuracy of a given method with respect to various variables such as the molecule size or its family, the nature of the excited states, the size of the basis set, etc. \newcommand{\err}{\mathcal{E}}
\newcommand{\nEx}{X}
\newcommand{\nExnn}{\mathcal{X}}
%=======================
\subsection{Introduction}
\label{sec:websiteIntro}
%=======================
The previous QUEST publications \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d} expose vertical excitation data, some statistics were provided considering the most relevant parameters.
But depending to the specific interest of quantum chemist this parameter selection can be irrelevant for his study.
Furthermore to determine the accuracy of a new method, it must be compared with reference data, such as those of the QUEST project.
For this we have to calculate the same type of statistics for the new method. The QUESTDB website was created exactly to solve these issues.
%=======================
\subsection{Specification}
%=======================
The website specification are the following
\begin{itemize}
\item Display the QUEST excitations energy value as table
\item Allow to import local files from user's computer
\item Allow to filter data with various parameters
\item Calculate statistics from these parameters
\item Display a box plot graph to easily show the methods accuracy
\end{itemize}
This solve the issues described at \ref{sec:websiteIntro}
%=======================
\subsection{Project}
%=======================
The project containing two parts
%------------------------------------------------
\subsubsection{Website}
%------------------------------------------------
This is the main part of the project. All the calculation are made locally on the dataset page.
Firstly the website proposes to the user to import new data \ref{sec:tools}.
these data are added to the current session (and removed after lost the page).
There are four multi selection list. Each list depends on the previous ones.
These lists allow to select information about the selected sets \ref{fig:scheme}.
Molecules \ref{fig:molecules} methods and basis \ref{sec:methods}.
After there are many filters to choose the properties of included excitations.
We provide also the ability to filter by molecule size or the active character percentage.
After that we need to define a reference method to compare with (TBE by default).
We also provide a flag to take off all the value declared not safe. We declared value as unsafe when the value have too big
uncertainty.
\paragraph{Statistics calculations}
We want to calculate the accuracy of each couple method/basis compared to the reference (usually TBEs).
For each method we define a vector containing all the energies of the user selected vertical transitions.
With $\meth$ a couple method/basis and $E^x_\meth$ the energy of the vertical excitation $\nEx$ for the method $\meth$
and $\err_\meth$ the error vector of the method $\meth$ compared to the reference $\text{ref}$
\begin{equation}
\vec{E_\meth} = \qty{E^1_\meth, \ldots , E^\nEx_\meth}
\end{equation}
\begin{equation}
\err^x_\meth = E^x_\text{ref} - E^x_\meth
\end{equation}
When the vertical excitation $x$ is defined for the method $\meth$ and the method $\text{ref}$.
So with $\nExnn$ the size of the vector $\vec{\err^x_\meth}$
\begin{gather}
MSE_\meth = \overline{{\vec{\err_\meth}}} = \frac{1}{\nExnn}\sum_{x=1}^\nExnn\err_\meth^x \\
MAE_\meth = \overline{\abs{\vec{\err_\meth}}} \\
RMSE_\meth = \sqrt{\overline{\vec{\err_\meth}^2}} \\
SDE_\meth = \sqrt{\frac{1}{\nExnn}\sum_{x=1}^\nExnn\err_x^2-MAE^2}
\end{gather}
These statistics allow user to determine the accuracy of each couple methods/basis.
On the website the statistics are forwarded in a table and in a box plot graph.
%------------------------------------------------
\subsubsection{Data generation tools}
\label{sec:tools}
%------------------------------------------------
There are currently two main tools to generate data \texttt{datafileBuilder} and \texttt{ADC25generator}
\paragraph{datafileBuilder}
The \texttt{datafileBuilder} tool is used to build datafile from {\LaTeX} \texttt{tabular}.
The \texttt{tabular} is associated to some options and {\LaTeX} \texttt{\textbackslash newcommand} parsed by the main script and the \texttt{tabular} environment is converted to a \texttt{NumPy} 2d array.
So the options, the {\LaTeX} \texttt{\textbackslash newcommand} to apply and the 2d array that represents the tabular environment are passed to the appropriate table parser module chosen using the \texttt{\textbackslash formatName} option in the input file.
Each module is responsible to parse the \texttt{tabular} and return all the corresponding dataFiles as object.
After, the main script output these objects to the corresponding files. Theses files can be used in the website
By importing it temporarily or to make a pull request for the new data.
The modular aspect of this tool gives us enough flexibility to easily convert many types of {\LaTeX} \texttt{tabular} to a standardized file format.
\paragraph*{ADC25generator}
The \texttt{ADC25generator} tool merge ADC(2) and ADC(3) metadata and calculate the ADC(2.5) energy from ADC(2) and ADC(3) datafile as
\begin{equation}
E_\text{ADC(2.5)} = \frac{E_\text{ADC(2)}+E_\text{ADC(3)}}{2}
\end{equation}
and the value is considered as not safe when one or more value as not safe
\begin{equation}
\mathrm{unsafe}_\text{ADC(2.5)} = \mathrm{unsafe}_\text{ADC(2)} \lor \mathrm{unsafe}_\text{ADC(3)}
\end{equation}
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Concluding remarks} \section{Concluding remarks}
\label{sec:ccl} \label{sec:ccl}