Complete the information on website with details on the statisticals calculations

This commit is contained in:
Mickaël Véril 2020-10-22 13:24:20 +02:00
parent 308be26c44
commit a76ace515b

View File

@ -30,6 +30,8 @@
\newcommand{\tabc}[1]{\multicolumn{1}{c}{#1}} \newcommand{\tabc}[1]{\multicolumn{1}{c}{#1}}
\newcommand{\QP}{\textsc{quantum package}} \newcommand{\QP}{\textsc{quantum package}}
\newcommand{\SupInf}{supporting information} \newcommand{\SupInf}{supporting information}
%Vector
\renewcommand{\vec}[1]{\bm{#1}}
% Update article type if known % Update article type if known
\papertype{Review Article} \papertype{Review Article}
@ -181,7 +183,7 @@ Doubly- and triply-augmented basis sets are usually employed for Rydberg states
%================================== %==================================
\subsection{Computational methods} \subsection{Computational methods}
%================================== %==================================
\label{sec:methods}
%------------------------------------------------ %------------------------------------------------
\subsubsection{Reference computational methods} \subsubsection{Reference computational methods}
%------------------------------------------------ %------------------------------------------------
@ -594,7 +596,7 @@ Thanks to this website, one can easily test and compare the accuracy of a given
\label{sec:websiteIntro} \label{sec:websiteIntro}
%======================= %=======================
The previous QUEST publications \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d} expose vertical excitation data, some statistics were provided considering the most relevant parameters. The previous QUEST publications \cite{Loos_2018a,Loos_2019,Loos_2020b,Loos_2020c,Loos_2020d} expose vertical excitation data, some statistics were provided considering the most relevant parameters.
But depending to the specific interest of quantum chemist this parameter selection can be irrelevent for his study. But depending to the specific interest of quantum chemist this parameter selection can be irrelevant for his study.
Furthermore to determine the accuracy of a new method it must be compared with reference data such as those of the QUEST project. Furthermore to determine the accuracy of a new method it must be compared with reference data such as those of the QUEST project.
For this we have to calculate the same type of statistics for the new method. The QUESTDB website was created exactly for that. For this we have to calculate the same type of statistics for the new method. The QUESTDB website was created exactly for that.
%======================= %=======================
@ -603,11 +605,11 @@ For this we have to calculate the same type of statistics for the new method. Th
Consider Consider
The website specification are the following The website specification are the following
\begin{itemize} \begin{itemize}
\item Display the QUEST excitations energie value as table \item Display the QUEST excitations energy value as table
\item Allow to import local file from the local coputer \item Allow to import local file from the local computer
\item Allow to filter data with various parameters \item Allow to filter data with various parameters
\item Calculate statistics from this parameters \item Calculate statistics from this parameters
\item Display a box plot graph to easly show the methods accuracy \item Display a box plot graph to easely show the methods accuracy
\end{itemize} \end{itemize}
this solve the issues described at \ref{sec:websiteIntro} this solve the issues described at \ref{sec:websiteIntro}
@ -617,12 +619,42 @@ The project containing two parts
%------------------------------------------------ %------------------------------------------------
\subsubsection{Website} \subsubsection{Website}
%------------------------------------------------ %------------------------------------------------
The website is the main part of the project. All the calculation are made locally on the dataset page .
With the website we can find all the data from the QUEST project and make various statistics calculation. Firstly the website proposes to the user to import new data \ref{sec:tools}.
these data are added to the current session (and removed after lost the page).
There are for multi selection dropdown list. Each dropdown depend to the previous.
These lists allow to select information about the selected sets \ref{fig:scheme}.
Molecules \ref{fig:molecules} methods and basis \ref{sec:methods}.
After there are many filters to choose the properties of included excitations.
We provide also the ability to filter by molecule size.
After that we need to define a reference method to compare with (TBE by default).
We also provide a flag to take off all the value declared not safe. We declared value as unsafe when the value have too big
uncertainty.
\paragraph{Statistics calculations}
We want to calculate the accuracy of each couple method/basis compared to the reference (usually TBEs).
for each method we define a vector containing all the energies of the user selected vertical transitions.
With $\text{meth}$ a couple method/basis and $E^x_\text{meth}$ the energy of the vertical excitation $x$ for the method $\text{meth}$.
And $\mathcal{E}_\text{meth}$ the error vector of the method $\text{meth}$ compared to the reference $\text{ref}$
\begin{equation}
\vec{E_\text{meth}} = \qty{E^1_\text{meth}, \ldots , E^X_\text{meth}}
\end{equation}
\begin{equation}
\mathcal{E}^x_\text{meth} = E^x_\text{ref} - E^x_\text{meth}
\end{equation}
When the vertical excitation $x$ is defined for the method $\text{meth}$ and the method $\text{ref}$.
So with $X$ the size of the vector $\vec{\mathcal{E}^x_\text{meth}}$
\begin{gather}
MSE_\text{meth} = \overline{{\vec{\mathcal{E}_\text{meth}}}} \\
MAE_\text{meth} = \overline{\abs{\vec{\mathcal{E}_\text{meth}}}} \\
RMSE_\text{meth} = \sqrt{\overline{\vec{\mathcal{E}_\text{meth}}^2}} \\
SDE_\text{meth} = \sqrt{\frac{1}{X}\sum_{x=1}^X\mathcal{E}_x^2-MAE^2}
\end{gather}
These statistics allow user to determine the accuracy of each couple methods/basis.
On the website the statistics are forwarded in a tabular and in a box plot graph.
%------------------------------------------------ %------------------------------------------------
\subsubsection{Data generation tools} \subsubsection{Data generation tools}
\label{sec:tools}
%------------------------------------------------ %------------------------------------------------
There are currently two main tool to generate data \texttt{datafileBuilder} and \texttt{ADC25generator} There are currently two main tool to generate data \texttt{datafileBuilder} and \texttt{ADC25generator}
\paragraph{datafileBuilder} \paragraph{datafileBuilder}