10
0
mirror of https://github.com/LCPQ/QUESTDB_website.git synced 2025-01-12 05:58:23 +01:00
QUESTDB_website/docs/datafileBuilder.md

261 lines
8.7 KiB
Markdown
Raw Normal View History

# DatafileBuilder
DatafileBuilder.py is a script to read a $\mathrm{\LaTeX}$ `tabular` environment to data file for the website.
## Requirement
To run the script you must have this two elements.
- [Python](https://www.python.org/)≥3
- [TexSoup](https://github.com/alvinwan/TexSoup)
## Command line usage
```
usage: datafileBuilder.py [-h] [--file FILE] [--defaultType {ABS,FLUO}]
[--format {LINE,COLUMN,TBE}] [--debug]
optional arguments:
-h, --help show this help message and exit
--file FILE
--defaultType {ABS,FLUO}
--format {LINE,COLUMN,TBE}
--debug Debug mode
```
The default type is `ABS` (for absorbtion).
The default format is `LINE ` described [below](#the-line-format)
## Disclaimer
There is **absolutly no guarantee** of success.
If the program crach of if the result is not correct please:
- Check if the input file respect the selected [format](#formats)
- **Simplify** the $\mathrm{\LaTeX}$ code of the input file as much as possible
## Input
### Input skeleton
```latex
% \newcommand area
\newcommand{}{}{}
% ther cusom commands definition with or without arguments
\newcommand{}{}
\begin{tabular}
% Tabular in one of the format supported by the script
\end{tabular}
```
### Example of input
```latex
\newcommand{\TDDFT}{TD-DFT}
\newcommand{\CASSCF}{CASSCF}
\newcommand{\CASPT}{CASPT2}
\newcommand{\ADC}[1]{ADC(#1)}
\newcommand{\CC}[1]{CC#1}
\newcommand{\CCSD}{CCSD}
\newcommand{\EOMCCSD}{EOM-CCSD}
\newcommand{\CCSDT}{CCSDT}
\newcommand{\CCSDTQ}{CCSDTQ}
\newcommand{\CCSDTQP}{CCSDTQP}
\newcommand{\CI}{CI}
\newcommand{\sCI}{sCI}
\newcommand{\exCI}{exFCI}
\newcommand{\FCI}{FCI}
\newcommand{\AVDZ}{aug-cc-pVDZ}
\newcommand{\AVTZ}{aug-cc-pVTZ}
\newcommand{\DAVTZ}{d-aug-cc-pVTZ}
\newcommand{\AVQZ}{aug-cc-pVQZ}
\newcommand{\DAVQZ}{d-aug-cc-pVQZ}
\newcommand{\TAVQZ}{t-aug-cc-pVQZ}
\newcommand{\AVPZ}{aug-cc-pV5Z}
\newcommand{\DAVPZ}{d-aug-cc-pV5Z}
\newcommand{\PopleDZ}{6-31+G(d)}
\newcommand{\pis}{\pi^\star}
\newcommand{\Ryd}{\mathrm{R}}
\begin{tabular}{l|p{.6cm}p{1.1cm}p{1.4cm}p{1.7cm}p{.9cm}|p{.6cm}p{1.1cm}p{1.4cm}p{.9cm}|p{.6cm}p{1.1cm}p{.9cm}|p{.7cm}p{.7cm}p{.7cm}}
\multicolumn{16}{c}{Water}\\
& \multicolumn{5}{c}{\AVDZ} & \multicolumn{4}{c}{\AVTZ}& \multicolumn{3}{c}{\AVQZ} & \multicolumn{3}{c}{Litt.}\\
State & {\CC{3}} & {\CCSDT} & {\CCSDTQ} & {\CCSDTQP} & {\exCI} & {\CC{3}} & {\CCSDT} & {\CCSDTQ} & {\exCI}& {\CC{3}} & {\CCSDT} & {\exCI} & Exp.$^a$ & Th.$^b$ & Th.$^c$\\
$^1B_1 (n \rightarrow 3s)$ &7.51&7.50&7.53&7.53&7.53 &7.60&7.59&7.62&7.62 &7.65 &7.64 &7.68 &7.41 &7.81&7.57\\
$^1A_2 (n \rightarrow 3p)$ &9.29&9.28&9.31&9.32&9.32 &9.38&9.37&9.40&9.41 &9.43 &9.41 &9.46 &9.20 &9.30&9.33\\
$^1A_1 (n \rightarrow 3s)$ &9.92&9.90&9.94&9.94&9.94 &9.97&9.95&9.98&9.99 &10.00 &9.98 &10.02 &9.67 &9.91&9.91\\
$^3B_1 (n \rightarrow 3s)$ &7.13&7.11&7.14&7.14&7.14 &7.23&7.22&7.24&7.25 &7.28 &7.26 &7.30 &7.20 &7.42&7.21\\
$^3A_2 (n \rightarrow 3p)$ &9.12&9.11&9.14&9.14&9.14 &9.22&9.20&9.23&9.24 &9.26 &9.25 &9.28 &8.90 &9.42&9.19\\
$^3A_1 (n \rightarrow 3s)$ &9.47&9.45&9.48&9.49&9.49 &9.52&9.50&9.53&9.54 &9.56 &9.54 &9.58 &9.46 &9.78&9.50\\
\end{tabular}
```
All '\newcommand' are applied to the cell of the tabular and the tabular is parsed to extract data.
### General rules
The general rules to extract data correctly are:
- A `$` must not follow another `$` put space between them.
- The column number must be the same on each row of the `tabular`
- Please respect the format of each tabular.
- Use standard $\mathrm{\LaTeX}$ for the `\multicolumn` command and not a wrapper.
- In general use standard $\mathrm{\LaTeX}$ instead of dirty form for example.
```latex
$A''$ % Bad
$A^"$ % Bad
$A^{\prime\prime}$ %Good
```
- D'ont put comment at the end of `tabular ` row (this cause a TexSoup bug).
- Only `tabular` environment is supported please convert `longtable` and other table format to `tabular .
- Only `\newcommand` are supported please convert `\def` and `\NewDocumentCommand`.
- After executing all commands the basis and methods name must be $\mathrm{\LaTeX}$ free (only plan text).
### Unsafe values
Unsafe value (value that must not included in the statistics table and graph) must be in emphasis or with $\sim$ symbol like
> *42*
> $\sim 42$
```latex
\emph{42} % unsafe=true
$\sim$ 42 % unsafe=true
42 % unsafe=false
```
that set the unsafe boolean value to `true ` in the output data file
#### Formats
##### Generality
###### Transition format
```latex
$^m s[\mathrm{F}](T)$
```
Where `m` is the multiplicity `s` is the symetry and `\mathrm{F}` if it is present specifies that the vertical transition is fluorescence
T is transition type and must be in the format
```latex
initial \rightarrow final
```
All the $\mathrm{\LaTeX}$ code in this format must be standard latex except of the command define on the `\newcommand` section
##### The line format
```latex
\begin{tabular}
& \multicolumn{n}{c}{Molecule} \\
& basis#1 & basis#2 & basis#n \\ % You can also use the LaTeX standard \multiculumn command
State & method#1 & method#2 method#n \\ % You can also use the LaTeX standard \multiculumn command
$Transition#1$ & value11&value#12 & ... value#1n\\
$Transition#2$ & value21&value#22 & ... value#2n\\
% All the other transition
$Transition#m$ & value#m1&value#m2 & ... value#mn\\
\end{tabular}
```
##### The column format
```latex
\begin{tabular}
& & basis#1 & basis#2 & basis#n \\ % You can also use the LaTeX standard \multiculumn command
Molecule &State & method#1 & method#2 method#n \\ % You can also use the LaTeX standard \multiculumn command
molecule#1 &$Transition#11$ &value#111&value#112 ... &value#11n \\
&$Transition#12$ &value#121&value#122 &value#12n \\
% Other transition on the molecule#1
&$Transition#1m$ &value#1m1&value#1m1 &value#1nm \\
% Other molecules
molecule#k &$Transition#k1$ &value#k11&value#k12 ... &value#k1n \\
% Other transition on the molecule#k
&$Transition#km$ &value#km1&value#km2 &value#kmn \\
\end{tabular}
```
This format is very powerfull because it can be used with multiple molecules.
##### The TBE format
The `TBE` format is a variant of the `COLUMN` format but made for theoretical best estimate tabular.
> Warning:
>
> The basis is not extract from the TBE format
```latex
\begin{tabular}
& & & & TBE(FC)& \multicolumn{3}{c}{Corrected TBE} \\
& State & $f$ & \%$T_1$ & basis & Method & Corr. & Value \\
molecule#1 &$transition#11$ & fvalue#11 &\%T_1value#11& fceval#11 & not used value & not used value & eval#11 \\
&$transition#12$ & fvalue#12 &\%T_1value#12& fceval#12 & not used value & not used value & eval#12 \\
% Other transition on the same molecule
&$transition#1n$ & fvalue#12 &\%T_1value#12& fceval#1n & not used value & not used value & eval#12 \\
molecule#m &$transition#m1$ & fvalue#m1 &\%T_1value#m1& fceval#m1 & not used value & not used value & eval#k1 \\
&$transition#m2$ & fvalue#m2 &\%T_1value#m2& fceval#m2 & not used value & not used value & eval#m2 \\
% Other transition on the same molecule
&$transition#mn$ & fvalue#mn &\%T_1value#mn& fceval#mn & not used value & not used value & eval#mn \\
\end{tabular}
```
## Output
### Directory strucure
```
data
├── abs
│   ├── molecule#1_method#1_basis#1.dat
│ ├── ...
│   ├── molecule#n_basis#m_method#k.dat
│   └── molecule#n_basis#m_method#k.dat
└── fluo
   ├── molecule#1_method#1_basis#1.dat
├── ...
   ├── molecule#n_basis#m_method#k.dat
   └── molecule#n_basis#m_method#k.dat
```
When the debug flag is used instead of `data/` the root of output directory is `data/test/`
### Output file
```
# Molecule : moleculename
# Comment :
# code : codename,[version]
# method : method,[basis]
# geom : method,[basis]
# DOI : DOI,[isSupporting]
# Initial state Final state Transition Energies (eV) %T1 Oscilator forces unsafe
####################### ####################### ######################################## ############# ####### ################### ##############
# Number Spin Symm Number Spin Symm type E_abs %T1 f is unsafe
n s symm n s symm (excitationType) value %T1val forceval isUnsafe
```
When each value are number spin value are integer symmetry and excitation type are standard LaTeX
isSupporting and isUnsafe are boolean corrresponded to `JavaScript` boolean values `true` or `false`