UP | HOME

TREX Configuration file

Table of Contents

This page contains information about the general structure of the TREXIO library. The source code of the library can be automatically generated based on the contents of the trex.json configuration file, which itself is compiled from different sections (groups) presented below.

For more information about the automatic generation on the source code or regarding possible modifications, please contact the TREXIO developers.

All quantities are saved in TREXIO file in atomic units. The dimensions of the arrays in the tables below are given in column-major order (as in Fortran), and the ordering of the dimensions is reversed in the produced trex.json configuration file as the library is written in C.

TREXIO currently supports int, float and str types for both single attributes and arrays. Note, that some attributes might have dim type (e.g. num of the nucleus group). This type is treated exactly the same as int with the only difference that dim variables cannot be negative. This additional constraint is required because dim attributes are used internally to allocate memory and to check array boundaries in the memory-safe API. Most of the times, the dim variables contain the num suffix.

In Fortran, the arrays are 1-based and in most other languages the arrays are 0-based. Hence, we introduce the index type which is an 1-based int in the Fortran interface and 0-based otherwise.

For sparse data structures such as electron replusion integrals, the data can be too large to fit in memory and the data needs to be fetched using multiple function calls to perform I/O on buffers.

1 Metadata (metadata group)

As we expect our files to be archived in open-data repositories, we need to give the possibility to the users to store some metadata inside the files. We propose to store the list of names of the codes which have participated to the creation of the file, a list of authors of the file, and a textual description.

Variable Type Dimensions (for arrays) Description
code_num dim   Number of codes used to produce the file
code str (metadata.code_num) Names of the codes used
author_num dim   Number of authors of the file
author str (metadata.author_num) Names of the authors of the file
package_version str   TREXIO version used to produce the file
description str   Text describing the content of file

2 Electron (electron group)

We consider wave functions expressed in the spin-free formalism, where the number of ↑ and ↓ electrons is fixed.

Variable Type Dimensions Description
up_num int   Number of ↑-spin electrons
dn_num int   Number of ↓-spin electrons

3 Nucleus (nucleus group)

The nuclei are considered as fixed point charges. Coordinates are given in Cartesian \((x,y,z)\) format.

Variable Type Dimensions Description
num dim   Number of nuclei
charge float (nucleus.num) Charges of the nuclei
coord float (3,nucleus.num) Coordinates of the atoms
label str (nucleus.num) Atom labels
point_group str   Symmetry point group
repulsion float   Nuclear repulsion energy

4 Effective core potentials (ecp group)

An effective core potential (ECP) \(V_A^{\text{ECP}}\) replacing the core electrons of atom \(A\) can be expressed as \[ V_A^{\text{ECP}} = V_{A \ell_{\max}+1} + \sum_{\ell=0}^{\ell_{\max}} \sum_{m=-\ell}^{\ell} | Y_{\ell m} \rangle \left[ V_{A \ell} - V_{A \ell_{\max}+1} \right] \langle Y_{\ell m} | \]

The first term in the equation above is sometimes attributed to the local channel, while the remaining terms correspond to the non-local channel projections.

The functions \(V_{A\ell}\) are parameterized as: \[ V_{A \ell}(\mathbf{r}) = \sum_{q=1}^{N_{q \ell}} \beta_{A q \ell}\, |\mathbf{r}-\mathbf{R}_{A}|^{n_{A q \ell}}\, e^{-\alpha_{A q \ell} |\mathbf{r}-\mathbf{R}_{A}|^2 } \]

See http://dx.doi.org/10.1063/1.4984046 or https://doi.org/10.1063/1.5121006 for more info.

Variable Type Dimensions Description
max_ang_mom_plus_1 int (nucleus.num) \(\ell_{\max}+1\), one higher than the max angular momentum in the removed core orbitals
z_core int (nucleus.num) Number of core electrons to remove per atom
num dim   Total number of ECP functions for all atoms and all values of \(\ell\)
ang_mom int (ecp.num) One-to-one correspondence between ECP items and the angular momentum \(\ell\)
nucleus_index index (ecp.num) One-to-one correspondence between ECP items and the atom index
exponent float (ecp.num) \(\alpha_{A q \ell}\) all ECP exponents
coefficient float (ecp.num) \(\beta_{A q \ell}\) all ECP coefficients
power int (ecp.num) \(n_{A q \ell}\) all ECP powers

There might be some confusion in the meaning of the \(\ell_{\max}\). It can be attributed to the maximum angular momentum occupied in the core orbitals, which are removed by the ECP. On the other hand, it can be attributed to the maximum angular momentum of the ECP that replaces the core electrons. Note, that the latter \(\ell_{\max}\) is always higher by 1 than the former.

Note for developers: avoid having variables with similar prefix in their name. HDF5 back end might cause issues due to the way find_dataset function works. For example, in the ECP group we use max_ang_mom and not ang_mom_max. The latter causes issues when written before ang_mom in the TREXIO file.

4.1 Example

For example, consider H2 molecule with the following effective core potential (in GAMESS input format for the H atom):

H-ccECP GEN 0 1
3
1.00000000000000    1 21.24359508259891 
21.24359508259891   3 21.24359508259891 
-10.85192405303825  2 21.77696655044365 
1
0.00000000000000    2 1.000000000000000

In TREXIO representation this would be:

num = 8

# lmax+1 per atom
max_ang_mom_plus_1 = [ 1, 1 ]

# number of core electrons to remove per atom
zcore = [ 0, 0 ]

# first 4 ECP elements correspond to the first H atom ; the remaining 4 elements are for the second H atom
nucleus_index = [
  0, 0, 0, 0,
  1, 1, 1, 1
  ]

# 3 first ECP elements correspond to potential of the P orbital (l=1), then 1 element for the S orbital (l=0) ; similar for the second H atom 
ang_mom = [
  1, 1, 1, 0,
  1, 1, 1, 0
  ]

# ECP quantities that can be attributed to atoms and/or angular momenta based on the aforementioned ecp_nucleus and ecp_ang_mom arrays
coefficient = [
  1.00000000000000, 21.24359508259891, -10.85192405303825, 0.00000000000000,
  1.00000000000000, 21.24359508259891, -10.85192405303825, 0.00000000000000
  ]

exponent = [ 
  21.24359508259891, 21.24359508259891, 21.77696655044365, 1.000000000000000,
  21.24359508259891, 21.24359508259891, 21.77696655044365, 1.000000000000000
  ]

power = [ 
  -1, 1, 0, 0, 
  -1, 1, 0, 0
  ]

5 Basis set (basis group)

We consider here basis functions centered on nuclei. Hence, we enable the possibility to define dummy atoms to place basis functions in random positions.

The atomic basis set is defined as a list of shells. Each shell \(s\) is centered on a center \(A\), possesses a given angular momentum \(l\) and a radial function \(R_s\). The radial function is a linear combination of \(N_{\text{prim}}\) primitive functions that can be of type Slater (\(p=1\)) or Gaussian (\(p=2\)), parameterized by exponents \(\gamma_{ks}\) and coefficients \(a_{ks}\): \[ R_s(\mathbf{r}) = \mathcal{N}_s \vert\mathbf{r}-\mathbf{R}_A\vert^{n_s} \sum_{k=1}^{N_{\text{prim}}} a_{ks}\, f_{ks}(\gamma_{ks},p)\, \exp \left( - \gamma_{ks} \vert \mathbf{r}-\mathbf{R}_A \vert ^p \right). \]

In the case of Gaussian functions, \(n_s\) is always zero.

Different codes normalize functions at different levels. Computing normalization factors requires the ability to compute overlap integrals, so the normalization factors should be written in the file to ensure that the file is self-contained and does not need the client program to have the ability to compute such integrals.

Some codes assume that the contraction coefficients are for a linear combination of normalized primitives. This implies that a normalization constant for the primitive \(ks\) needs to be computed and stored. If this normalization factor is not required, \(f_{ks}=1\).

Some codes assume that the basis function are normalized. This implies the computation of an extra normalization factor, \(\mathcal{N}_s\). If the the basis function is not considered normalized, \(\mathcal{N}_s=1\).

All the basis set parameters are stored in one-dimensional arrays:

Variable Type Dimensions Description
type str   Type of basis set: "Gaussian" or "Slater"
prim_num dim   Total number of primitives
shell_num dim   Total number of shells
nucleus_index index (basis.shell_num) One-to-one correspondence between shells and atomic indices
shell_ang_mom int (basis.shell_num) One-to-one correspondence between shells and angular momenta
shell_factor float (basis.shell_num) Normalization factor of each shell (\(\mathcal{N}_s\))
shell_index index (basis.prim_num) One-to-one correspondence between primitives and shell index
exponent float (basis.prim_num) Exponents of the primitives (\(\gamma_{ks}\))
coefficient float (basis.prim_num) Coefficients of the primitives (\(a_{ks}\))
prim_factor float (basis.prim_num) Normalization coefficients for the primitives (\(f_{ks}\))

5.1 Example

For example, consider H2 with the following basis set (in GAMESS format), where both the AOs and primitives are considered normalized:

HYDROGEN
S   5
1         3.387000E+01           6.068000E-03
2         5.095000E+00           4.530800E-02
3         1.159000E+00           2.028220E-01
4         3.258000E-01           5.039030E-01
5         1.027000E-01           3.834210E-01
S   1
1         3.258000E-01           1.000000E+00
S   1
1         1.027000E-01           1.000000E+00
P   1
1         1.407000E+00           1.000000E+00
P   1
1         3.880000E-01           1.000000E+00
D   1
1         1.057000E+00           1.000000E+00

In TREXIO representaion we have:

type  = "Gaussian"
prim_num   = 20
shell_num   = 12

# 6 shells per H atom
nucleus_index = 
[ 0, 0, 0, 0, 0, 0,  
  1, 1, 1, 1, 1, 1 ] 

# 3 shells in S (l=0), 2 in P (l=1), 1 in D (l=2)
shell_ang_mom =
[ 0, 0, 0, 1, 1, 2, 
  0, 0, 0, 1, 1, 2 ]

# no need to renormalize shells 
shell_factor = 
[ 1., 1., 1., 1., 1., 1.,  
  1., 1., 1., 1., 1., 1. ] 

# 5 primitives for the first S shell and then 1 primitive per remaining shells in each H atom
shell_index = 
[ 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 
  6, 6, 6, 6, 6, 7, 8, 9, 10, 11 ]

# parameters of the primitives (10 per H atom)
exponent =
[ 33.87, 5.095, 1.159, 0.3258, 0.1027, 0.3258, 0.1027, 1.407, 0.388, 1.057, 
  33.87, 5.095, 1.159, 0.3258, 0.1027, 0.3258, 0.1027, 1.407, 0.388, 1.057 ]

coefficient =
[ 0.006068, 0.045308, 0.202822, 0.503903, 0.383421, 1.0, 1.0, 1.0, 1.0, 1.0, 
  0.006068, 0.045308, 0.202822, 0.503903, 0.383421, 1.0, 1.0, 1.0, 1.0, 1.0 ]

prim_factor =
[ 1.0006253235944540e+01, 2.4169531573445120e+00, 7.9610924849766440e-01
  3.0734305383061117e-01, 1.2929684417481876e-01, 3.0734305383061117e-01,
  1.2929684417481876e-01, 2.1842769845268308e+00, 4.3649547399719840e-01,
  1.8135965626177861e+00, 1.0006253235944540e+01, 2.4169531573445120e+00,
  7.9610924849766440e-01, 3.0734305383061117e-01, 1.2929684417481876e-01,
  3.0734305383061117e-01, 1.2929684417481876e-01, 2.1842769845268308e+00,
  4.3649547399719840e-01, 1.8135965626177861e+00 ]

6 Atomic orbitals (ao group)

Going from the atomic basis set to AOs implies a systematic construction of all the angular functions of each shell. We consider two cases for the angular functions: the real-valued spherical harmonics, and the polynomials in Cartesian coordinates. In the case of spherical harmonics, the AOs are ordered in increasing magnetic quantum number (\(-l \le m \le l\)), and in the case of polynomials we impose the canonical ordering of the Libint2 library, i.e

\begin{eqnarray} p & : & p_x, p_y, p_z \nonumber \\ d & : & d_{xx}, d_{xy}, d_{xz}, d_{yy}, d_{yz}, d_{zz} \nonumber \\ f & : & f_{xxx}, f_{xxy}, f_{xxz}, f_{xyy}, f_{xyz}, f_{xzz}, f_{yyy}, f_{yyz}, f_{yzz}, …f_{zzz} \nonumber \\ {\rm etc.} \nonumber \end{eqnarray}

AOs are defined as

\[ \chi_i (\mathbf{r}) = \mathcal{N}_i\, P_{\eta(i)}(\mathbf{r})\, R_{\theta(i)} (\mathbf{r}) \]

where \(i\) is the atomic orbital index, \(P\) encodes for either the polynomials or the spherical harmonics, \(\theta(i)\) returns the shell on which the AO is expanded, and \(\eta(i)\) denotes which angular function is chosen. \(\mathcal{N}_i\) is a normalization factor that enables the possibility to have different normalization coefficients within a shell, as in the GAMESS convention where \(\mathcal{N}_{x^2} \ne \mathcal{N}_{xy}\) because \[ \left[ \iiint \left(x-X_A \right)^2 R_{\theta(i)} (\mathbf{r}) dx\, dy\, dz \right]^{-1/2} \ne \left[ \iiint \left( x-X_A \right) \left( y-Y_A \right) R_{\theta(i)} (\mathbf{r}) dx\, dy\, dz \right]^{-1/2}. \]

In such a case, one should set the normalization of the shell (in the Basis set section) to \(\mathcal{N}_{z^2}\), which is the normalization factor of the atomic orbitals in spherical coordinates. The normalization factor of the \(xy\) function which should be introduced here should be \(\frac{\mathcal{N}_{xy}}{\mathcal{N}_{z^2}}\).

Variable Type Dimensions Description
cartesian int   1: true, 0: false
num dim   Total number of atomic orbitals
shell index (ao.num) basis set shell for each AO
normalization float (ao.num) Normalization factors

6.1 One-electron integrals (ao_1e_int group)

  • \[ \hat{V}_{\text{ne}} = \sum_{A=1}^{N_\text{nucl}} \sum_{i=1}^{N_\text{elec}} \frac{-Z_A }{\vert \mathbf{R}_A - \mathbf{r}_i \vert} \] : electron-nucleus attractive potential,
  • \[ \hat{T}_{\text{e}} = \sum_{i=1}^{N_\text{elec}} -\frac{1}{2}\hat{\Delta}_i \] : electronic kinetic energy
  • \(\hat{h} = \hat{T}_\text{e} + \hat{V}_\text{ne} + \hat{V}_\text{ecp,l} + \hat{V}_\text{ecp,nl}\) : core electronic Hamiltonian

The one-electron integrals for a one-electron operator \(\hat{O}\) are \[ \langle p \vert \hat{O} \vert q \rangle \], returned as a matrix over atomic orbitals.

Variable Type Dimensions Description
overlap float (ao.num, ao.num) \(\langle p \vert q \rangle\)
kinetic float (ao.num, ao.num) \(\langle p \vert \hat{T}_e \vert q \rangle\)
potential_n_e float (ao.num, ao.num) \(\langle p \vert \hat{V}_{\text{ne}} \vert q \rangle\)
ecp_local float (ao.num, ao.num) \(\langle p \vert \hat{V}_{\text{ecp,l}} \vert q \rangle\)
ecp_non_local float (ao.num, ao.num) \(\langle p \vert \hat{V}_{\text{ecp,nl}} \vert q \rangle\)
core_hamiltonian float (ao.num, ao.num) \(\langle p \vert \hat{h} \vert q \rangle\)

6.2 Two-electron integrals (ao_2e_int group)

The two-electron integrals for a two-electron operator \(\hat{O}\) are \[ \langle p q \vert \hat{O} \vert r s \rangle \] in physicists notation or \[ ( pr \vert \hat{O} \vert qs ) \] in chemists notation, where \(p,q,r,s\) are indices over atomic orbitals.

Functions are provided to get the indices in physicists or chemists notation.

  • \[ \hat{W}_{\text{ee}} = \sum_{i=2}^{N_\text{elec}} \sum_{j=1}^{i-1} \frac{1}{\vert \mathbf{r}_i - \mathbf{r}_j \vert} \] : electron-electron repulsive potential operator.
  • \[ \hat{W}^{lr}_{\text{ee}} = \sum_{i=2}^{N_\text{elec}} \sum_{j=1}^{i-1} \frac{\text{erf}(\vert \mathbf{r}_i - \mathbf{r}_j \vert)}{\vert \mathbf{r}_i - \mathbf{r}_j \vert} \] : electron-electron long range potential
Variable Type Dimensions Description
eri float sparse (ao.num, ao.num, ao.num, ao.num) Electron repulsion integrals
eri_lr float sparse (ao.num, ao.num, ao.num, ao.num) Long-range Electron repulsion integrals

7 Molecular orbitals (mo group)

Variable Type Dimensions Description
type str   Free text to identify the set of MOs (HF, Natural, Local, CASSCF, etc)
num dim   Number of MOs
coefficient float (ao.num, mo.num) MO coefficients
class str (mo.num) Choose among: Core, Inactive, Active, Virtual, Deleted
symmetry str (mo.num) Symmetry in the point group
occupation float (mo.num) Occupation number

7.1 One-electron integrals (mo_1e_int group)

The operators as the same as those defined in the AO one-electron integrals section. Here, the integrals are given in the basis of molecular orbitals.

Variable Type Dimensions Description
overlap float (mo.num, mo.num) \(\langle i \vert j \rangle\)
kinetic float (mo.num, mo.num) \(\langle i \vert \hat{T}_e \vert j \rangle\)
potential_n_e float (mo.num, mo.num) \(\langle i \vert \hat{V}_{\text{ne}} \vert j \rangle\)
ecp_local float (mo.num, mo.num) \(\langle i \vert \hat{V}_{\text{ecp,l}} \vert j \rangle\)
ecp_non_local float (mo.num, mo.num) \(\langle i \vert \hat{V}_{\text{ecp,nl}} \vert j \rangle\)
core_hamiltonian float (mo.num, mo.num) \(\langle i \vert \hat{h} \vert j \rangle\)

7.2 Two-electron integrals (mo_2e_int group)

The operators as the same as those defined in the AO two-electron integrals section. Here, the integrals are given in the basis of molecular orbitals.

Variable Type Dimensions Description
eri float sparse (mo.num, mo.num, mo.num, mo.num) Electron repulsion integrals
eri_lr float sparse (mo.num, mo.num, mo.num, mo.num) Long-range Electron repulsion integrals

8 TODO Slater determinants

9 Reduced density matrices (rdm group)

Variable Type Dimensions Description
1e float (mo.num, mo.num) One body density matrix
1e_up float (mo.num, mo.num) ↑-spin component of the one body density matrix
1e_dn float (mo.num, mo.num) ↓-spin component of the one body density matrix
2e float sparse (mo.num, mo.num, mo.num, mo.num) Two-body reduced density matrix (spin trace)
2e_upup float sparse (mo.num, mo.num, mo.num, mo.num) ↑↑ component of the two-body reduced density matrix
2e_dndn float sparse (mo.num, mo.num, mo.num, mo.num) ↓↓ component of the two-body reduced density matrix
2e_updn float sparse (mo.num, mo.num, mo.num, mo.num) ↑↓ component of the two-body reduced density matrix

Author: TREX-CoE

Created: 2021-12-17 Fri 16:14

Validate