14 KiB
14 KiB
TREX : an innovative view of HPC usage applied to Quantum Monte Carlo simulations
#+LaTeX_CLASS_OPTIONS:[aspectratio=169]
Quantum chemistry
- Describing matter with quantum mechanics (Schrödinger's equation)
- Users: theoretical chemists and physicists
- Health | Drug design |
- Electronics | Nano- and micro-electronics |
- Materials | Carbon nanotubes, graphene, … |
- Catalysis | Enzymatic reactions, petroleum |
The TREX CoE
- CHAMP
- QMC=Chem
- TurboRVB
- NECI
- Quantum Package
- GammCor
TREX: Targeting REal chemical accuracy at the EXascale
How: Instead of re-writing codes, provide libraries
- A library for exchanging information between codes (TREXIO) $\Longrightarrow$ Enables HTC
- A library for high-performance (QMCkl) $\Longrightarrow$ Enables HPC
- Highly accurate
- Massively parallelisable (multiple QMC trajectories)
- CPU intensive
I/O library (TREXIO)
digraph G {
QP [label="Quantum Package"];
QMCCHEM [label="QMC=Chem"];
Turbo [label="TurboRVB"];
QP -> NECI;
NECI -> GammCor [style="dotted"];
NECI -> QMCCHEM [style="dotted"] ;
QP -> QMCCHEM;
QP -> CHAMP;
QP -> GammCor [style="dotted"];
QP -> Turbo [style="dotted"];
NECI -> Turbo [style="dotted"];
NECI -> CHAMP [style="dotted"];
QMCCHEM -> GammCor [style="dotted"];
CHAMP -> GammCor [style="dotted"];
Turbo -> GammCor [style="dotted"];
}
digraph G {
layout=circo;
External [label="External codes"];
QP [label="Quantum Package"];
QMCCHEM [label="QMC=Chem"];
Turbo [label="TurboRVB"];
TREX [label="TREXIO File", shape="box"];
CHAMP -> TREX;
GammCor -> TREX;
NECI -> TREX;
QMCCHEM -> TREX;
QP -> TREX;
Turbo -> TREX;
External -> TREX;
TREX -> CHAMP;
TREX -> GammCor;
TREX -> NECI;
TREX -> QMCCHEM;
TREX -> QP;
TREX -> Turbo;
TREX -> External;
}
(BSD license)
https://github.com/trex-coe/trexio
I/O library (TREXIO)
- Definition of an API for to read/write wave functions
- C-compatible API: Easy bindings in other languages
- File is self-contained: no external knowledge needed to compute $\Psi(r_1,\dots,r_n)$ (normalization factors, basis et parameters, etc)
- Strong conventions (atomic units, ordering of cartesian orbitals, etc)
- HDF5: Efficient I/O
- Text: debugging, fallback when HDF5 can't be installed
Source code generated from a config file.
Quantum Monte Carlo (QMC)
\alert{Problem}: Stochastic resolution of the Schr\"odinger equation for $N$ electrons
\begin{eqnarray}
E &= &\frac{\int \dcoord \Phi(\coord) {\cal H} \Phi(\coord)}
{\int \dcoord \Phi(\coord) \Phi(\coord)} \nonumber \\
&\sim & \sum \frac{ {\cal H}\Psi(\coord )}{\Psi(\coord )}
\text{, sampled with } (\Psi \times \Phi)
\nonumber
\end{eqnarray}
\begin{columns}
\begin{column}{.5\textwidth}
\begin{itemize}
\item[$\cal H $: ] Hamiltonian operator
\item[$E$: ] Energy
\end{itemize}
\end{column}
\begin{column}{.4\textwidth}
\begin{itemize}
\item[$\coord $: ] Electron coordinates
\item[$\Phi $: ] Almost exact wave function
\item[$\Psi $: ] Trial wave function
\end{itemize}
\end{column}
\end{columns}
Quantum Monte Carlo (QMC)
- Very low memory requirements (no integrals)
- Distribute walkers on different cores or compute nodes
- No blocking communication: near-ideal scaling
- Difficulty: parallelize within a QMC trajectory
QMC kernel library (QMCkl)
Computational kernels
- QMCkl will contain the main kernels of QMC methods
- Written together by QMC experts and HPC experts
-
Multiple high performance implementations of the kernels, tuned for different
- architectures
- problem sizes
- requested accuracy (reduced precision)
QMC kernel library (QMCkl)
Two implementations
- Documentation : easy to read and understand, not necessarily efficient
- High performance : efficient, but not necessarily readable by physicists/chemists
- Both Documentation and High performance have the same API.
Advantages
- The code can stay easy to understand by the physicists/chemists Performance-related aspects are delegated to the library
- Scientists can use their preferred language
- Scientists don't lose control on their codes
- Codes don't die when the architecture changes
- Scientific code development does not break the performance
- Better re-use of the optimization effort among the community
HPC library
- Same API as the documentation library
- Optimization is guided by analysis with MAQAO\footnote{https://maqao.org}.
- Propose performance-critical choices in the API design (data structures, memory management, etc)
- Both CPU and GPU versions of the kernels
- Task parallelism with StarPU\footnote{C. Augonnet et al, doi:10.1002/cpe.1631} to schedule kernels on CPU and GPU and handle asynchronous CPU-GPU transfers
Efficiently guiding the developer
Extensive/automatic testing of different configurations
First application : 3-body Jastrow factor
\[ \Jeen (\br,\bR) = \sum_{\alpha=1}^{\Nat} \sum_{i=1}^{\Nel} \sum_{j=1}^{i-1} \sum_{p=2}^{\Nord} \sum_{k=0}^{p-1} \sum_{l=0}^{\lmax} c_{lkp\alpha} \left( {r}_{ij} \right)^k \left[ \left( {R}_{i\alpha} \right)^l + \left( {R}_{j\alpha} \right)^l \right] \left( {R}_{i\,\alpha} \, {R}_{j\alpha} \right)^{(p-k-l)/2} \]
/scemama/pres_intel/src/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/speedup.pdf
- Gradient and Laplacian are also required
- Up to $20\times$ faster than in the original code
- $\sim 80\%$ of the AVX-512 peak is reached
- Expressed with a DGEMM kernel $\Longrightarrow$ also efficient on GPU
Verificarlo CI
- Track precision of kernels over commits
- Shows significant digits $s$, standard deviation $\sigma$, variable distribution
- Focus in depth on one particular run
- Compare multiple implementations of the same kernel
Useful links
TREX web site | https://trex-coe.eu |
TREXIO | https://github.com/trex-coe/trexio |
QMCkl | https://github.com/trex-coe/qmckl |
QMCkl documentation | https://trex-coe.github.io/qmckl |
MAQAO | http://www.maqao.org |
Verificarlo | https://github.com/verificarlo/verificarlo |