scemama/pres_intel

Fork 0

Anthony Scemama c57746aabf final version

2021-07-01 16:04:42 +02:00

14 KiB

Raw Blame History

TREX : an innovative view of HPC usage applied to Quantum Monte Carlo simulations

#+LaTeX_CLASS_OPTIONS:[aspectratio=169]

Quantum chemistry

Describing matter with quantum mechanics (Schrödinger's equation)
Users: theoretical chemists and physicists

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/Water.png

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/casula.png

- Health	Drug design
- Electronics	Nano- and micro-electronics
- Materials	Carbon nanotubes, graphene, …
- Catalysis	Enzymatic reactions, petroleum

The TREX CoE

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/TREX2.png

CHAMP
QMC=Chem
TurboRVB
NECI
Quantum Package
GammCor

TREX: Targeting REal chemical accuracy at the EXascale

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/Curve.png

How: Instead of re-writing codes, provide libraries

A library for exchanging information between codes (TREXIO) $\Longrightarrow$ Enables HTC
A library for high-performance (QMCkl) $\Longrightarrow$ Enables HPC

Highly accurate
Massively parallelisable (multiple QMC trajectories)
CPU intensive

I/O library (TREXIO)

digraph G {
QP [label="Quantum Package"];
QMCCHEM [label="QMC=Chem"];
Turbo   [label="TurboRVB"];
QP -> NECI;
NECI -> GammCor [style="dotted"];
NECI -> QMCCHEM [style="dotted"] ;
QP -> QMCCHEM;
QP -> CHAMP;
QP -> GammCor [style="dotted"];
QP -> Turbo [style="dotted"];
NECI -> Turbo [style="dotted"];
NECI -> CHAMP [style="dotted"];
QMCCHEM -> GammCor [style="dotted"];
CHAMP -> GammCor [style="dotted"];
Turbo -> GammCor [style="dotted"];
}

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/interfaces.png

digraph G {
layout=circo;
External [label="External codes"];
QP [label="Quantum Package"];
QMCCHEM [label="QMC=Chem"];
Turbo   [label="TurboRVB"];
TREX [label="TREXIO File", shape="box"];
CHAMP -> TREX;
GammCor -> TREX;
NECI -> TREX;
QMCCHEM -> TREX;
QP -> TREX;
Turbo -> TREX;
External -> TREX;

TREX -> CHAMP;
TREX -> GammCor;
TREX -> NECI;
TREX -> QMCCHEM;
TREX -> QP;
TREX -> Turbo;
TREX -> External;
}

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/interfaces2.png

(BSD license)
https://github.com/trex-coe/trexio

I/O library (TREXIO)

Definition of an API for to read/write wave functions
C-compatible API: Easy bindings in other languages

File is self-contained: no external knowledge needed to compute $\Psi(r_1,\dots,r_n)$ (normalization factors, basis et parameters, etc)
Strong conventions (atomic units, ordering of cartesian orbitals, etc)

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/api.png

HDF5: Efficient I/O
Text: debugging, fallback when HDF5 can't be installed

Source code generated from a config file.

Quantum Monte Carlo (QMC)

\alert{Problem}: Stochastic resolution of the Schr\"odinger equation for $N$ electrons
\begin{eqnarray}
E &= &\frac{\int \dcoord \Phi(\coord) {\cal H} \Phi(\coord)}
                         {\int \dcoord \Phi(\coord) \Phi(\coord)} \nonumber \\
                  &\sim & \sum \frac{ {\cal H}\Psi(\coord )}{\Psi(\coord )}
                    \text{, sampled with } (\Psi \times \Phi)
\nonumber
\end{eqnarray}
\begin{columns}
\begin{column}{.5\textwidth}
\begin{itemize}
\item[$\cal H $: ] Hamiltonian operator
\item[$E$: ] Energy
\end{itemize}
\end{column}
\begin{column}{.4\textwidth}
\begin{itemize}
\item[$\coord $: ] Electron coordinates
\item[$\Phi $: ] Almost exact wave function
\item[$\Psi $: ] Trial wave function
\end{itemize}
\end{column}
\end{columns}

Quantum Monte Carlo (QMC)

Very low memory requirements (no integrals)
Distribute walkers on different cores or compute nodes
No blocking communication: near-ideal scaling
Difficulty: parallelize within a QMC trajectory

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/Qmc.png

QMC kernel library (QMCkl)

Computational kernels

QMCkl will contain the main kernels of QMC methods
Written together by QMC experts and HPC experts
Multiple high performance implementations of the kernels, tuned for different
- architectures
- problem sizes
- requested accuracy (reduced precision)

QMC kernel library (QMCkl)

Two implementations

Documentation : easy to read and understand, not necessarily efficient
High performance : efficient, but not necessarily readable by physicists/chemists
Both Documentation and High performance have the same API.

Advantages

The code can stay easy to understand by the physicists/chemists Performance-related aspects are delegated to the library
Scientists can use their preferred language
Scientists don't lose control on their codes
Codes don't die when the architecture changes
Scientific code development does not break the performance
Better re-use of the optimization effort among the community

HPC library

Same API as the documentation library
Optimization is guided by analysis with MAQAO\footnote{https://maqao.org}.
Propose performance-critical choices in the API design (data structures, memory management, etc)
Both CPU and GPU versions of the kernels
Task parallelism with StarPU\footnote{C. Augonnet et al, doi:10.1002/cpe.1631} to schedule kernels on CPU and GPU and handle asynchronous CPU-GPU transfers

Efficiently guiding the developer

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/maqao1.png

Extensive/automatic testing of different configurations

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/maqao2.png

First application : 3-body Jastrow factor

\[ \Jeen (\br,\bR) = \sum_{\alpha=1}^{\Nat} \sum_{i=1}^{\Nel} \sum_{j=1}^{i-1} \sum_{p=2}^{\Nord} \sum_{k=0}^{p-1} \sum_{l=0}^{\lmax} c_{lkp\alpha} \left( {r}_{ij} \right)^k \left[ \left( {R}_{i\alpha} \right)^l + \left( {R}_{j\alpha} \right)^l \right] \left( {R}_{i\,\alpha} \, {R}_{j\alpha} \right)^{(p-k-l)/2} \]

/scemama/pres_intel/src/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/speedup.pdf

Gradient and Laplacian are also required
Up to $20\times$ faster than in the original code
$\sim 80\%$ of the AVX-512 peak is reached
Expressed with a DGEMM kernel $\Longrightarrow$ also efficient on GPU

Verificarlo CI

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/img/cmp-runs.png

Track precision of kernels over commits
Shows significant digits $s$, standard deviation $\sigma$, variable distribution

/scemama/pres_intel/media/commit/c57746aabfe2965a8af342ea136ebf00b9b3ac6b/img/inspect-runs.png

Focus in depth on one particular run
Compare multiple implementations of the same kernel

Useful links

TREX web site	https://trex-coe.eu
TREXIO	https://github.com/trex-coe/trexio
QMCkl	https://github.com/trex-coe/qmckl
QMCkl documentation	https://trex-coe.github.io/qmckl
MAQAO	http://www.maqao.org
Verificarlo	https://github.com/verificarlo/verificarlo

14 KiB Raw Blame History

TREX : an innovative view of HPC usage applied to Quantum Monte Carlo simulations

Quantum chemistry

The TREX CoE

TREX: Targeting REal chemical accuracy at the EXascale

I/O library (TREXIO)

I/O library (TREXIO)

Quantum Monte Carlo (QMC)

Quantum Monte Carlo (QMC)

QMC kernel library (QMCkl)

Computational kernels

QMC kernel library (QMCkl)

Two implementations

Advantages

HPC library

Efficiently guiding the developer

Extensive/automatic testing of different configurations

First application : 3-body Jastrow factor

Verificarlo CI

Useful links

14 KiB

Raw Blame History