Initial commit

This commit is contained in:
Anthony Scemama 2021-06-28 14:28:23 +02:00
commit f2f59cc736
6 changed files with 564 additions and 0 deletions

35
Abstract Normal file
View File

@ -0,0 +1,35 @@
TREX : an innovative view of HPC usage applied to Quantum Monte Carlo simulations
The TREX[1] European Center of Excellence focuses on high accuracy quantum
mechanical methods, essential in many different fields of application such as
new material design or photochemistry. Among these methods, Quantum Monte Carlo
(QMC) approaches are particularly well adapted to exascale architectures.
Our ambition is to the help the community take advantage of exascale machines
through the use of our HPC software.
We will review in the presentation progress along the three following axes:
* TREXIO[2]: A common I/O library and file format for easily exchanging data between
applications, facilitating high-throughput computing workflows,
* QMCkl[3]: A library of computational kernels specific to QMC applications
written together by QMC and HPC experts, taking advantage of both CPUs and GPUs,
* An integrated workflow including performance analysis (MAQAO[4]) and numerical
accuracy measurements (Verificarlo[5]) to be used for the development of QMC
kernels and more generally for improving the applications. In particular, we
plan to identify the best performance usage of QMCkl and also to adjust the
performance with numerical precision requirements.
------------------------------------
[1] https://trex-coe.eu
[2] https://github.com/trex-coe/trexio
[3] https://trex-coe.github.io/qmckl
[4] https://www.maqao.org
[5] https://github.com/verificarlo
------------------------------------

BIN
Curve.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 241 KiB

BIN
casula.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 211 KiB

BIN
dirac2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

BIN
dirac_4.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

529
scemama.tex Normal file
View File

@ -0,0 +1,529 @@
% Created 2021-06-28 Mon 14:25
% Intended LaTeX compiler: pdflatex
\documentclass[aspectratio=169]{beamer}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{graphicx}
\usepackage{grffile}
\usepackage{longtable}
\usepackage{wrapfig}
\usepackage{rotating}
\usepackage[normalem]{ulem}
\usepackage{amsmath}
\usepackage{textcomp}
\usepackage{amssymb}
\usepackage{capt-of}
\usepackage{hyperref}
\institute{Lab. Chimie et Physique Quantiques, IRSAMC, UPS/CNRS, Toulouse (France)}
\usepackage{minted}
\usemintedstyle{emacs}
\newminted{f90}{fontsize=\footnotesize}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{hyperref}
\usepackage{mathtools}
\usepackage{physics}
\definecolor{darkgreen}{rgb}{0.,0.6,0.}
\definecolor{darkblue}{rgb}{0.,0.2,0.7}
\definecolor{darkred}{rgb}{0.6,0.1,0.1}
\definecolor{darkpink}{rgb}{0.7,0.0,0.7}
\usetheme{trex}
\author{Anthony Scemama}
\date{12/03/2021}
\title{Library development within TREX}
\hypersetup{
pdfauthor={Anthony Scemama},
pdftitle={Library development within TREX},
pdfkeywords={},
pdfsubject={},
pdfcreator={Emacs 26.3 (Org mode 9.4)},
pdflang={English}}
\begin{document}
\maketitle
\begin{frame}[label={sec:org58acecb}]{Quantum chemistry}
\begin{columns}
\begin{column}{0.25\textwidth}
\begin{center}
\includegraphics[width=\textwidth]{./dirac_4.jpg}
\end{center}
\end{column}
\begin{column}{0.75\textwidth}
\begin{center}
\includegraphics[width=\textwidth]{./dirac2.png}
\end{center}
\end{column}
\end{columns}
\end{frame}
\begin{frame}[label={sec:org07dc1a1}]{Quantum chemistry}
\begin{columns}
\begin{column}{0.6\textwidth}
\begin{exampleblock}{}
\begin{itemize}
\item Describing matter with quantum mechanics (Schrödinger's equation)
\item Users: theoretical chemists and physicists
\end{itemize}
\end{exampleblock}
\end{column}
\begin{column}{0.4\textwidth}
\begin{center}
\includegraphics[width=\textwidth]{./Water.png}
\end{center}
\end{column}
\end{columns}
\begin{columns}
\begin{column}{0.4\textwidth}
\begin{center}
\includegraphics[width=\textwidth]{./casula.png}
\end{center}
\end{column}
\begin{column}{0.6\textwidth}
\begin{exampleblock}{Implications for society}
\begin{center}
\begin{tabular}{ll}
- Health & Drug design\\
- Electronics & Nano- and micro-electronics\\
- Materials & Carbon nanotubes, graphene, \dots{}\\
- Catalysis & Enzymatic reactions, petrol\\
\end{tabular}
\end{center}
\end{exampleblock}
\end{column}
\end{columns}
\end{frame}
\begin{frame}[label={sec:orgf844e6b}]{TREX: Targeting REal chemical accuracy at the EXascale}
\begin{columns}
\begin{column}{0.4\textwidth}
\begin{center}
\includegraphics[width=\textwidth]{./Curve.png}
\end{center}
\end{column}
\begin{column}{0.6\textwidth}
\begin{exampleblock}{QMC: Quantum Monte Carlo methods}
\begin{itemize}
\item Massively parallelisable (multiple QMC trajectories)
\item Difficulty: take advantage of parallelism within a trajectory
\end{itemize}
\end{exampleblock}
\begin{exampleblock}{Objective: Make codes ready for exascale}
How: Instead of re-writing codes, provide libraries
\begin{itemize}
\item One library for exchanging information between codes (\alert{TREXIO})
\item One library for high-performance (\alert{QMCkl})
\end{itemize}
\end{exampleblock}
\end{column}
\end{columns}
\end{frame}
\begin{frame}[label={sec:org9bd59c0}]{Presentation of TREX}
\begin{itemize}
\item TREX CoE: Targeting REal chemical accuracy at the eXascale
\item Started in Oct. 2020
\end{itemize}
\end{frame}
\begin{frame}[label={sec:orgfb14f88}]{QMC kernel library (QMCkl)}
\begin{itemize}
\item Codesign: Written together by QMC experts and HPC experts
\item Contains all major kernels of QMC methods
\item Multiple high performance implementations of the kernels, tuned
for different
\begin{itemize}
\item architectures
\item problem sizes
\item requested accuracy (reduced precision)
\end{itemize}
\item Reference implementation : \emph{Documentation} : easy to read and
understand, not necessarily efficient
\item Final implementations : \emph{High performance} : efficient, but not
necessarily readable by physicists/chemists
\end{itemize}
\end{frame}
\begin{frame}[label={sec:org929f042}]{Advantages}
\begin{itemize}
\item The code can stay easy to understand by the physicists/chemists
Performance-related aspects are delegated to the library
\item Changing architecture requires only linking with another
version of the library
\item Scientific code development does not break the performance
\item Better re-use of the optimization effort among the community
\end{itemize}
\end{frame}
\begin{frame}[label={sec:org8ded82f}]{Literate programming}
\begin{quote}
Literate programming is a programming paradigm introduced by Donald
Knuth in which a computer program is given an explanation of its
logic in a natural language, such as English, interspersed with
snippets of macros and traditional source code, from which
compilable source code can be generated. (Wikipedia)
\end{quote}
\end{frame}
\begin{frame}[label={sec:org01ef9da}]{Documentation library}
Literate programming with org-mode:
\begin{itemize}
\item Here, comments are more important than code
\item Can add graphics, \LaTeX formulas, tables, etc
\item Documentation always synchronized with the code
\item Some routines can be generated by embedded scripts
\item Most of the the first report was generated from the documentation
\item Kernels are implemented in Fortran for readability
\item The API is C-compatible: QMCkl appears like a C library
\(\Longrightarrow\) can be used in all other languages
\end{itemize}
\end{frame}
\begin{frame}[label={sec:orge55dfb7}]{Codesign strategy}
\begin{enumerate}
\item Kernel extraction: QMC specialists agree on the
mathematical expression of the problem
\item A mini-application is written to find the best data layout
with HPC experts from real-size examples
\item The kernel is written in the documentation library
\item HPC experts provide an HPC version of the kernel
\item The library is linked in the QMC codes of the CoE
\end{enumerate}
\end{frame}
\begin{frame}[label={sec:orgfdcab81}]{Our first application : 3-body Jastrow factor}
\newcommand{\Jeen}{J_{\text{een}}}
\newcommand{\Nel}{N_{\text{elec}}}
\newcommand{\Nat}{N_{\text{nucl}}}
\newcommand{\Nord}{N_{\text{nord}}}
\newcommand{\lmax}{p-k-2\delta_{k,0}}
\newcommand{\br}{\mathbf{r}}
\newcommand{\bR}{\mathbf{R}}
\newcommand{\ttr}{\, \bar{\mathtt{r}}}
\newcommand{\tR}{\, \bar{\mathtt{R}}}
\newcommand{\tP}{\, \bar{\mathtt{P}}}
\[
\Jeen (\br,\bR) = \sum_{\alpha=1}^{\Nat} \sum_{i=1}^{\Nel} \sum_{j=1}^{i-1}
\sum_{p=2}^{\Nord} \sum_{k=0}^{p-1}
\sum_{l=0}^{\lmax} c_{lkp\alpha}
\left( {r}_{ij} \right)^k
\left[ \left( {R}_{i\alpha} \right)^l + \left( {R}_{j\alpha} \right)^l \right]
\left( {R}_{i\,\alpha} \, {R}_{j\alpha} \right)^{(p-k-l)/2}
\]
can be rewritten as
\[
\Jeen(\br,\bR) =
\sum_{p=2}^{\Nord}\sum_{k=0}^{p-1}
\sum_{l=0}^{\lmax}
\sum_{\alpha=1}^{\Nat}
c_{lkp\alpha}
\sum_{i=1}^{\Nel}
{\tR}_{i,\alpha,(p-k-l)/2}\,
{\tP}_{i,\alpha,k,(p-k+l)/2} \; \text{\scriptsize \alert{($\downarrow$ complexity)}}
\]
with
\[
{\tP}_{i, \alpha, k, l} = \sum_{j=1}^{\Nel}
{\ttr}_{i,j,k}\,{\tR}_{j,\alpha,l}. \; \text{\alert{\scriptsize (GEMM)}}
\]
\end{frame}
\begin{frame}[label={sec:org3e25bfe}]{Our first application : Gradient and Laplacian}
\newcommand{\Jeen}{J_{\text{een}}}
\newcommand{\Nel}{N_{\text{elec}}}
\newcommand{\Nat}{N_{\text{nucl}}}
\newcommand{\Nord}{N_{\text{nord}}}
\newcommand{\lmax}{p-k-2\delta_{k,0}}
\newcommand{\br}{\mathbf{r}}
\newcommand{\bR}{\mathbf{R}}
\newcommand{\ttr}{\, \bar{\mathtt{r}}}
\newcommand{\tR}{\, \bar{\mathtt{R}}}
\newcommand{\tP}{\, \bar{\mathtt{P}}}
\newcommand{\tg}{\, \bar{\mathtt{g}}}
\newcommand{\tG}{\, \bar{\mathtt{G}}}
\newcommand{\tQ}{\, \bar{\mathtt{Q}}}
\begin{eqnarray*}
\nabla_{im} \Jeen(\br,\bR) & = &
\sum_{p=2}^{\Nord}\sum_{k=0}^{p-1}
\sum_{l=0}^{\lmax}
\sum_{\alpha=1}^{\Nat}
c_{lkp\alpha}
\sum_{i=1}^{\Nel}
{\tG}_{i,m,\alpha,(p-k-l)/2} {\tP}_{i,\alpha,k,(p-k+l)/2} + \\
&& {\tG}_{i,m,\alpha,(p-k+l)/2} {\tP}_{i,\alpha,k,(p-k-l)/2} +
{\tR}_{i,\alpha,(p-k-l)/2} {\tQ}_{i,m,\alpha,k,(p-k+l)/2} + \\
&& {\tR}_{i,\alpha,(p-k+l)/2} {\tQ}_{i,m,\alpha,k,(p-k-l)/2} +
\delta_{m,4} \big( \\
&& {\tG}_{i,1,\alpha,(p-k+l)/2} {\tQ}_{i,1,\alpha,k,(p-k-l)/2} +
{\tG}_{i,2,\alpha,(p-k+l)/2} {\tQ}_{i,2,\alpha,k,(p-k-l)/2} + \\
&& {\tG}_{i,3,\alpha,(p-k+l)/2} {\tQ}_{i,3,\alpha,k,(p-k-l)/2} +
{\tG}_{i,1,\alpha,(p-k-l)/2} {\tQ}_{i,1,\alpha,k,(p-k+l)/2} + \\
&& {\tG}_{i,2,\alpha,(p-k-l)/2} {\tQ}_{i,2,\alpha,k,(p-k+l)/2} +
{\tG}_{i,3,\alpha,(p-k-l)/2} {\tQ}_{i,3,\alpha,k,(p-k+l)/2} \big)
\end{eqnarray*}
with
\[
{\tG}_{i, m, \alpha, l} = \frac{\partial \left( {R}_{i\alpha} \right)^l}
{\partial r_i}, \phantom{ space }
{\tg}_{i, m, j, k} = \frac{\partial \left( {r}_{ij} \right)^k}
{\partial r_i}, \phantom{ space }
\text{ and }
{\tQ}_{i, m, \alpha, k, l} = \sum_{j=1}^{\Nel}
{\tg}_{i,m,j,k}\,{\tR}_{j,\alpha,l}
\]
\end{frame}
\begin{frame}[label={sec:org0200bcf}]{Speed up}
\begin{center}
\includegraphics[height=0.8\textheight]{./speedup.pdf}
\end{center}
\(\sim 80\%\) of the AVX-512 peak is reached on a Skylake CPU.
\end{frame}
\begin{frame}[label={sec:org4f9d92a}]{Links}
\begin{itemize}
\item TREX web site : \url{https://trex-coe.eu}
\item QMCkl documentation : \url{https://trex-coe.github.io/qmckl}
\item QMCkl repository : \url{https://github.com/trex-coe/qmckl}
\end{itemize}
\end{frame}
\begin{frame}[label={sec:orgd018cbe},fragile]{CoEs Co-Design Workshop}
March 12 9:00-12:00 and 14:00-17:00
The goal of this workshop is to first get an overview of what CoEs think of co-design and what they do in that context and then to build a shared and common view on this important issue.
The workshop consists of two round tables where each panellist will make a short presentation that will be followed by a discussion among the panellists and with all participants. The workshop is open to all CoE members, so please disseminate largely.
\begin{block}{Session 1 - 9:00 12:00}
Different levels of co-design, where do CoEs come in ?
Panelists: Fabio Affinito, Guillaume Houzeaux, Jesus Labarta, Soline
Laforet, Antony Scemama,
Supercomputer are rather complex systems build using innovative technologies, both on the hardware and software sides. Therefore, co-design can be applied at different levels : chip, network, low-level software, programming models and environment, libraries or applications….
In its design, a computer can also be more general purpose or tuned for a specific class of application.
This round table will discuss all these issues and how CoEs can best contribute.
Zoom link :
\url{https://zoom.us/j/96850634356?pwd=Q0ViUlVYL0tSWXlEeGVVTTJYcWpBdz09}
\begin{enumerate}
\item Short presentation of yourself and of your background.
\begin{itemize}
\item Guillaume Houzeaux: BSC. Sticks : Comp. mechanics
\item Jesus Labarta: BSC
\item Soline Laforet: Atos, Earth Science
\item Fabio Affinito: CINECA, Coordinator of support team
\end{itemize}
\end{enumerate}
\begin{enumerate}
\item Questions :
\end{enumerate}
• How would you define co-design ? (Introduction)
Each of you can make a short speech to present your global view on co-design.
\begin{itemize}
\item JS
\begin{itemize}
\item Codesign implies design. We design applications
\item Codimensioning is not codesign
\item Holistic: every layer in the architecture take design
decisions on the same project
\end{itemize}
\item SL:
\begin{itemize}
\item HPC is not the main field of activity of hardware providers
\item Software environment is important
\end{itemize}
\item FA: \ldots{}
\item GH: Application developer rely on the lower level
\end{itemize}
\begin{verbatim}
Multiple people of different communities work on the same project, each
bringing its domain-specific expertise. Better than the sum of its parts.
\end{verbatim}
• What are the new challenges for co-design related to the future exascale system?
\begin{verbatim}
The software stack is becoming inceasingly complex. In HPC, we
need to use programming languages close to the hardware, with lots
of dangerous constructs. Writing correct software in these
languages becomes increasingly difficult.
Scientists can't understand their codes any more.
\end{verbatim}
\begin{itemize}
\item JL:
\begin{itemize}
\item Applications are more important than "shining" showing Flops/s
\item People program to their own mental model of the machine
\item General purpose vs specific : specific is suicide
\end{itemize}
\end{itemize}
• How does your CoE address the co-design activity? (SL)
\begin{verbatim}
- Move the critical performance outside of the codes in libraries
- QMCkl
- Design an API : collaboration between quantum chemists and computational
scientists. Documentation library were computational kernels are understandable
by HPC experts. The library is re-implemented by HPC experts for targeted
architectures.
- Kernel extraction: we go back to latex formulas, and write a mini-application.
Figure out the data structures for the kernel with HPC experts, and tune the
mini-application. Then, re-implement the kernel in the library.
\end{verbatim}
• What are the co-desing activities operated in you CoE and to
what extent do you think that these activities will actually
influence the HPC hardware design?
\begin{verbatim}
- Move the critical performance outside of the codes in libraries
- QMCkl
- Design an API : collaboration between quantum chemists and computational
scientists. Documentation library were computational kernels are understandable
by HPC experts. The library is re-implemented by HPC experts for targeted
architectures.
- Kernel extraction: we go back to latex formulas, and write a mini-application.
Figure out the data structures for the kernel with HPC experts, and tune the
mini-application. Then, re-implement the kernel in the library.
\end{verbatim}
• how general purpose/special purpose should designs be? 
\begin{verbatim}
Don't re-invent the wheel. But to get the wheel, you often
need to buy the bus and take away the wheel.
Lots of systems preventing you to extract the wheel
because the bus should be used without the wheel.
\end{verbatim}
• Which levels of co-design are relevant?
• What are the differences between developing a library and a program?
\begin{verbatim}
- In a code, you can trust that the parameters are valid.
Many checks before a the routine can do its job.
- Error handling: the library should not decide to crash the program
but return the error and its description to the calling code.
\end{verbatim}
• How holistic is holistic? 
• Can we measure the speed/rate/success of co-design? 
\begin{verbatim}
- In the 1st 6 months, we have reritten a kernel in co-design.
It give the same result as the naive formulation, but with a 20x speedup,
reaching 80% of the peak of a CPU.
\end{verbatim}
• what is the importance of the role of CoEs in helping scientific
communities to follow the evolution of the next HPC architectures?
\begin{verbatim}
- Isolated users are afraid of architecture changes. A CoE builds a community,
and it is a riving force that helps the users to change, and adapt their codes.
\end{verbatim}
• separation of concerns: how much this concept is working when
co-designing applications?
\begin{verbatim}
- Our libraries are good examples.
\end{verbatim}
\end{block}
\begin{block}{Questions / comments}
\begin{itemize}
\item JSC: We need different supercomputing centers specialized
for different application profiles.
\item Miguel Vasquez: Compromises. Need workload managers
\item Gavin Pringle (Excellerat): No CoE will even convince a hardware vendor.
Access to hardware prototypes.
\item Mariano Vazquez (CompBioMed): Centralize Codesign and
dissemination plan
\item Pasqua d'ambra CNR: appreciates vision of library design
\item Jesus Labrata: EPI
\begin{itemize}
\item ARM-based, SVE cores.
\item RISCV core
\item High bandwidth per code
\item Large vector operations
\item Emulator is available
\item How to handle locality
\item Sparse matrix vector is a kernel to optimize
\item \url{https://ssh.hca.bsc.es/epi/ftp/}
\url{https://ssh.hca.bsc.es/epi/ftp/doc/}
\item Pasqua d'Ambra (CNR), EoCoE : BLAS is a succes of codesign strategy
Same for Graph operations
\end{itemize}
\end{itemize}
Pasqua d'Ambra: sparse linear algebra kernels
Jesus Labarta: EPI
Karim H: MdlS
Miguel Vasquez: containers are not a solution
Peter V Coveney: Exascale Linpack is nonsense
\end{block}
\begin{block}{Session 2 14:00-17:00}
\url{https://zoom.us/j/94418315362?pwd=VWRESW95dzkySWZyU1NiSkhzK3JQdz09}
Co-Design for new usage
Panelists : Peter Coveney, Marta Garcia, Berk Hess, Leopold Talirz, Bruno Raffin 
Exascale computer are very likely to run more complex workloads than present supercomputer. This evolution is mainly driven by the development of data-analytics and the need to couple “standard” HPC computation and sophisticated data treatments. The new workloads will require to (co-)design hardware and software tools to manage complex workflows, code coupling, large ensemble runs, (in situ)data-analytics,…
Zoom link :
\url{https://zoom.us/j/94418315362?pwd=VWRESW95dzkySWZyU1NiSkhzK3JQdz09}
\begin{enumerate}
\item Short presentation of yourself and of your background.
\item Questions :
\end{enumerate}
• How do you think HPC workloads will evolve with exascale computers ? How does this influence the design of the system / applications
• what makes ensemble simulation and complex workflows prime candidates for exascale computing/computers?
 
• What are the possibility offered by new hardware, especially those allowing data-intensive workload (GPU, NVRam, flash,…) ?
• Data : Should HPC centers archiving/hosting Large scientific data set ?
• Should HPC centers be designed to host science portals?
• Should HPC centers host workflow-management-type workloads ? How should workflow be properly integrate in HPC centers ?
• What is your center of excellence doing to promote new usage of HPC or adapt to them ?
• What spread of maturity do we expect in codes concerning parallelization and acceleration?
• Which levels of co-design are relevant?
• Should we target more than software/software co-design?
• Which are the "must-have" or key things to succeed in the co-design task ?
• Which are the pitfalls or dangers along the road that can make co-design fail?
Idea: leave linpack to the americans, and propose a better benchmark.
\end{block}
\end{frame}
\end{document}