commit f2f59cc7365764e649cb861a3f4ad1ea05ad0a9e Author: Anthony Scemama Date: Mon Jun 28 14:28:23 2021 +0200 Initial commit diff --git a/Abstract b/Abstract new file mode 100644 index 0000000..8f28554 --- /dev/null +++ b/Abstract @@ -0,0 +1,35 @@ + TREX : an innovative view of HPC usage applied to Quantum Monte Carlo simulations + + + The TREX[1] European Center of Excellence focuses on high accuracy quantum + mechanical methods, essential in many different fields of application such as + new material design or photochemistry. Among these methods, Quantum Monte Carlo + (QMC) approaches are particularly well adapted to exascale architectures. + Our ambition is to the help the community take advantage of exascale machines + through the use of our HPC software. + + We will review in the presentation progress along the three following axes: + + * TREXIO[2]: A common I/O library and file format for easily exchanging data between + applications, facilitating high-throughput computing workflows, + + * QMCkl[3]: A library of computational kernels specific to QMC applications + written together by QMC and HPC experts, taking advantage of both CPUs and GPUs, + + * An integrated workflow including performance analysis (MAQAO[4]) and numerical + accuracy measurements (Verificarlo[5]) to be used for the development of QMC + kernels and more generally for improving the applications. In particular, we + plan to identify the best performance usage of QMCkl and also to adjust the + performance with numerical precision requirements. + + ------------------------------------ + + [1] https://trex-coe.eu + [2] https://github.com/trex-coe/trexio + [3] https://trex-coe.github.io/qmckl + [4] https://www.maqao.org + [5] https://github.com/verificarlo + + ------------------------------------ + + diff --git a/Curve.png b/Curve.png new file mode 100644 index 0000000..d6dcaf2 Binary files /dev/null and b/Curve.png differ diff --git a/casula.png b/casula.png new file mode 100644 index 0000000..8106a7f Binary files /dev/null and b/casula.png differ diff --git a/dirac2.png b/dirac2.png new file mode 100644 index 0000000..3684f21 Binary files /dev/null and b/dirac2.png differ diff --git a/dirac_4.jpg b/dirac_4.jpg new file mode 100644 index 0000000..e173247 Binary files /dev/null and b/dirac_4.jpg differ diff --git a/scemama.tex b/scemama.tex new file mode 100644 index 0000000..45cd7d9 --- /dev/null +++ b/scemama.tex @@ -0,0 +1,529 @@ +% Created 2021-06-28 Mon 14:25 +% Intended LaTeX compiler: pdflatex +\documentclass[aspectratio=169]{beamer} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{graphicx} +\usepackage{grffile} +\usepackage{longtable} +\usepackage{wrapfig} +\usepackage{rotating} +\usepackage[normalem]{ulem} +\usepackage{amsmath} +\usepackage{textcomp} +\usepackage{amssymb} +\usepackage{capt-of} +\usepackage{hyperref} +\institute{Lab. Chimie et Physique Quantiques, IRSAMC, UPS/CNRS, Toulouse (France)} +\usepackage{minted} +\usemintedstyle{emacs} +\newminted{f90}{fontsize=\footnotesize} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{hyperref} +\usepackage{mathtools} +\usepackage{physics} +\definecolor{darkgreen}{rgb}{0.,0.6,0.} +\definecolor{darkblue}{rgb}{0.,0.2,0.7} +\definecolor{darkred}{rgb}{0.6,0.1,0.1} +\definecolor{darkpink}{rgb}{0.7,0.0,0.7} +\usetheme{trex} +\author{Anthony Scemama} +\date{12/03/2021} +\title{Library development within TREX} +\hypersetup{ + pdfauthor={Anthony Scemama}, + pdftitle={Library development within TREX}, + pdfkeywords={}, + pdfsubject={}, + pdfcreator={Emacs 26.3 (Org mode 9.4)}, + pdflang={English}} +\begin{document} + +\maketitle + +\begin{frame}[label={sec:org58acecb}]{Quantum chemistry} +\begin{columns} +\begin{column}{0.25\textwidth} +\begin{center} +\includegraphics[width=\textwidth]{./dirac_4.jpg} +\end{center} +\end{column} +\begin{column}{0.75\textwidth} +\begin{center} +\includegraphics[width=\textwidth]{./dirac2.png} +\end{center} +\end{column} +\end{columns} +\end{frame} + +\begin{frame}[label={sec:org07dc1a1}]{Quantum chemistry} +\begin{columns} +\begin{column}{0.6\textwidth} +\begin{exampleblock}{} +\begin{itemize} +\item Describing matter with quantum mechanics (Schrödinger's equation) +\item Users: theoretical chemists and physicists +\end{itemize} +\end{exampleblock} +\end{column} +\begin{column}{0.4\textwidth} +\begin{center} +\includegraphics[width=\textwidth]{./Water.png} +\end{center} +\end{column} +\end{columns} + +\begin{columns} +\begin{column}{0.4\textwidth} +\begin{center} +\includegraphics[width=\textwidth]{./casula.png} +\end{center} +\end{column} +\begin{column}{0.6\textwidth} +\begin{exampleblock}{Implications for society} +\begin{center} +\begin{tabular}{ll} +- Health & Drug design\\ +- Electronics & Nano- and micro-electronics\\ +- Materials & Carbon nanotubes, graphene, \dots{}\\ +- Catalysis & Enzymatic reactions, petrol\\ +\end{tabular} +\end{center} +\end{exampleblock} +\end{column} +\end{columns} +\end{frame} + +\begin{frame}[label={sec:orgf844e6b}]{TREX: Targeting REal chemical accuracy at the EXascale} +\begin{columns} +\begin{column}{0.4\textwidth} +\begin{center} +\includegraphics[width=\textwidth]{./Curve.png} +\end{center} + +\end{column} +\begin{column}{0.6\textwidth} +\begin{exampleblock}{QMC: Quantum Monte Carlo methods} +\begin{itemize} +\item Massively parallelisable (multiple QMC trajectories) +\item Difficulty: take advantage of parallelism within a trajectory +\end{itemize} +\end{exampleblock} +\begin{exampleblock}{Objective: Make codes ready for exascale} +How: Instead of re-writing codes, provide libraries +\begin{itemize} +\item One library for exchanging information between codes (\alert{TREXIO}) +\item One library for high-performance (\alert{QMCkl}) +\end{itemize} +\end{exampleblock} +\end{column} +\end{columns} +\end{frame} + +\begin{frame}[label={sec:org9bd59c0}]{Presentation of TREX} +\begin{itemize} +\item TREX CoE: Targeting REal chemical accuracy at the eXascale +\item Started in Oct. 2020 +\end{itemize} +\end{frame} + +\begin{frame}[label={sec:orgfb14f88}]{QMC kernel library (QMCkl)} +\begin{itemize} +\item Codesign: Written together by QMC experts and HPC experts +\item Contains all major kernels of QMC methods +\item Multiple high performance implementations of the kernels, tuned +for different +\begin{itemize} +\item architectures +\item problem sizes +\item requested accuracy (reduced precision) +\end{itemize} +\item Reference implementation : \emph{Documentation} : easy to read and +understand, not necessarily efficient +\item Final implementations : \emph{High performance} : efficient, but not +necessarily readable by physicists/chemists +\end{itemize} +\end{frame} + +\begin{frame}[label={sec:org929f042}]{Advantages} +\begin{itemize} +\item The code can stay easy to understand by the physicists/chemists +Performance-related aspects are delegated to the library +\item Changing architecture requires only linking with another +version of the library +\item Scientific code development does not break the performance +\item Better re-use of the optimization effort among the community +\end{itemize} +\end{frame} + +\begin{frame}[label={sec:org8ded82f}]{Literate programming} +\begin{quote} +Literate programming is a programming paradigm introduced by Donald +Knuth in which a computer program is given an explanation of its +logic in a natural language, such as English, interspersed with +snippets of macros and traditional source code, from which +compilable source code can be generated. (Wikipedia) +\end{quote} +\end{frame} + + +\begin{frame}[label={sec:org01ef9da}]{Documentation library} +Literate programming with org-mode: +\begin{itemize} +\item Here, comments are more important than code +\item Can add graphics, \LaTeX formulas, tables, etc +\item Documentation always synchronized with the code +\item Some routines can be generated by embedded scripts +\item Most of the the first report was generated from the documentation +\item Kernels are implemented in Fortran for readability +\item The API is C-compatible: QMCkl appears like a C library +\(\Longrightarrow\) can be used in all other languages +\end{itemize} +\end{frame} + +\begin{frame}[label={sec:orge55dfb7}]{Codesign strategy} +\begin{enumerate} +\item Kernel extraction: QMC specialists agree on the +mathematical expression of the problem +\item A mini-application is written to find the best data layout +with HPC experts from real-size examples +\item The kernel is written in the documentation library +\item HPC experts provide an HPC version of the kernel +\item The library is linked in the QMC codes of the CoE +\end{enumerate} +\end{frame} + +\begin{frame}[label={sec:orgfdcab81}]{Our first application : 3-body Jastrow factor} +\newcommand{\Jeen}{J_{\text{een}}} +\newcommand{\Nel}{N_{\text{elec}}} +\newcommand{\Nat}{N_{\text{nucl}}} +\newcommand{\Nord}{N_{\text{nord}}} +\newcommand{\lmax}{p-k-2\delta_{k,0}} +\newcommand{\br}{\mathbf{r}} +\newcommand{\bR}{\mathbf{R}} +\newcommand{\ttr}{\, \bar{\mathtt{r}}} +\newcommand{\tR}{\, \bar{\mathtt{R}}} +\newcommand{\tP}{\, \bar{\mathtt{P}}} + +\[ + \Jeen (\br,\bR) = \sum_{\alpha=1}^{\Nat} \sum_{i=1}^{\Nel} \sum_{j=1}^{i-1} +\sum_{p=2}^{\Nord} \sum_{k=0}^{p-1} +\sum_{l=0}^{\lmax} c_{lkp\alpha} +\left( {r}_{ij} \right)^k +\left[ \left( {R}_{i\alpha} \right)^l + \left( {R}_{j\alpha} \right)^l \right] +\left( {R}_{i\,\alpha} \, {R}_{j\alpha} \right)^{(p-k-l)/2} +\] + +can be rewritten as + +\[ + \Jeen(\br,\bR) = + \sum_{p=2}^{\Nord}\sum_{k=0}^{p-1} + \sum_{l=0}^{\lmax} + \sum_{\alpha=1}^{\Nat} + c_{lkp\alpha} + \sum_{i=1}^{\Nel} + {\tR}_{i,\alpha,(p-k-l)/2}\, + {\tP}_{i,\alpha,k,(p-k+l)/2} \; \text{\scriptsize \alert{($\downarrow$ complexity)}} + \] +with + \[ + {\tP}_{i, \alpha, k, l} = \sum_{j=1}^{\Nel} + {\ttr}_{i,j,k}\,{\tR}_{j,\alpha,l}. \; \text{\alert{\scriptsize (GEMM)}} + \] +\end{frame} + +\begin{frame}[label={sec:org3e25bfe}]{Our first application : Gradient and Laplacian} +\newcommand{\Jeen}{J_{\text{een}}} +\newcommand{\Nel}{N_{\text{elec}}} +\newcommand{\Nat}{N_{\text{nucl}}} +\newcommand{\Nord}{N_{\text{nord}}} +\newcommand{\lmax}{p-k-2\delta_{k,0}} +\newcommand{\br}{\mathbf{r}} +\newcommand{\bR}{\mathbf{R}} +\newcommand{\ttr}{\, \bar{\mathtt{r}}} +\newcommand{\tR}{\, \bar{\mathtt{R}}} +\newcommand{\tP}{\, \bar{\mathtt{P}}} +\newcommand{\tg}{\, \bar{\mathtt{g}}} +\newcommand{\tG}{\, \bar{\mathtt{G}}} +\newcommand{\tQ}{\, \bar{\mathtt{Q}}} + +\begin{eqnarray*} + \nabla_{im} \Jeen(\br,\bR) & = & + \sum_{p=2}^{\Nord}\sum_{k=0}^{p-1} + \sum_{l=0}^{\lmax} + \sum_{\alpha=1}^{\Nat} + c_{lkp\alpha} + \sum_{i=1}^{\Nel} + {\tG}_{i,m,\alpha,(p-k-l)/2} {\tP}_{i,\alpha,k,(p-k+l)/2} + \\ +&& {\tG}_{i,m,\alpha,(p-k+l)/2} {\tP}_{i,\alpha,k,(p-k-l)/2} + + {\tR}_{i,\alpha,(p-k-l)/2} {\tQ}_{i,m,\alpha,k,(p-k+l)/2} + \\ +&& {\tR}_{i,\alpha,(p-k+l)/2} {\tQ}_{i,m,\alpha,k,(p-k-l)/2} + + \delta_{m,4} \big( \\ +&& {\tG}_{i,1,\alpha,(p-k+l)/2} {\tQ}_{i,1,\alpha,k,(p-k-l)/2} + + {\tG}_{i,2,\alpha,(p-k+l)/2} {\tQ}_{i,2,\alpha,k,(p-k-l)/2} + \\ +&& {\tG}_{i,3,\alpha,(p-k+l)/2} {\tQ}_{i,3,\alpha,k,(p-k-l)/2} + + {\tG}_{i,1,\alpha,(p-k-l)/2} {\tQ}_{i,1,\alpha,k,(p-k+l)/2} + \\ +&& {\tG}_{i,2,\alpha,(p-k-l)/2} {\tQ}_{i,2,\alpha,k,(p-k+l)/2} + + {\tG}_{i,3,\alpha,(p-k-l)/2} {\tQ}_{i,3,\alpha,k,(p-k+l)/2} \big) +\end{eqnarray*} + +with + +\[ + {\tG}_{i, m, \alpha, l} = \frac{\partial \left( {R}_{i\alpha} \right)^l} + {\partial r_i}, \phantom{ space } + {\tg}_{i, m, j, k} = \frac{\partial \left( {r}_{ij} \right)^k} + {\partial r_i}, \phantom{ space } + \text{ and } + {\tQ}_{i, m, \alpha, k, l} = \sum_{j=1}^{\Nel} + {\tg}_{i,m,j,k}\,{\tR}_{j,\alpha,l} +\] +\end{frame} + +\begin{frame}[label={sec:org0200bcf}]{Speed up} +\begin{center} +\includegraphics[height=0.8\textheight]{./speedup.pdf} +\end{center} + +\(\sim 80\%\) of the AVX-512 peak is reached on a Skylake CPU. +\end{frame} + + + +\begin{frame}[label={sec:org4f9d92a}]{Links} +\begin{itemize} +\item TREX web site : \url{https://trex-coe.eu} +\item QMCkl documentation : \url{https://trex-coe.github.io/qmckl} +\item QMCkl repository : \url{https://github.com/trex-coe/qmckl} +\end{itemize} +\end{frame} + +\begin{frame}[label={sec:orgd018cbe},fragile]{CoEs Co-Design Workshop} + March 12 – 9:00-12:00 and 14:00-17:00 + +The goal of this workshop is to first get an overview of what CoEs think of co-design and what they do in that context and then to build a shared and common view on this important issue. +The workshop consists of two round tables where each panellist will make a short presentation that will be followed by a discussion among the panellists and with all participants. The workshop is open to all CoE members, so please disseminate largely. + +\begin{block}{Session 1 - 9:00 – 12:00} +Different levels of co-design, where do CoEs come in ? + +Panelists: Fabio Affinito, Guillaume Houzeaux, Jesus Labarta, Soline +Laforet, Antony Scemama, + +Supercomputer are rather complex systems build using innovative technologies, both on the hardware and software sides. Therefore, co-design can be applied at different levels : chip, network, low-level software, programming models and environment, libraries or applications…. +In its design, a computer can also be more general purpose or tuned for a specific class of application. +This round table will discuss all these issues and how CoEs can best contribute. + +Zoom link : +\url{https://zoom.us/j/96850634356?pwd=Q0ViUlVYL0tSWXlEeGVVTTJYcWpBdz09} + +\begin{enumerate} +\item Short presentation of yourself and of your background. +\begin{itemize} +\item Guillaume Houzeaux: BSC. Sticks : Comp. mechanics +\item Jesus Labarta: BSC +\item Soline Laforet: Atos, Earth Science +\item Fabio Affinito: CINECA, Coordinator of support team +\end{itemize} +\end{enumerate} + + +\begin{enumerate} +\item Questions : +\end{enumerate} +• How would you define co-design ? (Introduction) + Each of you can make a short speech to present your global view on co-design. + +\begin{itemize} +\item JS +\begin{itemize} +\item Codesign implies design. We design applications +\item Codimensioning is not codesign +\item Holistic: every layer in the architecture take design +decisions on the same project +\end{itemize} +\item SL: +\begin{itemize} +\item HPC is not the main field of activity of hardware providers +\item Software environment is important +\end{itemize} +\item FA: \ldots{} +\item GH: Application developer rely on the lower level +\end{itemize} + +\begin{verbatim} + Multiple people of different communities work on the same project, each + bringing its domain-specific expertise. Better than the sum of its parts. +\end{verbatim} + +• What are the new challenges for co-design related to the future exascale system? + +\begin{verbatim} + The software stack is becoming inceasingly complex. In HPC, we + need to use programming languages close to the hardware, with lots + of dangerous constructs. Writing correct software in these + languages becomes increasingly difficult. + + Scientists can't understand their codes any more. +\end{verbatim} + +\begin{itemize} +\item JL: +\begin{itemize} +\item Applications are more important than "shining" showing Flops/s +\item People program to their own mental model of the machine +\item General purpose vs specific : specific is suicide +\end{itemize} +\end{itemize} + +• How does your CoE address the co-design activity? (SL) + +\begin{verbatim} + - Move the critical performance outside of the codes in libraries + - QMCkl + - Design an API : collaboration between quantum chemists and computational + scientists. Documentation library were computational kernels are understandable + by HPC experts. The library is re-implemented by HPC experts for targeted + architectures. + - Kernel extraction: we go back to latex formulas, and write a mini-application. + Figure out the data structures for the kernel with HPC experts, and tune the + mini-application. Then, re-implement the kernel in the library. +\end{verbatim} + +• What are the co-desing activities operated in you CoE and to + what extent do you think that these activities will actually + influence the HPC hardware design? + +\begin{verbatim} + - Move the critical performance outside of the codes in libraries + - QMCkl + - Design an API : collaboration between quantum chemists and computational + scientists. Documentation library were computational kernels are understandable + by HPC experts. The library is re-implemented by HPC experts for targeted + architectures. + - Kernel extraction: we go back to latex formulas, and write a mini-application. + Figure out the data structures for the kernel with HPC experts, and tune the + mini-application. Then, re-implement the kernel in the library. +\end{verbatim} + +• how general purpose/special purpose should designs be?  + +\begin{verbatim} + Don't re-invent the wheel. But to get the wheel, you often + need to buy the bus and take away the wheel. + Lots of systems preventing you to extract the wheel + because the bus should be used without the wheel. +\end{verbatim} + +• Which levels of co-design are relevant? + +• What are the differences between developing a library and a program? + +\begin{verbatim} + - In a code, you can trust that the parameters are valid. + Many checks before a the routine can do its job. + - Error handling: the library should not decide to crash the program + but return the error and its description to the calling code. +\end{verbatim} + +• How holistic is holistic?  +• Can we measure the speed/rate/success of co-design?  + +\begin{verbatim} + - In the 1st 6 months, we have reritten a kernel in co-design. + It give the same result as the naive formulation, but with a 20x speedup, + reaching 80% of the peak of a CPU. +\end{verbatim} + +• what is the importance of the role of CoEs in helping scientific + communities to follow the evolution of the next HPC architectures? + +\begin{verbatim} + - Isolated users are afraid of architecture changes. A CoE builds a community, + and it is a riving force that helps the users to change, and adapt their codes. +\end{verbatim} + +• separation of concerns: how much this concept is working when + co-designing applications? + +\begin{verbatim} + - Our libraries are good examples. +\end{verbatim} +\end{block} + +\begin{block}{Questions / comments} +\begin{itemize} +\item JSC: We need different supercomputing centers specialized +for different application profiles. +\item Miguel Vasquez: Compromises. Need workload managers +\item Gavin Pringle (Excellerat): No CoE will even convince a hardware vendor. +Access to hardware prototypes. +\item Mariano Vazquez (CompBioMed): Centralize Codesign and +dissemination plan +\item Pasqua d'ambra CNR: appreciates vision of library design +\item Jesus Labrata: EPI +\begin{itemize} +\item ARM-based, SVE cores. +\item RISCV core +\item High bandwidth per code +\item Large vector operations +\item Emulator is available +\item How to handle locality +\item Sparse matrix vector is a kernel to optimize +\item \url{https://ssh.hca.bsc.es/epi/ftp/} +\url{https://ssh.hca.bsc.es/epi/ftp/doc/} +\item Pasqua d'Ambra (CNR), EoCoE : BLAS is a succes of codesign strategy +Same for Graph operations +\end{itemize} +\end{itemize} + +Pasqua d'Ambra: sparse linear algebra kernels +Jesus Labarta: EPI +Karim H: MdlS +Miguel Vasquez: containers are not a solution +Peter V Coveney: Exascale Linpack is nonsense +\end{block} + + +\begin{block}{Session 2 – 14:00-17:00} + \url{https://zoom.us/j/94418315362?pwd=VWRESW95dzkySWZyU1NiSkhzK3JQdz09} +Co-Design for new usage + +Panelists : Peter Coveney, Marta Garcia, Berk Hess, Leopold Talirz, Bruno Raffin  + +Exascale computer are very likely to run more complex workloads than present supercomputer. This evolution is mainly driven by the development of data-analytics and the need to couple “standard” HPC computation and sophisticated data treatments. The new workloads will require to (co-)design hardware and software tools to manage complex workflows, code coupling, large ensemble runs, (in situ)data-analytics,… + + +Zoom link : +\url{https://zoom.us/j/94418315362?pwd=VWRESW95dzkySWZyU1NiSkhzK3JQdz09} + +\begin{enumerate} +\item Short presentation of yourself and of your background. + +\item Questions : +\end{enumerate} +• How do you think HPC workloads will evolve with exascale computers ? How does this influence the design of the system / applications + + • what makes ensemble simulation and complex workflows prime candidates for exascale computing/computers? +  + • What are the possibility offered by new hardware, especially those allowing data-intensive workload (GPU, NVRam, flash,…) ? + +• Data : Should HPC centers archiving/hosting Large scientific data set ? +• Should HPC centers be designed to host science portals? +• Should HPC centers host workflow-management-type workloads ? How should workflow be properly integrate in HPC centers ? + +• What is your center of excellence doing to promote new usage of HPC or adapt to them ? + +• What spread of maturity do we expect in codes concerning parallelization and acceleration? +• Which levels of co-design are relevant? +• Should we target more than software/software co-design? + +• Which are the "must-have" or key things to succeed in the co-design task ? +• Which are the pitfalls or dangers along the road that can make co-design fail? + +Idea: leave linpack to the americans, and propose a better benchmark. +\end{block} +\end{frame} +\end{document} \ No newline at end of file