diff --git a/Amdahl.pdf b/Amdahl.pdf deleted file mode 100644 index ed091c0..0000000 Binary files a/Amdahl.pdf and /dev/null differ diff --git a/Amdahl2.pdf b/Amdahl2.pdf deleted file mode 100644 index f8424ac..0000000 Binary files a/Amdahl2.pdf and /dev/null differ diff --git a/FranceGrilles.png b/FranceGrilles.png deleted file mode 100644 index 5c2fa3f..0000000 Binary files a/FranceGrilles.png and /dev/null differ diff --git a/Meso.png b/Meso.png deleted file mode 100644 index f116e94..0000000 Binary files a/Meso.png and /dev/null differ diff --git a/PyramideNewLook2012-2.png b/PyramideNewLook2012-2.png deleted file mode 100644 index f96fbc6..0000000 Binary files a/PyramideNewLook2012-2.png and /dev/null differ diff --git a/blas1.png b/blas1.png deleted file mode 100644 index e175276..0000000 Binary files a/blas1.png and /dev/null differ diff --git a/blas2.png b/blas2.png deleted file mode 100644 index c26c295..0000000 Binary files a/blas2.png and /dev/null differ diff --git a/blas3.png b/blas3.png deleted file mode 100644 index bd3ad46..0000000 Binary files a/blas3.png and /dev/null differ diff --git a/arriere_chassis.jpg b/chassis_back.jpg similarity index 100% rename from arriere_chassis.jpg rename to chassis_back.jpg diff --git a/genci.png b/genci.png deleted file mode 100644 index 01130e8..0000000 Binary files a/genci.png and /dev/null differ diff --git a/hierarchy.pdf b/hierarchy.pdf new file mode 100644 index 0000000..2ac6ec8 Binary files /dev/null and b/hierarchy.pdf differ diff --git a/parallelism_scemama.org b/parallelism_scemama.org index dd56744..a71fcb2 100644 --- a/parallelism_scemama.org +++ b/parallelism_scemama.org @@ -23,10 +23,202 @@ #+startup: beamer #+options: H:2 +* Program :noexport: + +** Mardi apres-midi + + Je peux leur faire + - paralleliser un produit de matrices avec OpenMP en Fortran/C + - calculer pi avec un algorithme Monte Carlo avec MPI (Master-Worker) d'abord en Python puis en Fortran/C. + - calculer une integrale dans R^3 sur une grille de points avec MPI en Fortran/C. + + Ca va juste leur donner les bases avec OMP DO et MPI_Reduce, et on ne pourra pas + aller beaucoup plus loin. Mais ils pourront quand meme faire tourner les 112 + coeurs du cluster. + + Attention: il faut aussi prevoir qu'ils n'auront peut-etre jamais utilise un + cluster, et ils ne savent probablement pas ce que c'est. Donc je vais devoir + inclure un peu de hardware dans ma presentation sur le parallelisme et leur + expliquer qu'il faut faire sbatch pour lancer un calcul. + + C'est bien slurm to gestionnaire de batch? + + +** Mercredi + + Pour IRPF90, je peux faire une presentation assez generale. + + J'ai deja un tutoriel pour ecrire un programme de dynamique moleculaire avec un + potentiel Lennard-Jones, je pense que ce sera plus facile puisqu'il n'y a pas + beaucoup de temps pour faire les TPs. + + Si ils vont suffisamment vite, on peut ensuite basculer sur la parallelisation + OpenMP des boucles dans le code, et sur le lancement de plusieurs trajectoires + avec MPI en reprenant le modele de parallelisation de pi de la veille. + + Pour QP, je me dis que ce serait bien de le le presenter en 15 minutes une fois + qu'ils ont fait le TP sur IRPF90. Ensuite on peut leur faire une demo sur comment + implementer un SCF en 10 minutes, mais je ne pense pas qu'on aura le temps de + leur faire faire le travail dans QP, et ca simplifie les problemes de + compilation sur les machines. On peut aussi leur donner acces au compte ou QP + est installe pour ceux qui vont tres vite et qui veulent essayer de jouer avec. + * Supercomputers +** Computers + + #+LATEX: \begin{columns} + #+LATEX: \begin{column}{0.7\textwidth} + #+LATEX: \begin{exampleblock}{Today (order of magnitude)} + + - 1 *socket* (x86 CPU @ 2.2-3.3 GHz, *4 cores*, hyperthreading) + - \sim 4-16 GB RAM + - \sim 500GB SSD + - Graphics card : ATI Radeon, Nvidia GeForce + - Gigabit ethernet + - USB, Webcam, Sound card, etc + - \sim 500 euros + #+LATEX: \end{exampleblock} + #+LATEX: \end{column} + #+LATEX: \begin{column}{0.3\textwidth} + #+ATTR_LATEX: :width \textwidth + + [[./desktop-inspiron-MT-3650-pdp-module-1.jpg]] + #+LATEX: \end{column} + #+LATEX: \end{columns} + +** Computer designed for computation + + #+LATEX: \begin{columns} + #+LATEX: \begin{column}{0.6\textwidth} + #+LATEX: \begin{exampleblock}{Today (order of magnitude)} + + - 2 sockets (x86 CPU @ 2.2 GHz, 32 cores/socket, hyperthreading) + - 64-128 GB RAM + - Multiple SSD HDDs (RAID0) + - Gigabit ethernet + - Possibly an Accelerator (Nvidia Volta/Ampere) + - \sim 5k euros + + #+LATEX: \end{exampleblock} + #+LATEX: \end{column} + #+LATEX: \begin{column}{0.4\textwidth} + #+ATTR_LATEX: :width \textwidth + + [[./z840_gallery_img4_tcm245_2164103_tcm245_1871309_tcm245-2164103.jpg]] + #+LATEX: \end{column} + #+LATEX: \end{columns} + +** Cluster + + #+LATEX: \begin{columns} + #+LATEX: \begin{column}{0.6\textwidth} + #+LATEX: \begin{exampleblock}{} + - Many computers designed for computation + - Compact (1-2U in rack) / machine + - Network switches + - Login server + - Batch queuing system (SLURM / PBS / SGE / LFS) + - Cheap Cooling system + - Requires a lot of electrical power (~10kW/rack) + - Possibly a Low-latency / High-bandwidth network (Infiniband or 10Gb ethernet) + - >50k euros + #+LATEX: \end{exampleblock} + #+LATEX: \end{column} + #+LATEX: \begin{column}{0.4\textwidth} + #+ATTR_LATEX: :width \textwidth + + [[./img_20160510_152246_resize.jpg]] + #+LATEX: \end{column} + #+LATEX: \end{columns} + +** Supercomputer + + #+LATEX: \begin{columns} + #+LATEX: \begin{column}{0.6\textwidth} + #+LATEX: \begin{exampleblock}{} + - Many computers designed for computation + - Very compact (<1U in rack) / machine + - Low-latency / High-bandwidth network (Infiniband or 10Gb ethernet) + - Network switches + - Parallel filesystem for scratch space (Lustre / BeeGFS / GPFS) + - Multiple login servers + - Batch queuing system (SLURM / PBS / SGE / LFS) + - Highly efficient cooling system (water) + - Requires a lot of electrical power (>100kW) + #+LATEX: \end{exampleblock} + #+LATEX: \end{column} + #+LATEX: \begin{column}{0.4\textwidth} + #+ATTR_LATEX: :width \textwidth + + [[./Eos.png]] + #+LATEX: \end{column} + #+LATEX: \end{columns} + +** Definitions + - Top500 :: Rank of the 500 fastest supercomputers + - Flop :: Floating point operation + - Flops :: Flop/s, Number of Floating point operations per second + - RPeak :: Peak performance, max possible number of Flops + - RMax :: Real performance on the Linpack benchmark (dense eigenproblem) + - SP :: Single precision (32-bit floats) + - DP :: Double precision (64-bit floats) + - FPU :: Floating Point Unit + - FMA :: Fused multiply-add ($a\times x+b$ in 1 instruction) + +** Quantifying performance + + #+LATEX: \begin{exampleblock}{Example} + *RPeak* of Intel Xeon Gold 6140 Processor : + - 18 cores + - 2.3 GHz + - 2 FPUs + - 8 FMA (DP)/FPU/cycle + + $18 \times 2.3\, 10^9 \times 2 \times 8 \times 2 = 1.3$ TFlops (DP) + #+LATEX: \end{exampleblock} + + - Number of hours :: 730/month, 8760/year + - Units :: Kilo (K), Mega (M), Giga (G), Tera (T), Peta (P), Exa (E), ... +** Top500 (1996) + #+ATTR_LATEX: :height 0.9\textheight + [[./top500_95.png]] + +** Top500 (2021) + #+ATTR_LATEX: :height 0.9\textheight + [[./top500_21.png]] + + https://www.top500.org/lists/top500/2021/11/ +** Curie thin nodes (TGCC, France) + Ranked 9th in 2012, 77 184 cores, 1.7 PFlops, 2.1 MW + #+ATTR_LATEX: :height 0.8\textheight + [[./tgcc.jpg]] + +** Mare Nostrum (BSC, Spain) + Ranked 13th in 2017, 153 216 cores, 6.5 PFlops, 1.6 MW + #+ATTR_LATEX: :height 0.8\textheight + [[./marenostrum.jpg]] + +** Architecture + #+ATTR_LATEX: :height 0.9\textheight + [[./hierarchy.pdf]] +** Chassis (Front) + #+ATTR_LATEX: :height 0.9\textheight + [[./chassis.jpg]] +** Chassis (Back) + #+ATTR_LATEX: :height 0.9\textheight + [[./chassis_back.jpg]] +** Compute Node + #+ATTR_LATEX: :height 0.9\textheight + [[./blade.jpg]] +** Socket + #+ATTR_LATEX: :height 0.9\textheight + [[./socket.jpg]] +** Core + #+ATTR_LATEX: :height 0.9\textheight + [[./Nehalem.jpg]] * Fundamentals of parallelization @@ -54,7 +246,6 @@ but every problem is not parallelizable at all levels. ** Data movement - * OpenMP * Message Passing Interface (MPI) @@ -83,7 +274,6 @@ but every problem is not parallelizable at all levels. : /home/scemama/MEGA/TEX/Cours/TCCM/TCCM2022/Parallelism/parallelism_scemama.pdf - * Figures :noexport: #+BEGIN_SRC dot :output file :file interfaces.png diff --git a/prace.png b/prace.png deleted file mode 100644 index b4dd5b9..0000000 Binary files a/prace.png and /dev/null differ diff --git a/top500.png b/top500.png deleted file mode 100644 index e1a30af..0000000 Binary files a/top500.png and /dev/null differ diff --git a/top500_95.png b/top500_95.png index 22456ec..a95e3f5 100644 Binary files a/top500_95.png and b/top500_95.png differ diff --git a/z840_gallery_img4_tcm245_2164103_tcm245_1871309_tcm245-2164103.jpg b/z840_gallery_img4_tcm245_2164103_tcm245_1871309_tcm245-2164103.jpg index 704d67a..ea5355b 100644 Binary files a/z840_gallery_img4_tcm245_2164103_tcm245_1871309_tcm245-2164103.jpg and b/z840_gallery_img4_tcm245_2164103_tcm245_1871309_tcm245-2164103.jpg differ