Architecture OK
BIN
Amdahl.pdf
BIN
Amdahl2.pdf
Before Width: | Height: | Size: 626 KiB |
Before Width: | Height: | Size: 38 KiB |
Before Width: | Height: | Size: 188 KiB After Width: | Height: | Size: 188 KiB |
BIN
hierarchy.pdf
Normal file
@ -23,10 +23,202 @@
|
|||||||
#+startup: beamer
|
#+startup: beamer
|
||||||
#+options: H:2
|
#+options: H:2
|
||||||
|
|
||||||
|
* Program :noexport:
|
||||||
|
|
||||||
|
** Mardi apres-midi
|
||||||
|
|
||||||
|
Je peux leur faire
|
||||||
|
- paralleliser un produit de matrices avec OpenMP en Fortran/C
|
||||||
|
- calculer pi avec un algorithme Monte Carlo avec MPI (Master-Worker) d'abord en Python puis en Fortran/C.
|
||||||
|
- calculer une integrale dans R^3 sur une grille de points avec MPI en Fortran/C.
|
||||||
|
|
||||||
|
Ca va juste leur donner les bases avec OMP DO et MPI_Reduce, et on ne pourra pas
|
||||||
|
aller beaucoup plus loin. Mais ils pourront quand meme faire tourner les 112
|
||||||
|
coeurs du cluster.
|
||||||
|
|
||||||
|
Attention: il faut aussi prevoir qu'ils n'auront peut-etre jamais utilise un
|
||||||
|
cluster, et ils ne savent probablement pas ce que c'est. Donc je vais devoir
|
||||||
|
inclure un peu de hardware dans ma presentation sur le parallelisme et leur
|
||||||
|
expliquer qu'il faut faire sbatch pour lancer un calcul.
|
||||||
|
|
||||||
|
C'est bien slurm to gestionnaire de batch?
|
||||||
|
|
||||||
|
|
||||||
|
** Mercredi
|
||||||
|
|
||||||
|
Pour IRPF90, je peux faire une presentation assez generale.
|
||||||
|
|
||||||
|
J'ai deja un tutoriel pour ecrire un programme de dynamique moleculaire avec un
|
||||||
|
potentiel Lennard-Jones, je pense que ce sera plus facile puisqu'il n'y a pas
|
||||||
|
beaucoup de temps pour faire les TPs.
|
||||||
|
|
||||||
|
Si ils vont suffisamment vite, on peut ensuite basculer sur la parallelisation
|
||||||
|
OpenMP des boucles dans le code, et sur le lancement de plusieurs trajectoires
|
||||||
|
avec MPI en reprenant le modele de parallelisation de pi de la veille.
|
||||||
|
|
||||||
|
Pour QP, je me dis que ce serait bien de le le presenter en 15 minutes une fois
|
||||||
|
qu'ils ont fait le TP sur IRPF90. Ensuite on peut leur faire une demo sur comment
|
||||||
|
implementer un SCF en 10 minutes, mais je ne pense pas qu'on aura le temps de
|
||||||
|
leur faire faire le travail dans QP, et ca simplifie les problemes de
|
||||||
|
compilation sur les machines. On peut aussi leur donner acces au compte ou QP
|
||||||
|
est installe pour ceux qui vont tres vite et qui veulent essayer de jouer avec.
|
||||||
|
|
||||||
|
|
||||||
* Supercomputers
|
* Supercomputers
|
||||||
|
|
||||||
|
** Computers
|
||||||
|
|
||||||
|
#+LATEX: \begin{columns}
|
||||||
|
#+LATEX: \begin{column}{0.7\textwidth}
|
||||||
|
#+LATEX: \begin{exampleblock}{Today (order of magnitude)}
|
||||||
|
|
||||||
|
- 1 *socket* (x86 CPU @ 2.2-3.3 GHz, *4 cores*, hyperthreading)
|
||||||
|
- \sim 4-16 GB RAM
|
||||||
|
- \sim 500GB SSD
|
||||||
|
- Graphics card : ATI Radeon, Nvidia GeForce
|
||||||
|
- Gigabit ethernet
|
||||||
|
- USB, Webcam, Sound card, etc
|
||||||
|
- \sim 500 euros
|
||||||
|
#+LATEX: \end{exampleblock}
|
||||||
|
#+LATEX: \end{column}
|
||||||
|
#+LATEX: \begin{column}{0.3\textwidth}
|
||||||
|
#+ATTR_LATEX: :width \textwidth
|
||||||
|
|
||||||
|
[[./desktop-inspiron-MT-3650-pdp-module-1.jpg]]
|
||||||
|
#+LATEX: \end{column}
|
||||||
|
#+LATEX: \end{columns}
|
||||||
|
|
||||||
|
** Computer designed for computation
|
||||||
|
|
||||||
|
#+LATEX: \begin{columns}
|
||||||
|
#+LATEX: \begin{column}{0.6\textwidth}
|
||||||
|
#+LATEX: \begin{exampleblock}{Today (order of magnitude)}
|
||||||
|
|
||||||
|
- 2 sockets (x86 CPU @ 2.2 GHz, 32 cores/socket, hyperthreading)
|
||||||
|
- 64-128 GB RAM
|
||||||
|
- Multiple SSD HDDs (RAID0)
|
||||||
|
- Gigabit ethernet
|
||||||
|
- Possibly an Accelerator (Nvidia Volta/Ampere)
|
||||||
|
- \sim 5k euros
|
||||||
|
|
||||||
|
#+LATEX: \end{exampleblock}
|
||||||
|
#+LATEX: \end{column}
|
||||||
|
#+LATEX: \begin{column}{0.4\textwidth}
|
||||||
|
#+ATTR_LATEX: :width \textwidth
|
||||||
|
|
||||||
|
[[./z840_gallery_img4_tcm245_2164103_tcm245_1871309_tcm245-2164103.jpg]]
|
||||||
|
#+LATEX: \end{column}
|
||||||
|
#+LATEX: \end{columns}
|
||||||
|
|
||||||
|
** Cluster
|
||||||
|
|
||||||
|
#+LATEX: \begin{columns}
|
||||||
|
#+LATEX: \begin{column}{0.6\textwidth}
|
||||||
|
#+LATEX: \begin{exampleblock}{}
|
||||||
|
- Many computers designed for computation
|
||||||
|
- Compact (1-2U in rack) / machine
|
||||||
|
- Network switches
|
||||||
|
- Login server
|
||||||
|
- Batch queuing system (SLURM / PBS / SGE / LFS)
|
||||||
|
- Cheap Cooling system
|
||||||
|
- Requires a lot of electrical power (~10kW/rack)
|
||||||
|
- Possibly a Low-latency / High-bandwidth network (Infiniband or 10Gb ethernet)
|
||||||
|
- >50k euros
|
||||||
|
#+LATEX: \end{exampleblock}
|
||||||
|
#+LATEX: \end{column}
|
||||||
|
#+LATEX: \begin{column}{0.4\textwidth}
|
||||||
|
#+ATTR_LATEX: :width \textwidth
|
||||||
|
|
||||||
|
[[./img_20160510_152246_resize.jpg]]
|
||||||
|
#+LATEX: \end{column}
|
||||||
|
#+LATEX: \end{columns}
|
||||||
|
|
||||||
|
** Supercomputer
|
||||||
|
|
||||||
|
#+LATEX: \begin{columns}
|
||||||
|
#+LATEX: \begin{column}{0.6\textwidth}
|
||||||
|
#+LATEX: \begin{exampleblock}{}
|
||||||
|
- Many computers designed for computation
|
||||||
|
- Very compact (<1U in rack) / machine
|
||||||
|
- Low-latency / High-bandwidth network (Infiniband or 10Gb ethernet)
|
||||||
|
- Network switches
|
||||||
|
- Parallel filesystem for scratch space (Lustre / BeeGFS / GPFS)
|
||||||
|
- Multiple login servers
|
||||||
|
- Batch queuing system (SLURM / PBS / SGE / LFS)
|
||||||
|
- Highly efficient cooling system (water)
|
||||||
|
- Requires a lot of electrical power (>100kW)
|
||||||
|
#+LATEX: \end{exampleblock}
|
||||||
|
#+LATEX: \end{column}
|
||||||
|
#+LATEX: \begin{column}{0.4\textwidth}
|
||||||
|
#+ATTR_LATEX: :width \textwidth
|
||||||
|
|
||||||
|
[[./Eos.png]]
|
||||||
|
#+LATEX: \end{column}
|
||||||
|
#+LATEX: \end{columns}
|
||||||
|
|
||||||
|
** Definitions
|
||||||
|
- Top500 :: Rank of the 500 fastest supercomputers
|
||||||
|
- Flop :: Floating point operation
|
||||||
|
- Flops :: Flop/s, Number of Floating point operations per second
|
||||||
|
- RPeak :: Peak performance, max possible number of Flops
|
||||||
|
- RMax :: Real performance on the Linpack benchmark (dense eigenproblem)
|
||||||
|
- SP :: Single precision (32-bit floats)
|
||||||
|
- DP :: Double precision (64-bit floats)
|
||||||
|
- FPU :: Floating Point Unit
|
||||||
|
- FMA :: Fused multiply-add ($a\times x+b$ in 1 instruction)
|
||||||
|
|
||||||
|
** Quantifying performance
|
||||||
|
|
||||||
|
#+LATEX: \begin{exampleblock}{Example}
|
||||||
|
*RPeak* of Intel Xeon Gold 6140 Processor :
|
||||||
|
- 18 cores
|
||||||
|
- 2.3 GHz
|
||||||
|
- 2 FPUs
|
||||||
|
- 8 FMA (DP)/FPU/cycle
|
||||||
|
|
||||||
|
$18 \times 2.3\, 10^9 \times 2 \times 8 \times 2 = 1.3$ TFlops (DP)
|
||||||
|
#+LATEX: \end{exampleblock}
|
||||||
|
|
||||||
|
- Number of hours :: 730/month, 8760/year
|
||||||
|
- Units :: Kilo (K), Mega (M), Giga (G), Tera (T), Peta (P), Exa (E), ...
|
||||||
|
|
||||||
|
** Top500 (1996)
|
||||||
|
#+ATTR_LATEX: :height 0.9\textheight
|
||||||
|
[[./top500_95.png]]
|
||||||
|
|
||||||
|
** Top500 (2021)
|
||||||
|
#+ATTR_LATEX: :height 0.9\textheight
|
||||||
|
[[./top500_21.png]]
|
||||||
|
|
||||||
|
https://www.top500.org/lists/top500/2021/11/
|
||||||
|
** Curie thin nodes (TGCC, France)
|
||||||
|
Ranked 9th in 2012, 77 184 cores, 1.7 PFlops, 2.1 MW
|
||||||
|
#+ATTR_LATEX: :height 0.8\textheight
|
||||||
|
[[./tgcc.jpg]]
|
||||||
|
|
||||||
|
** Mare Nostrum (BSC, Spain)
|
||||||
|
Ranked 13th in 2017, 153 216 cores, 6.5 PFlops, 1.6 MW
|
||||||
|
#+ATTR_LATEX: :height 0.8\textheight
|
||||||
|
[[./marenostrum.jpg]]
|
||||||
|
|
||||||
|
** Architecture
|
||||||
|
#+ATTR_LATEX: :height 0.9\textheight
|
||||||
|
[[./hierarchy.pdf]]
|
||||||
|
** Chassis (Front)
|
||||||
|
#+ATTR_LATEX: :height 0.9\textheight
|
||||||
|
[[./chassis.jpg]]
|
||||||
|
** Chassis (Back)
|
||||||
|
#+ATTR_LATEX: :height 0.9\textheight
|
||||||
|
[[./chassis_back.jpg]]
|
||||||
|
** Compute Node
|
||||||
|
#+ATTR_LATEX: :height 0.9\textheight
|
||||||
|
[[./blade.jpg]]
|
||||||
|
** Socket
|
||||||
|
#+ATTR_LATEX: :height 0.9\textheight
|
||||||
|
[[./socket.jpg]]
|
||||||
|
** Core
|
||||||
|
#+ATTR_LATEX: :height 0.9\textheight
|
||||||
|
[[./Nehalem.jpg]]
|
||||||
* Fundamentals of parallelization
|
* Fundamentals of parallelization
|
||||||
|
|
||||||
|
|
||||||
@ -54,7 +246,6 @@ but every problem is not parallelizable at all levels.
|
|||||||
|
|
||||||
** Data movement
|
** Data movement
|
||||||
|
|
||||||
|
|
||||||
* OpenMP
|
* OpenMP
|
||||||
|
|
||||||
* Message Passing Interface (MPI)
|
* Message Passing Interface (MPI)
|
||||||
@ -83,7 +274,6 @@ but every problem is not parallelizable at all levels.
|
|||||||
: /home/scemama/MEGA/TEX/Cours/TCCM/TCCM2022/Parallelism/parallelism_scemama.pdf
|
: /home/scemama/MEGA/TEX/Cours/TCCM/TCCM2022/Parallelism/parallelism_scemama.pdf
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
* Figures :noexport:
|
* Figures :noexport:
|
||||||
|
|
||||||
#+BEGIN_SRC dot :output file :file interfaces.png
|
#+BEGIN_SRC dot :output file :file interfaces.png
|
||||||
|
BIN
top500.png
Before Width: | Height: | Size: 70 KiB |
BIN
top500_95.png
Before Width: | Height: | Size: 47 KiB After Width: | Height: | Size: 112 KiB |
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 38 KiB |