TCCM2021/parallelism_scemama.org
2021-11-21 17:29:19 +01:00

57 KiB
Raw Blame History

Fundamentals of parallelization

#+LaTeX_CLASS_OPTIONS:[aspectratio=169]

Multi-threading

Processes vs threads

Process

  • Has its own memory address space
  • Context switching between processes is slow
  • Processes interact only through system-provided communication mechanisms
  • Fork: creates a copy of the current process
  • Exec: switches to running another binary executable
  • Spawn: Fork, then exec the child

Thread

  • Exist as subsets of a process
  • Context switching between threads is fast
  • Share the same memory address space : interact via shared memory

Threads

/scemama/TCCM2021/media/commit/a2c3f22f1d582e6391cd8023180a8a2bd195079b/smp.png

  • Concurrent programming
  • Graphical user interfaces (progress bars, …)
  • Asynchronous I/O
  • Standard library: POSIX threads (pthreads)

Communication time

  • Low latency network latency : 1.2 microsecond
  • Random memory access : 0.1 microsecond

Threads example (Python)

#!/usr/bin/env python
import threading
import time

class test:
 def __init__(self, Nthreads):
     self.Nthreads = Nthreads
     self.data = [ i for i in range(Nthreads) ]

 def run_thread(self, j):
     self.data[j] = 0
     time.sleep(j)
     self.data[j] = j

Threads example (Python)

 def run(self):
     thread = [ None ] * self.Nthreads
     t0 = time.time()
     print(self.data)
     for i in range(self.Nthreads):
         thread[i] = threading.Thread( target=self.run_thread, args=(i,)  )
         thread[i].start()
     for i in range(self.Nthreads):
         thread[i].join()
         print(time.time()-t0, "seconds. ", self.data)

if __name__ == '__main__':
t = test(4)
t.run()

$ python thread_python.py 
[0, 1, 2, 3]
0.0009775161743164062 seconds.  [0, 0, 0, 0]
1.0018701553344727 seconds.  [0, 1, 0, 0]
2.003377676010132 seconds.  [0, 1, 2, 0]
3.004056930541992 seconds.  [0, 1, 2, 3]

Computation of π with threads in Python

#!/usr/bin/env python
import os, sys, threading
from random import random, seed
from math import sqrt

NMAX = 10000000            # Nb of MC steps/process
error_threshold = 1.0e-4   # Stopping criterion 

class pi_calculator:
 def __init__(self, Nthreads):
     self.Nthreads= Nthreads
     self.results = []
     self.lock = threading.Lock()

 def compute_pi(self):
     result = 0.
     for i in range(NMAX):         # Loop NMAX times
         x,y = random(), random()   # Draw 2 random numbers x and y
         if x*x + y*y <= 1.:        # Check if (x,y) is in the circle
             result += 1
     with self.lock:
         self.results.append(4.* float(result)/float(NMAX))

Computation of π with threads in Python

 def run(self):
    thread = [None] * self.Nthreads
    for i in range(self.Nthreads):
       thread[i] = threading.Thread( target=self.compute_pi, args=()  )
       thread[i].start()
    print("All threads started")
    
    while True:
       for i in range(self.Nthreads):
          thread[i].join()
       N = len(self.results)
       average = sum(self.results)/N             # Compute average
       if N > 2:                         # Compute variance
          l = [ (x-average)*(x-average) for x in self.results ]
          variance = sum(l)/(N-1.)
       else:
          variance = 0.
       error = sqrt(variance)/sqrt(N)    # Compute error
       print("%f +/- %f %d"%(average, error, N))

Computation of π with threads in Python

       if N > 2 and error < error_threshold:  # Stopping condition
          return

       for i in range(self.Nthreads):
          thread[i] = threading.Thread( target=self.compute_pi, args=()  )
          thread[i].start()

if __name__ == '__main__':
calc = pi_calculator(4)
calc.run()

Note: Inefficient in Python because of the Global Interpreter Lock (GIL), but you got the idea.

OpenMP

Exercises

Monte Carlo

  1. Write a Fortran
    double precision function compute_pi(M)
    that computes $\pi$ with the Monte Carlo algorithm using $M$ samples
  2. Call it like this:

    program pi_mc
    implicit none
    integer          :: M
    logical          :: iterate
    double precision :: sample
    double precision, external   :: compute_pi
    
    call random_seed()  ! Initialize random number generator
    read (*,*) M        ! Read number of samples in compute_pi
    
    iterate = .True.
    do while (iterate)  ! Compute pi over N samples until 'iterate=.False.'
    sample = compute_pi(M)
    write(*,*) sample
    read (*,*) iterate
    end do
    end program pi_mc

Monte Carlo

  1. Write a Fortran

    double precision function compute_pi(M)
    that computes $\pi$ with the Monte Carlo algorithm using $M$ samples

    program pi_mc
    implicit none
    integer          :: M
    logical          :: iterate
    double precision :: sample
    double precision, external   :: compute_pi
    
    call random_seed()  ! Initialize random number generator
    read (*,*) M        ! Read number of samples in compute_pi
    
    iterate = .True.
    do while (iterate)  ! Compute pi over N samples until 'iterate=.False.'
    sample = compute_pi(M)
    write(*,*) sample
    read (*,*) iterate
    end do
    end program pi_mc

Monte Carlo (solution)

double precision function compute_pi(M)
implicit none
integer, intent(in) :: M
double precision    :: x, y, n_in
integer             :: i

n_in = 0.d0
do i=1, M
  call random_number(x)
  call random_number(y)
  if (x*x + y*y <= 1.d0) then
     n_in = n_in+1.d0
  end if
end do
compute_pi = 4.d0*n_in/dble(nmax)

end function compute_pi