TCCM2021/parallelism_scemama.org

35 KiB

Fundamentals of parallelization

#+LaTeX_CLASS_OPTIONS:[aspectratio=169]

Inter-process communication

Processes vs threads

Process

  • Has its own memory address space
  • Context switching between processes is slow
  • Processes interact only through system-provided communication mechanisms
  • Fork: creates a copy of the current process
  • Exec: switches to running another binary executable
  • Spawn: Fork, then exec the child

Thread

  • Exist as subsets of a process
  • Context switching between threads is fast
  • Share the same memory address space : interact via shared memory

Inter-process communication

  • Processes exchange data via read/write on File descriptors
  • These file descriptors can point to:

    • Files on disk: Simplest choice
    • Named Pipes: Same program as with files
    • Pipes: Same behavior as files
    • Network sockets

Named pipes

A named pipe is a virtual file which is read by a process and written by other processes. It allows processes to communicate using standard I/O operations:

/scemama/TCCM2021/media/commit/69ca8512da61f6fa1ddef637c17321ddd1763cfd/pipe.png

Example

/scemama/TCCM2021/media/commit/69ca8512da61f6fa1ddef637c17321ddd1763cfd/pipe_example.png

  • Unzip input.gz
  • Sort the unzipped file
  • Zip the result into output.gz

Same as

$ gunzip --to-stdout input.gz \
 | sort \
 | gzip > output.gz

Example: Program 1

#!/bin/bash

# Create two pipes using the mkfifo command
mkfifo /tmp/pipe /tmp/pipe2

# Unzip the input file and write the result
# in the 1st pipe
echo "Run gunzip"
gunzip --to-stdout input.gz > /tmp/pipe

# Zip what comes from the second pipe
echo "Run gzip"
gzip < /tmp/pipe2 > output.gz

# Clear the pipes in the filesystem
rm /tmp/pipe /tmp/pipe2
#!/bin/bash

# Read the 1st pipe, sort the result and write
# in the 2nd pipe
echo "Run sort"
sort < /tmp/pipe > /tmp/pipe2
$ ./p1.sh &
Run gunzip
$ ./p2.sh
Run sort
Run gzip
[1]+ Done
./p1.sh

Pipe

Fork : Copy the current process

Pipes don't have an entry on the file system, and are opened/closed in the programs.

  1. Create the pipe
  2. Fork in parent/child processes
  3. Exchange data between parent and child

/scemama/TCCM2021/media/commit/69ca8512da61f6fa1ddef637c17321ddd1763cfd/fork.png

Pipe

#!/usr/bin/env python
import sys,os

def main():
print("Process ID: %d" % (os.getpid()))
r, w = os.pipe()
new_pid = os.fork()

if new_pid != 0:  # This is the parent process
   print("I am the parent, my PID is %d"%(os.getpid()))
   print("and the PID of my child is %d"%(new_pid))

   # Close write and open read file descriptors
   os.close(w)
   r = os.fdopen(r,'r')

   # Read data from the child
   print("Reading from the child")
   s = r.read()
   r.close()
   print("Read '%s' from the child"%(s))

/scemama/TCCM2021/media/commit/69ca8512da61f6fa1ddef637c17321ddd1763cfd/fork.png

Pipe

else:   # This is the child process
   print("  I am the child, my PID is %d"%(os.getpid()))

   # Close read and open write file descriptors
   os.close(r)
   w = os.fdopen(w,'w')

   # Send 'Hello' to the parent
   print("  Sending 'Hello' to the parent")
   w.write( "Hello!" )
   w.close()

   print("  Sent 'Hello'")

if __name__ == "__main__":
main()

/scemama/TCCM2021/media/commit/69ca8512da61f6fa1ddef637c17321ddd1763cfd/fork.png

Computation of $\pi$

/scemama/TCCM2021/media/commit/69ca8512da61f6fa1ddef637c17321ddd1763cfd/pi.png

  • The surface of the circle is $\pi r^2$ $\Longrightarrow$ For a unit circle, the surface is $\pi$
  • The function in the red square is $y = \sqrt{1-x^2}$ (the circle is $\sqrt{x^2 + y^2} = 1$)
  • The surface in grey corresponds to \[ \int_0^1 \sqrt{1-x^2} dx = \frac{\pi}{4} \]

Monte Carlo computation of $\pi$

/scemama/TCCM2021/media/commit/69ca8512da61f6fa1ddef637c17321ddd1763cfd/pi_mc.png

  • Points $(x,y)$ are drawn randomly in the unit square
  • Count how many times the points are inside the circle \[ \frac{N_{\text{in}}}{N_{\text{in}}+N_{\text{out}}} = \frac{\pi}{4} \]

Optimal algorithm

  • Each core $1 \le i \le M$ computes its own average $X_i$
  • All $M$ results are independent $\Longrightarrow$ Gaussian-distributed random variables (central-limit theorem)

/scemama/TCCM2021/media/commit/69ca8512da61f6fa1ddef637c17321ddd1763cfd/pi_convergence.png

Computation of $\pi$ with pipes in Python

import os, sys
from random import random, seed
from math import sqrt

NMAX = 10000000            # Nb of MC steps/process
error_threshold = 1.0e-4   # Stopping criterion 
NPROC=4                    # Use 4 processes

def compute_pi():
"""Local Monte Carlo calculation of pi"""
seed(None)  # Initialize random number generator

result = 0.
for i in range(NMAX):         # Loop NMAX times
   x,y = random(), random()   # Draw 2 random numbers x and y
   if x*x + y*y <= 1.:        # Check if (x,y) is in the circle
      result += 1
return 4.* float(result)/float(NMAX) # Estimate of pi

Computation of $\pi$ with pipes in Python

def main():
r = [None]*NPROC     # Reading edges of the pipes
pid = [None]*NPROC   # Running processes

for i in range(NPROC):
   r[i], w = os.pipe()    # Create the pipe
   pid[i] = os.fork()     # Fork and save the PIDs
   
   if pid[i] != 0:        # This is the parent process
      os.close(w)
      r[i] = os.fdopen(r[i],'r')
   else:                   # This is the child process
      os.close(r[i])
      w = os.fdopen(w,'w')
      while True:         # Compute pi on this process
         X = compute_pi()
         try:
            w.write("%f\n"%(X))  # Write the result in the pipe
            w.flush()
         except IOError:         # Child process exits here
            sys.exit(0)

Computation of $\pi$ with pipes in Python

data = []
while True:
    for i in range(NPROC): # Read in the pipe of each process
        data.append( float(r[i].readline()) )
        N = len(data)
        average = sum(data)/N             # Compute average
        if N > 2:                         # Compute variance
            l = [ (x-average)*(x-average) for x in data ]
            variance = sum(l)/(N-1.)
        else:
            variance = 0.
        error = sqrt(variance)/sqrt(N)    # Compute error
        print(f"%{average} +/- %{error}  %{N}")

        if N > 2 and error < error_threshold:  # Stopping condition
            for i in range(NPROC):             # Kill all children
                try: os.kill(pid[i],9)
                except: pass
            sys.exit(0)

if __name__ == '__main__':  main()

Message Passing Interface (MPI)

OpenMP

Exercises

Monte Carlo

  1. Write a Fortran
    double precision function compute_pi(M)
    that computes $\pi$ with the Monte Carlo algorithm using $M$ samples
  2. Call it like this:

    program pi_mc
    implicit none
    integer          :: M
    logical          :: iterate
    double precision :: sample
    double precision, external   :: compute_pi
    
    call random_seed()  ! Initialize random number generator
    read (*,*) M        ! Read number of samples in compute_pi
    
    iterate = .True.
    do while (iterate)  ! Compute pi over N samples until 'iterate=.False.'
    sample = compute_pi(M)
    write(*,*) sample
    read (*,*) iterate
    end do
    end program pi_mc

Monte Carlo

  1. Write a Fortran

    double precision function compute_pi(M)
    that computes $\pi$ with the Monte Carlo algorithm using $M$ samples

    program pi_mc
    implicit none
    integer          :: M
    logical          :: iterate
    double precision :: sample
    double precision, external   :: compute_pi
    
    call random_seed()  ! Initialize random number generator
    read (*,*) M        ! Read number of samples in compute_pi
    
    iterate = .True.
    do while (iterate)  ! Compute pi over N samples until 'iterate=.False.'
    sample = compute_pi(M)
    write(*,*) sample
    read (*,*) iterate
    end do
    end program pi_mc

Monte Carlo (solution)

double precision function compute_pi(M)
implicit none
integer, intent(in) :: M
double precision    :: x, y, n_in
integer             :: i

n_in = 0.d0
do i=1, M
  call random_number(x)
  call random_number(y)
  if (x*x + y*y <= 1.d0) then
     n_in = n_in+1.d0
  end if
end do
compute_pi = 4.d0*n_in/dble(nmax)

end function compute_pi