TCCM2021/parallelism_scemama.org
2021-11-20 23:19:20 +01:00

46 KiB

Fundamentals of parallelization

#+LaTeX_CLASS_OPTIONS:[aspectratio=169]

Message Passing Interface (MPI)

Message Passing Interface

  • Application Programming Interface for inter-process communication
  • Takes advantage of HPC hardware:

    • TCP/IP: 50 $\mu \text{s}$ latency
    • Remote Direct Memory Access (RDMA): <2 $\mu \text{s}$ (low-latency network)
  • Portable
  • Each vendor has its own implementation adapted to the hardware
  • Standard in HPC
  • Initially designed for fixed number of processes:

    • No problem for the discovery of peers
    • Fast collective communications
  • Single Program Multiple Data (SPMD) paradigm

Communicators

  • Group of processes that can communicate together
  • Each process has an ID in the communicator: no need for IP adresses and port numbers
  • MPI_COMM_WORLD: Global communicator, default
  • size: number of processes in the communicator
  • rank: ID of the process in the communicator

Point-to-point communication

Python

  • Send: comm.send(data, dest, tag)
  • Receive: comm.recv(source, tag)

Fortran

  • Send: MPI_SEND(buffer, count, datatype, destination, tag, communicator, ierror)
  • Receive: MPI_RECV(buffer, count, datatype, source, tag, communicator, status, ierror)

Point-to-point communication (Python)

from mpi4py import MPI

def main():
 comm = MPI.COMM_WORLD
 rank = comm.Get_rank()
 size = comm.Get_size()

 if rank == 0:
     data = 42
     print("Before: Rank: %d    Size: %d    Data: %d"%(rank, size, data))
     comm.send(data, dest=1, tag=11)
     print("After : Rank: %d    Size: %d    Data: %d"%(rank, size, data))
 elif rank == 1:
     data = 0
     print("Before: Rank: %d    Size: %d    Data: %d"%(rank, size, data))
     data = comm.recv(source=0, tag=11)
     print("After : Rank: %d    Size: %d    Data: %d"%(rank, size, data))

if __name__ == "__main__": main()

Point-to-point communication (Python)

$ mpiexec -n 4 python mpi_rank.py 
Before: Rank: 0    Size: 4    Data: 42
Before: Rank: 1    Size: 4    Data: 0
After : Rank: 0    Size: 4    Data: 42
After : Rank: 1    Size: 4    Data: 42

In Fortran, compile using mpif90 and execute using mpiexec (or mpirun).

Point-to-point communication (Fortran)

program test_rank
   use mpi
   implicit none
   integer :: rank, size, data, ierr, status(mpi_status_size)

   call MPI_INIT(ierr)      ! Initialize library (required)
   if (ierr /= MPI_SUCCESS) then
      call MPI_ABORT(MPI_COMM_WORLD, 1, ierr)
   end if
   
   call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
   if (ierr /= MPI_SUCCESS) then
      call MPI_ABORT(MPI_COMM_WORLD, 2, ierr)
   end if

   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
   if (ierr /= MPI_SUCCESS) then
      call MPI_ABORT(MPI_COMM_WORLD, 3, ierr)
   end if

Point-to-point communication (Fortran)

 if (rank == 0) then
     data = 42
     print *, "Before: Rank:", rank, "Size:", size, "Data: ", data
     call MPI_SEND(data, 1, MPI_INTEGER, 1, 11, MPI_COMM_WORLD, ierr)
     print *, "After : Rank:", rank, "Size:", size, "Data: ", data

 else if (rank == 1) then
     data = 0
     print *, "Before: Rank:", rank, "Size:", size, "Data: ", data
     call MPI_RECV(data, 1, MPI_INTEGER, 0, 11, MPI_COMM_WORLD, &
                   status, ierr)
     print *, "After : Rank:", rank, "Size:", size, "Data: ", data

 end if
 call MPI_FINALIZE(ierr)      ! De-initialize library (required)
end program

Collective communications

One-to-all

  • Broadcast: send same data to all
  • Scatter: distribute an array

All-to-one

  • Reduction: Sum/product/… of data coming from all ranks
  • Gather: collect a distributed array

All-to-all

  • Reduction and broadcast

Deadlocks

OpenMP

Exercises

Monte Carlo

  1. Write a Fortran
    double precision function compute_pi(M)
    that computes $\pi$ with the Monte Carlo algorithm using $M$ samples
  2. Call it like this:

    program pi_mc
    implicit none
    integer          :: M
    logical          :: iterate
    double precision :: sample
    double precision, external   :: compute_pi
    
    call random_seed()  ! Initialize random number generator
    read (*,*) M        ! Read number of samples in compute_pi
    
    iterate = .True.
    do while (iterate)  ! Compute pi over N samples until 'iterate=.False.'
    sample = compute_pi(M)
    write(*,*) sample
    read (*,*) iterate
    end do
    end program pi_mc

Monte Carlo

  1. Write a Fortran

    double precision function compute_pi(M)
    that computes $\pi$ with the Monte Carlo algorithm using $M$ samples

    program pi_mc
    implicit none
    integer          :: M
    logical          :: iterate
    double precision :: sample
    double precision, external   :: compute_pi
    
    call random_seed()  ! Initialize random number generator
    read (*,*) M        ! Read number of samples in compute_pi
    
    iterate = .True.
    do while (iterate)  ! Compute pi over N samples until 'iterate=.False.'
    sample = compute_pi(M)
    write(*,*) sample
    read (*,*) iterate
    end do
    end program pi_mc

Monte Carlo (solution)

double precision function compute_pi(M)
implicit none
integer, intent(in) :: M
double precision    :: x, y, n_in
integer             :: i

n_in = 0.d0
do i=1, M
  call random_number(x)
  call random_number(y)
  if (x*x + y*y <= 1.d0) then
     n_in = n_in+1.d0
  end if
end do
compute_pi = 4.d0*n_in/dble(nmax)

end function compute_pi