mirror of
https://github.com/LCPQ/quantum_package
synced 2025-01-08 20:33:26 +01:00
Parallelism.md
parent
87f84156d2
commit
0b21534a36
61
Parallelism.md
Normal file
61
Parallelism.md
Normal file
@ -0,0 +1,61 @@
|
|||||||
|
================
|
||||||
|
Task parallelism
|
||||||
|
================
|
||||||
|
|
||||||
|
|
||||||
|
A task scheduler is implemented in OCaml and runs in the `qp_run` program.
|
||||||
|
The IRPF90 programs communicate with this scheduler to fetch new tasks to do.
|
||||||
|
The flexibility of IRPF90 enables to build micro-services to help a running
|
||||||
|
program. For example, if the computation of the AOs takes too much time, it is
|
||||||
|
possible to start on multiple compute nodes some tasks that will accelerate
|
||||||
|
the running program.
|
||||||
|
|
||||||
|
The typical scheme is the following:
|
||||||
|
|
||||||
|
1) The program (IRPF90) asks `qp_run` to create a new queue for a state of
|
||||||
|
the calculation
|
||||||
|
2) The program adds multiple tasks to do to the queue
|
||||||
|
3) The program starts a **collector** thread that waits for the results
|
||||||
|
computed by the workers
|
||||||
|
4) The program starts multiple **worker** threads that will fetch tasks to do
|
||||||
|
from the queue, compute the corresponding task, and send the result directly
|
||||||
|
to the collector. Then, the queue is informed that the task has been done
|
||||||
|
5) When the queue is empty and all workers have sent their results, the
|
||||||
|
last worker receives from `qp_run` a *control* integer, and sends it to
|
||||||
|
the collector thread
|
||||||
|
6) The collector thread checks that the control integer is correct : this can be
|
||||||
|
for instance the number of AOs to compute and the number of actually
|
||||||
|
computed AOs.
|
||||||
|
7) The parallel section is terminated
|
||||||
|
|
||||||
|
|
||||||
|
The task scheduler
|
||||||
|
------------------
|
||||||
|
|
||||||
|
The task scheduler (implemented in the `TaskServer.ml` file) understands text
|
||||||
|
messages, and transforms them in typed messages. The list of understood
|
||||||
|
messages can be found in the `of_string` function of the `Message.ml` file.
|
||||||
|
|
||||||
|
When `qp_run` starts, it tries to opens a ZeroMQ REP socket on a default port,
|
||||||
|
and tries again on different ports until a suitable port range is found.
|
||||||
|
The port is bound using both the TCP protocal and the inproc protocol to
|
||||||
|
enable both multi-threaded and distributed support.
|
||||||
|
|
||||||
|
To initiate a parallel job, a `Newjob` message has to be sent to the scheduler,
|
||||||
|
with a job name (`state`) that will be checked at every connection to the
|
||||||
|
scheduler. This will create a new `Queuing_system` instance. The
|
||||||
|
`Queuing_system` contains
|
||||||
|
|
||||||
|
* A list of tasks
|
||||||
|
* A list of connected clients (empty at the initialization)
|
||||||
|
* The subset of tasks still queued
|
||||||
|
* The subset of tasks currently running, and on which client they run
|
||||||
|
|
||||||
|
To add tasks to the system, the `AddTask` message is sent. It can be
|
||||||
|
|
||||||
|
* `add_task <state> range <i> <j> ` : Builds a list of tasks (j) where
|
||||||
|
i<l<j.
|
||||||
|
* `add_task <state> triangle <i>` : Builds a list of tasks (l,i) where
|
||||||
|
1<l<i.
|
||||||
|
* `add_task <state> <msg>` : Builds a list of tasks (msg)
|
||||||
|
|
Loading…
Reference in New Issue
Block a user