diff --git a/Parallelism.md b/Parallelism.md index ced5f4a..304f6c3 100644 --- a/Parallelism.md +++ b/Parallelism.md @@ -40,19 +40,68 @@ enable both multi-threaded and distributed support. To initiate a parallel job, a `Newjob` message has to be sent to the scheduler, with a job name (`state`) that will be checked at every connection to the -scheduler. This will create a new `Queuing_system` instance. The -`Queuing_system` contains +scheduler. + +`new_job ` + +The collector thread needs to opens a `PULL` socket bound both using the TCP +and the inproc protocols, and those endpoints need to be given together with +the `Newjob` message. + +Now, a new `Queuing_system` instance is created. The `Queuing_system` contains * A list of tasks * A list of connected clients (empty at the initialization) * The subset of tasks still queued * The subset of tasks currently running, and on which client they run -To add tasks to the system, the `AddTask` message is sent. It can be +The format of tasks is up to the user : it is just a string. +To add new tasks to the system, the `AddTask` message is sent: +* `add_task ` : Add a unique task to the `Queuing_system` * `add_task range ` : Builds a list of tasks (j) where i triangle ` : Builds a list of tasks (l,i) where 1 ` : Builds a list of tasks (msg) + + +When workers connect to the `Queuing_system` with a `Connect` message, they +obtain as a reply the address of the `PULL` socket of the collector, +as well as a new Client ID : + +`connect (tcp|inproc)` + +Now the Workers connect a `PUSH` socket to the `PULL` endpoint of the +collector (TCP or inproc). +Workers are now ready to fetch new tasks using `GetTask` messages: + +`get_task ` + +and the `Queuing_system` now knows which tasks runs on which client. +The reply is the task, as a string, and the corresponding Task ID. + +When the task is done, the worker pushes the results to the collector, +and sends to `qp_run` a `TaskDone` message with the corresponding Task ID, +and its contribution to the control integer which will be accumulated in the +`Queuing_system` instance: + +`task_done ` + + +If the queue is empty, the reply to the GetTask` message is a `Terminate` message +which informs the workers to terminate. + +When a worker terminates, it sends a `Disconnect` message to the scheduler: + +`disconnect ` + +If there are remaining running clients, the reply is `0`. For the last client, +the reply contains the control integer, and this allows the worker to inform the +collector that all tasks are done and all workers are disconnected. + +Once the collector thread has finished to pull all the data, it can terminate. + +Now, the main thread can send an `End_job` message to the scheduler to inform +it that the parallel task is done. +