Skip to main content

Python multiprocessing

Most likely the computer you are reading this has multiple processor cores. So far the python programs we have written, runs on a single processor. Often part of our code can be distributed in parallel to multiple processor and could be performed simultaneously. Here we start with an example. First, with usual serial processing:

import time

def sleep(sec):
time.sleep(sec)
return sec

t0 = time.perf_counter()

for ii in range(5):
print(sleep(ii))

t_final = time.perf_counter()
print("The program took", t_final - t0, "second(s).")

With multiprocessing (I am allocating two cores for below examples):

import concurrent.futures

t0 = time.perf_counter()

with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(sleep, ii) for ii in range(5)]

for f in concurrent.futures.as_completed(results):
print(f.result())

t_final = time.perf_counter()
print("The program took", t_final - t0, "second(s).")

We can see the performance is almost doubled while using two cores in parallel. We could collect the output in a list, or convert to an array as well:

with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(sleep, ii) for ii in range(5)]

result_list = []
for f in concurrent.futures.as_completed(results):
result_list.append(f.result())

# convert to array
result_array = np.array(result_list)
# or
result_array = np.vstack(result_list)

There is another alternative way of doing the multiprocessing:

import multiprocessing

def sleep(sec, return_dict):
time.sleep(sec)
return_dict[sec] = sec

t0 = time.perf_counter()

if __name__ == "__main__":
manager = multiprocessing.Manager()
return_dict = manager.dict()
jobs = []
for i in range(5):
p = multiprocessing.Process(target=sleep, args=(i, return_dict))
jobs.append(p)
p.start()

for proc in jobs:
proc.join()

for key in range(5):
print(return_dict[key])

t_final = time.perf_counter()
print("The program took", t_final - t0, "second(s).")
info

We must have unique keys, if we want to collect the results in order. In some cases, that might not be necessary, e.g., batch processing of images.

MPI for Python

Installation:

pip install mpi4py

A python code can be run by:

mpirun -np 4 python3 code.py

MPI Examples

import sys
from mpi4py import MPI

comm = MPI.COMM_WORLD
id = comm.Get_rank()
p = comm.Get_size()

if (id == 0):
print("There are ", p, "MPI processes running.")

Resources