Parallel backends¶
Two options are available for making use of multiple CPU cores. The first runs multiple trials in parallel with joblib. Alternatively, you can run each trial across multiple cores to reduce the runtime.
Joblib¶
This is the default backend and will execute multiple trials at the same time, with each trial running on a separate core in “embarrassingly parallel” execution. Note that with only 1 trial, there will be no parallelism.
Dependencies:
$ pip install joblib
Usage:
from hnn_core import JoblibBackend
# set n_jobs to the number of trials to run in parallel with Joblib (up to number of cores on system)
with JoblibBackend(n_jobs=2):
dpls = simulate_dipole(net, n_trials=2)
MPI¶
This backend will use MPI (Message Passing Interface) on the system to split neurons across CPU cores (processors) and reduce the simulation time as more cores are used.
Linux Dependencies:
$ sudo apt-get install libopenmpi-dev openmpi-bin
$ pip install mpi4py psutil
MacOS Dependencies:
$ conda install -y openmpi mpi4py
$ pip install psutil
MacOS Environment:
$ export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib
Alternatively, run the commands below will avoid needing to run the export command every time a new shell is opened:
$ cd ${CONDA_PREFIX}
$ mkdir -p etc/conda/activate.d etc/conda/deactivate.d
$ echo "export OLD_LD_LIBRARY_PATH=\$LD_LIBRARY_PATH" >> etc/conda/activate.d/env_vars.sh
$ echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\${CONDA_PREFIX}/lib" >> etc/conda/activate.d/env_vars.sh
$ echo "export LD_LIBRARY_PATH=\$OLD_LD_LIBRARY_PATH" >> etc/conda/deactivate.d/env_vars.sh
$ echo "unset OLD_LD_LIBRARY_PATH" >> etc/conda/deactivate.d/env_vars.sh
Test MPI:
$ mpiexec -np 2 nrniv -mpi -python -c 'from neuron import h; from mpi4py import MPI; \
print("Hello from proc %d" % MPI.COMM_WORLD.Get_rank()); \
h.quit()'
numprocs=2
NEURON -- VERSION 7.7.2 7.7 (2b7985ba) 2019-06-20
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2018
See http://neuron.yale.edu/neuron/credits
Hello from proc 0
Hello from proc 1
Verifies that MPI, NEURON, and Python are all working together.
Usage:
from hnn_core import MPIBackend
# set n_procs to the number of processors MPI can use (up to number of cores on system)
with MPIBackend(n_procs=2):
dpls = simulate_dipole(net, n_trials=1)
Notes for contributors:
MPI parallelization with NEURON requires that the simulation be launched with the nrniv binary from the command-line. The mpiexec command is used to launch multiple nrniv processes which communicate via MPI. This is done using subprocess.Popen() in MPIBackend.simulate() to launch parallel child processes (MPISimulation) to carry out the simulation. The communication sequence between MPIBackend and MPISimulation is outlined below.
In order to pass the parameters from
MPIBackendthe childMPISimulationprocesses’stdinis used. Parameters are pickled and base64 encoded before being written to the processes’stdin. Closingstdin(from theMPIBackendside) signals to the child processes that it is done sending parameters and the parallel simulation should begin.Output from the simulation (either to
stdoutorstderr) is communicated back toMPIBackend, where it will be printed to the console. Typical output at this point would be simulation progress messages as well as any MPI warnings/errors during the simulation.Once the simulation has completed, the child process with rank 0 (in
MPISimulation.run()) sends a signal toMPIBackendthat the simulation has completed and simulation data will be written tostderr. The data is pickled and base64 encoded before it is written tostderrinMPISimulation._write_data_stderr(). No other output (e.g. raised exceptions) can go tostderrduring this step.At this point, the child process with rank 0 (the only rank with complete simulation results) will send another signal that includes the expected length of the pickled and encoded data (in bytes) to
stderrfollowing the data written in the previous step.MPIBackendwill use this signal to know that data transfer has completed and it will verify the length of data it receives, printing aUserWarningif the lengths don’t match.
It is important that MPISimulation uses the flush() method after each signal to ensure that the signal will immediately be available for reading by MPIBackend and not buffered with other output.
Tests for parallel backends utilize a special @pytest.mark.incremental decorator (defined in conftest.py) that causes a test failure to skip subsequent tests in the incremental block. For example, if a test running a simple MPI simulation fails, subsequent tests that compare simulation output between different backends will be skipped. These types of failures will be marked as a failure in CI.