Parallel job execution¶
-
class
parallel.
DefaultContext
(tmp_path, job_nr)¶ Does nothing special.
-
class
parallel.
ProfilingContext
(tmp_path, job_nr)¶ Profiles the running jobs and stores the profiles in the temporary job folder in the subdirectory “profiling”.
Useful for debugging. Do not use in production.
-
class
parallel.
SGE
(tmp_directory: Optional[str] = None, memory: str = '3G', time_h: int = 100, python_executable_path: Optional[str] = None, sge_error_file: Optional[str] = None, sge_output_file: Optional[str] = None, parallel_environment=None, name='map', ram_key='h_vmem', queue=None, priority=None, num_threads: int = 1, execution_context=<class 'parallel.execution_contexts.DefaultContext'>, chunk_size=1)¶ Map a function to be executed on an SGE cluster environment. Reads a config file (if it exists) in the home directory which should look as the default in parallel.config.
The mapper reads commonly used parameters from a configuration file stord in
~/.parallel
. An example configuration file could look as follows:#~/.parallel [DIRECTORIES] TMP=/tmp [BROKER] TYPE=REDIS # can be SQLITE or REDIS [SGE] QUEUE=p.openmp PARALLEL_ENVIRONMENT=openmp PRIORITY=-500 [REDIS] HOST=127.0.0.1
- Parameters
tmp_directory (str or None) – Directory where temporary job pickle files are stored. If set to None a tmp directory is read from the ‘’~/.parallel’’ configuration file. It this file does not exist a tmp directory within the user home directory is created.
memory (str) – RAM requested by each job, e.g. “10G”.
time_h (int) – Job run time in hours.
python_executable_path (str or None) – The python interpreter which executes the jobs. If set to None, the currently executing interpreter is used as returned by
sys.executable
.sge_error_file (str or None) – File to which stderr messages from workers are stored. If set to None, a file within the tmp_directory is used.
sge_output_file (str or None) – File to which stdout messages from workers are stored. If set to None, a file within the tmp_directory is used.
parallel_environment (str) – The SGE environment. This is what is passed to the -pe option in the qsub script.
name (str) – A name for the job.
queue (str) – The SGE queue.
priority (int) –
SGE job priority. A value between -1000 and 0. Default: -500
Note that a priority of 0 automatically enables the reservation flag.
num_threads (int, default = 1) – Number of threads for each worker. This also sets the environment variable MKL_NUM_THREADS, OMP_NUM_THREADS to the sepcified number to handle jobs which use OpenMP etc. correctly.
execution_context (
DefaultContext
,ProfilingContext
,NamedPrinter
) – Any context manager can be passed here. The__enter__
method is called before evaluating the function on the cluster. The__exit__
method directly after the function run finished.chunk_size (int, default=1) –
Number of tasks executed within one job.
Warning
If
chunk_size
is larger than 1, this can have bad side effects as all the jobs within one chunk are executed within the python process.
- Returns
sge – Configured SGE mapper.
- Return type
-
map
(function, array)¶ Does what map(function, array) would do, but does it via an array job on the SGE by pickling objects, storing them in a temporary folder, submitting them to the SGE and then reading and returning the results.
- Parameters
function (callable) – The function to be mapped.
array (iterable) – The values to which the function is applied
- Returns
result_list – List of results of function application. This list can also contain
Exception
objects.- Return type
list
-
class
parallel.
Slurm
(tmp_directory: Optional[str] = None, memory: str = '3G', time_h: int = 100, name='map', python_executable_path: Optional[str] = None, slurm_error_file: Optional[str] = None, slurm_output_file: Optional[str] = None, partition=None, niceness=None, num_threads: int = 1, execution_context=<class 'parallel.execution_contexts.DefaultContext'>, chunk_size=1)¶ Map a function to be executed with Slurm workload manager. Reads a config file (if it exists) in the home directory which should look as the default in parallel.config.
The mapper reads commonly used parameters from a configuration file stord in
~/.parallel
. An example configuration file could look as follows:#~/.parallel [DIRECTORIES] TMP=/tmp [BROKER] TYPE=REDIS # can be SQLITE or REDIS [SLURM] PARTITION=p.gaba NICENESS=0 [REDIS] HOST=127.0.0.1
- Parameters
tmp_directory (str or None) – Directory where temporary job pickle files are stored. If set to None a tmp directory is read from the ‘’~/.parallel’’ configuration file. It this file does not exist a tmp directory within the user home directory is created.
memory (str) – RAM requested by each job, e.g. “10G”.
time_h (int) – Job run time in hours.
python_executable_path (str or None) – The python interpreter which executes the jobs. If set to None, the currently executing interpreter is used as returned by
sys.executable
.slurm_error_file (str or None) – File to which stderr messages from workers are stored. If set to None, a file within the tmp_directory is used.
slurm_output_file (str or None) – File to which stdout messages from workers are stored. If set to None, a file within the tmp_directory is used.
name (str) – A name for the job.
partition (str) – The Slurm partition.
niceness (int) – Slurm job niceness. Default: 0
num_threads (int, default = 1) – Number of threads for each worker. This also sets the environment variable MKL_NUM_THREADS, OMP_NUM_THREADS to the sepcified number to handle jobs which use OpenMP etc. correctly.
execution_context (
DefaultContext
,ProfilingContext
,NamedPrinter
) – Any context manager can be passed here. The__enter__
method is called before evaluating the function on the cluster. The__exit__
method directly after the function run finished.chunk_size (int, default=1) –
Number of tasks executed within one job.
Warning
If
chunk_size
is larger than 1, this can have bad side effects as all the jobs within one chunk are executed within the python process.
- Returns
slurm – Configured Slurm mapper.
- Return type
-
map
(function, array)¶ Does what map(function, array) would do, but does it via an array job on Slurm by pickling objects, storing them in a temporary folder, submitting them to the Slurm and then reading and returning the results.
- Parameters
function (callable) – The function to be mapped.
array (iterable) – The values to which the function is applied
- Returns
result_list – List of results of function application. This list can also contain
Exception
objects.- Return type
list
-
parallel.
sge_available
()¶ Makes a simple heuristic test to check if the SGE is available on the machine. It tries to execute the
qstat
command. In case it is found, it is assumed that the SGE is available.- Returns
available – Whether SGE is available or not.
- Return type
bool
-
parallel.
slurm_available
()¶ Checks if the Slurm workload manager is available.