Slurm

To run array jobs the header or sbatch call should contain information relating to the size of the array.

The flag to use is \-a, \-\-array with the argument being the steps as a comma separated list of individual indexes and index ranges e.g. --array=1-6,10 to run for array values of 1, 2, 3, 4, 5, 6, 10.

Finally a step size can be given following a colon e.g. --array=1-15:4, which would run for 1, 5, 9, 13.

The maximum number of steps to run at once can be given following a % symbol, e.g. --array=1-15%4 would allow a maximum of 4 tasks to run at a time on the indexes of 1 through to 15.

The array index of the current job step can be accessed using $SLURM_ARRAY_TASK_ID

Other available array related variables are available below:

SLURM_ARRAY_TASK_COUNT: Total number of tasks in a job array.

SLURM_ARRAY_TASK_MAX: Job array's maximum ID (index) number.

SLURM_ARRAY_TASK_MIN: Job array's minimum ID (index) number.

SLURM_ARRAY_TASK_STEP: Job array's index step size.

SLURM_ARRAY_JOB_ID: Job array's master job ID number.

The given slurm script will be run once for each index provided, with the only changes being the value of the $SLURM_ARRAY_TASK_ID.

This can be useful in situations like performing an operation on each line in a file in parallel instead of in series.

The following script can be used for performing an action on every line of a file:

#!/usr/bin/env bash

#SBATCH --time=00:30:00
#SBATCH --array=1-10
#SBATCH --ntasks=1

suffix="q;d"
line_to_check=$SLURM_ARRAY_TASK_ID$suffix
line_data=`sed "$line_to_check" file_to_process.txt`
echo "Got line data $line_data"
python script_to_run.py line_data

In the above script 10 step will be created, each with one task/core for a wall clock time of 30mins.

Each line of the file file_to_process.txt will be passed as an argument to the script script_to_run.py

Child pages

Running HPC array jobs

Slurm