The purpose of this document is to describe the various functionalities and outputs of the Slurm job management database.

Creation of database

This is called automatically as part of – to manually create a db you can use the below

python <path_to_run_folder> [list of realisations]
python ~/Documents/scratch/test_18p5/ test123 test_realiastion1


Updating entries in database

Uses the same path name as used to create the db, rather than the absolute path of the db.

Can only progress the status, aka must move in a linear fashion. If a step fails it should advance to failed and a new entry created.

usage: [-h] [-r RUN_NAME] [-j JOB] [-e ERROR]
                         run_folder {EMOD3D,post_EMOD3D,HF,BB,IM_calculation}

positional arguments:
  run_folder            folder to the collection of runs on Kupe

optional arguments:
  -h, --help            show this help message and exit
  -r RUN_NAME, --run_name RUN_NAME
                        name of run to be updated
  -j JOB, --job JOB – Job number on supercomputer
  -e ERROR, --error ERROR – text notes about why the run failed
python ~/Documents/scratch/test_18p5/ HF in-queue --j 3 --run_name test123
python ~/Documents/scratch/test_18p5/ HF running --j 3
python ~/Documents/scratch/test_18p5/ HF failed --j 3 --error 'Hit wall clock limit 5000'

Querying status of database

Prints the status of the collection of runs. [-h] [--error] run_folder [run_name]
positional arguments:
  run_folder   folder to the collection of runs on Kupe
  run_name     name of run to be queried
optional arguments:
  -h, --help   show this help message and exit
  --error, -e  Optionally add an error string to the database


slurm_gm_workflow/scripts/management$ python ~/Documents/scratch/test_18p5/
                 run_name |         process |     status |        last_modified
                  test123 |              BB |   in-queue |  2018-05-16 03:53:55
                  test123 |  IM_calculation |   in-queue |  2018-05-16 03:53:55
                  test123 |     post_EMOD3D |    running |  2018-05-16 04:30:01
                  test123 |          EMOD3D |  completed |  2018-05-16 03:58:15
                  test123 |              HF |     failed |  2018-05-16 22:56:41
        test_realiastion1 |          EMOD3D |    created |  2018-05-16 03:34:26
        test_realiastion1 |     post_EMOD3D |    created |  2018-05-16 03:34:26
        test_realiastion1 |              HF |    created |  2018-05-16 03:34:26
        test_realiastion1 |              BB |    created |  2018-05-16 03:34:26
        test_realiastion1 |  IM_calculation |    created |  2018-05-16 03:34:26
slurm_gm_workflow/scripts/management$ python ~/Documents/scratch/test_18p5/ test123
                 run_name |         process |     status |        last_modified
                  test123 |              BB |   in-queue |  2018-05-16 03:53:55
                  test123 |  IM_calculation |   in-queue |  2018-05-16 03:53:55
                  test123 |     post_EMOD3D |    running |  2018-05-16 04:30:01
                  test123 |          EMOD3D |  completed |  2018-05-16 03:58:15
                  test123 |              HF |     failed |  2018-05-16 22:56:41


slurm_gm_workflow/scripts/management$ python ~/Documents/scratch/test_18p5/ --error

 Run_name: test123

 Process: EMOD3D

 Status: completed

 Last_Modified: 2018-05-16 03:58:15

 Error: Demo error


 Run_name: test123

 Process: HF

 Status: failed

 Last_Modified: 2018-05-16 22:56:41

 Error: hit wall clock limit 5000


 Run_name: Kelly_HYP02-03_S1264

 Process: EMOD3D

 Status: failed

 Last_Modified: 2018-05-18 02:30:03

 Error: Task removed from squeue without completion


Inserting new tasks into database

Insert a new entry into the database with the status created for the given run_name

python ~/Documents/scratch/test_18p5/ run_name {EMOD3D,post_EMOD3D,HF,BB,IM_calculation}
python ~/Documents/scratch/test_18p5/ test123 EMOD3D

Querying Slurm

Checking the squeue to see the progress of a task.

python run_folder [poll-interval]

python ~/Documents/scratch/test_18p5/
not updating status (running) of 'post_EMOD3D' on 'test123'
not updating status (in-queue) of 'BB' on 'test123'
updating 'IM_calculation' on 'test123' to the status of 'running' from 'in-queue'
Task 'EMOD3D' on 'test_realiastion1' not found on squeue; changing status to 'failed'

python ~/Documents/scratch/test_18p5/
not updating status (running) of 'post_EMOD3D' on 'test123' (2183326)
not updating status (in-queue) of 'BB' on 'test123' (2183255)
not updating status (running) of 'IM_calculation' on 'test123' (2183303)