The purpose of this document is to describe the various functionalities and outputs of the Slurm job management database.
This is called automatically as part of install.sh – to manually create a db you can use the below
python create_mgmt_db.py <path_to_run_folder> [list of realisations]
e.g.
python create_mgmt_db.py ~/Documents/scratch/test_18p5/ test123 test_realiastion1
Uses the same path name as used to create the db, rather than the absolute path of the db.
Can only progress the status, aka must move in a linear fashion. If a step fails it should advance to failed and a new entry created.
usage: update_mgmt_db.py [-h] [-r RUN_NAME] [-j JOB] [-e ERROR]
run_folder {EMOD3D,post_EMOD3D,HF,BB,IM_calculation}
{created,in-queue,running,completed,failed}
positional arguments:
run_folder folder to the collection of runs on Kupe
{EMOD3D,post_EMOD3D,HF,BB,IM_calculation}
{created,in-queue,running,completed,failed}
optional arguments:
-h, --help show this help message and exit
-r RUN_NAME, --run_name RUN_NAME
name of run to be updated
-j JOB, --job JOB – Job number on supercomputer
-e ERROR, --error ERROR – text notes about why the run failed
e.g.
python update_mgmt_db.py ~/Documents/scratch/test_18p5/ HF in-queue --j 3 --run_name test123
python update_mgmt_db.py ~/Documents/scratch/test_18p5/ HF running --j 3
python update_mgmt_db.py ~/Documents/scratch/test_18p5/ HF failed --j 3 --error 'Hit wall clock limit 5000'
e.g.
slurm_gm_workflow/scripts/management$ python query_mgmt_db.py ~/Documents/scratch/test_18p5/ run_name | process | status | last_modified _______________________________________________________________________________ test123 | BB | in-queue | 2018-05-16 03:53:55 test123 | IM_calculation | in-queue | 2018-05-16 03:53:55 test123 | post_EMOD3D | running | 2018-05-16 04:30:01 test123 | EMOD3D | completed | 2018-05-16 03:58:15 test123 | HF | failed | 2018-05-16 22:56:41 test_realiastion1 | EMOD3D | created | 2018-05-16 03:34:26 test_realiastion1 | post_EMOD3D | created | 2018-05-16 03:34:26 test_realiastion1 | HF | created | 2018-05-16 03:34:26 test_realiastion1 | BB | created | 2018-05-16 03:34:26 test_realiastion1 | IM_calculation | created | 2018-05-16 03:34:26 |
slurm_gm_workflow/scripts/management$ python query_mgmt_db.py ~/Documents/scratch/test_18p5/ test123 run_name | process | status | last_modified _______________________________________________________________________________ test123 | BB | in-queue | 2018-05-16 03:53:55 test123 | IM_calculation | in-queue | 2018-05-16 03:53:55 test123 | post_EMOD3D | running | 2018-05-16 04:30:01 test123 | EMOD3D | completed | 2018-05-16 03:58:15 test123 | HF | failed | 2018-05-16 22:56:41 |
slurm_gm_workflow/scripts/management$ python query_mgmt_db.py ~/Documents/scratch/test_18p5/ --error
Run_name: test123
Process: EMOD3D
Status: completed
Last_Modified: 2018-05-16 03:58:15
Error: Demo error
Run_name: test123
Process: HF
Status: failed
Last_Modified: 2018-05-16 22:56:41
Error: hit wall clock limit 5000
Insert a new entry into the database with the status created for the given run_name
python insert_mgmt_db.py ~/Documents/scratch/test_18p5/ run_name {EMOD3D,post_EMOD3D,HF,BB,IM_calculation}
e.g.
python insert_mgmt_db.py ~/Documents/scratch/test_18p5/ test123 EMOD3D