...
This is called automatically as part of install.sh – to manually create a db you can use the below
...
Code Block |
---|
python create_mgmt_db.py <path_to_run_folder> [list of realisations] |
...
e.g. |
...
python create_mgmt_db.py ~/Documents/scratch/test_18p5/ test123 test_realiastion1 |
Updating entries in database
...
Can only progress the status, aka must move in a linear fashion. If a step fails it should advance to failed and a new entry created.
Code Block |
---|
usage: update_mgmt_db.py [-h] [-r RUN_NAME] [-j JOB] [-e |
...
ERROR] run_folder {EMOD3D,post_EMOD3D,HF,BB,IM_calculation} |
...
{created,in-queue,running,completed,failed} |
...
positional arguments:
run_folder folder to the collection of runs on Kupe
...
positional arguments: run_folder folder to the collection of runs on Kupe {EMOD3D,post_EMOD3D,HF,BB,IM_calculation} |
...
{created,in-queue,running,completed,failed} |
...
optional arguments:
...
optional arguments: -h, -- |
...
help show this help message and exit -r RUN_NAME, --run_name RUN_NAME |
...
name of run to be updated
-j JOB, --job JOB – Job number on supercomputer
-e ERROR, --error ERROR – text notes about why the run failed
e.g.
...
name of run to be updated -j JOB, --job JOB – Job number on supercomputer -e ERROR, --error ERROR – text notes about why the run failed e.g. python update_mgmt_db.py ~/Documents/scratch/test_18p5/ HF in-queue --j 3 --run_name test123 |
...
python update_mgmt_db.py ~/Documents/scratch/test_18p5/ HF running --j 3 |
...
python update_mgmt_db.py ~/Documents/scratch/test_18p5/ HF failed --j 3 --error 'Hit wall clock limit 5000' |
Querying status of database
Prints the status of the collection of runs.
e.g.
Code Block | ||
---|---|---|
| ||
query_mgmt_db.py [-h] [--error] run_folder [run_name]
positional arguments:
run_folder folder to the collection of runs on Kupe
run_name name of run to be queried
optional arguments:
-h, --help show this help message and exit
--error, -e Optionally add an error string to the database
e.g.
slurm_gm_workflow/scripts/management$ python query_mgmt_db.py ~/Documents/scratch/test_18p5/
run_name | process | status | last_modified
_______________________________________________________________________________
test123 | BB | in-queue | 2018-05-16 03:53:55
test123 | IM_calculation | in-queue | 2018-05-16 03:53:55
test123 | post_EMOD3D | running | 2018-05-16 04:30:01
test123 | EMOD3D | completed | 2018-05-16 03:58:15
test123 | HF | failed | 2018-05-16 22:56:41
test_realiastion1 | EMOD3D | created | 2018-05-16 03:34:26
test_realiastion1 | post_EMOD3D | created | 2018-05-16 03:34:26
test_realiastion1 | HF | created | 2018-05-16 03:34:26
test_realiastion1 | BB | created | 2018-05-16 03:34:26
test_realiastion1 | IM_calculation | created | 2018-05-16 03:34:26 |
...
Error: hit wall clock limit 5000
Run_name: Kelly_HYP02-03_S1264
Process: EMOD3D
Status: failed
Last_Modified: 2018-05-18 02:30:03
Error: Task removed from squeue without completion
Inserting new tasks into database
Insert a new entry into the database with the status created for the given run_name
Code Block |
---|
python insert_mgmt_db.py ~/Documents/scratch/test_18p5/ run_name {EMOD3D,post_EMOD3D,HF,BB,IM_calculation} |
...
e.g. |
...
python insert_mgmt_db.py ~/Documents/scratch/test_18p5/ test123 EMOD3D |
...
Querying Slurm
Checking the squeue to see the progress of a task.
Code Block |
---|
python slurm_query_status.py run_folder [poll-interval]
e.g.
python slurm_query_status.py ~/Documents/scratch/test_18p5/
not updating status (running) of 'post_EMOD3D' on 'test123'
not updating status (in-queue) of 'BB' on 'test123'
updating 'IM_calculation' on 'test123' to the status of 'running' from 'in-queue'
Task 'EMOD3D' on 'test_realiastion1' not found on squeue; changing status to 'failed'
python slurm_query_status.py ~/Documents/scratch/test_18p5/
not updating status (running) of 'post_EMOD3D' on 'test123' (2183326)
not updating status (in-queue) of 'BB' on 'test123' (2183255)
not updating status (running) of 'IM_calculation' on 'test123' (2183303) |
...