Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is called automatically as part of install.sh – to manually create a db you can use the below

 

...

Code Block
python create_mgmt_db.py <path_to_run_folder> [list of realisations]

...


e.g.

...


python create_mgmt_db.py ~/Documents/scratch/test_18p5/ test123 test_realiastion1

 

Updating entries in database

...

Can only progress the status, aka must move in a linear fashion. If a step fails it should advance to failed and a new entry created.

Code Block
usage: update_mgmt_db.py [-h] [-r RUN_NAME] [-j JOB] [-e

...

 ERROR]
                         run_folder {EMOD3D,post_EMOD3D,HF,BB,IM_calculation}

...


                         {created,in-queue,running,completed,failed}

...

 

positional arguments:

  run_folder            folder to the collection of runs on Kupe

...




positional arguments:
  run_folder            folder to the collection of runs on Kupe
  {EMOD3D,post_EMOD3D,HF,BB,IM_calculation}

...


  {created,in-queue,running,completed,failed}

...

 

optional arguments:

...




optional arguments:
  -h, --

...

help            show this help message and exit
  -r RUN_NAME, --run_name RUN_NAME

...

                        name of run to be updated

  -j JOB, --job JOB – Job number on supercomputer

  -e ERROR, --error ERROR – text notes about why the run failed

e.g.

...


                        name of run to be updated
  -j JOB, --job JOB – Job number on supercomputer
  -e ERROR, --error ERROR – text notes about why the run failed
e.g.
python update_mgmt_db.py ~/Documents/scratch/test_18p5/ HF in-queue --j 3 --run_name test123

...


python update_mgmt_db.py ~/Documents/scratch/test_18p5/ HF running --j 3

...


python update_mgmt_db.py ~/Documents/scratch/test_18p5/ HF failed --j 3 --error 'Hit wall clock limit 5000'

Querying status of database

Prints the status of the collection of runs.

 e.g.

Code Block
languagebash
query_mgmt_db.py [-h] [--error] run_folder [run_name]
positional arguments:
  run_folder   folder to the collection of runs on Kupe
  run_name     name of run to be queried
optional arguments:
  -h, --help   show this help message and exit
  --error, -e  Optionally add an error string to the database

e.g.

slurm_gm_workflow/scripts/management$ python query_mgmt_db.py ~/Documents/scratch/test_18p5/
                 run_name |         process |     status |        last_modified
_______________________________________________________________________________
                  test123 |              BB |   in-queue |  2018-05-16 03:53:55
                  test123 |  IM_calculation |   in-queue |  2018-05-16 03:53:55
                  test123 |     post_EMOD3D |    running |  2018-05-16 04:30:01
                  test123 |          EMOD3D |  completed |  2018-05-16 03:58:15
                  test123 |              HF |     failed |  2018-05-16 22:56:41
        test_realiastion1 |          EMOD3D |    created |  2018-05-16 03:34:26
        test_realiastion1 |     post_EMOD3D |    created |  2018-05-16 03:34:26
        test_realiastion1 |              HF |    created |  2018-05-16 03:34:26
        test_realiastion1 |              BB |    created |  2018-05-16 03:34:26
        test_realiastion1 |  IM_calculation |    created |  2018-05-16 03:34:26

...

 Error: hit wall clock limit 5000

 

 Run_name: Kelly_HYP02-03_S1264

 Process: EMOD3D

 Status: failed

 Last_Modified: 2018-05-18 02:30:03

 Error: Task removed from squeue without completion

 

Inserting new tasks into database

Insert a new entry into the database with the status created for the given run_name

Code Block
python insert_mgmt_db.py ~/Documents/scratch/test_18p5/ run_name {EMOD3D,post_EMOD3D,HF,BB,IM_calculation}

...


e.g.

...


python insert_mgmt_db.py ~/Documents/scratch/test_18p5/ test123 EMOD3D

...

Querying Slurm

Checking the squeue to see the progress of a task.

Code Block
python slurm_query_status.py run_folder [poll-interval]
e.g.

python slurm_query_status.py ~/Documents/scratch/test_18p5/
not updating status (running) of 'post_EMOD3D' on 'test123'
not updating status (in-queue) of 'BB' on 'test123'
updating 'IM_calculation' on 'test123' to the status of 'running' from 'in-queue'
Task 'EMOD3D' on 'test_realiastion1' not found on squeue; changing status to 'failed'

python slurm_query_status.py ~/Documents/scratch/test_18p5/
not updating status (running) of 'post_EMOD3D' on 'test123' (2183326)
not updating status (in-queue) of 'BB' on 'test123' (2183255)
not updating status (running) of 'IM_calculation' on 'test123' (2183303)

 

...