Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Checkpointing is needed for IM_calculation due to large job size and limited running time on Kupe. Therefore, we implemented checkpointing to track the current progress of an im_calculation job, and carry on from where the job was interrupted by slurm.

To run checkpointing, first git clone the IM_calculaiton repo, then check out the checkpoint branch

Code Block
$ git clone https://github.com/ucgmsim/IM_calculation.git
$ git checkout checkpoint

Open im_calc_sl.template, change the IMPATH variable (line 22) to where you have cloned the git repository

Code Block
# open template
~/IM_calculation-[checkpoint]$ vim im_calc_sl.template
# modify the $IMPATH 
export IMPATH=/home/melody.zhu/IM_calculation

Note, the checkpointing code relies on the input/output directory structure specified in the im_calc_al.template in the checkpoint branch. Failure to match the dir structure will result in runtime error. A quick fix would be modifying the template to suit your own dir structure.

...

Splitting a big slurm script into several smaller slurms is needed due to the maximum number of lines allowed in a slurm script on Kupe.

Still in the checkpointing branch, Run the generate_splitInside generate_sl.py script that uses both checkpointing and splitting. The -ml argument specifies the maximum number of lines of python call to calculate_ims.py/caculate_rrups.py. Header and footer like  '#SBATCH --time=15:30:00', 'date' etc are NOT included.

...

Code Block
python generate_split_sl.py -sobs /nesi/nobackup/nesi00213/RunFolder/Cybershake/v18p6_batched/v18p6_1k_under2p0G_ab/~/test_obs -sim runs/Runs -srf /nesi/nobackup/nesi00213/RunFolder/Cybershake/v18p6/Runs/test_srfs/_batched/v18p6_exclude_1k_batch_6/Data/Sources -ll /scale_akl_nobackup/filesets/transit/nesi00213/StationInfo/non_uniform_whole_nz_with_real_stations-hh400_v18p6.ll -np 80 -o ~/test_obs/IMCalcExample/ /nesi/nobackup/nesi00213/RunFolder/Cybershake/v18p6/test_check_point/ rrup_out -ml 100 -e -s 

Output:

To submit the slurm script:

...