make sure the job submitted has already finished by using llq
(the new ll script appends the rup_model name to the job name, so using a specific command will be able to test if a specific job is still on load-level queue or not.
```
To show all jobs with job name belong to user 'ykh22'
```
Code Block
llq -l -u ykh22 | grep 'Job Name:'
pipe it to grep to determine if a job is completed.
lets say we are looking for AlpineF2K_HYP06-21_S1404
Code Block
llq -l -u ykh22 | grep 'Job Name: postprocess' | grep 'AlpineF2K_HYP06-21_S1404'
it will be empty if the job is not in queue, otherwise it should show on screen
Code Block
Job Name: postprocess_AlpineF2K_HYP06-21_S1404
check the completed count of Vel files by using `ls` and `wc`
Code Block
ls LF/AlpineK2T_HYP10-10_S1514/Vel/ | wc 6657 6657 98076
than compare it with the station count within the domain
Code Block
cat fd_rt01-h0.400.ll | wc 8550 25650 271714
for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.
change the WCT multplied in "the templates".( So that all jobs submitted afterwards will use the WCT)
Code Block
title original post_emod3d_mpi.ll.template
# @ wall_clock_limit = 0:20:00
to
Code Block
# @ wall_clock_limit = 1:30:00
re-submit job for all srf in that simulation
Code Block
echo "1" | ./submit_post_emod3d.sh

HF:

...

make sure the job submitted has already finished by looking at llq.
lets say we are looking for AlpineF2K_HYP06-21_S1404
Code Block
llq -l -u ykh22 | grep 'Job Name: run_hf_mpi' | grep 'AlpineF2K_HYP06-21_S1404'
it will be empty if the job is not in queue, otherwise it should show on screen
Code Block
Job Name: run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__AlpineF2K_HYP06-21_S1404
check the completed count of Vel Acc files by using `ls` and `wc`
Code Block
ls LFHF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/AlpineK2T_HYP10-10_S1514/VelAcc/ | wc 6657 6657 98076
than compare it with the station count within the domain
Code Block
cat fd_rt01-h0.400.ll | wc 8550 25650 271714
for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.
change the WCT multplied in "the templates".( So that all jobs submitted afterwards will use the WCT)
Code Block
# @ wall_clock_limit = 1:00:00
to
Code Block
# @ wall_clock_limit = 4:30:00
re-submit job for all srf in that simulation
Code Block
echo "1" | ./submit_hf.sh

...

IMPORTANT:before running batch bb submission, make sure all LF and HF for all runs under the list_vm are done.

Code Block

/nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/submit_cybershake_bb.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma

1.1 If only specific run's LF and HF are finished and user prefer to run BB for that specific run only. cd to the simulation folder and run ./submit_bb.sh.

run test_cybershake_bb.sh to test which runs finished

script takes 2 args, 1.path to Runs folder, 2. the list of vms (so it will not run for all the unnecessary runs)

Code Block

/nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/test_cybershake_bb.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma 2>&1 | tee /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_bb_vma.log

this will output the test result on the screen as well as dumping them into a log file, namely "/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_hf_vma.log"
(the part of the script after 2>&1 is to redirect the output to both the screen and a file using 'tee' )
Note:change the file name and location depending on your own requirement.

...

Resuming BB

make sure the job submitted has already finished by looking at llq.

Code Block
llq -l -u ykh22 \| grep 'Job Name: run_bb_mpi' \| grep 'AlpineF2K_HYP06-21_S1404'

Code Block
Job Name: run_bb_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__AlpineF2K_HYP06-21_S1404

check the completed count of Vel files by using `ls` and `wc`
Code Block
ls HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/AlpineK2T_HYP10-10_S1514/Vel/ | wc 6657 6657 98076
than compare it with the station count within the domain
Code Block
cat fd_rt01-h0.400.ll | wc 8550 25650 271714
for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.
change the WCT in "the templates".( So that all jobs submitted afterwards will use the WCT)
Code Block
# @ wall_clock_limit = 1:00:00
Code Block
# @ wall_clock_limit = 4:30:00
re-submit job for all srf in that simulation
Code Block
echo "1" | ./submit_bb.sh

TODO:

add script to auto test all simulations and submit the next step
- (currently need to run the test script and submit the next step manually)
A script to adjust WCT for HF
- currently HF has a hard-coded/
add script to auto test all simulations and submit the next step
- (currently need to run the test script and submit the next step manually)
A script to adjust WCT for HF
- currently HF has a hard-coded/static WCT
- (multiple re-submission of HF is needed if the boundary is large, more than 7 times for Alpine simulations)
A script to check if a job is still running(or in queue)
- currently user needs to manually check that
- a script to bulk check may help automating
1
2

Child pages

Versions Compared

Old Version 14

New Version 15

Key

HF:

Resuming BB

TODO:

Child pages

Page History

Versions Compared

Old Version 14

New Version 15

Key

HF:

Resuming BB

TODO: