...
make sure the job submitted has already finished by using llq
(the new ll script appends the rup_model name to the job name, so using a specific command will be able to test if a specific job is still on load-level queue or not.To show all jobs with job name belong to user 'ykh22'
Code Block llq -l -u ykh22 | grep 'Job Name:'
pipe it to grep to determine if a job is completed.
lets say we are looking for AlpineF2K_HYP06-21_S1404Code Block llq -l -u ykh22 | grep 'Job Name: postprocess' | grep 'AlpineF2K_HYP06-21_S1404'
it will be empty if the job is not in queue, otherwise it should show on screen
Code Block Job Name: postprocess_AlpineF2K_HYP06-21_S1404
check the completed count of Vel files by using `ls` and `wc`
Code Block ls LF/AlpineK2T_HYP10-10_S1514/Vel/ | wc 6657 6657 98076
than compare it with the station count within the domain
Code Block cat fd_rt01-h0.400.ll | wc 8550 25650 271714
for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.change the WCT multplied in "the templates".( So that all jobs submitted afterwards will use the WCT)
Code Block title original post_emod3d_mpi.ll.template # @ wall_clock_limit = 0:20:00
to
Code Block # @ wall_clock_limit = 1:30:00
re-submit job for all srf in that simulation
Code Block echo "1" | ./submit_post_emod3d.sh
HF:
...
make sure the job submitted has already finished by looking at llq.
lets say we are looking for AlpineF2K_HYP06-21_S1404Code Block llq -l -u ykh22 | grep 'Job Name: run_hf_mpi' | grep 'AlpineF2K_HYP06-21_S1404'
it will be empty if the job is not in queue, otherwise it should show on screen
Code Block Job Name: run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__AlpineF2K_HYP06-21_S1404
check the completed count of Vel Acc files by using `ls` and `wc`
Code Block ls LFHF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/AlpineK2T_HYP10-10_S1514/VelAcc/ | wc 6657 6657 98076
than compare it with the station count within the domain
Code Block cat fd_rt01-h0.400.ll | wc 8550 25650 271714
for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.
change the WCT multplied in "the templates".( So that all jobs submitted afterwards will use the WCT)
Code Block # @ wall_clock_limit = 1:00:00
to
Code Block # @ wall_clock_limit = 4:30:00
re-submit job for all srf in that simulation
Code Block echo "1" | ./submit_hf.sh
...
IMPORTANT:before running batch bb submission, make sure all LF and HF for all runs under the list_vm are done.
Code Block /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/submit_cybershake_bb.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma
1.1 If only specific run's LF and HF are finished and user prefer to run BB for that specific run only. cd to the simulation folder and run ./submit_bb.sh.run test_cybershake_bb.sh to test which runs finished
script takes 2 args, 1.path to Runs folder, 2. the list of vms (so it will not run for all the unnecessary runs)
Code Block /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/test_cybershake_bb.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma 2>&1 | tee /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_bb_vma.log
this will output the test result on the screen as well as dumping them into a log file, namely "/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_hf_vma.log"
(the part of the script after 2>&1 is to redirect the output to both the screen and a file using 'tee' )
Note:change the file name and location depending on your own requirement.
...
Resuming BB
make sure the job submitted has already finished by looking at llq.
Code Block llq -l -u ykh22 | grep 'Job Name: run_bb_mpi' | grep 'AlpineF2K_HYP06-21_S1404'
Code Block Job Name: run_bb_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__AlpineF2K_HYP06-21_S1404
check the completed count of Vel files by using `ls` and `wc`
Code Block ls HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/AlpineK2T_HYP10-10_S1514/Vel/ | wc 6657 6657 98076
than compare it with the station count within the domain
Code Block cat fd_rt01-h0.400.ll | wc 8550 25650 271714
for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.change the WCT in "the templates".( So that all jobs submitted afterwards will use the WCT)
Code Block # @ wall_clock_limit = 1:00:00
Code Block # @ wall_clock_limit = 4:30:00
re-submit job for all srf in that simulation
Code Block echo "1" | ./submit_bb.sh
TODO:
- add script to auto test all simulations and submit the next step
- (currently need to run the test script and submit the next step manually)
- A script to adjust WCT for HF
- currently HF has a hard-coded/
- add script to auto test all simulations and submit the next step
- (currently need to run the test script and submit the next step manually)
- A script to adjust WCT for HF
- currently HF has a hard-coded/static WCT
- (multiple re-submission of HF is needed if the boundary is large, more than 7 times for Alpine simulations)
- A script to check if a job is still running(or in queue)
- currently user needs to manually check that
- a script to bulk check may help automating
- 2