...
make sure the job submitted has already finished by using llq
(the new ll script appends the rup_model name to the job name, so using a specific command will be able to test if a specific job is still on load-level queue or not.To show all jobs with job name belong to user 'ykh22'
Code Block llq -l -u ykh22 | grep 'Job Name:'
pipe it to grep to determine if a job is completed.
lets say we are looking for AlpineF2K_HYP06-21_S1404Code Block llq -l -u ykh22 | grep 'Job Name: postprocess' | grep 'AlpineF2K_HYP06-21_S1404'
it will be empty if the job is not in queue, otherwise it should show on screen
Code Block Job Name: postprocess_AlpineF2K_HYP06-21_S1404
check the completed count of Vel files by using `ls` and `wc`
Code Block ls LF/AlpineK2TAlpineF2K_HYP10-10_S1514/Vel/ | wc 6657 6657 98076
than compare it with the station count within the domain
Code Block cat fd_rt01-h0.400.ll | wc 8550 25650 271714
for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.change the WCT in "the templates".( So that all jobs submitted afterwards will use the WCT)
Code Block title original post_emod3d_mpi.ll.template # @ wall_clock_limit = 0:20:00
to
Code Block # @ wall_clock_limit = 1:30:00
re-submit job for all srf in that simulation
Code Block echo "1" | ./submit_post_emod3d.sh
...
make sure the job submitted has already finished by looking at llq.
lets say we are looking for AlpineF2K_HYP06-21_S1404Code Block llq -l -u ykh22 | grep 'Job Name: run_hf_mpi' | grep 'AlpineF2K_HYP06-21_S1404'
it will be empty if the job is not in queue, otherwise it should show on screen
Code Block Job Name: run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__AlpineF2K_HYP06-21_S1404
check the completed count of Acc files by using `ls` and `wc`
Code Block ls HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/AlpineK2TAlpineF2K_HYP10-10_S1514/Acc/ | wc 6657 6657 98076
than compare it with the station count within the domain
Code Block cat fd_rt01-h0.400.ll | wc 8550 25650 271714
for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.
change the WCT in "the templates".( So that all jobs submitted afterwards will use the WCT)
Code Block # @ wall_clock_limit = 1:00:00
to
Code Block # @ wall_clock_limit = 4:30:00
re-submit job for all srf in that simulation
Code Block echo "1" | ./submit_hf.sh
...
make sure the job submitted has already finished by looking at llq.
Code Block llq -l -u ykh22 | grep 'Job Name: run_bb_mpi' | grep 'AlpineF2K_HYP06-21_S1404'
Code Block Job Name: run_bb_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__AlpineF2K_HYP06-21_S1404
check the completed count of Vel files by using `ls` and `wc`
Code Block ls HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/AlpineK2TAlpineF2K_HYP10-10_S1514/Vel/ | wc 6657 6657 98076
than compare it with the station count within the domain
Code Block cat fd_rt01-h0.400.ll | wc 8550 25650 271714
for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.change the WCT in "the templates".( So that all jobs submitted afterwards will use the WCT)
Code Block # @ wall_clock_limit = 1:00:00
Code Block # @ wall_clock_limit = 4:30:00
re-submit job for all srf in that simulation
Code Block echo "1" | ./submit_bb.sh
...