V17p9

workflow:

LF:

  1. create a file constains a list of vm models to run(since it is currently 1 to N srf)

    cd /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/Vms
    
    ls | split -l 10 - list_vm

    this should output something like this:

    -rw-rw----  1 ykh22 nesi-users     94 Oct  2 03:20 list_vma
    -rw-rw----  1 ykh22 nesi-users     91 Oct  2 03:20 list_vmb
    -rw-rw----  1 ykh22 nesi-users     84 Oct  2 03:20 list_vmc
    -rw-rw----  1 ykh22 nesi-users    108 Oct  2 03:20 list_vmd
    -rw-rw----  1 ykh22 nesi-users     82 Oct  2 03:20 list_vme
    -rw-rw----  1 ykh22 nesi-users     62 Oct  2 03:20 list_vmf
    -rw-rw----  1 ykh22 nesi-users    105 Oct  2 03:20 list_vmg
    -rw-rw----  1 ykh22 nesi-users     96 Oct  2 03:20 list_vmh
    -rw-rw----  1 ykh22 nesi-users     51 Oct  2 03:20 list_vmi
  2. run install_cybershake.py with the path to the list of vm models and the path to install to

    /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/install_cybershake.sh $gmsim/RunFolder/Cybershake/v17p9/Data/list_vma /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9

    this should create all the simulation folders in the list_vm*

    Albury AlpineF2K AlpineK2T Ashley AwatNEVer AwatNEVerCl AwatereNE AwatereSW Barefell Brothers
    !!!!SIM_DIR:/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury
    Generation of model params has been skipped.
    Re-directing related params to files under /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/VMs/Albury
    /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury
    Permission /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury : 750
    ****************************************************************************************************
    ****************************************************************************************************
    Producing statcords and FD_STATLIST. It may take a minute or two
    /nesi/projects/nesi00213/StationInfo/non_uniform_whole_nz_with_real_stations-hh400_17062017.ll
    From: /nesi/projects/nesi00213/StationInfo/non_uniform_whole_nz_with_real_stations-hh400_17062017.ll
    To:
      /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/fd_rt01-h0.400.statcords
      /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/fd_rt01-h0.400.ll
    Done
    !!!!SIM_DIR:/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/AlpineF2K
    Generation of model params has been skipped.
    Re-directing related params to files under /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/VMs/AlpineF2K
    /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/AlpineF2K
    Permission /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/AlpineF2K : 750
    ****************************************************************************************************
    ****************************************************************************************************
    Producing statcords and FD_STATLIST. It may take a minute or two
    /nesi/projects/nesi00213/StationInfo/non_uniform_whole_nz_with_real_stations-hh400_17062017.ll
    From: /nesi/projects/nesi00213/StationInfo/non_uniform_whole_nz_with_real_stations-hh400_17062017.ll
    To:
      /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/AlpineF2K/fd_rt01-h0.400.statcords
      /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/AlpineF2K/fd_rt01-h0.400.ll
    Done
    ...
    ...
    ...
    !!!!SIM_DIR:/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers
    Generation of model params has been skipped.
    Re-directing related params to files under /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/VMs/Brothers
    /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers
    Permission /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers : 750
    ****************************************************************************************************
    ****************************************************************************************************
    Producing statcords and FD_STATLIST. It may take a minute or two
    /nesi/projects/nesi00213/StationInfo/non_uniform_whole_nz_with_real_stations-hh400_17062017.ll
    From: /nesi/projects/nesi00213/StationInfo/non_uniform_whole_nz_with_real_stations-hh400_17062017.ll
    To:
      /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers/fd_rt01-h0.400.statcords
      /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers/fd_rt01-h0.400.ll
    Done
  3. run submit_cybershake_emod3d.sh ( this will submit EMOD3D for all the simulation will the maximum WCT estimated)

    /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/submit_cybershake_emod3d.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma

    IMPORTANT!! : make sure the list_vm is the same as the one used in install_cybershake. (or has been installed properly by any means)

    submitting EMOD3D for:
    Albury
    AlpineF2K
    AlpineK2T
    Ashley
    AwatNEVer
    AwatNEVerCl
    AwatereNE
    AwatereSW
    Barefell
    Brothers
    ==============================
    submitting for Albury
    ==============================
    nx=286 ny=272 nz=105 sim_duration=55 num_procs=512
    Maximum: 0:06:07.273212
    Average: 0:00:51.445334
    Minimum: 0:00:00
    Loadleveler script run_emod3d_Albury_HYP01-01_S1244.ll written
    Submitting run_emod3d_Albury_HYP01-01_S1244.ll
    Loadleveler script run_emod3d_Albury_HYP01-01_S1254.ll written
    Submitting run_emod3d_Albury_HYP01-01_S1254.ll
    Loadleveler script run_emod3d_Albury_HYP01-01_S1264.ll written
    Submitting run_emod3d_Albury_HYP01-01_S1264.ll
    ..
    ..
    ..
    ==============================
    submitting for Brothers
    ==============================
    nx=372 ny=356 nz=113 sim_duration=69 num_procs=512
    Maximum: 0:14:04.156170
    Average: 0:01:58.244117
    Minimum: 0:00:00
    Loadleveler script run_emod3d_Brothers_HYP01-02_S1244.ll written
    Submitting run_emod3d_Brothers_HYP01-02_S1244.ll
    Loadleveler script run_emod3d_Brothers_HYP01-02_S1254.ll written
    Submitting run_emod3d_Brothers_HYP01-02_S1254.ll
    Loadleveler script run_emod3d_Brothers_HYP01-02_S1264.ll written
    Submitting run_emod3d_Brothers_HYP01-02_S1264.ll
    Loadleveler script run_emod3d_Brothers_HYP02-02_S1274.ll written
    Submitting run_emod3d_Brothers_HYP02-02_S1274.ll
    Loadleveler script run_emod3d_Brothers_HYP02-02_S1284.ll written
    Submitting run_emod3d_Brothers_HYP02-02_S1284.ll
    Loadleveler script run_emod3d_Brothers_HYP02-02_S1294.ll written
    Submitting run_emod3d_Brothers_HYP02-02_S1294.ll
  4. run test_emod3d.sh to determine which simulation have finished its EMOD3D jobs.
    the script takes 2 arguments: 1. the path to the Runs folder. 2. the list of vms (so it will not run for all the unnecessary runs)

    /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/test_cybershake_emod3d.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma 2>&1 | tee /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_emod3d_vma.log

    this will output the test result on the screen as well as dumping them into a log file, namely "/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_emod3d_vma.log"
    (the part of the script after 2>&1 is to redirect the output to both the screen and a file using 'tee' )
    Note:change the file name and location depending on your own requirement.

    example output
    running test for Albury
    Albury_HYP01-01_S1244: EMOD3D completed
    Albury_HYP01-01_S1254: EMOD3D completed
    Albury_HYP01-01_S1264: EMOD3D completed
    ====================
    Albury finished
    ====================
  5. after all EMOD3D finished, run submit_cybershake_post_emod.sh
    IMPORTANT:this will submit post_emod3d for all of the listed vm in list_vm. so if not all emod3d finished, it will be better to submit post_emod3d for each simulation individually.
    5.1 If only some of the runs are finished, and the user prefer to submit the post_emod3d for specific runs only. cd to the specific folder and execute ./submit_post_emod3d.sh and select auto submit

  6. run check_cybershake_post_emod.sh to check which simulation have finished

    script takes 2 args, 1.path to Runs folder, 2. the list of vms (so it will not run for all the unnecessary runs)

    /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/test_cybershake_post_emod3d.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma 2>&1 | tee /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_post_emod3d_vma.log

    this will output the test result on the screen as well as dumping them into a log file, namely "/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_post_emod3d_vma.log"
    (the part of the script after 2>&1 is to redirect the output to both the screen and a file using 'tee' )
    Note:change the file name and location depending on your own requirement.

    example output
    running test for Ashley
    Ashley_HYP01-03_S1244: post_emod3d finished
    Ashley_HYP01-03_S1254: post_emod3d finished
    Ashley_HYP02-03_S1264: post_emod3d finished
    Ashley_HYP02-03_S1274: post_emod3d finished
    Ashley_HYP03-03_S1284: post_emod3d finished
    Ashley_HYP03-03_S1294: post_emod3d finished
    ====================
    Ashley finished
    ====================

Resuming Post-EMOD3D

    post-emod3d has built-in resume functionality. So if the job failed to finish, you can resubmit again and it will start from where it ended.

    To maximize the efficentcy, its better to adjust the WCT to a proper length, instead of needing to check multiple times and submit multiple times.

  1. make sure the job submitted has already finished by using llq
    (the new ll script appends the rup_model name to the job name, so using a specific command will be able to test if a specific job is still on load-level queue or not.

    To show all jobs with job name belong to user 'ykh22'
    llq -l -u ykh22 | grep 'Job Name:'

    pipe it to grep to determine if a job is completed.
    lets say we are looking for AlpineF2K_HYP06-21_S1404

    llq -l -u ykh22 | grep 'Job Name: postprocess' | grep 'AlpineF2K_HYP06-21_S1404'

    it will be empty if the job is not in queue, otherwise it should show on screen

    Job Name: postprocess_AlpineF2K_HYP06-21_S1404
  2. check the completed count of Vel files by using `ls` and `wc`

    ls LF/AlpineF2K_HYP10-10_S1514/Vel/ | wc
          6657    6657   98076  

    than compare it with the station count within the domain

    cat fd_rt01-h0.400.ll | wc
        8550   25650  271714

    for this example, we have 8850 stations and only 2219 station finished (6657 / 3).
    so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.

  3. change the WCT in "the templates".( So that all jobs submitted afterwards will use the WCT)

    original post_emod3d_mpi.ll.template
     # @ wall_clock_limit     = 0:20:00

    to

    # @ wall_clock_limit     = 1:30:00
  4. re-submit job for all srf in that simulation

    echo "1" | ./submit_post_emod3d.sh

HF:

  1. run install_bb_cybershake.sh to setup the parameters(Mod-1D) for hf and bb runs.

    /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/install_bb_cybershake.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/install_bb_cybershake_selection.txt

    the code takes 3 arguments, 1. the path to Runs folder, 2. the list of vm(the same list used as previous steps), 3.the input for the install_bb.sh (this can be changed if different Mod-1D is chosen)

    installing BB for:
    Albury
    AlpineF2K
    AlpineK2T
    Ashley
    AwatNEVer
    AwatNEVerCl
    AwatereNE
    AwatereSW
    Barefell
    Brothers
    ==============================
    installing BB for Albury
    ==============================
    devel
    Info: Old version of params.py supporting singular kappa and sdrop
    ****************************************************************************************************
                                         EMOD3D HF/BB Preparationi Ver.devel
    ****************************************************************************************************
    ====================================================================================================
    Do you want site-specific computation? (To use a universal 1D profile, Select 'No')
    ====================================================================================================
     1. Yes
     2. No
    Enter the number you wish to select (1-2):====================================================================================================
    Select one of 1D Velocity models (from /nesi/projects/nesi00213/VelocityModel/Mod-1D)
    ====================================================================================================
     1. /nesi/projects/nesi00213/VelocityModel/Mod-1D/Cant1D_v1-midQ.1d
     2. /nesi/projects/nesi00213/VelocityModel/Mod-1D/Cant1D_v1.1d
     3. /nesi/projects/nesi00213/VelocityModel/Mod-1D/Cant1D_v2-midQ.1d
     4. /nesi/projects/nesi00213/VelocityModel/Mod-1D/Cant1D_v2-midQ_leer.1d
     5. /nesi/projects/nesi00213/VelocityModel/Mod-1D/banks.1d
     6. /nesi/projects/nesi00213/VelocityModel/Mod-1D/foothills.1d
     7. /nesi/projects/nesi00213/VelocityModel/Mod-1D/foothills_v2.1d
     8. /nesi/projects/nesi00213/VelocityModel/Mod-1D/plains.1d
    Enter the number you wish to select (1-8):/nesi/projects/nesi00213/VelocityModel/Mod-1D/Cant1D_v2-midQ_leer.1d
    Info: You have specified multiple SRF files.
          A single hf_kappa(=0.045) and hf_sdrop(=50) specified in params.py will be used for all SRF files.
           If you need to specific hf_kappa and hf_sdrop value for each SRF, add hf_kappa_list and hf_sdrop_list to params_base.py
    ====================================================================================================
    - Vel. Model 1D: Cant1D_v2-midQ_leer
    - hf_sim_bin: hb_high_v5.4.5_np2mm+
    - hf_rvfac: 0.8
    - hf_sdrop: 50
    - hf_kappa: 0.045
    - srf file: /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/Sources/Albury/Srf/Albury_HYP01-01_S1244.srf
    /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/LF/Albury_HYP01-01_S1244/params_uncertain.py
    /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/Albury_HYP01-01_S1244/params_bb_uncertain.py
    [Errno 17] File exists
    Permission /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045 : 750
    Permission /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/BB/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045 : 750
    ====================================================================================================
    - Vel. Model 1D: Cant1D_v2-midQ_leer
    - hf_sim_bin: hb_high_v5.4.5_np2mm+
    - hf_rvfac: 0.8
    - hf_sdrop: 50
    - hf_kappa: 0.045
    - srf file: /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/Sources/Albury/Srf/Albury_HYP01-01_S1254.srf
    /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/LF/Albury_HYP01-01_S1254/params_uncertain.py
    /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/Albury_HYP01-01_S1254/params_bb_uncertain.py
    [Errno 17] File exists
    Permission /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045 : 750
    Permission /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/BB/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045 : 750
    ====================================================================================================
    - Vel. Model 1D: Cant1D_v2-midQ_leer
    - hf_sim_bin: hb_high_v5.4.5_np2mm+
    - hf_rvfac: 0.8
    - hf_sdrop: 50
    - hf_kappa: 0.045
    - srf file: /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/Sources/Albury/Srf/Albury_HYP01-01_S1264.srf
    /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/LF/Albury_HYP01-01_S1264/params_uncertain.py
    /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/Albury_HYP01-01_S1264/params_bb_uncertain.py
    [Errno 17] File exists
    Permission /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045 : 750
    Permission /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Albury/BB/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045 : 750
    ...
    ...
    ...
    ...
    ...
  2. run submit_cybershake_hf.sh

    /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/submit_cybershake_hf.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma
    example output
    ==============================
    submitting for Brothers
    ==============================
    MPI
    Note: rand_reset is not defined in params_base_bb.py. We assume rand_reset=True
    ['/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/Brothers_HYP01-02_S1244', '/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/Brothers_HYP01-02_S1254', '/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/Brothers_HYP01-02_S1264', '/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/Brothers_HYP02-02_S1274', '/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/Brothers_HYP02-02_S1284', '/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/Brothers/HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/Brothers_HYP02-02_S1294']
    Also submit the job for you?
     1. Yes
     2. No
    Enter the number you wish to select (1-2):Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP01-02_S1244
    Loadleveler script run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP01-02_S1244_20171003_040524.ll written
    Submitting run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP01-02_S1244_20171003_040524.ll
    Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP01-02_S1254
    Loadleveler script run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP01-02_S1254_20171003_040524.ll written
    Submitting run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP01-02_S1254_20171003_040524.ll
    Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP01-02_S1264
    Loadleveler script run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP01-02_S1264_20171003_040524.ll written
    Submitting run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP01-02_S1264_20171003_040524.ll
    Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP02-02_S1274
    Loadleveler script run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP02-02_S1274_20171003_040524.ll written
    Submitting run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP02-02_S1274_20171003_040524.ll
    Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP02-02_S1284
    Loadleveler script run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP02-02_S1284_20171003_040524.ll written
    Submitting run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP02-02_S1284_20171003_040524.ll
    Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP02-02_S1294
    Loadleveler script run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP02-02_S1294_20171003_040524.ll written
    Submitting run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__Brothers_HYP02-02_S1294_20171003_040524.ll
    ==============================
  3. run test_cybershake_hf.sh.
    script takes 2 args, 1.path to Runs folder, 2. the list of vms (so it will not run for all the unnecessary runs)

    /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/test_cybershake_hf.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma 2>&1 | tee /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_hf_vma.log

    this will output the test result on the screen as well as dumping them into a log file, namely "/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_hf_vma.log"
    (the part of the script after 2>&1 is to redirect the output to both the screen and a file using 'tee' )
    Note:change the file name and location depending on your own requirement.

    example output
    running test for Albury
    Albury_HYP01-01_S1244: HF finished
    Albury_HYP01-01_S1254: HF finished
    Albury_HYP01-01_S1264: HF finished
    ====================
    Albury finished
    ====================

Resuming HF

  1. make sure the job submitted has already finished by looking at llq.
    lets say we are looking for AlpineF2K_HYP06-21_S1404

     llq -l -u ykh22 | grep 'Job Name: run_hf_mpi' | grep 'AlpineF2K_HYP06-21_S1404'

    it will be empty if the job is not in queue, otherwise it should show on screen

    Job Name: run_hf_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__AlpineF2K_HYP06-21_S1404
  2. check the completed count of Acc files by using `ls` and `wc`

    ls HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/AlpineF2K_HYP10-10_S1514/Acc/ | wc
          6657    6657   98076

    than compare it with the station count within the domain

    cat fd_rt01-h0.400.ll | wc
        8550   25650  271714

    for this example, we have 8850 stations and only 2219 station finished (6657 / 3).

    so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.

  3. change the WCT in "the templates".( So that all jobs submitted afterwards will use the WCT)

    # @ wall_clock_limit     = 1:00:00 

    to

    # @ wall_clock_limit     = 4:30:00
  4. re-submit job for all srf in that simulation

    echo "1" | ./submit_hf.sh

BB:

  1. IMPORTANT:before running batch bb submission, make sure all LF and HF for all runs under the list_vm are done.

    /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/submit_cybershake_bb.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma


    1.1 If only specific run's LF and HF are finished and user prefer to run BB for that specific run only. cd to the simulation folder and run ./submit_bb.sh.

  2. run test_cybershake_bb.sh to test which runs finished

    script takes 2 args, 1.path to Runs folder, 2. the list of vms (so it will not run for all the unnecessary runs)

    /nesi/projects/nesi00213/RunFolder/Cybershake/workflow/devel/cybershake/test_cybershake_bb.sh /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Data/list_vma 2>&1 | tee /nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_bb_vma.log

    this will output the test result on the screen as well as dumping them into a log file, namely "/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/test_hf_vma.log"
    (the part of the script after 2>&1 is to redirect the output to both the screen and a file using 'tee' )
    Note:change the file name and location depending on your own requirement.

     

Resuming BB

  1. make sure the job submitted has already finished by looking at llq.

    llq -l -u ykh22 | grep 'Job Name: run_bb_mpi' | grep 'AlpineF2K_HYP06-21_S1404'
    Job Name: run_bb_mpi_Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045__AlpineF2K_HYP06-21_S1404
  2. check the completed count of Vel files by using `ls` and `wc`

    ls HF/Cant1D_v2-midQ_leer_hfnp2mm+_rvf0p8_sd50_k0p045/AlpineF2K_HYP10-10_S1514/Vel/ | wc
          6657    6657   98076

    than compare it with the station count within the domain

    cat fd_rt01-h0.400.ll | wc
        8550   25650  271714

    for this example, we have 8850 stations and only 2219 station finished (6657 / 3).

    so its safe to assume that if we give it more than 4~4.5 times of WCT, it should finish with next submission.

     

  3. change the WCT in "the templates".( So that all jobs submitted afterwards will use the WCT)

    # @ wall_clock_limit     = 1:00:00
    # @ wall_clock_limit     = 4:30:00
  4. re-submit job for all srf in that simulation

    echo "1" | ./submit_bb.sh

Useful Commands

  • if you wish to view all jobs you submitted

    llq -u username -f %jn %id %st

    this will show all jobs "username" submitted (with the job name, jobid, and job status)

  • the script below can be used to parallel download files using rsync.
    !!! the folder tree must first be created using.

    -av -f"+ */" -f"- *" $source_dir $des_dir

    !!! must be modified. its using 'find' to return a list of folders, and parse it to download_rsync using 'xargs  -o -n1 -P$threadnumber'

    find LF -type d -print0 | xargs -0 -n1 -P12 -I% ~/gm_sim_workflow/devel/cybershake/download_rsyn.sh ykh22@fitzroy.nesi.org.nz:/nesi/projects/nesi00213/RunFolder/Cybershake/v17p9/Runs/AlpineF2K/% /nesi/projects/nesi00213/RunFolder/Cybershake/17p9/backup/AlpineF2K/LF


TODO:

  • add script to auto test all simulations and submit the next step
    • (currently need to run the test script and submit the next step manually)
  • A script to adjust WCT for HF
    • currently HF has a hard-coded/static WCT
    • (multiple re-submission of HF is needed if the boundary is large, more than 7 times for Alpine simulations)
  • A script to check if a job is still running(or in queue)
    • currently user needs to manually check that
    • a script to bulk check may help automating
  • Make a script to automate the parallel download script.
  • No labels