Ground motion simulation run manual (20p07)

WARNING

This page is superseded by Ground motion simulation run manual (23p01)

The instructions for how to run a ground motion simulation on the NeSI systems of Maui and Mahuika are given below:

These instructions use the automated workflow described here: Automated workflow pipelines

In code blocks on this page variables are given in <> brackets, optional parameters are given in [], examples given in {}, and comments are given in (). All script arguments are placed on a new line for clarity, they should follow the script in use.

1) Generate fault selection file

The fault selection file must first be generated.

This file will be a list of all events or faults to be considered for realisation generation.

The format of this file is two columns: The name of the event/fault followed by the number of realisations for the event/fault. If the realisations are to have uncertainty then the number of realisations should be suffixed by the letter 'r'

A script to generate the fault selection file from a given nhm and list of faults is available here: Simulation count calculator in Python3

2) Generate source files

Once the fault selection file has been generated the source files must be created.

The two parts of this step can be performed with a slurm script.

The first step is to generate realisation files from either NHM of GCMT sources. Further more an uncertainty version can be given to generate realisations with uncertainty characteristics.

2a) Source parameter generation

Notably gcmt subduction interface faults must have the arguments "--common_source_parameter tect_type SUBDUCTION_INTERFACE" added to use the Skarlatoduis magnitude-area scaling relationship.

python $gmsim/Pre-processing/srf_generation/source_parameter_generation/generate_realisations_from_gcmt.py 
	<path to fault selection file>
	<path to gcmt file>
	<type of fault to generate> (Currently 1 or 2 for gcmt sources)
	[--n_processes <number of cores to use> (Default 1, max number of events/faults)]
	[--version <Name of perturbation version to use>]
	[--vel_mod_1d <Location of the 1d velocity model to use for srf generation>]
	[--aggregate_file <Location to place the realisation aggregation file> (Will cause a crash if this file already exists)]
	[--source_parameter <parameter name> <Location of file containing source specific parameters> (Repeatable)]
	[--common_source_parameter <parameter name> <parameter value>]
	[--vs30_median <Location of vs30 sigma file> (File has two columns, one of station names, the other of vs30 values)]
	[--vs30_sigma <Location of vs30 sigma file> (Same format as the median file, except with sigma instead of median)]
	[--cybershake_root <Location of simulation root directory>]
	[--checkpointing (Prevents recreation of previously created files)]
python $gmsim/Pre-processing/srf_generation/source_parameter_generation/generate_realisations_from_nhm.py
	<path to fault selection file>
	<path to nhm file>
	<type of fault to generate> (Currently always 4 for nhm)
	[(The same optional arguments as for gcmt shown above)]

2b) Input file generation

Next the realisation files must be used to generate the srf, srfinfo, stoch and sim_params files that will be used to run the simulation.

python $gmsim/Pre-processing/srf_generation/input_file_generation/generate_srf_from_realisations.py 
	[--cybershake_root <Location of simulation root directory> (Default current directory)] 
	[--n_processes <Number of cores to use for file generation> (Default 1)] 
	[--checkpointing (Prevents regeneration of previously generated realisations)]

More information on the srf generation step is available here: Validation source perturbation

3) Generate 3d Velocity Models

The 3d velocity model for each event/fault is generated in 2 steps.

Step 1

The fist step converts realisation .csv file into vm_params.yaml.

python rel2vm_params.py --help
usage: rel2vm_params.py [-h] [-o OUTDIR] [--pgv PGV] [--hh HH] [--dt DT] [--min-vs MIN_VS] [--vm-version VM_VERSION] [--vm-topo {TRUE,BULLDOZED,SQUASHED,SQUASHED_TAPERED}] [--no-optimise] [--deep-rupture]
                        [--target-land-coverage TARGET_LAND_COVERAGE] [--min-rjb MIN_RJB] [--ds-multiplier DS_MULTIPLIER]
                        rel_file

positional arguments:
  rel_file              REL csv file

optional arguments:
  -h, --help            show this help message and exit
  -o OUTDIR, --outdir OUTDIR
                        output directory to place VM files (if not specified, the same location as rel_file is in)
  --pgv PGV             max PGV at velocity model perimiter (estimated, cm/s)
  --hh HH               velocity model grid spacing (km)
  --dt DT               timestep to estimate simulation duration (s) Default: hh/20
  --min-vs MIN_VS       for nzvm gen and flo (km/s)
  --vm-version VM_VERSION
                        velocity model version to generate
  --vm-topo {TRUE,BULLDOZED,SQUASHED,SQUASHED_TAPERED}
                        topo_type parameter for velocity model generation
  --no-optimise         Don't try and optimise the vm if it is off shore. Removes dependency on having GMT coastline data
  --deep-rupture        Continue even if too deep
  --target-land-coverage TARGET_LAND_COVERAGE
                        Land coverage level (%) that triggers optimisation if not met (Default: 99.0)
  --min-rjb MIN_RJB     Specify a minimum horizontal distance (in km) for the VM to span from the fault - invalid VMs will still not be generated
  --ds-multiplier DS_MULTIPLIER
                        Sets the DS multiplier for setting the sim-duration. Validation runs default to 1.2. Cybershake runsshould manually set it to 0.75

e.g.

python ~/Pre-processing/VM/rel2vm_params.py Hossack_REL01.csv -o ~/Data/VMs/Hossack/ --pgv 2.0 --hh 0.4 --dt 0.01 --min_vs 0.5 --vm-version 2.03

This produces vm_params.yaml at ~/Data/VMs/Hossack with a PGV threshold of 2.0 cm/s and grid spacing of 0.4, dt of 0.01, a minimum vs of 0.5, version 2.03.

Alongside vm_params.yaml, a map is produced, which can be visually inspected if the VM is at a sensible location.

Step 2

The second step converts vm_params.yaml into NZVM binaries and associated files (eg. model_params, model_coords etc)

python VM/vm_params2vm.py --help
usage: vm_params2vm.py [-h] [-o OUTDIR] [-t VM_THREADS] name vm_params_path

positional arguments:
  name                  Name of the fault
  vm_params_path        path to vm_params.yaml

optional arguments:
  -h, --help            show this help message and exit
  -o OUTDIR, --outdir OUTDIR
                        output directory to place VM files (if not specified, the directory containing vm_params.yaml is used
  -t VM_THREADS, --vm_threads VM_THREADS, --threads VM_THREADS
                        number of threads for the VM generation

python ~/Pre-processing/VM/vm_params2vm.py Hossack ~/Data/VMs/Hossack/vm_params.yaml -t 16

This runs VM generation on 18 threads and places NZVM binaries and other files in ~/Data/VMs/Hosack, and reports the validation test result.

Any velocity models that are not generated for some reason will have an image generated instead, logs and the generated image can be used to determine why the velocity model was not generated.

Common reasons include:

Fully offshore domain
Velocity model would have greater depth than width

4) Install simulation

The simulation must be installed to set up all the shared parameter files and output directories.

python $gmsim/workflow/scripts/cybershake/install_cybershake.py
	<Path to root of simulation directory>
	<Path to fault selection file>
	[<gmsim version to use> (Defaults to 16.1) (Must exist in slurm_gm_workflow/templates/gmsim)]
	[--seed <Seed to use for HF> (Defaults to using a different random one for each realisation)]
	[--stat_file_path <Path to the station list to use>]
	[--extended_period (Adds additional pSA periods to IM_calc, usually used for plotting the pSA)]
	[--log_file <Path to the log file to use> (Defaults to 'cybershake_log.txt')]
	[--keep_dup_station (Keep stations that would be removed for snapping to a grid point previously snapped to by another station)]
    [--vm_perturbations (Use vm perturbation files for all realisations. Incompatible with below flag)]
    [--ignore_vm_perturbations (Explicitly don't use any vm perturbation files. Incompatible with above flag)]

e.g.

srun -t 60 -N 1 --cpus-per-task 1 python $gmsim/workflow/scripts/cybershake/install_cybershake.py . list.txt 20.4.1.4 --keep_dup_station --stat_file_path /nesi/project/nesi00213/StationInfo/non_uniform_whole_nz_with_real_stations-hh400_v20p3_land.ll

This runs install in your current directory for gm sim version 20.4.1.4 keeping duplicate stations for the 20p3 non_uniform station list.

5) Estimate corehours

In most cases the simulation should have core hour estimation run for the amount of core hours required to run the simulation.

python $gmsim/workflow/estimation/estimate_cybershake.py
	<Path to the VMs directory>
	<Path to the sources directory>
	[--runs_dir <Path to the runs directory>]
	[--fault_selection <Path to the fault selection file> (Ignored if --runs_dir is given)]
	[--root_yaml <Path to the root_params.yaml file> (Ignored if --runs_dir is given)]
	[--output <Path to save the output> (Does not save, only displays if not given)]
	[--verbose (Show estimated core hours for each fault, not just the whole simulation)] 
	[--models_dir <Path to the estimation models> (Uses a default from the workflow or platform config)]

If the number of core hours required is greater than 1000, then a core hour request should be submitted via slack.

6) Run simulation

Finally the simulation can be run

python $gmsim/workflow/scripts/cybershake/run_cybershake.py 
	<Path to root of simulation directory>
	<The user name to be used with the HPC scheduler>
	[<Path to the task configuration file to be used> (Defaults to a file in the slurm_gm_workflow repository which runs EMOD3D, HF, BB, IM_calc and clean_up on all realisations)]
	[--sleep_time <How long each sub script should sleep for before restarting> (Defaults to 5, generally shouldn't need to be changed)]
	[--n_max_retries <How many times an individual task should be attempted before requiring user intervention>]
	[--n_runs <The number of jobs that can run simultaneously> (Takes either 1 value, or a number of values equal to the number of available HPCs]
	[--log_folder <Location log files should be placed>] 
	[--debug (Print debug log messages to the terminal)]

Details of the configuration file contents are available here: Auto submit wrapper

7) Monitor simulation

The run script will provide an overview of the progress of the simulation.

To watch the database state and your jobs in the Maui and Mahuika scheduler queues the following command can be used

watch "python $gmsim/workflow/scripts/management/query_mgmt_db.py <path to simulation root directory> --config <path to config file>; squeue -u $USER -M maui; squeue -u $USER -M mahuika"

If more than 40 tasks are set to be run, then it is advisable to use the argument "–mode count" with query_mgmt_db.py

python $gmsim/workflow/scripts/management/query_mgmt_db.py
    <Path to simulation root directory>
	[<Name of fault or event to inspect> (All events/faults are shown otherwise)]
	[--config <Path to task configuration file used to call run_cybershake.py>]
	[--mode {error,count,todo,retry_max,detailed_count} [{error,count,todo,retry_max,detailed_count} ...] (Multiple values can be given, some modes work in combination with each other)]
	[--mode-help (Flag that displays information about the different modes)]

Child pages