Cybershake data directory structure

As the pre-processing and installation steps of a cybershake run are being implemented, now is the best time for the Data directory to be restructured, if it is to be changed at all.

The initial proposal was to restructure it to be like the Runs directory with a directory for each fault containing a VM and directories for each realisation which contain the SRF/srfinfo/stoch files

After discussion a further two ideas were developed and are recorded here for further discussion.

To compare how each structure would have one VM per fault with one VM per realisation square brackets have been used to represent option directories, based on this. Only one of <fault VM files> and <realisation vm files> will be present in a given directory structure

Current structure

The current structure is based on the current pre-processing steps of generating srf files and then generating VMs from them

Data
- Sources
  - <fault>
    - <fault>.info
    - srf
      - <realisation>.srf
      - <realisation>.info
    - stoch
      - <realisation>.stoch
    - sim_params
      - <realisation>.yaml
- VMs
  - <fault>
    - <vm files>
    - [realisation]
      - <realisation vm files>

Pros:

Everything works
No dev hours required to maintain it
Works with existing Data directories

Cons:

Different directory structure to Runs?
Harder to scale up for VM perturbations per realisation

Initially proposed change

This structure is intended to mimic the structure of the Runs directory where everything related to a realisation is kept in a <fault>/<realisation> directory, and anything common at the fault level is kept in the <fault> directory

Data
- <fault>
  - <fault>.info
  - [fault VM]
    - <fault vm files>
  - <realisation>
    - [realisation VM]
      - <realisation vm files>
    - <realisation>.srf
    - <realisation>.info
    - <realisation>.stoch
    - <realisation>.yaml

Pros:

Has the same structure as the Runs directory
All files created by srfgen for a realisation are placed in the same directory

Cons:

Will have to restructure existing Data directories to be usable with the new workflow
Requires dev time to refactor

Halfway change

In order to maintain much of the current structure, while also incorporating some changes to reflect the Runs directory, the following structure was proposed

Data
- Sources
  - <fault>
    - <fault>.info
    - <realisation>
      - <realisation>.srf
      - <realisation>.info
      - <realisation>.stoch
      - <realisation>.yaml
- VM
  - <fault>
    - <fault vm files>
    - [realisation]
      - <realisation vm files>

Pros:

Structure is very similar to the current system
All files created by srfgen are placed in the same directory

Cons:

As it's so close to the original structure, is there any point in changing it?
Will have to restructure existing Data directories to be usable with the new workflow
Requires dev time to refactor

Even further change

The final proposal uses the similarities between the Runs directory and the proposed Data directory change to merge the two, meaning that everything related to one realisation is contained within the same directory

Data
- root_params.yaml
- <fault>
  - fault_params.yaml
  - [fault VM]
    - <fault vm files>
  - median_source
    - <fault>.csv
    - <fault>.srf
  - <realisation>
    - sim_params.yaml
    - [VM]
      - <realisation vm files>
    - Sources
      - <realisation>.srf
      - <realisation>.info
      - <realisation>.stoch
      - <realisation>.yaml
    - LF
    - HF
    - BB
    - IM_calc

Pros:

Everything related to a realisation is in the same directory

Cons:

Increased difficulty to save srfs and vms to be used in another cybershake run
- Not too difficult to create a script to extract the sources for use in other places
Will have to restructure existing Data directories to be usable with the new workflow
Requires dev time to refactor

Child pages

Cybershake data directory structure

Current structure

Initially proposed change

Halfway change

Even further change