As the pre-processing and installation steps of a cybershake run are being implemented, now is the best time for the Data directory to be restructured, if it is to be changed at all.
The initial proposal was to restructure it to be like the Runs directory with a directory for each fault containing a VM and directories for each realisation which contain the SRF/srfinfo/stoch files
After discussion a further two ideas were developed and are recorded here for further discussion.
To compare how each structure would have one VM per fault with one VM per realisation square brackets have been used to represent option directories, based on this. Only one of <fault VM files> and <realisation vm files> will be present in a given directory structure
Current structure
The current structure is based on the current pre-processing steps of generating srf files and then generating VMs from them
- Data
- Sources
- <fault>
- <fault>.info
- srf
- <realisation>.srf
- <realisation>.info
- stoch
- <realisation>.stoch
- sim_params
- <realisation>.yaml
- <fault>
- VMs
- <fault>
- <vm files>
- [realisation]
- <realisation vm files>
- <fault>
- Sources
Pros:
- Everything works
- No dev hours required to maintain it
- Works with existing Data directories
Cons:
- Different directory structure to Runs?
- Harder to scale up for VM perturbations per realisation
Initially proposed change
This structure is intended to mimic the structure of the Runs directory where everything related to a realisation is kept in a <fault>/<realisation> directory, and anything common at the fault level is kept in the <fault> directory
- Data
- <fault>
- <fault>.info
- [fault VM]
- <fault vm files>
- <realisation>
- [realisation VM]
- <realisation vm files>
- <realisation>.srf
- <realisation>.info
- <realisation>.stoch
- <realisation>.yaml
- [realisation VM]
- <fault>
Pros:
- Has the same structure as the Runs directory
- All files created by srfgen for a realisation are placed in the same directory
Cons:
- Will have to restructure existing Data directories to be usable with the new workflow
- Requires dev time to refactor
Halfway change
In order to maintain much of the current structure, while also incorporating some changes to reflect the Runs directory, the following structure was proposed
- Data
- Sources
- <fault>
- <fault>.info
- <realisation>
- <realisation>.srf
- <realisation>.info
- <realisation>.stoch
- <realisation>.yaml
- <fault>
- VM
- <fault>
- <fault vm files>
- [realisation]
- <realisation vm files>
- <fault>
- Sources
Pros:
- Structure is very similar to the current system
- All files created by srfgen are placed in the same directory
Cons:
- As it's so close to the original structure, is there any point in changing it?
- Will have to restructure existing Data directories to be usable with the new workflow
- Requires dev time to refactor
Even further change
The final proposal uses the similarities between the Runs directory and the proposed Data directory change to merge the two, meaning that everything related to one realisation is contained within the same directory
- Data
- root_params.yaml
- <fault>
- fault_params.yaml
- [fault VM]
- <fault vm files>
- median_source
- <fault>.csv
- <fault>.srf
- <realisation>
- sim_params.yaml
- [VM]
- <realisation vm files>
- Sources
- <realisation>.srf
- <realisation>.info
- <realisation>.stoch
- <realisation>.yaml
- LF
- HF
- BB
- IM_calc
Pros:
- Everything related to a realisation is in the same directory
Cons:
- Increased difficulty to save srfs and vms to be used in another cybershake run
- Not too difficult to create a script to extract the sources for use in other places
- Will have to restructure existing Data directories to be usable with the new workflow
- Requires dev time to refactor