Improvement Ideas for Cybershake

Issue	Area	Why it is a problem	Proposed solutions	Risks &
Version numbers are not intuitive (Sung)	Planning / Data management	Not easy to tell how multiple CS runs are related to each other Not easy to tell what resolution, what faults (South island, North island?) this CS runs is for	A new version number for any new runs A sticky wiki page that describes all the high-level details of all CS runs Should state which platform (NeSI vs KISTI) and virt envs for reproducibility Clean up Cybershake-related Wiki pages ( fix inconsistencies, obsolete info)
Version Management (Claudio)	Planning / Data management	CS runs are now split into multiple sub-runs (due to size), requiring a new approach to versioning to prevent inconsistencies	Versioning is done at the a run level, not the CS level A run is any simulation run for a set of faults and fixed parameters If needed we can add an CS version later (which would consist of multiple run versions that have identical parameters) For each run an entry in the run table is made Table shows common run parameters (i.e. standard inputs, grid spacing etc) Link to extra run-specific page Contains run-specific details, such as the set of faults run Any modifications to the run or issues encountered are logged on this page Parameters of a run are never changed, create a new version if parameters are changed
Standardised Ouputs/Archiving (Claudio)	Data management	Outputs between different runs are not consistent, making usage of the data difficult	Upon completion, data is archived in a consistent manner across runs This data should must not be modified If issues exist, fix these in a new version
Uploaded data may have missing bits (Sung)	Data management		An automated step doing sanity check if all the necessary data is present Automated Dropbox synchronisation after everything is completed Dropbox upload is ideally 3 separate tar balls per each fault - Srf/VM confs, BB, and IMs Verify Dropbox upload (checksum for individual file is in place, but make sure all files are indeed uploaded)	UC IT may retire Dropbox
run_cybershake gets killed every 2~3 hours on KISTI (Sung)	Running	Needing to manually restart, causing low thoughput	Check with KISTI tech support for a solution Use of cronjobs (if no alternative is found)
If estimation incorrect each realization individually has to run and update their WCT	Running / Core Hour management	Can cause burning of more core hours than required	Always run REL_01 test for any run before executing all RELS Include some update system for Realizations that need to increase WCT based on REL_01 performance

Child pages

Improvement Ideas for Cybershake