Issue | Area | Why it is a problem | Proposed solutions | Risks & |
---|---|---|---|---|
Version numbers are not intuitive (Sung) | Planning / Data management | Not easy to tell how multiple CS runs are related to each other Not easy to tell what resolution, what faults (South island, North island?) this CS runs is for | A new version number for any new runs A sticky wiki page that describes all the high-level details of all CS runs Should state which platform (NeSI vs KISTI) and virt envs for reproducibility Clean up Cybershake-related Wiki pages ( fix inconsistencies, obsolete info) | |
Version Management (Claudio) | Planning / Data management | CS runs are now split into multiple sub-runs (due to size), requiring a new approach to versioning to prevent inconsistencies |
| |
Standardised Ouputs/Archiving (Claudio) | Data management | Outputs between different runs are not consistent, making usage of the data difficult |
| |
Uploaded data may have missing bits (Sung) | Data management | An automated step doing sanity check if all the necessary data is present Automated Dropbox synchronisation after everything is completed Dropbox upload is ideally 3 separate tar balls per each fault - Srf/VM confs, BB, and IMs | UC IT may retire Dropbox | |
run_cybershake gets killed every 2~3 hours on KISTI (Sung) | Running | Needing to manually restart, causing low thoughput | Check with KISTI tech support for a solution Use of cronjobs (if no alternative is found) | |
If estimation incorrect each realization individually has to run and update their WCT | Running / Core Hour management | Can cause burning of more core hours than required | Always run REL_01 test for any run before executing all RELS Include some update system for Realizations that need to increase WCT based on REL_01 performance |