IssueAreaWhy it is a problemProposed solutionsRisks & 
Version numbers are not intuitive
(Sung)
Planning / Data management

Not easy to tell how multiple CS runs are related to each other

Not easy to tell what resolution, what faults (South island, North island?) this CS runs is for

A new version number for any new runs

A sticky wiki page that describes all the high-level details of all CS runs

Should state which platform (NeSI vs KISTI) and virt envs for reproducibility

Clean up Cybershake-related Wiki pages ( fix inconsistencies, obsolete info)


Version Management (Claudio)Planning / Data managementCS runs are now split into multiple sub-runs (due to size), requiring a new approach to versioning to prevent inconsistencies
  • Versioning is done at the a run level, not the CS level
    • A run is any simulation run for a set of faults and fixed parameters
    • If needed we can add an CS version later (which would consist of multiple run versions that have identical parameters)
  • For each run an entry in the run table is made
    • Table shows common run parameters (i.e. standard inputs, grid spacing etc)
    • Link to extra run-specific page
      • Contains run-specific details, such as the set of faults run
      • Any modifications to the run or issues encountered are logged on this page
  • Parameters of a run are never changed, create a new version if parameters are changed

Standardised Ouputs/Archiving (Claudio)Data managementOutputs between different runs are not consistent, making usage of the data difficult
  • Upon completion, data is archived in a consistent manner across runs
  • This data should must not be modified
    • If issues exist, fix these in a new version

Uploaded data may have missing bits (Sung)Data management

An automated step doing sanity check if all the necessary data is present

Automated Dropbox synchronisation after everything is completed

Dropbox upload is ideally 3 separate tar balls per each fault - Srf/VM confs, BB, and IMs
Verify Dropbox upload (checksum for individual file is in place, but make sure all files are indeed uploaded)

UC IT may retire Dropbox
run_cybershake gets killed every 2~3 hours on KISTI (Sung)RunningNeeding to manually restart, causing low thoughput

Check with KISTI tech support for a solution

Use of cronjobs (if no alternative is found)


If estimation incorrect each realization individually has to run and update their WCTRunning / Core Hour managementCan cause burning of more core hours than requiredAlways run REL_01 test for any run before executing all RELS
Include some update system for Realizations that need to increase WCT based on REL_01 performance

  • No labels