GitHub URL:
What is this repo about?
Repo status
README present | Yes |
Is Public? | Yes |
Number of commits | 1271 |
Last time Updated | 04-04-2019 |
Functionalities
- Description: State how this function is used or interacts with other sw components.
- Status: (1: not working, 2: unstable, 3: works under specific condition, 4: works with known issues, 5: perfect )
- Tests: (1: none, 2: broken/outdated. 3: with limited coverage, 4: works with known issues, 5: perfect)
- Doc (1: none, 2: outdated, 3: with limited coverage, 4: mostly ok, 5: perfect) Give a link
- Frequency of use: Daily, Weekly, Monthly, Yearly, Never
- Frequency of code/req. change: Daily, Weekly, Monthly, Yearly, Never
- Bus Factor: Number of people that are familiar with the code (1-7)
Functionality | Description | Status | Known issues | Tests? | Doc? | Frequency of use | Frequency of code/req. change | Bus Factor |
---|---|---|---|---|---|---|---|---|
submit slurm scripts | Python scripts that generates slurm scripts to run simulations. | 4 | Legacy codes that are never been used after mid 2018 are still in repo. Many Bash scripts that are created to remove manual work no longer works due to structural changes to simulations. | 4 | 2 | Daily | Daily | 3 |
Dashboard | Web-based scripts to show the usage of HPC (IO/Corehours) | 4 | Some Functions are still under development. | 3 | 3 Comments + Docstrings | Daily | Weekly | 2 |
e2e_tests | Testing scripts to run full install-to-simulation test to see if the workflow is broken | 4 | workflow_config.json will not be properly created after branching off master (default value used instead of deployed version) - (the correct workflow_config.json is created upon creation of the envrionment, from then on it is the environments owner responsibility to keep it up to date if values require adding/updating) | 1 | 4 ReadMe + Comments + Docstrings | Daily | Weekly | 2 |
estimation WCT | Used to estimate the runtime of a specific Simulation Step. | 4 | When data is out of range the estimation may be under-estimating a lot. (band-aged fixed with multiplying WCT with retry-count) - (this should be fixed by the addition of a SVR model, that can handle out of bound data and in general overestimates – DONE) | 1 | 4 Readme + Comments + Docstrings | Daily | Monthly | 1 |
automated simulation workflow | A wrapper that can bulk install simulations and auto-submit jobs. | 4 | excessive access to the management DB will cause it to be locked on Maui legacy parameters has to be removed from 'example' cybershake_config.json | 4 | 3 CyberShake Install and Auto-submission | Daily | Daily | 3 |
verification | Scripts that will be used to auto-verify if a simulation is valid or is something obviously wrong. | 1 | Under development and not implemented into automated workflow | 1 | 1 | Monthly | Monthly | 2 |
Automated Testing | 4 | Does not cover the whole repo yet. Some bash scripts that are not in the 'main workflow' are not tested, e.g. scripts that moves files around that is heavily relying on folder/file structures. | 1 Test script for test script? | 1 | Daily | Weekly | 3 | |
Templates | Templates used for simulation install or job submission | 5 | 1 | 1 | Daily | Weekly | 4 | |
HPC Environments | Scripts for creating/activating HPC Environments | 4 | 1 | 3 Readme + Comments | Daily | Monthly | 1 | |
Deploy workflow | Scripts for deploying workflow | 3 | Not overly stable, not frequently used and often requires work for it to work as intended. Get rid of it and just use environments? | 1 | 2 | Weekly | Monthly | 1 |
Metadata | Scripts for logging and aggregating metadata | 5 | 5? | 3 Comments + Docstrings | Daily | Monthly | 1 | |
Shared workflow | Shared functionality for the worfklow | 3 | Could probably use some refactoring, most likely contains some unused functions | 3 | 2 Limited Comments + the odd Docstring | Daily | Monthly | 7 |
Scripts + Cybershake scripts | Scripts | 2 | Massive number of scripts, most of them are unused, requires some significant tidy up | 3 | 2 | Most are outdated? | ? | 2 |
Management DB | Lots of scripts for creation and updating of MgmtDB | 4 | Works but messy code and prone to failure IMO, currently being tidied up as part of https://quakecore.atlassian.net/browse/QSW-1057 | 3 | 2 | Daily | Monthly | 7 |
Suggested Improvements / New Features
- Description: State how/why this will be useful
- Timeline: Estimate of how many sprints will it take to develop
Functionality | Description | Timeline |
---|---|---|
Update DB script revamp | To attempt to address the Lock issue caused by excessive access | 3~4 Days |
Integrate Pre-processing into automated workflow | One step closer to fully automated workflow. (Including option to only do pre-processing) | 1 Sprint |
Implement automated verification | First guard/test for running huge simulations (i.e Cybershake) | 1~2 Sprint (depends on how complicated the method will be) |
Get rid of default HPC deployed workflow and purely use environments | Remove the default workflow and create a deafult (i.e. stable) HPC environment instead. Makes everything consistent, removes requirement to maintain deploy code? | 1 day |
Logging | Add decent logging for workflow, advantages:
| 1-2 Sprints depending on extend |
Large quantity of dead scripts | Remove unused/outdated scripts | 2 day |
Optimize estimation performance | Prevent constant reloading of models, this should make estimation super fast (i.e. not noticeable). Currently the models are loaded from the files for every estimation, which is obviously slow | 1-2 days |
Visualisation Automation | Add options for enabling plotting/visualisations for All/First Realisation | 1-2 Sprints |
Error handling | Identify more cases which can be automatically checked / corrected rather than requiring manual intervention for runs. | |
Update Cybershake related code to fit recent yaml changes | This will deprecate unnecessary parameters and/or the use of redundant config files. | 1 Sprint |
Change of realisation names | AlpineF2K_HYP01-47_S1244 to AlpineF2K_REL01 | 2 Sprints+ |