GitHub URL:  

What is this repo about?


Repo status

README presentYes
Is Public?Yes

Number of commits

1271

Last time Updated

04-04-2019


Functionalities

  • Description: State how this function is used or interacts with other sw components.
  • Status: (1: not working, 2: unstable, 3: works under specific condition, 4: works with known issues, 5: perfect )
  • Tests: (1: none, 2: broken/outdated. 3: with limited coverage, 4: works with known issues, 5: perfect)
  • Doc (1: none, 2: outdated,  3: with limited coverage, 4: mostly ok, 5: perfect)  Give a link
  • Frequency of use: Daily, Weekly, Monthly, Yearly, Never
  • Frequency of code/req. change: Daily, Weekly, Monthly, Yearly, Never
  • Bus Factor: Number of people that are familiar with the code (1-7)
FunctionalityDescriptionStatusKnown issuesTests?Doc?Frequency of useFrequency of code/req. changeBus Factor
submit slurm scriptsPython scripts that generates slurm scripts to run simulations.4Legacy codes that are never been used after mid 2018 are still in repo.
Many Bash scripts that are created to remove manual work no longer works due to structural changes to simulations.
42
DailyDaily3
DashboardWeb-based scripts to show the usage of HPC (IO/Corehours)4Some Functions are still under development.3

3

Comments + Docstrings

DailyWeekly2
e2e_testsTesting scripts to run full install-to-simulation test to see if the workflow is broken4

workflow_config.json will not be properly created after branching off master (default value used instead of deployed version) - (the correct workflow_config.json is created upon creation of the envrionment, from then on it is the environments owner responsibility to keep it up to date if values require adding/updating)

14
ReadMe + Comments + Docstrings
DailyWeekly2
estimation WCTUsed to estimate the runtime of a specific Simulation Step.4When data is out of range the estimation may be under-estimating a lot. (band-aged fixed with multiplying WCT with retry-count) - (this should be fixed by the addition of a SVR model, that can handle out of bound data and in general overestimates – DONE)1

4

Readme + Comments + Docstrings

DailyMonthly1
automated simulation workflowA wrapper that can bulk install simulations and auto-submit jobs.4

excessive access to the management DB will cause it to be locked on Maui

legacy parameters has to be removed from 'example' cybershake_config.json

43
CyberShake Install and Auto-submission
DailyDaily3
verificationScripts that will be used to auto-verify if a simulation is valid or is something obviously wrong.1Under development and not implemented into automated workflow11MonthlyMonthly2
Automated Testing
4Does not cover the whole repo yet.
Some bash scripts that are not in the 'main workflow' are not tested, e.g. scripts that moves files around that is heavily relying on folder/file structures.

1

Test script for test script?

1DailyWeekly3
TemplatesTemplates used for simulation install or job submission5
11DailyWeekly4
HPC EnvironmentsScripts for creating/activating HPC Environments4
1

3

Readme + Comments

DailyMonthly1
Deploy workflowScripts for deploying workflow3Not overly stable, not frequently used and often requires work for it to work as intended. Get rid of it and just use environments?12WeeklyMonthly1
MetadataScripts for logging and aggregating metadata5
5?

3

Comments + Docstrings

DailyMonthly1
Shared workflowShared functionality for the worfklow3Could probably use some refactoring, most likely contains some unused functions3

2

Limited Comments +

the odd Docstring

DailyMonthly7
Scripts + Cybershake scriptsScripts2Massive number of scripts, most of them are unused, requires some significant tidy up32Most are outdated??2
Management DBLots of scripts for creation and updating of MgmtDB4Works but messy code and prone to failure IMO, currently being tidied up as part of https://quakecore.atlassian.net/browse/QSW-105732DailyMonthly7

Suggested Improvements / New Features

  • Description: State how/why this will be useful
  • Timeline: Estimate of how many sprints will it take to develop 
FunctionalityDescriptionTimeline
Update DB script revampTo attempt to address the Lock issue caused by excessive access3~4 Days
Integrate Pre-processing into automated workflowOne step closer to fully automated workflow. (Including option to only do pre-processing)1 Sprint
Implement automated verificationFirst guard/test for running huge simulations (i.e Cybershake)1~2 Sprint
(depends on how complicated the method will be)
Get rid of default HPC deployed workflow and purely use environmentsRemove the default workflow and create a deafult (i.e. stable) HPC environment instead. Makes everything consistent, removes requirement to maintain deploy code?

1 day

Logging

Add decent logging for workflow, advantages:

  • Makes debugging significant easier
  • Gives idea of what is slow in automated workflow
  • Single log file with decent format, which can then be easily searched/filtered easily with some third-party or custom log viewer
1-2 Sprints depending on extend
Large quantity of dead scriptsRemove unused/outdated scripts2 day
Optimize estimation performancePrevent constant reloading of models, this should make estimation super fast (i.e. not noticeable). Currently the models are loaded from the files for every estimation, which is obviously slow1-2 days
Visualisation AutomationAdd options for enabling plotting/visualisations for All/First Realisation1-2 Sprints
Error handlingIdentify more cases which can be automatically checked / corrected rather than requiring manual intervention for runs.
Update Cybershake related code to fit recent yaml changesThis will deprecate unnecessary parameters and/or the use of redundant config files.1 Sprint
Change of realisation namesAlpineF2K_HYP01-47_S1244 to AlpineF2K_REL012 Sprints+



  • No labels