Background
- Create a model to estimate the wall clock time for LF, HF or BB run
Tasks:
- Collect, combine and transform metadata (done)
- Investigate and test a model -> Neural network (done)
- Streamline training and usage of NN (done)
- Add support for number of cores parameter (mostly done, require real data to test)
- Change submit scripts (submit_emod3d.py....) to use pre-trained model
- Change submit scripts to python3
- submit_emod3d (done, currently testing that it still works)
- Created python3 virtual environment for maui and mahuika
- Change submit scripts to python3
- Create script to estimate full run time for a folder of srf/vms
- Train actual model once maui data is available, i.e. after current cybershake run
- (Uncertainty?)
- (Visualisation?)
Documentation
- Also exists as Readme.md in slurm_gm_workflow/estimation/
Notes
- All of the estimation code is written in python3, the only exception is the write_jsons.py script used for metadata collection
Usage
Estimation is done using the functions inside estimate_WC.py, which load the pre-trained neural network and then run the estimation.
...
def convert_to_wct(core_hours):
pass
def get_wct(core_hours, overestimate_factor=0.1):
pass
Creating a pre-trained model
Building a pre-trained model consists of a two main steps, collect and format the data and then training the neural network
Colleting the metadata
1) To create the metadata files for a given Run, use the write_jsons.py (with the -sj flag) script, which will create a folder names "jsons" in each fault simulation folder. This jsons folder then contains a "all_sims.json" file, which has all the metadata for the faults realisations runs. E.g.
...
3) Steps 1 and 2 can be repeated for as many run folders as wanted. I would suggest putting all the resulting .csv files into a single directory, for easier loading when training the neural network model.
Training the model
The training of the different models (LF, HF, BB) is done using the train_model.py script, which takes a config file and the input data (either as directory or single input files). The input data is the saved dataframes from the previous step.
...