Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Background

  • Create a model to estimate the wall clock time for LF, HF or BB run

Tasks:

  • Collect, combine and transform metadata (done)
  • Investigate and test a model -> Neural network (done)
  • Streamline training and usage of NN (done)
  • Add support for number of cores parameter (mostly done, require real data to test)
  • Change submit scripts (submit_emod3d.py....) to use pre-trained model
    • Change submit scripts to python3
      • submit_emod3d (done, currently testing that it still works)
      • Created python3 virtual environment for maui and mahuika 
      Change all submit script dependencies to be python 2 and 3 compatible (done, currently testing)
  • Create script to estimate full run time for a folder of srf/vms
  • Train actual model once maui data is available, i.e. after current cybershake run
  • (Uncertainty?)
  • (Visualisation?)

Documentation

  • Also exists as Readme.md in slurm_gm_workflow/estimation/


Notes

  • All of the estimation code is written in python3, the only exception is the write_jsons.py script used for metadata collection

Usage

Estimation is done using the functions inside estimate_WC.py, which load the pre-trained neural network and then run the estimation.

...

def convert_to_wct(core_hours):
    pass
    
def get_wct(core_hours, overestimate_factor=0.1):
    pass

Creating a pre-trained model

Building a pre-trained model consists of a two main steps, collect and format the data and then training the neural network

Colleting the metadata

1) To create the metadata files for a given Run, use the write_jsons.py (with the -sj flag) script, which will create a folder names "jsons" in each fault simulation folder. This jsons folder then contains a "all_sims.json" file, which has all the metadata for the faults realisations runs. E.g.

...

3) Steps 1 and 2 can be repeated for as many run folders as wanted. I would suggest putting all the resulting .csv files into a single directory, for easier loading when training the neural network model.

Training the model

The training of the different models (LF, HF, BB) is done using the train_model.py script, which takes a config file and the input data (either as directory or single input files). The input data is the saved dataframes from the previous step.

...