This is a preliminary document to outline the changes that we have made to the GM simulation work-flow in order to run it on Kupe (HPC 3).

Rewriting the scripts to Slurm

We have taken the base scripts used on Fitzroy and converted them to Slurm.

So far the GM simulation work-flow is completed for the manual interactive submission of a single simulation. We will work to extend this to automated simulations and Cybershake.

There is a very simple installing script for the work-flow that should be improved in the future.

Testing on Pan

We have run initial tests using the new Slurm-based work-flow on Pan.

Some interesting findings:

Initial runs on Kupe

Installation

The following list comes from what Jonney needed to do to get the work-flow running on Kupe

Initial test run

Once the above has been completed, we are able to run a sample simulation succesfully exactly as a reasearcher would do on FitzRoy (with the limitations noted above).

The first example was the Kelly fault from Cybershake 17p8. We have so far compared the LF parts and they are in perfect agreement as expected. Once the HF and BB parts are done, we will also compare them as needed.

The Kelly fault does not have any execution time information, so Jonney is re-running a simulation that has execution time from Cybershake 17p9.

In term of Core Hours, we have obtain this matrix by running a simulation on Brothers.

JobKupeFitz speed-up
Emod3d14342.43
Post-emod3d0.0166666670.424
HF0.54.28.4
BB0.141410
 14.679.65.45

Further testing

To test the scalability, we performed a run on a larger model (February 2011 by Hoby). This model has nx=1400,ny=1200,nz=460.

The following results come from the LF calculation using EMOD3D for this model. The matrix below shows the relations of Cores and mean time for 100 iterations.

Requested coresPhysical coresNodesMean time for 100 time steps
8080290.3
128802129.6
16080297.47
160160446.2
256160465.5
320160448.6

We will note that Kupe is using hyper-threading by default. If we request N nodes, the best performance will be given by 40*N cores, anything above it seems to penalize the execution time.

Based on the matrix above, we did and estimation of the full run time on Cant 2011 earthquake(with sim_duration=100.0), obtaining

 CPUtimeseccore hours for the LF part of the simulation
Kupe16002:30:009200408.9
Fitz51201:50:006600938.7
Speedup   2.3