Notes:

CBallany Fault outside of DEM bounds

OtaraEast1 - Optimised VM no longer on land

Day 1 (22-5-19)

  • CS Environment created
  • Start source/vm generation

Issues

  • Creation of source selection - 487 faults considered (needs streamlining)
  • Submission of job (nesi issue)


Day 2

  • SRF Generation Complete
  • VM Generation Done
  • Install Done

Issues

  • Submission of job (lack of automation issue)

Day 3

  • Ready to start!
  • 14 AhuririR LF runs completed

Issues

  • HF is taking more than 5x runtime to complete. – Solved HF DT issue

Day 6

  • HF Started
  • Up to 'W' for LF
  • 80k/~250k Core hours used

Issues

  • HF has random errors - has paused HF calculations. These are seemingly similar to the NeSI issues presented earlier

Day 7

EMOD3D: 79772.18/106396.26,

HF 1805.29/122388.69,

BB 264.47/7709.17,

Total 81841.93/236494.12 - 34.61%

Number of realisations completed: 485/11317 - 4.29%

10493/11317 realisations of EMOD3D have completed

Issues

  • Autosubmit crashing error found and hackfix implemented - proper fix to be done later


Day 10

  • Found the cause of HF simulations crashing. This is due to having longer path duration increases the array size (np2) requirement immensely.
  • Re-compiled binaries with array size of 2^17.
  • Resumed HF simulations.


Day 13

  • Unexpected hight usage of HF simulations.
  • Cybershake is fully stopped for investigation.


Day 14

  • Ran 12 variations of simplified HF simulations to investigate the cause.
  • Initial research shows that the path duration has huge impact on the simulation time.
  • Cybershake will not be restarted until we figured that running with different path duration is scientifically correct and needed.


Day 49 ( 16-July-2019)

  • Added Cap to HF simulation (5.4.5.2)
  • Restarted Cybershake

Day 51

  • Abnormal CH usage of HF with specific Fault on several compute Nodes (nid[270-275],[400-405])
        - reproduce-able when submitted to the same node

Day 51

  • Questionable Compute Nodes for Abnormal CH usage of HF seems to shifted (more nodes have weird CH, while the old list is normal)
        - Logged for future report to NeSI

Day 68 (05 - Aug - 2019 )




  • No labels