Notes:
CBallany Fault outside of DEM bounds
OtaraEast1 - Optimised VM no longer on land
Day 1 (22-5-19)
- CS Environment created
- Start source/vm generation
Issues
- Creation of source selection - 487 faults considered (needs streamlining)
- Submission of job (nesi issue)
Day 2
- SRF Generation Complete
- VM Generation Done
- Install Done
Issues
- Submission of job (lack of automation issue)
Day 3
- Ready to start!
- 14 AhuririR LF runs completed
Issues
- HF is taking more than 5x runtime to complete. – Solved HF DT issue
Day 6
- HF Started
- Up to 'W' for LF
- 80k/~250k Core hours used
Issues
- HF has random errors - has paused HF calculations. These are seemingly similar to the NeSI issues presented earlier
Day 7
EMOD3D: 79772.18/106396.26,
HF 1805.29/122388.69,
BB 264.47/7709.17,
Total 81841.93/236494.12 - 34.61%
Number of realisations completed: 485/11317 - 4.29%
10493/11317 realisations of EMOD3D have completed
Issues
- Autosubmit crashing error found and hackfix implemented - proper fix to be done later
Day 10
- Found the cause of HF simulations crashing. This is due to having longer path duration increases the array size (np2) requirement immensely.
- Re-compiled binaries with array size of 2^17.
- Resumed HF simulations.
Day 13
- Unexpected hight usage of HF simulations.
- Cybershake is fully stopped for investigation.
Day 14
- Ran 12 variations of simplified HF simulations to investigate the cause.
- Initial research shows that the path duration has huge impact on the simulation time.
- Cybershake will not be restarted until we figured that running with different path duration is scientifically correct and needed.
Day 49 ( 16-July-2019)
- Added Cap to HF simulation (5.4.5.2)
- Restarted Cybershake
Day 51
- Abnormal CH usage of HF with specific Fault on several compute Nodes (nid[270-275],[400-405])
- reproduce-able when submitted to the same node
Day 51
- Questionable Compute Nodes for Abnormal CH usage of HF seems to shifted (more nodes have weird CH, while the old list is normal)
- Logged for future report to NeSI