Architecture Overview

Current : Daemon

 
  • QueueMgmt Daemon runs in Kupe (infinitely). Every 60 seconds, it performs
    1. Batch process the queue that contains SQL update commands (so that mgmt DB can tell the correct status of each job. batch processed to work-around SQL IO blocking issue)
    2. Query DB and submit BB job if both LF and HF have completed successfully
  • Currently generate SRFs,VMs are done by a separate workflow and these input files need to be placed at the agreed location.
  • Additional downstream calculations, such as IM calc and IM agg etc, can be easily accommodated.
  • QueueMgmtDaemon can run locally and interact with Kupe via SSH+Socket bypassing two-factor authentication process.

Cylc

 
  • Cylc is expected to be available for Maui/Mahuika
  • Cylc client will be running locally, and talk to HPC via SSH (NeSI needs to configure and allow us)
  • Cylc supports jinja2 and SLURM, and is expected to simplify the automation workflow.
  • For all N ruptures, Cylc can generate N DAGs: LFx & HFx -> BBx.
  • If it can seamlessly interact with SLURM (as promised) and retrieve the status of each job, we no longer need a separate mgmt DB, and utilize the success trigger from the successful completion of LFx and HFx to kick off BBx,
  • It is very very bulky, and has a steep learning curve. Documentation is lacking.

 

 

Proposed Steps

  1. Get the current solution to run from the local linux box. Easy
  2. Get help from Hillary Oliver to get the basic skeleton working and training
  3. Incorporate downstream calculations.

 

 

  • No labels