Unlike SLURM used by NeSI and TACC,  KISTI Nurion uses PBS (Portable Batch System). This means job submission as well as the queue (and job) monitoring may need a change

A job can be only submitted from /scratch

A PBS script usually has an extension name .sh, and must contain all the options below.


#PBS -V

keep the env variables

#PBS -N

set the name of the job

#PBS -q

set the queue for the job

#PBS -l

set the compute resource. eg)

select=4:ncpus=32:mpiprocs=32:ompthreads=1 

#select (num of nodes) ncpus (num process * num threads per node) mpiproces (num process per node)

NOTE: for Python multiprocessing module, try the setting like:

select=1:ncpus=64:mpiprocs=4:ompthreads=16

The example above was tested with VM generation running 4 Python multiprocessing processes, where each process deploys 16 openMP threads

#PBS -A

Add info about the job (for statistical purpose) QuakeCoRE will be "inhouse"


Example PBS script (Extension name is .sh)
#!/bin/sh
#PBS -N IntelMPI_job 
#PBS -V 
#PBS -q normal 
#PBS -A inhouse 
#PBS -l select=4:ncpus=32:mpiprocs=32:ompthreads=1
#PBS -l walltime=04:00:00  # normal queue maximum is 48h


cd $PBS_O_WORKDIR

module purge
module load craype-mic-knl intel/18.0.3 impi/18.0.3 python/3.7.0

mpirun ./test_mpi

Environment variables

PBS_JOBIDjob id
PBS_JOBNAMEjob name assigned by the user
PBS_NODEFILEcontains a list of compute nodes (nodes) allocated to the job
PBS_O_PATH

Value of PATH from submission environment

PBS_O_WORKDIRabsolute path where "qsub" was executed
TMPDIR

The job-specific temporary directory for this job. Defaults to /tmp/pbs.job_id on the vnodes.

Useful Commands

qsub hello.sh : Submit hello.sh

qdel <jobid> : Cancel the job

qsig -s <suspend/resume> <job id>

showq : Show queue

pbs_status : Show idle resource per queue

pbs_queue_check : Show list of queues that can be used with the current account


qstat -uuser's own job only
qstat -TRemaining time of jobs in the queue
qstat -iOnly see jobs in Q/H state
qstat -fSee details
qstat -xSee completed jobs
(python3_nurion) [x1746a08@login04 Hossack_HYP01-10_S1244]$ qstat -u x1746a08
pbs:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
3811446.pbs     x1746a08 normal   run_emod3d  54332   3 204     -- 00:33 R 00:00




(python3_nurion) [hpc11a02@login03 v20p4p90]$ qselect -u $USER |xargs qdel   (stop all user's job)



Supplying arguments to PBS script

qsub -v arg1="$var1/path",arg2='$foo',arg3=3 -otherflags script.sh

Then you can access values of arg1,arg2,arg3 inside the script by $arg,$arg2 and $arg3

quoting usage are bash-like:
" " will translate the variables inside into actual values. 

' '  will be taken literally

Parallel Loop: Embarrassingly parallel execution


Bash For 

it works, but not tested if it is indeed launching 4 separate processes. Note & at the end of the command, and "wait" after the for loop.

#!/bin/bash
# script version: pbs

#PBS -N par_loop
#PBS -V
#PBS -q normal
#PBS -A inhouse
#PBS -l select=1:ncpus=4:mpiprocs=4:ompthreads=1
#PBS -l walltime=00:00:05
#PBS -W sandbox=PRIVATE


for i in `seq 4`;
do
    python $PBS_O_WORKDIR/hello.py $i > $PBS_O_WORKDIR/outfile$i &
done
wait



Job Arrays

qsub -J 1-8 my_job.sh

This my_job.sh 8 times with 8 different IDs. Inside the script my_job.sh, this id is kept in $PBS_ARRAY_INDEX.


1-8:2 - the jobs submitted will include { 1 3 5 7 }.


#!/bin/bash
# script version: pbs
#PBS -N job_array
#PBS -V
#PBS -q normal
#PBS -A inhouse
#PBS -l select=1:ncpus=1:mpiprocs=1:ompthreads=1
#PBS -l walltime=00:00:05
#PBS -W sandbox=PRIVATE


python $PBS_O_WORKDIR/hello.py ${PBS_ARRAY_INDEX} > $PBS_O_WORKDIR/outfile$PBS_ARRAY_INDEX




Example LF/HF/BB PBS scripts


EMOD3D PBS script
#!/bin/bash
#PBS -N run_emod3d.Hossack_HYP01-10_S1244
#PBS -V
#PBS -q normal
#PBS -A inhouse
#PBS -l select=3:ncpus=64:mpiprocs=64:ompthreads=1
#PBS -l walltime=00:33:00
#PBS -W sandbox=PRIVATE

module purge
module add craype-network-opa intel/18.0.3 craype-mic-knl impi/18.0.3 python/3.7
export gmsim_root=/home01/x1746a08/gmsim
source $gmsim_root/Environments/virt_envs/python3_nurion/bin/activate

SUCCESS_CODE=0

export outfile=$PBS_O_WORKDIR/result_lf
touch $outfile
rm $outfile

export runtime_fmt="%Y%m%d_%H%M%S"

echo `date +$runtime_fmt` >>$outfile
mpirun $gmsim_root/tools/emod3d-mpi_v3.0.4 -args "par=$PBS_O_WORKDIR/LF/e3d.par"

end_time=`date +$runtime_fmt`
echo $end_time >>$outfile

#run test script and update mgmt_db
#test before update
ln -s $PBS_O_WORKDIR/LF/e3d.par $PBS_O_WORKDIR/LF/OutBin/e3d.par
timestamp=`date +$runtime_fmt`
test_cmd="$gmsim/workflow/scripts/test_emod3d.sh $PBS_O_WORKDIR Hossack_HYP01-10_S1244"
res=`$test_cmd`

success=$?

# Below is to work-around the cacheing issue on Maui.
#if [[ $success == $SUCCESS_CODE ]]; then
#    sleep 2
#    echo "Success 1" >> $outfile
#    res=`$test_cmd`
#    success=$?
#fi
if [[ $success == $SUCCESS_CODE ]]; then
    #passed
    echo "Success:" $res >> $outfile
else
   echo "Fail" $res >> $outfile
fi


HF PBS script
#!/bin/bash
#PBS -N sim_hf.Hossack_HYP01-10_S1244
#PBS -V
#PBS -q normal
#PBS -A inhouse
#PBS -l select=4:ncpus=64:mpiprocs=64:ompthreads=1
#PBS -l walltime=00:30:00
#PBS -W sandbox=PRIVATE

module purge
module add craype-network-opa intel/18.0.3 craype-mic-knl impi/18.0.3 python/3.7
export gmsim_root=/home01/x1746a08/gmsim
source $gmsim_root/Environments/virt_envs/python3_nurion/bin/activate

export outfile=$PBS_O_WORKDIR/result_hf
touch $outfile
rm $outfile

runtime_fmt="%Y-%m-%d_%H:%M:%S"
start_time=`date +$runtime_fmt`
echo $start_time >> $outfile
mkdir -p $PBS_O_WORKDIR/HF/Acc
mpirun python $gmsim/workflow/scripts/hf_sim.py $PBS_O_WORKDIR/../fd_rt01-h0.400.ll $PBS_O_WORKDIR/HF/Acc/HF.bin -m $gmsim_root/VelocityModel/Mod-1D/Cant1D_v3-midQ_OneRay.1d --duration 36.42 --dt 0.005 --sim_bin $gmsim_root/tools/hb_high_binmod_v5.4.5 --version 5.4.5 --dt 0.005 --rvfac 0.8 --sdrop 50 --path_dur 1 --kappa 0.045 --seed 34580 --slip $PBS_O_WORKDIR/../../../Data/Sources/Hossack/Stoch/Hossack_HYP01-10_S1244.stoch
end_time=`date +$runtime_fmt`
echo $end_time >> $outfile


timestamp=`date +%Y%m%d_%H%M%S`
#test before update
test_cmd="$gmsim/workflow/scripts/test_hf.sh $PBS_O_WORKDIR"
echo $test_cmd >> $outfile
res=`$test_cmd`
if [[ $? == 0 ]]; then
    #passed
    echo "Success:" $res >> $outfile
else
   echo "Fail" $res >> $outfile
fi



BB PBS script
#!/bin/bash
# BB calculation
#PBS -N sim_bb.Hossack_HYP01-10_S1244
#PBS -V
#PBS -q normal
#PBS -A inhouse
#PBS -l select=4:ncpus=64:mpiprocs=64:ompthreads=1
#PBS -l walltime=00:30:00
#PBS -W sandbox=PRIVATE

module purge
module add craype-network-opa intel/18.0.3 craype-mic-knl impi/18.0.3 python/3.7
export gmsim_root=/home01/x1746a08/gmsim
source $gmsim_root/Environments/virt_envs/python3_nurion/bin/activate

export outfile=$PBS_O_WORKDIR/result_bb
touch $outfile
rm $outfile

runtime_fmt="%Y-%m-%d_%H:%M:%S"
start_time=`date +$runtime_fmt`
echo $start_time >> $outfile
mkdir -p $PBS_O_WORKDIR/HF/Acc

start_time=`date +$runtime_fmt`
echo $start_time >> $outfile


echo "Computing BB"
mkdir -p $PBS_O_WORKDIR/BB/Acc
mpirun  python $gmsim/workflow/scripts/bb_sim.py $PBS_O_WORKDIR/LF/OutBin $PBS_O_WORKDIR/../../../Data/VMs/Hossack $PBS_O_WORKDIR/HF/Acc/HF.bin $gmsim_root/StationInfo/non_uniform_whole_nz_with_real_stations-hh400_v18p6.vs30 $PBS_O_WORKDIR/BB/Acc/BB.bin --flo 0.25 --version 3.0.4 --site_specific False --fmin 0.2 --fmidbot 0.5 --lfvsref 500.0

end_time=`date +$runtime_fmt`
echo $end_time

timestamp=`date +%Y%m%d_%H%M%S`
#test before update
test_cmd="$gmsim/workflow/scripts/test_bb.sh $PBS_O_WORKDIR"
echo $test_cmd >> $outfile
res=`$test_cmd`
if [[ $? == 0 ]]; then
   #passed
    echo "Success:" $res >> $outfile
else
   echo "Fail" $res >> $outfile
fi
  • No labels