You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 32 Next »

cNeSI Maui/Mahuika


MauiMahuika
ModelCray XC50Cray CS400
Number of CPUs18,650x2.4Ghz Skylake (1node = 80 virtual cores)8,424 x 2.1GHz Broadwell (1node = 72 virtual cores)
Total Memory66.8Tb30 Tb
SchedulerSLURMSLURM
Max num of submission per user
QueueWall-clock limitNodesCPU/NodeMax Mem/Node
nesi_research24 h26440 (80)80 or 160Gb

Max CPU request: 240 nodes = 9,600 phy.= 19,200 virt. cores

Max Node Hours : 1200 node-hours

eg.) requesting 240 nodes means wall clock limited to 5 hours.


QueueWall-clock limitNodesCPU/NodeMax Mem/CPUMax Mem/Node
large3days226721500Mb108Gb
long3weeks69721500Mb108Gb
prepost3h5726800Mb480Gb
bigmem7days4726800Mb480Gb
hugemem7days0.512830Gb4000Gb
gpu3days4813500Mb108Gb
ga_bigmem7days1726800Mb480Gb
ga_hugemem7days112830Gb4000Gb

Max CPU request: 576 CPUs (8 full nodes)

Max num of jobs (submit): 1000

Max Core hours per job: 20,000 hrs.

Dev env.

File system

Gotchas

TACC Stampede2


Stampede2 (TACC)
ModelDell PowerEdge C6320P/C6420
Number of CPUs

367,024

Xeon Phi 7250 68C 1.4GHz

Total Memory736Tb
SchedulerSLURM
Max num of submission per user

KNL: 1 node 68 cores (1 socket) = 272 hyper threads BUT 64-68MPI tasks advisable * 4200 KNL nodes (96Gb+16Gb)/node

SKX: 1 nodes 48 cores (= 2 sockets* 24 cores/socket) = 96 hyper threads * 1,736 nodes

QueueWall-clock limitMax Nodes/JobMax active jobs (running+waiting)
KNL


development2h16 (1,088 cores)1
normal48h256 (17,408 cores)50
large48h2048 (139,264 cores)5
long120h32 (2,176 cores)2
flat-quadrant48h32 (2,175 cores)5
SKX


skx-dev2h4 (192 cores)1
skx-normal48h128 (6,144 cores)25
six-large48h868 (41,664 cores)3

SKX is slightly more expensive than KNL

Dev env.Default compiler: Intel 18.
File system

$HOME: 10Gb (200,000 files)
$WORK: 1Tb (3mil files) : not for high IO, large files. nobackup, no purge

$SCRATCH: unlimited. nobackup, deleted if not accessed for 10 day.

 /nesi/project/nesi00213 == $HOME/project

/nesi/nobackup/nesi00213 == $HOME/nobackup or $SCRATCH/nobackp



Gotchas
Building
module add fftw3/3.3.8
module add intel/18.0.2
module add impi/18.0.2
module add cmake/3.10.2


MPI_C_LIB_NAMES = mpifort;mpi;mpigi;dl;rt;pthread
MPI_dl_LIBRARY = /usr/lib64/libdl.so
MPI_pthread_LIBRARY = /usr/lib64/libthread.so
MPI_rt_LIBRARY =  /usr/lib64/librt.so

By default gcc-6.5 creeps in and it attempts to build with gcc-6.5 instead of icc. Enforce it with CC=icc.

I found "make VERBOSE=1" extremely useful to debug building issues

Issue

emod3d has a rounding error issue with icc and returns wrong "ny" failing post-emod3d test. Rob Graves fixed this by converting float to double in the function get_n1n2() in misc.c. The fix is included in 3.0.6 (On Nurion, however, this fix was found to be not enough)

Running

 

Project name must be CamelCase: DesignSafe-Graves

Slurm script needs -N for number of nodes

#SBATCH -N 4
#SBATCH --ntasks=160

Instead of "srun" it uses "ibrun"


Workflow

A number of hardcoded bits assuming NeSI machine need to be updated. Check workflow and qcore "stampede" branches.

https://github.com/ucgmsim/slurm_gm_workflow/tree/stampede

https://github.com/ucgmsim/qcore/tree/stampede

Usage check
(python3_stampede) sungbae@stampede21(1):~$ /usr/local/etc/taccinfo
---------------------- Project balances for user sungbae ----------------------
| Name Avail SUs Expires | |
| DesignSafe-Graves 19974 2020-09-30 | |
------------------------ Disk quotas for user sungbae -------------------------
| Disk Usage (GB) Limit %Used File Usage Limit %Used |
| /home1 0.8 10.0 7.82 1853 200000 0.93 |
| /work 10.0 1024.0 0.97 55539 3000000 1.85 |
| /scratch 11.0 0.0 0.00 4032 0 0.00 |
-------------------------------------------------------------------------------


Available 19974 SUs out of 20000.


KISTI Nurion



Nurion (KISTI)
ModelCray CS500
Number of CPUs

570,020

Xeon Phi 7250 68C 1.4Ghz

Total Memory
SchedulerPBS
Max num of submission per user

KNL: 1 node 68 cores (1 socket) * 8305 nodes (96Gb+16Gb)/node

SKL: 1 node 40 cores (2 sockets * 20 cores/socket) * 132 nodes (192Gb/node)

QueueWall-clock limitMax Nodes/JobMax running jobsMax active jobs (running+waiting)
KNL



exclusiveunlimited2600 (176,800 cores)100200

normal

(82Gb)

48h4970 (337,960 cores)3040

long

(82Gb)

120h3001020
flat (102Gb)48h1801020

debug

(82Gb)

48h2 (20 avail)22
SKL



commercial48h118 (4720cores)26
norm_skl48h118(4720cores)510
Dev env.
File system
Gotchas

Building EMOD3D was somewhat tricky. I ended up having my own version of CMake 3.9 (existing module has no ccmake, and later versions of CMake are buggy), and fftw3 (existing module didn't have fftw3f, and CMake failed to pick up.


Originally build with Intel tool chain, but EMOD3D had rounding error issues, and it generates incompatible random numbers (different from Maui). For best (and consistent) result, using GNU tool chain is highly recommended.


The following modules are used. craype-network-opa

gcc

craype-mic-knl

openmpi


Don't bother with fftw3 module. We need to build fftw3 from scratch: only fftw3f (single) version is needed.

FFTW3

export MPICC='mpicc -fPIC -march=knl'

export CC='gcc -fPIC -march=knl'

./configure --enable-float --enable-sse --enable-threads --host=x86_64-pc-linux --enable-shared --prefix=/home01/hpc11a02/gmsim/Environments/nurion/ROOT/local/gnu

make all install


EMOD3D


mkdir build

cd build

export FFTW_DIR=/home01/hpc11a02/gmsim/Environments/nurion/ROOT/local/gnu

cmake -DFFTW_DIR=$FFTW_DIR -DCMAKE_INSTALL_PREFIX:PATH=/home01/hpc11a02/gmsim/tools ../

make

make install


(cmake - DCMAKE_INSTALL_PREFIX:PATH=.... is the same ./configure --prefix=.... )


GMT

Prerequisite

  • curl
  • sqlite-snapshot-202004061816,
  • zlib-1.2.11,
  • libpng-1.6.37,
  • tiff-4.1.0,
  • GraphicsMagick-1.3.35,
  • proj-7.0
  • gdal-3.0.1

Except for GDAL, this works:

$ PKG_CONFIG_PATH=/home01/hpc11a02/gmsim/Environments/nurion/ROOT/local/gnu/lib/pkgconfig ./configure --prefix=/home01/hpc11a02/gmsim/Environments/nurion/ROOT/local/gnu

make all install


For GDAL,

CPPFLAGS=-I/home01/hpc11a02/gmsim/Environments/nurion/ROOT/local/gnu/include PKG_CONFIG_PATH=/home01/hpc11a02/gmsim/Environments/nurion/ROOT/local/gnu/lib/pkgconfig ./configure --prefix=/home01/hpc11a02/gmsim/Environments/nurion/ROOT/local/gnu --with-proj=/home01/hpc11a02/gmsim/Environments/nurion/ROOT/local/gnu


For GMT,

go to build

cmake -DDCW_PATH:PATH=/home01/hpc11a02/gmsim/Environments/nurion/ROOT/share/dcw-gmt-1.1.4 -DGSHHG_PATH:PATH=/home01/hpc11a02/gmsim/Environments/nurion/ROOT/share/gshhg-gmt-2.3.7 ../


make all install


!WARNING!

"qsub" MUST be executed in $SCRATCH directory.


Usage check

isam

$ lfs quota -h /home01

$ lfs quota -h /scratch

For details of PBS, see PBS page.


  • No labels