Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Stampede2 (TACC)
ModelDell PowerEdge C6320P/C6420
Number of CPUs

367,024

Xeon Phi 7250 68C 1.4GHz

Total Memory736Tb
SchedulerSLURM
Max num of submission per user

KNL: 1 node 68 cores (1 socket) = 272 hyper threads BUT 64-68MPI tasks advisable * 4200 KNL nodes (96Gb+16Gb)/node

SKX: 1 nodes 48 cores (= 2 sockets* 24 cores/socket) = 96 hyper threads * 1,736 nodes

QueueWall-clock limitMax Nodes/JobMax active jobs (running+waiting)
KNL


development2h16 (1,088 cores)1
normal48h256 (17,408 cores)50
large48h2048 (139,264 cores)5
long120h32 (2,176 cores)2
flat-quadrant48h32 (2,175 cores)5
SKX


skx-dev2h4 (192 cores)1
skx-normal48h128 (6,144 cores)25
six-large48h868 (41,664 cores)3

SKX is slightly more expensive than KNL

Dev env.Default compiler: Intel 18.
File system

$HOME: 10Gb (200,000 files)
$WORK: 1Tb (3mil files) : not for high IO, large files. nobackup, no purge

$SCRATCH: unlimited. nobackup, deleted if not accessed for 10 day.

 /nesi/project/nesi00213 == $HOME/project

/nesi/nobackup/nesi00213 == $HOME/nobackup or $SCRATCH/nobackp



Gotchas
Building
module add fftw3/3.3.8
module add intel/18.0.2
module add impi/18.0.2
module add cmake/3.10.2


MPI_C_LIB_NAMES = mpifort;mpi;mpigi;dl;rt;pthread
MPI_dl_LIBRARY = /usr/lib64/libdl.so
MPI_pthread_LIBRARY = /usr/lib64/libthread.so
MPI_rt_LIBRARY =  /usr/lib64/librt.so

By default gcc-6.5 creeps in and it attempts to build with gcc-6.5 instead of icc. Enforce it with CC=icc.

I found "make VERBOSE=1" extremely useful to debug building issues

Issue

emod3d has a rounding error issue with icc and returns wrong "ny" failing post-emod3d test. Jonney has a fix (RobG adds 0.5 instead of round() functionRob Graves fixed this by converting float to double in the function get_n1n2() in misc.c. The fix is included in 3.0.6 (On Nurion, however, this fix was found to be not enough)

Running

 

Project name must be CamelCase: DesignSafe-Graves

Slurm script needs -N for number of nodes

#SBATCH -N 4
#SBATCH --ntasks=160

Instead of "srun" it uses "ibrun"


Workflow

A number of hardcoded bits assuming NeSI machine need to be updated. Check workflow and qcore "stampede" branches.

https://github.com/ucgmsim/slurm_gm_workflow/tree/stampede

https://github.com/ucgmsim/qcore/tree/stampede

Usage check
(python3_stampede) sungbae@stampede21(1):~$ /usr/local/etc/taccinfo
---------------------- Project balances for user sungbae ----------------------
| Name Avail SUs Expires | |
| DesignSafe-Graves 19974 2020-09-30 | |
------------------------ Disk quotas for user sungbae -------------------------
| Disk Usage (GB) Limit %Used File Usage Limit %Used |
| /home1 0.8 10.0 7.82 1853 200000 0.93 |
| /work 10.0 1024.0 0.97 55539 3000000 1.85 |
| /scratch 11.0 0.0 0.00 4032 0 0.00 |
-------------------------------------------------------------------------------


Available 19974 SUs out of 20000.

...