Child pages
  • Getting started on the BlueGene P
Skip to end of metadata
Go to start of metadata


Running Applications on the Blue Gene/P system

1. Access to the Blue Gene/P

Access to the Blue Gene/P is possible by using the Secure Shell (SSH) to login to the front-end Node called foster.canterbury.ac.nz. This is the node from which a user compiles and runs interactive jobs on the Blue Gene/P system.

2. Compiling Programmes

a. Introduction

The Blue Gene/P system uses the same XL family of compilers as the IBM Blue Gene/L system but has specific optimisations for the Blue Gene/P architecture. In particular, the XL family of compilers generate code appropriate for the double floating-point unit (FPU) of the Blue Gene/P system.

In addition to the XL family of compilers, the Blue Gene/P system supports a version of the GNU compilers for C, C++, and Fortran. These compilers do not generate highly optimised code for the Blue Gene/P system. In particular, they do not automatically generate code for the double FPUs, and they do not support OpenMP

The following versions of compilers are installed on the Blue Gene/P

IBM XL compilers
The following IBM XL compilers are supported for developing Blue Gene/P applications:

  • XL C/C++ Advanced Edition V9.0 for Blue Gene/P
  • XL Fortran Advanced Edition V11.1 for Blue Gene/P

GNU Compiler Collection
The standard GNU Compiler Collection V4.1.2 for C, C++, and Fortran is supported on the Blue Gene/P system. The current versions are:

  • gcc - V4.1.2
  • binutils - V 2.16
  • glibc - V2.4

b. Compiling MPI programmes for Blue Gene/P

The Blue Gene/P software provides several scripts to compile and link MPI programs. These
scripts make building MPI programs easier by setting the include paths for the compiler and
linking in the libraries that implement MPICH2 and other libraries that are required by Blue Gene/P MPI programs.

The following scripts are provided to compile and link MPI programs:

  • mpicc - GNU C compiler (4.1.2)
  • mpicxx - GNU C++ compiler (4.1.2)
  • mpif77 - GNU Fortran 77compiler (4.1.2)
  • mpif90 - GNU Fortran 90 compiler (4.1.2)
  • mpicc-4.3.2 - GNU C compiler (4.3.2)
  • mpicxx-4.3.2 - GNU C++ compiler (4.3.2)
  • mpif77-4.3.2 - GNU Fortran 77compiler (4.3.2)
  • mpif90-4.3.2 - GNU Fortran 90 compiler (4.3.2)
  • mpixlc - IBM XL C compiler
  • mpixlc_r - Thread-safe version of mpixlc
  • mpixlcxx - IBM XL C++ compiler
  • mpixlcxx_r Thread-safe version of mpixlcxx
  • mpixlf2003 IBM XL Fortran 2003 compiler
  • mpixlf2003_r Thread-safe version of mpixlf2003
  • mpixlf77 IBM XL Fortran 77 compiler
  • mpixlf77_r Thread-safe version of mpixlf77
  • mpixlf90 IBM XL Fortran 90 compiler
  • mpixlf90_r Thread-safe version of mpixlf90
  • mpixlf95 IBM XL Fortran 95 compiler
  • mpixlf95_r Thread-safe version of mpixlf95

Users are strongly encouraged to use the IBM XL compiler scripts which have their name ending in _r. These scripts ensure thread safe code is generated and call the IBM compilers underneath. The 4.3.2 version of the GNU compiler supports openmp while 4.1.2 doesn't.
For example, to compile and produce an optimised parallel Message Passing Interface (MPI) programme written in C a reasonable starting point, using the IBM XL scripts and compilers, is as follows:

mpixlc_r -O3 -qarch=450d -qtune=450 myprog.c -o myprog

c. Compiling a hybrid MPI and OpenMP programme for Blue Gene/P

The main difference between compiling pure MPI programme to a hybrid programme, containing both MPI and OpenMP statements, is the additional compiler option of -qsmp. For example to compile a hybrid C program called myprog.c the compile statement would look like

mpixlc_r -O3 -qarch=450d -qtune=450 -qsmp=omg myprog.cc -o myprog

3. Running Programmes

a. Introduction

Each compute node on the Blue Gene/P has four CPU cores and four Gigabytes of memory. It can operate in one of three modes to support either a pure MPI programme or a hybrid model of MPI and OpenMP.
A brief description of each mode, the number of MPI tasks and how memory is shared is provided in the following table

LoadLeveler provides the facility for submitting and monitoring batch jobs on the Blue Gene/P cluster. There are 2 job classes (queues), called 'bgp' and 'bgp_dev', for running programmes which can run in any of the three job modes. The 'bgp' class is for production runs whereas the 'bgp_dev' is for code development runs.

The characteristics of these 2 classes are shown below

Class Name
(Queue)

Max Number
of Compute
Nodes

Max No of MPI
tasks
(VN mode)

Max No of
MPI tasks
(Dual mode)

Max No of
MPI tasks
(SMP Mode)

Maximum
elapsed time
for a job

bgp

2048

8192

4096

2048

24 hours

bgp_dev

64

256

128

64

0.5 hours

Note: The minimum number of compute nodes allocated on the BlueGene P per job submission is 64 (as opposed to 32 on the BlueGene L).

b. Examples

Running a pure MPI job through LoadLeveler

Step1: An example LoadLeveler job file to run a parallel MPI job, using the Virtual Node mode, is shown below followed by an explanation of each line. In this example the file will be called mympi.ll.

# Example MPI LoadLeveler Job file
# @ shell = /bin/bash
#
# @ job_name = my_run
#
# @ job_type = bluegene
#
# @ wall_clock_limit     = 00:20:00
#
# Groups to select from: UC, UC_merit, NZ, NZ_merit
# @ group = NZ
# Your project number, either bfcs or nesi, followed by 5 digits
# @ account_no = nesi00000
#
# @ output               = $(job_name).$(schedd_host).$(jobid).out
# @ error                = $(job_name).$(schedd_host).$(jobid).err
# @ notification         = never
# @ class                = bgp
#
# @ bg_connection = prefer_torus
# @ bg_size = 64
#
# @ queue

/bgsys/drivers/ppcfloor/bin/mpirun -mode VN -np 256 -exe ./my_executable

Note

Icon

that all the lines in this command file are required, unless stated below. The meaning of each line in this command file is as follows:

# Example.... - is a comment provided a @ symbol does not follow the # symbol. Any line starting with # and followed by a @ symbol is interpreted by LoadLeveler.

# @ shell = /bin/bash - Specifies the Unix shell to be used for the job.

# @ job_name = my_run - This allows the user to give a name to the job. This is not mandatory, but is useful for identifying output files.

# @ job_type = bluegene - informs LoadLeveler that the job is to be run on the Blue Gene/P

# @ wall_clock_limit = 00:20:00 - Specifies a wall clock limit of 20 minutes for the job. The wall clock limit has the format hh:mm:ss or mm:ss and cannot exceed 24 hrs.

# @ group = NZ - specifies the group that the user belongs to. The name of the group the user belongs to will be provided when they register to use the system. Loadleveler recognizes 4 groups only: NZ, NZ_merit, UC, or UC_merit. If you are unsure of of your group, run the the command "whatgroupami" on the login node.

# @ account_no = bfcs00000 - is the project number. This is the number we issue to you when you register a project. It can be either bfcs (UC loadleveler groups) or nesi (NZ loadleveler groups) followed by five digits. You can find all the active projects you are participating in by running the command "whatprojectami" on the login node.

# @ output = $(job_name).$(schedd_host).$(jobid).out
# @ error = $(job_name).$(schedd_host).$(jobid).err
The above lines specify the files to which stdout and stderr from the job will be redirected. There is no default, so the user must set something here. The use of $(schedd_host).$(jobid) is recommended as this matches the hostid/jobid reported by the LoadLeveler command llq.

# @ notification = never - Suppresses email notification of job completion.

# @ class = bgp - specifies the job is to be submitted to the 'bgp' job class. There is currently only one job class

# @ bg_connection = prefer_torus -informs LoadLeveler that there is a preference to have partition that makes up the compute nodes wired as a torus. If this is not possible then LoadLeveler will provide a *mesh partition.

# @ bg_size = 64 - specifies to LoadLeveler the size of the job in units of compute nodes.

Note:

Icon

The smallest number of compute nodes that can be allocated is 64.

# @ queue - This line tells LoadLeveler that this is the last LoadLeveler command in the job file.

/bgsys/drivers/ppcfloor/bin/mpirun -mode VN -np 256 -exe ./my_executable

/bgsys/drivers/ppcfloor/bin/mpirun - is the name of the MPI job launcher

-mode VN - starts the compute nodes up in Virtual Node Mode

-np 256 - is the total number of instances of the MPI programme to start. This is satisfied by starting 64 compute nodes (defined by the LoadLeveler statement # @ bg_size = 64) and launching 4 MPI tasks on each of the nodes

exe ./my_executable - informs mpirun that the name of the MPI executable to launch is called _my_executable

Step 2: Submit the file just created to LoadLeveler using the following command:

llsubmit mympi.ll

Step 3: Monitor the progress of the job using the following command:

llq -b
Running a hybrid MPI and OpenMP job through LoadLeveler

Step1: An example LoadLeveler job file to run a hybrid MPI and OpenMP job, using the SMP Node mode, is shown below. In this example the file will be called hybrid.ll.

# Example MPI/OpenMP  LoadLeveler Job file
# @ shell = /bin/bash
#
# @ job_name = my_run
#
# @ job_type = bluegene
#
# @ wall_clock_limit     = 00:20:00
#
# @ group = UC
# @ account_no = bfcs00000
#
# @ output               = $(job_name).$(schedd_host).$(jobid).out
# @ error                = $(job_name).$(schedd_host).$(jobid).err
# @ notification         = never
# @ class                = bgp
#
# @ bg_connection = prefer_torus
# @ bg_size = 64
#
# @ queue

/bgsys/drivers/ppcfloor/bin/mpirun -mode SMP -np 64 -env
OMP_NUM_THREADS=4 -exe ./my_executable

Note

Icon

The difference between this LoadLeveler script and the previous one is the mpirun statement. An explanation of this statement is provided below:

/bgsys/drivers/ppcfloor/bin/mpirun - is the name of the MPI job launcher

-mode SMP - starts the compute nodes up in Symmetrical Multiprocessing (SMP) mode

-np 64 - is the total number of instances of the MPI programme to start i.e. 1 MPI instance will be started on each of the 64 compute nodes. This number has to match the number specified by the LoadLeveler statement # @ bg_size

-env OMP_NUM_THREADS=4 - states that 4 OpenMP threads will be started on each compute node when the OpenMP portions of the hybrid programme are executed.

-exe ./my_executable informs mpirun that the name of the MPI executable to launch is called _my_executable

Step 2: Submit the file just created to LoadLeveler using the following command:

llsubmit hybrid.ll

Step 3: Monitor the progress of the job using the following command:

llq -b
  • No labels