This document is a quick introduction to HPC at BlueFern. It contains basic informations for the impatient and assumes you already have successfully applied for access.
Lightweight work on the front-end node (the node that you log in to; either Foster or Kerr) is OK, but of course this node won't provide any more performance than an average desktop PC. For any kind of more demanding work, it is best to run interactive jobs on the development nodes, or to use a loadleveler batch job. The devlopment nodes are p3n14-c for AIX and p4n14-c for Linux, and you can ssh to them from the login nodes, but not from your desktop PC. There are special loadleveler classes to access these nodes; p7aix_dev and p7linux_dev respectively - more about loadleveler below.
There are a number of factors to consider when choosing a system/cluster for running jobs. The most important typically being:
If the job can be run in parallel, an important criterion to use in determining where to run it, is whether it can make use of multiple nodes (such as when using MPI) or has to be run on a shared-memory machine (such as when OpenMP is used).
Here are some general guidelines:
Some software is required to make use of our facilities:
You will need a terminal client program that supports the secure shell (SSH) protocol for network communications to remote servers. Users of Mac OS X and linux (and other unix variants) typically don't have to install anything and can use the built-in terminal programs.
If you are a Windows user, we recommend you using MobaXterm. It is an ssh client with built-in X11 and VNC server that does not need to be installed on a computer before you can use it. We have prepared a bundle that you can download here from BlueFern's wiki directly. After you download the bundle, unzip the bundle and double click the MobaXterm_Personal_5.1.exe to launch the tool. The bundle includes all of BlueFern's HPCs as saved sessions (bookmarks). If your user id on your local computer is different from your BlueFern user id, you will need to update saved favorites by entering your BlueFern id the saved sessions. If you need more help for using this tool, the vendors documentation is available here.
If you are using a UC student lab system then you should be aware that your environment is quite locked-down. You can only unzip the mobaxterm bundle in to a temporary directory or your home directory P:\ but after that you can copy a shortcut from that directory to your desktop to run mobaxterm. Also, incoming connections to your X server are banned by Active Directory policy, so you can only use ssh tunneling from a mobaxterm SSH session to start X clients on your desktop.
You can also use puTTY for ssh and either X-Win32 or Cygwin/X for the X11 server.You may need to ask your system administrator to install these for you. Cygwin/X is easy to install and it is done over the internet using a windows setup.exe program. See this page for more detailed instructions. There is an extensive list of SSH clients at http://en.wikipedia.org/wiki/Comparison_of_SSH_clients.
To use graphical programs on BlueFern systems and see the results on your local screen, you will need to run an X server program on your local computer. Linux users will find the X Window support already installed with most distributions. Mac OS X ship with a program called X11, which is not installed by default but is on the system disks. For Microsoft Window users, you can use the X11 tunnelling and forwarding options with Putt, see PuttyX11forwarding. Another option is to install Xming. If installing Xming, you should also install the optional font package. If your graphics hardware does not work well with Xming, you could try Xming-mesa, from the same site.
You will also need software that supports secure transfer of files between your computer and the BlueFern systems. The command line programs scp and sftp can be used from within terminal programs on Linux or Mac OS X computers. On Microsoft Windows platforms, MobaXterm provides a GUI interface for downloading and uploading files via the current connected scp or sftp session. Command scp and sftp are also available in MobaXterm's local shell as well. If you are a PuTTY users you can use pscp and psftp, which work in a similar way as scp and sftp. There is also another free tool called WinSCP, and it offers a graphical interface for file transfers.
Make sure you know your user name and password which BlueFern will send to you. You can access several different configurations, but normally students are assigned to hpclogin1 or bgfen1, and researchers to the other systems:
Host Name (Alias)
|Host Name (computer name)|
BlueFern p755 running AIX
BlueFern p755 running Linux
|Visualisation Cluster (Linux)||popper.canterbury.ac.nz||viz0.canterbury.ac.nz||18.104.22.168|
AIX Student system
BlueGene/L (Linux) student system
From PuTTY: fill in the PuTTY Configuration with Host Name from the above table, and you can add your username to the Auto-login username under Connectino->Data - then you can save the session identified by this Host Name.
From a Unix or Linux command line (such as from the BlueFern systems): use a command like ssh foster.canterbury.ac.nz to log in to the Blue Gene/P front end node, for example. You can add your username to the command with firstname.lastname@example.org to log in as ajd41 to the front end node, instead of your local user name.
The first time you log in several things will happen. ssh will warn you that it doesn't recognize the machine you are logging into and will ask you to confirm that you want to connect to this machine. Check the ip address against the value from the previous table and if everything match, type
yes. You will then be asked for your password. If this is your very first login on one of our systems you will also be instructed to change your password. Please do so by following the instructions that are given to you at that time.
From PuTTY, edit the session properties and go to Connection->SSH-X11 and tick Enable X11 Forwarding and make sure MIT-Magic-Cookie-1 is checked as well. From a Unix or Linux command line (such as from the BlueFern systems), use a command like ssh -X p3n14-c to forward your X11 display to the AIX development node.
It is up to you (within reason). There are a number of cases that we will review in turn.
All the different BlueFern systems (Power 755, BlueGene P, Visualisation cluster) are set up with one server that users have direct login access to, the "login node" (e.g kerr, p2n14, foster, viz0). The main computational clusters are being indirectly accessed by submitting batch job scripts via a scheduling system called "LoadLeveler", except for the visualisation cluster which utilizes a combination of SLURM and Vizstack to allocate resources interactively.
Whether it is your own or someone else, you will need to compile it from source for the system you are targeting. The process is slightly different for each of the three systems you can log into. You will also have a choice of compiling your code for serial or parallel execution. In the latter case you may need to rewrite some of your code to take full advantage of parallelism. The details of this are beyond the scope of this document but there is documentation on other pages of this wiki you can have a look at:
and some more on the internet about parallelizing your code with mpi:
You will need to transfer files in and out of the system. These files can be source code for your own programs, input data for your applications and more importantly results.
All the BlueFern systems run some version of the UNIX operating system and it is useful to have some knowledge of the shell and other command-line programs that you can use to manipulate files. If you are new to UNIX systems, we recommend that you work through one of the many online tutorials such as this "Tutorial for Beginners" http://www.ee.surrey.ac.uk/Teaching/Unix/index.html.
To transfer data to and from our system you need to use either scp (command line) or a sftp (secure file transfer protocol) client. Numerous Linux file managers support sftp. For Mac OS X and Windows users we recommend Filezilla. We have instructions on how to use Filezilla here.
Note that binary executable files from Microsoft Windows PCs will not run on any of our BlueFern systems. In order to work with such programs, you must obtain the source code and recompile for use on Linux or AIX. Unfortunately, not all programs will have such source code available.
Your files go into a "unix" type file system (For windows users, that means there are no drive letters). Your files will usually go into your own home directory called
/hpc/home/usercode. There are a few other places were you can put some files along with some conditions, here is a summary:
Your files have to be available to all the nodes for processing. To achieve this we use a special "parallel" file system developed by IBM: GPFS (see also here and the wikipedia entry). GPFS stands for General Parallel File System. The following page gives an overview of unix file system structure along with a review on basic navigation.
The way you organize your files is up to you, but it might be helpful to create separate subdirectories for each job that you submit and to have a separate directory for program source code.
Once your software and all of your data are on the system, you will want to get results. Our facilities are primarily set up to run jobs in the background in an efficient way. What we mean by this, is that you will have to figure out and decide how many processors your program may use (just the one if you kept it "serial") and for how long it should run. Once you have done this, you can write a small script and submit your job to the queues via LoadLeveler (see Getting Started on the Power755 Cluster and Getting started on the BlueGene P to submit jobs on the Power 755 and BlueGene P respectively). Typically you would transfer your files to your /hpc/scratch/usercode directory and submit jobs from there.
Alternatively, for experiments and small jobs you can use the parallel environment on the BlueFern cluster. There are two nodes you can use for this (one for AIX, the other for Linux). Once you have your program (or script calling your program), you have to create a
host.list file. The file should contain as many lines as the number of processors you want to use. Each line should be a valid node on the cluster. Here is a list of valid nodes you can use in your
host.list - a node can be repeated as many time as necessary.
Development-Compute LPAR Name
Once you have your
host.list ready – in the same directory as your program – you can execute you program with the following:
poe [executable] [options]
"Executable" is the name of the program/script you want to run and "options" can take many values. The most useful option for this simple guide is "
-procs n" where "
n" is the number of processors you want to use. You can find more details on using the parallel environment here. Note that poe relies on the file .rhosts in your home directory containing the name of the node where you submit the poe command. We set it up like this:
... and also the file must be readable only by you - use chmod 600 ~/.rhosts to change the permissions if necessary.
After completing simulations on the BlueFern systems, typically you will need to post-process some output data. Graphical display of the output can be useful for identifying bugs in programs or to help interpret and summarize results. BlueFern has dedicated hardware and software on the Visualisation cluster (viz0) dedicated to remote visualisation and client/server applications set-ups (e.g Paraview). The Visualisation cluster has access to the same file system as the other BlueFern systems where simulations are being run, and therefore there is no need for large transfer of files for analysis. For further information please check Getting Started on the Visualisation Cluster.
We are hoping that this quick tour is useful in getting you started. Once you have toyed around a little bit with this material, you may want to look at more detailed instructions on the wiki.