BlueGene L
Skip to end of metadata
Go to start of metadata

BlueFern also have a massively parallel computer that can be used: a BlueGene/L system from IBM.

Basic Blocks

Our BlueGene/L has a total of 4096 CPUs. The processors in question are a custom version of the PPC440 running at 700MHz. This may seem underpowered compared to more modern CPU but it is designed to get a good throughput with the memory which has a similar speed. It also offer a good ratio of MFlops by MWatts of power. The BlueGene/L is build over the following elements starting from the smallest:

  • Chip A dual core PPC440 as described above. This is also refereed as a compute node (2 cores)
  • Compute card a pair of compute nodes are assembled together along with 1GB of memory (4 cores)
  • Compute node card or Node card, there are two row of eight compute cards assembled on one of these (64 cores)
  • Midplane 16 node cards are stacked in one of these (1024 cores in one midplane)
  • Rack 2 midplane form a rack (2048 cores).

Our system consists of 2 racks for a total of 4096 cores. The compute node card can also include I/O cards as well as compute cards, they have 4 cores which we do not count as they are not used for actual computations.

Communication

Communication between nodes is vital in this kind of set up. In the BlueGene/L three kind of networks are used to make communication efficient:

  • 3D torus - the classic 2D torus is the doughnut. You get the doughnut shape by starting with a rectangular sheet of paper, when you reach an edge you go back to the opposite side. If you connect 2 opposite sides together you get something that looks like a tube, to get your 2D torus (or doughnut) you then connect the opposite edges of the tube. To create a 3D torus you start with a cube instead of a sheet of paper. The basic idea here is that each node is connected to its neighbors in all 3 dimensions (6 neighbors in total, 2 in each direction) and nodes at an edge are connected to nodes on the opposite side.
  • Collective network
  • The Barrier Network (Global interrupt)

The details of these networks is beyond the scope of this document. The last two networks are designed to improve performances of mpich, the message passing protocol which is the corner stone of parallel programming on this machine. The system is connected to the same GPFS file system available from the other BlueFern systems (P5-575).

Software configuration

The load of a full blown operating such as linux or AIX is counterproductive for massively parallel computer of this size. So instead a bare bone operating system has been created to operate the nodes. The BlueGene/L itself is designed only to do computations. For servicing, compiling and submitting jobs specific "nodes" are used which are in fact not part of the BlueGene/L itself. In our case these nodes are provided by our P5-575 cluster.

Because the system cannot be accessed directly, programs have to be cross compiled from another system. They will also need to queued with LoadLeveler from that system.

More details, on all the topics discussed here, can be found in the following IBM book. You can also find more resources on our wiki.

  • No labels