Blog

The following packages have been updated to the latest version (October 2013):

  • xlf 11.1
  • xlc 9.0
  • XL MASS and SMP libraries
Concurrent Computing Course

Prof. Andrzej Bargiela, an Erskine visitor to UC, is teaching a short course on concepts of concurrent programming that students are invited to attend.  This short course will provide some important theoretical background for anyone interested in parallel algorithms and programming:


Lectures on Concurrent Computing Department of Computer Science & Software Engineering Lecturer: Prof. Andrzej Bargiela, University of Nottingham, UK Visiting Erskine Fellow (Feb.-May 2014) The general objective of this short course on Concurrent Computing will be to link the mathematical specification of algorithms, that may benefit from concurrent execution, with the necessary consideration of the correctness of the implementation of such algorithms. First four lectures will be given within COSC413 but additional participants are welcome. Lecture 1: 1.00pm-2pm, Wednesday, 26 March 2014, room Erskine 315 Concurrent computing abstraction:

introduction of a formalism that enables reasoning about correctness of concurrent programs; atomic instructions, interleaving and proofs by mathematical induction Lecture 2: 2.00pm-3pm, Wednesday, 26 March 2014, room Erskine 315 Mutual exclusion problem: discusion of the interdependence of concurrent programs and the various attempts to ensure correct execution of such programs using the most basic computer functionality that of memory interlock Lecture 3: 1.00pm-2pm, Wednesday, 2 April 2014, room Erskine 315 Fine-grained atomic operations - semaphore synchronization: refinement of synchronization of concurrent processes by building complex instructions Lecture 4: 2.00pm-3pm, Wednesday, 2 April 2014, room Erskine 315 Coarse-grained atomic operations – monitor synchronization: refinement of synchronization of concurrent processes by building abstract data types Optional tutorials/practicals Students will be offered an opportunity to undertake a project using an emulator of the ADA language. This provides an environment for empirical validation of the correctness of implementation of concurrent programs as well as facilitating an easy introduction to ADA programming. Additional Lectures at the beginning of May, the date(s) and room: TBA The objective will be to expand on the formal proof of correctness in the context of distributed computing environments. Lecture 5: ADA rendezvous for distributed synchronization: synchronization of concurrent processes executing on distributed hardware; a-symmetrical communication-based synchronization with a privileged process Lecture 6: Distributed mutual exclusion: synchronization of concurrent processes executing on distributed hardware; symmetrical communication-based synchronization with all processes being equal Lecturer profile: Andrzej Bargiela (www.bargiela.com) is Professor of Computer Science working at the University of Nottingham, UK. His external appointments included Visiting Professorships at the University of Alberta, Canada, Helsinki University of Technology, Finland, Tokyo Institute of Technology, Japan, University of Bari, Italy and Krakow Technical University, Poland. His research falls under the general heading of Computational Intelligence and involves study of representation of information and uncertainty, mathematical foundations of Granular Computing, information abstraction, human-centred information processing, fuzzy logic, parallel, distributed and neural computation and modeling and optimization of systems with structural and information uncertainty. He is Associate Editor of the IEEE Transactions on Systems, Man and Cybernetics, (Systems) and Associated Editor of Information Sciences. He served as President of the European Council for Modelling and Simulation and serves/served as reviewer for research funding bodies in UK, Germany, Italy and Poland.

A maintenance day is scheduled for Wednesday, April 9th.  All BlueFern systems are likely to be affected as various machines are shut down and restarted. Although the systems may be functioning correctly for brief periods, you are advised not to run any HPC jobs unless you're prepared for them to be halted suddenly.

Update 9/4/14:  Todays maintenance is now finished.  All systems should be operating normally.

 

 

Two scripts; hsubmit and cfxhold, have been created to control loadleveler jobs on Kerr that use Ansys CFX or Fluent licenses.  Here is the workflow to use them:

  1. Submit your CFX or Fluent script with, for example: hsubmit myscript.ll  Hsubmit works by changing the permissions of the directory where myscript.ll is situated to read-only, which has the effect of putting the submitted job into an immediate user hold.  After five seconds (to allow llsubmit to run through), hsubmit changes your directory permissions to allow writing again.
  2. A job in the user hold state won't run until the user issues llhold -r, and that's where cfxhold comes in.  Cfxhold has two modes of operation:
    1. Run cfxhold with no arguments, or incorrect arguments, to get a listing of the currently available HPC licenses.
    2. Run cfxhold with the number of HPC licenses you want (currently no more than 20 can be requested) and one or more loadleveler job IDs, and it will keep running and querying the Ansys license server until, firstly:  the requisite number of licenses are available and secondly: sufficient free cores are available for your job.  cfxhold will then issue an "llhold -r" command to release the job ID(s) you have specified.  For example: cfxhold 4 p1n14-c.70947.0 p1n14-c.70949.0 & will keep running in the background until four HPC licenses are available from the license server, and also four slots are free in the loadleveler class used for p1n14-c.70947.0 and p1n14-c.70949.0.  You can use llq -u $USER to list the loadleveler jobs you have in the queue.

Let me know if you run into any issues using these scripts.  See also the post about Setting up CFX and Ansys licenses and the CFX script on our LoadLeveler page.

Login Security Beefed Up

 

We have recently increased security on our systems with a program that blocks access for hosts from which repeated login failures originate.  We get tens of thousands of such failures every month on our systems from malicious internet hosts.  One side effect of this is that legitimate off-campus hosts may be blocked, if a user on that host has more than a few login failures.  If you think this is occurring for you or a colleague, please send in a support request to support.nesi.org.nz with the name or IP address of the host you're using and we will re-enable access for this host.

Part 1 of this webinar series will provide an overview of turbulence modelling techniques with a detailed focus on RANS models and accurately predicting the key flow features of wall-bounded flows.

 

Date: Wednesday Oct 30th, 2013

Time: 1:00pm - 2:00pm AEDST, 3pm NZ. 

See http://www.leapaust.com.au/solving-turbulent-flows-using-ansys-cfd-webinar-part-1 for further details.

A maintenance day is scheduled for Thursday, October 17.  Machines in the linux queues and the visualization cluster will be unavailable while new software is installed on them.

LEAP, the distributors of Ansys and many other CFD programmes, are running a CAD and CAE Competition open to any final year Undergraduates attending Australian or NZ Universities in 2013 Entries don't have to be in until October, but you must register very soon.  See http://www.leapaust.com.au/uniposter2013.html for more details.

 

We will be running some system maintenance procedures on July 17th that will take the BlueFern systems offline.  Users will be logged off and the loadleveler queues will be shut down.  More information will be added to this post later on.

System Maintenance Finished

The system maintenance scheduled for April 17th has now finished.

We will be running some system maintenance procedures on April 17th that will take the BlueFern systems offline.  Users will be logged off and the loadleveler queues will be shut down.

Loadleveler is currently draining job queues to take the systems offline.  If you have a loadleveler job that doesn't need the maximum wallclock limit of three days then you can submit it with the "# @ wall_clock_limit" directive set to make your job complete by 8am on Wednesday, and your job will still run, provided enough CPU resources exist for it.

Loadleveler Project Accounting

BlueFern is shortly to begin accounting for all projects.  Up until now most users have been assigned the default account of bfcs00000, but on April 8th we will remove access to this account (this is now done) and, consequently, you should begin using the "#@ account_no" clause in your loadleveler scripts.  For example:

#@ account_no = bfcs00247

You can find out what account_no code(s) to use in your loadleveler script by running the command

whatprojectami

which lists the accounts you have access to on the BlueGene/P and Power7 machines.  If whatprojectami doesn't list any accounts for you then please send an Email to bluefern@canterbury.ac.nz and we'll start the process to assign an account ID to you.

Dear BlueFern users,

After our successful maintenance day yesterday (January 17th, 2013), we have shifted the majority of nodes in the Power7 cluster from AIX to Linux.  This has made the cluster more friendly to open source software and you will now be able to run open source applications (often available only on the Linux side of the cluster) on a bigger scale.

As a part of this shift, we have also shifted the InfiniBand switch (used for high-performance communication between tasks running on multiple nodes) from AIX to Linux. (For compatibility reasons, the InfiniBand switch unfortunately cannot be used simultaneously from AIX and Linux).

This affects how parallel (MPI) jobs should be submitted - Linux jobs should now be using InfiniBand to get the best performance, while AIX jobs (which were previously using InfiniBand) cannot use InfiniBand anymore (if your AIX job requests InfiniBand, LoadLeveler will refuse to run the job with the error message: "cannot meet the adapter requirement".)

Therefore, the correct settings for your MPI jobs (as per the Getting Started on the Power755 Cluster page) are:

  • For Linux:

    # @ class = p7linux
    # @ rset = rset_mcm_affinity
    # @ task_affinity = core(1)
    # @ network.MPI_LAPI = sn_single,shared,US,,instances=2
    # @ queue
    
    # suggested environment settings:
    export MP_EAGER_LIMIT=65536
    export MP_SHARED_MEMORY=yes
    export MEMORY_AFFINITY=MCM
  • For AIX:

    # @ class = p7aix
    # @ rset = rset_mcm_affinity
    # @ task_affinity = core(1)
    # @ queue
    
    # suggested environment settings:
    export MP_SHARED_MEMORY=yes
    export MEMORY_AFFINITY=MCM

We hope you will benefit from this page.  If you run into any issue, please do not hesitate to contact the BlueFern team at bluefern@nesi.org.nz.

HPC maintenance finished

The BlueGene/P and Power7 systems are now available again.

Merry Christmas

 

BlueFern is closed over the Christmas/New Year break from Dec 21st, with staff available again on January 3rd.  We wish all our users a Merry Christmas and a restful New Year.