Skip to end of metadata
Go to start of metadata

Current Questions

This page is for asking questions about UCSC.

I Can't Log in!

  1. In all cases, please send in a support request to support.nesi.org.nz with your BlueFern username.  If you've never logged in to our systems before then it's possible there is a problem with the password that we have emailed you.  If you have logged in before, there may be issues such as being over disc quota, etc.
  2. We run a program that blocks access for hosts from which repeated login failures originate.  We get tens of thousands of such failures every month on our systems from malicious internet hosts.  One side effect of this is that legitimate off-campus hosts may be blocked, if a user on that host has more than a few login failures.  The indicator of this effect is that ssh breaks the connection, without you ever getting a chance to enter your password.  If you think this is occurring for you or a colleague, please send in a support request to support.nesi.org.nz with the name or IP address of the host you're using and we will re-enable access for this host.

How do I use the BlueGene/P?

See Getting started on the BlueGene P

How do I use the power755 cluster?

See Getting Started on the Power755 Cluster

Why won't my job run?

See the RunningJobs and LoadLeveler pages for help on this.

How can I check what is running under my account and kill any runaway processes?

Use llkill

Llkill is a script to search the nodes on our HPC and visualization clusters.  It presents a list of processes that you own and gives you the option of killing them all, or selectively.  See the wiki page for more information or run "llkill --help".

Manually:

To check what processes are running under your account, use the ps command. To check what processes you own on your login session, run ps -U $USER and this will display something like:

    UID    PID    TTY  TIME CMD
   5501 176162 pts/16  0:00 bash
   5501 340014      -  0:00 sshd
   5501 545060      -  0:00 mysim.exe
   ...

Suppose you decide to stop the "mysim.exe" process listed above. We see that it has the process ID (PID) of 545060, so we can use the kill command like this: kill 545060 and this sends a TERMINATE signal to process 545060, which is mysim.exe. Or, you can use kill -INT 545060 to send an INTERRUPT signal to the process, which is the same effect as if you typed control-C when running the process interactively. You can use kill -INT -1 to interrupt every process on the machine, which is sometimes a handy "scattergun" approach.

The situation gets more messy if you have to check on all of our compute nodes (use llstatus to get a list of nodes). You can script the commands like this:

for M in p1n01-c p1n02-c p1n03-c p1n04-c p1n05-c p1n06-c p1n07-c p1n08-c p1n09-c p1n10-c p1n11-c p1n12-c p1n13-c
do
    echo "Tasks on $M"
    ssh $M ps axuww|fgrep $USER
done

To kill the process on a remote machine, use ssh to remotely execute it, eg: ssh p1n07-c kill -INT 545060 to kill PID 545060 on p1n07-c.

Can I check resources on all available machines and possibly specify one manually?

You can use llstatus to check the load on machines (it's the LdAvg column) and add the clause

# @ requirements  = (Machine=="p1n10-c")

to specify that this job has to run on, for example: p1n10. However, such a requirement is not advisable because you may well have an extra-long wait before a particular machine becomes available.

My POE jobs fail with permission errors

See the POE page

Loadleveler is asking for my "group": What group am I?

To submit a job via the scheduling system Loadleveler, you will need to specify the group you belong to. Loadleveler recognizes 4 groups only: NZ, NZ_merit, UC, or UC_merit. "Merit" means that funding has been obtained for a supercomputing resource and jobs that are associated with this a "merit" group will be scheduled to run at a higher priority than jobs in other groups. If you are unsure of which group or groups you belong to, run the the following command on one of the login nodes.

To specify your group in a loadelever script just add:

For example:

 


CategoryHpc
CategoryQuestions