Howto

From Wiki

Jump to: navigation, search

Be sure to check the how not to page before running jobs to learn the rules of the road.

Contents

Get an account

In 2013, the maintenance and operation of the BASS transitioned to the UNC Computer-Science Facilities Support team. This makes it open for use for all CS account holders, but it requires a separate account to get a home directory on the machine. To get an account on BASS, send an email to bassaccounts@cs.unc.edu asking for one.

Users without an existing CS login must specify the information below. Incomplete requests will be delayed until all information is provided.

  • Desired Username
  • Email address where we can contact you
  • UNC PID (may be empty for off-campus users)
  • UNC account number from which the monthly $27 support fee will be charged (this is 5% of the standard CS monthly recharge-center fee because the account will not be receiving full-time support on all CS machines). Those wishing to pay through another mechanism should describe this in the email.

Get on the mailing list

You'll probably want to be on the bass@cs.unc.edu mailing list so that you can get updates about machine status and such. You can get onto the list using the interface at https://fafnir.cs.unc.edu/mailman/listinfo/bass.

Set up your environment

Setting environment for MPI

See more at Using MPI

Some versions of the Message-Passing Interface (MPI) we are using use secure-shell (SSH) connections to launch the parallel jobs. This means that you need to set up an SSH key pair so that you can log on from one grid node to another without a password. This is done using the ssh-keygen program, run man ssh-keygen for details on how this works. The basic approach is as follows:

cd
mkdir -p .ssh                     (Don't worry if this fails)
chmod 0700 .ssh
ssh-keygen -t dsa              (Press return to accept all defaults)
cd .ssh
touch authorized_keys
cat id_dsa.pub >> authorized_keys

Also, you need to be using the bash shell to run large MPI jobs; tcsh and other shells have too-small limits on the environment for this to work.

Selecting an MPI environment

There is a locally-built utility called pathmunge that can be used to set up your environment in several ways, one of which is to select an MPI version. To set the MPI version to openmpi-1.4.1, put the following in your .bash_profile file right after your PATH is set:

. /usr/local/bin/pathmunge.sh
pathmunge usempi openmpi-1.4.1

To see which versions are currently available, type

. /usr/local/bin/pathmunge.sh
pathmunge usempi list

Selecting a compiler

More details at Compiling

The default compiler is the version of GCC that the operating system shipped with (Gnu 4.1 as of 2009/01/31). There are other compilers available. To use the GCC 4.4 compilers, use gcc44 rather than gcc, and the same for g++44 and gfortran44.

To use the Sun Studio compilers using our locally-built pathmunge utility, place the following in your .bash_profile after your PATH is set:

. /usr/local/bin/pathmunge.sh
pathmunge prepend /opt/sunstudioceres/bin

Access AFS files

You can access files in AFS on the compile node, but not on other nodes (so don't rely on this for submitted jobs). This is because the AFS client has to be tightly coupled with the kernel and tends to make things unstable. Go ahead and copy files to your home directory that need to be accessed from the compute nodes. You can make a link to your AFS home directory from your bass home directory using a command like:

ln -s /afs/cs.unc.edu/home/`whoami` ~/unc_afs

There are two kinds of nodes:

  • The compile node (bass.cs.unc.edu) has access to your AFS directories.
  • The compute nodes (where parallel jobs run) do not have access to your AFS directories. The compute nodes run on the NFS file system local to the BASS, so all files needed to run your jobs must be copied to your BASS home directory space or other scratch space before execution.

Disk space configuration, access and usage

Each bass system has access to the following NFS3 mounted file systems:

/stage       16.0 Terabyte space located on file server bass-thor.cs.unc.edu
/home        2.0 Terabyte space located on file server bass-files.cs.unc.edu
/home1       2.0 Terabyte space located on file server bass-files.cs.unc.edu

NOTE: only the /home space is backed up to tape!

The bass-files server uses a Sun StorageTek 6140 disk array on a SAN network. The /stage data space is located on a Sun Model X4540 storage server. Each bass node system NFS 3 mounts from the bass-files and bass-thor file servers.

It is recommended to use your home directory for compiling and general work, this space IS backed up! None of the other space is backed up! You can access your home directory space for various files and executables. If you need to read/write large amounts of input and output data use the /stage space for this purpose.

NOTE: /stage should NOT be used for long term storage. For performance reasons /stage is raid0, (striping) across 46 disk drives. If one drive goes bad you can lose your data. You should move your data in/out of this space in a timely manner.

Getting data files to and from the bass system

You can use an sftp, secure ftp client, and connect to host bass.cs.unc.edu from anywhere to upload or download data to your home directory, /stage or /nanoscratch space. You may want to try a gui utility like FileZilla which has clients available for Linux, Windows, and MacOSX.

You can access files in /afs/cellname space on host bass.cs.unc.edu. Note that bass.cs.unc.edu is the only bass node that runs afs. For example you can copy files to/from your /afs/cs.unc.edu/home/user account on bass.cs.unc.edu.

You can connect to a samba server that is running on hosts bass-files.cs.unc.edu or host bass-thor.cs.unc.edu from a Windows client:

\\bass-files.cs.unc.edu\home
\\bass-files.cs.unc.edu\nanoscratch
\\bass-thor.cs.unc.edu\stage

You can use the linux smbclient, (ftp like client), utility to access these disk shares directly from a linux machine.

Note that the Windows smb disk share protocol is fire walled from outside the .cs.unc.edu domain. That is, you must have an IP address in the .cs.unc.edu domain to access samba or any Windows share. For example to connect to bass-files from a Windows machine, Click Start->Run and enter "\\bass-files.cs.unc.edu". If you are logged into a Computer Science machine you will not be prompted for a user/password. If you are not logged into a Computer Science machine you will need to enter your user name and password. Enter your user name as "user@cs.unc.edu".

Users in the NSRG group can access nanodata at /nanodata on all nodes. If you are in this group, but don't have access to /nanodata, send an email to the mailing list.

Submitting jobs

For an X-windows-based gridengine gui, run the qmon command after setting your DISPLAY variable.

The best way to run jobs with the Sun Grid Engine is to make a special script file that describes the parameters of the run and then submit that script using the qsub command to the grid engine. The contents of that script depend on how the job should communicate: examples are provided here for several common cases.

Running a set of independent non-parallel jobs on the GRID

If you want to run 10 copies of the same program with different inputs, the script could look like the following (from /home/examples/scripts/independent_jobs.bash):

# Special comment lines to the grid engine start
# with #$ and they should contain the command-line arguments for the
# qsub command.  See 'man qsub' for more options.
#
#$ -S /bin/bash
#$ -t 1-10
#$ -tc 4
#$ -o $HOME/tmp/$JOB_NAME.$JOB_ID.$TASK_ID
#$ -j y
#$ -cwd
# The above arguments mean:
#       -S /bin/bash : Run this set of jobs using /bin/bash
#       -t 1-10 : Run 10 separate instances with the SGE_TASK_ID set from 1 through 10
#       -tc 4 : Run only at most 4 jobs at a time.  ** Use -tc 100 if you submit > 100
#       -o : Put the output files in ~/tmp, named by job name and ID, and task ID
#       -j y : Join the error and output files for each job
#       -cwd : Run the job in the Current Working Directory (where the script is)

# The following are among the useful environment variables set when each
# job is run:
#       $SGE_TASK_ID : Which job I am from the above range
#       $SGE_TASK_LAST : Last number from the above range
#               (Equal to the number of tasks if range starts with 1
#                and has a stride of 1.)

# This will be run once on each of the compute nodes selected, with the variable "$SGE_TASK_ID" set
# to the correct instance.
echo "This is job $SGE_TASK_ID"

This will produce a number of files in ~/tmp, named after the script with the grid-engine job ID in the name, that list the output from each job. If you want to run program using a set of input files, you can name them file1 through file100 and use the following in place of echo:

myprogram < file$SGE_TASK_ID

If you prefer to use a command-line argument to tell which instance of the job you are running (perhaps seeding a random-number generator), you can pass the task ID on the command line:

myprogram $SGI_TASK_ID

The Sun Grid Engine scheduler will release each job as resources become available. The jobs will not be able to communicate with each other via either shared memory or MPI, and the jobs must not use multiple threads. If you want to use multiple threads, see the section below on running shared-memory parallel jobs. If you want to run 500 jobs, then you replace 1-10 with 1-500 above and use -tc 100 to limit grid engine to only running 100 jobs at a time.

Wei Liu has provided a helper script that will create a script to run a batch job. You can find it at File:Config.zip.

Note: If you submit a large number of jobs using a script that runs qsub multiple times (using Wei Liu's example or your own), you need to either run these jobs in the background (see below) or make sure that you limit the scripts to submit no more than 100 jobs at the same time. The -tc option to qsub takes care of this for you if you use the example script above.

Available queues

These are the queues available for user job submission on the machine. You normally select which queue to run on by requesting resources, rather than explicitly listing a queue.

  • background.q : This queue can only be used for single-processor or shared-memory jobs, but it does not accrue usage against your group. Jobs in the background queue run when there is not another job scheduled on a particular slot. They are stopped and re-started when other jobs are run on a node. Consider using this queue when you have a large number of sequential jobs -- it will keep the machine full without blocking regular jobs from running.
  • comp.q : The default CPU queue with 15 16-way shared-memory CPUs each with 32GB of RAM. This is where groups of single-processor jobs should normally be submitted.
  • himem.q : A 16-way shared-memory node with 128GB of RAM.
  • gpu.q : A queue with the same number of slots as there are GPUs on that node. This is currently 4 per node.
  • gpu1.q : A queue with one slot per GPU node, no matter how many GPUs are on that node. Jobs will be allocated in a round-robin fashion on this queue.
  • gpucomp.q : The graphics-processor queue (actually, the CPUs associated with this queue). CPU jobs should not normally be submitted to this queue.
  • all.q : All of the above processors. Special permission is required to submit to this queue.

Running a set of Matlab jobs on the GRID

The Matlab page describes how to submit a set of Matlab jobs to the grid.

Running background jobs on the GRID

The background queue provides a way to use spare cycles on the machine, which will help keep it busy all the time. This makes the managers of the machine happy, so we encourage its use by not counting background job time against your group's scheduling priority. Of course, if priority (non-background) jobs come in, then they will put the background jobs to sleep until they complete. If you have a large number of jobs to run with no particular deadline, consider running them in the background.

You do that by adding two arguments to your submission shell: '-l background' and '-P GROUPNAME-bg', where GROUPNAME is replaced by the name of your group. Here's how to run the same set of jobs as above, but on the CISMM background queue:

# Special comment lines to the grid engine start
# with #$ and they should contain the command-line arguments for the
# qsub command.  See 'man qsub' for more options.
#
#$ -S /bin/bash
#$ -t 1-10
#$ -o $HOME/tmp/$JOB_NAME.$JOB_ID.$TASK_ID
#$ -j y
#$ -cwd
#$ -l background
#$ -P CISMM-bg
# The above arguments mean:
#       -S /bin/bash : Run this set of jobs using /bin/bash
#       -t 1-10 : Run 10 separate instances with the SGE_TASK_ID set from 1 through 10
#       -o : Put the output files in ~/tmp, named by job name and ID, and task ID
#       -j y : Join the error and output files for each job
#       -cwd : Run the job in the Current Working Directory (where the script is)
#       -l background : Run on the background queue
#       -P CISMM-bg : Run on CISMM's background environment.

# The following are among the useful environment variables set when each
# job is run:
#       $SGE_TASK_ID : Which job I am from the above range
#       $SGE_TASK_LAST : Last number from the above range
#               (Equal to the number of tasks if range starts with 1
#                and has a stride of 1.)

# This will be run once on each of the compute nodes selected, with the variable "$SGE_TASK_ID" set
# to the correct instance.
echo "This is job $SGE_TASK_ID"

To find the available background queues, use 'qconf -sprjl' and look for ones with '-bg' at the end. The relevant ones for projects are BRIC-bg, CISMM-bg, CS-bg, IAM-bg, MBRL-bg, RENCI-bg, and VLP-bg.

You can run shared-memory parallel jobs on this queue using the '-pe smp N' option (where N is the number of cores you need). You cannot run GPU jobs or MPI jobs on the background queue because the resources cannot be guaranteed to foreground jobs, and because partially-stopped MPI jobs timeout and crash.

Run an MPI program on the GRID

The following is a sample (bash) shell script for "hello world" mpi program. Some examples of mpi are in the directory /home/examples/mpi on bass-comp0. This script is saved under the name mpi.sh in that directory. Use qsub mpi.sh to run the hello_world mpi program. The output will be placed in your home directory as mpi.sh.ojobid. Run the qstat command to determine the jobid. These examples can also be downloaded from File:Mpi talk.tar.gz. They were provided by Todd Gamblin from RENCI.

  • Run the following using qsub scriptname, where scriptname is the name you save the script under.
#$ -S /bin/bash
#$ -pe MPI 20
#$ -V
#$ -j y
#$ -cwd
# If using a starred MPI environment (See Using MPI):
`which mpirun` hello_world
# Otherwise:
`which mpirun` -np $NSLOTS -hostfile $TMPDIR/machines hello_world
# ---------------------------

Compiling To compile an MPI program, you must first setup your environment. To compile C/C++ programs so that they can run on the bass, use the 'mpiCC' command to compile in place of g++ or CC. This will know where to find all needed include files and libraries. More information about compiling is available at Using MPI.

Running an MPI job on the grid requires a slightly more complicated launch script. The script itself is run using the qsub command with the script as an argument. The following is an example TCSH script that will run a parallel ray-tracer from the examples directory.

  • Run the following using qsub scriptname, where scriptname is the name you save the script under (or /home/examples/scripts/mpi_raytrace.tcsh).
#$ -S /bin/tcsh -pe MPI 20 -V -o $HOME/tmp/$JOB_NAME.$JOB_ID -j y
#   -S /bin/tcsh : Run the jobs using /bin/tcsh on this script
#   -pe MPI 20 : Run in the "MPI" parallel environment, with 20 job slots
#       (MPI is the compute queue, gMPI is the GPU queue, aMPI is the all queue).
#   -V : All environment variables active within qsub should be exported to the job
#   -o : Put the output files in ~/tmp, named by job name and ID, and task ID
#   -j y : Join the error and output files for each job (must come after -o).
# NOTE: If your main shell is tcsh, you will only be able to submit jobs up
# to about 350 slots before the environment-variable length is increased.  To
# send larger jobs for now both your login shell and the shell used to run the
# job must be bash.

# $TMPDIR/machines is filled in by the Grid Engine 
# $NSLOTS holds the number of processes that have been run.
setenv WDIR /home/examples/tests/sge/mpi
cd $WDIR
# If using a starred MPI environment (See Using MPI):
`which mpirun` $WDIR/shootmpi -s 80 20 -r 1 $WDIR/1M1J.opt.wld ~/MPI_test.ppm
# Otherwise:
`which mpirun` -np $NSLOTS -hostfile $TMPDIR/machines $WDIR/shootmpi -s 80 20 -r 1 $WDIR/1M1J.opt.wld ~/MPI_test.ppm

It should complete in about five minutes once it has begun to run and produce a file named MPI_test.ppm in your home directory and an output file in a tmp directory under your home directory. The image file can be viewed with the GIMP program, or with irfanview

The following is a BASH-shell script to run the same program.

  • Run the following using qsub scriptname, where scriptname is the name you save the script under (or /home/examples/scripts/mpi_raytrace.bash).
#$ -S /bin/bash -pe MPI 20 -V -o $HOME/tmp/$JOB_NAME.$JOB_ID -j y
#   -S /bin/bash : Run the jobs using /bin/bash on this script
#   -pe MPI 20 : Run in the "MPI" parallel environment, with 20 job slots
#       (MPI is the compute queue, gMPI is the GPU queue, aMPI is the all queue).
#   -V : All environment variables active within qsub should be exported to the job
#   -o : Put the output files in ~/tmp, named by job name and ID, and task ID
#   -j y : Join the error and output files for each job (must come after -o).

# $TMPDIR/machines is filled in by the Grid Engine
# $NSLOTS holds the number of processes that have been run.
WDIR=/home/examples/tests/sge/mpi
cd $WDIR
# If using a starred MPI environment (See Using MPI):
`which mpirun` $WDIR/shootmpi -s 80 20 -r 1 $WDIR/1M1J.opt.wld ~/MPI_test.ppm
# Otherwise:
`which mpirun` -np $NSLOTS -hostfile $TMPDIR/machines $WDIR/shootmpi -s 80 20 -r 1 $WDIR/1M1J.opt.wld ~/MPI_test.ppm

If you are going to submit the script from the same directory it should run in, you can avoid the whole $WDIR setting above by adding the line #$ -cwd to the script -- that will run it in the current working directory at the time it is submitted.

Available MPI parallel environments

These are the parallel environments (-pe option) available for MPI user job submission on the machine:

  • MPI This is where MPI jobs should normally be submitted. It attempts to fill a node up with job instances before moving to the next node.
  • rrMPI : This parallel environment distributes job instances in a round-robin fashion to all of the available nodes.

Running a set of shared-memory parallel jobs on the GRID

To submit a number of independent jobs, each of which uses more than one shared-memory thread, submit using the '-t' option, but submit to a parallel environment rather than to a queue. If you want 10 jobs, each of which requires a 16-processor machine, use the following (/home/examples/scripts/independent_smp.bash):

#$ -S /bin/bash
#$ -t 1-10
#$ -o $HOME/tmp/$JOB_NAME.$JOB_ID.$TASK_ID
#$ -j y
#$ -pe smp 16
#$ -tc 6  # If we're running 16-way parallel, only do 6 at a time
#           to make sure that we don't use more than 100 cores at
#           once.

# Run the job.
echo "I am job $SGE_TASK_ID, and am running with 16 reserved processors"

OpenMP

The g++ compiler supports OpenMP when the -fopenmp flag is used at the compile and link lines. Only version 4.4 (gcc44, g++44) fully supports OpenMP; use that version. If you want an OpenMP job to only use a specific number of processors rather than all of the available processors when it runs, for example 8, add the line

export OMP_NUM_THREADS=8

into your script before the program-execution line. This is useful if you need be able to run on a host that has one or two jobs already running on it, or if you want to limit the size of each of your jobs so that multiple ones can fit on the same node. Make the number of processors requested match the number your job will use (-pe smp 8 in this case).

Compile and run CUDA programs

CUDA is the C-like language and environment developed by NVidia to enable general-purpose programming of the G80 and later series graphics cards. To get the CUDA SDK running on BASS, do the following:

  • Add /usr/local/cuda/bin to your PATH (in bash, this can be done by putting 'PATH=/usr/local/cuda/bin:$PATH' into your ~/.bash_profile; in tcsh and csh, this can be done by putting 'setenv PATH ${PATH}:/usr/local/cuda/bin' into your ~/.cshrc file).
  • Add /usr/local/cuda/lib64 to your LD_LIBRARY_PATH (in bash, this can be done by putting 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64:/usr/local/cuda/lib64' into your ~/.bash_profile file; in tcsh and csh, this can be done by putting 'setenv LD_LIBRARY_PATH /usr/local/cuda/lib64' into your ~/.cshrc file, or if you already have a LD_LIBRARY_PATH then 'setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/usr/local/cuda/lib64').

You should be able to compile and link CUDA programs on bass-comp0.

Testing

  • Download and run the latest NVIDIA_CUDA_SDK_*_Linux.run script (to be able to run it, you'll need to do 'chmod +x NVIDIA_CUDA_SDK_*' after downloading it. Tell it where you would like to put the resulting SDK source code. You have to tell it where to find CUDA, because it is not installed in the standard location; tell it to look in /usr/local/cuda.

Now you're ready to make the SDK example projects.

  • 'cd' into the directory where you installed the example projects.
  • Type 'make'.

Then you should be able to qlogin to a GPU node and run the programs (which are placed in bin/linux/release under your CUDA SDK directory).

Submitting your own jobs

Submit jobs to the GPU nodes by requesting a GPU host in your qsub invocation (-l gpu_host=TRUE,gpus=1). An example script that runs 10 copies of such a job follows:

#$ -S /bin/bash
#$ -t 1-10
#$ -o $HOME/tmp/$JOB_NAME.$JOB_ID.$TASK_ID
#$ -j y
#$ -cwd
#$ -l gpu_host,gpus=1

# Run the job. (Replace the "echo" command with your CUDA program)
echo "I am job $SGE_TASK_ID, and am running on a CUDA node"

To determine which GPU to run on, we make the assumption that the jobs are allocated sequentially and only one parallel job is on one node. In that case, you can use your job ID (available from SGE or MPI) and the number of GPUs on the node (available from CUDA) to determine which to use:

  • SGE: $TASK_ID % GPUS_on_this_node.
  • MPI: MPI_rank % GPUS_on_this_node.

Details on CUDA mutual exclusion

(Comments from John Stone at UIUC.) There's presently no mutual exclusion mechanism in CUDA at all. Every process that wants to use the CUDA devices has to "fend for itself". In essence, this is the same problem as when one runs two programs on a single node, and they both want to allocate all of one of the shared resources (e.g. RAM, /tmp space, etc). With modern OSs, one can deal with most of these issues now by using kernel-enforced process limits that restrict how much physical/virtual memory a process or process group can use, and these features are now built into most of the queueing systems and enforce resource usage policies. Presently, there's no analogous mechanism on CUDA, though it would certainly be nice to have one. I'd previously suggested to NVIDIA that it'd be nice to have driver flags or other configurable settings to control what CUDA devices show up as available, when processes query for them. At the time my thought was mainly to avoid using GPUs that were already under heavy graphics load for CUDA calculations, but the situation you have both described shows that there would be a benefit to having some form of "limit" system that could interact with queueing systems and such, much like we already have for CPU/memory/disk resources.

Run a job requiring OpenGL

(This approach currently fails due to not being able to get the console. We're debugging the process.)

Some programs that use graphics cards to compute require the ability to open OpenGL windows on the host. If you have such a job, you can get it working by doing the following:

 qlogin -now n -l gpus=4,gpu_host
 startx -- :0 >& ${HOME}/startx-`/bin/hostname -s`.log &
 sleep 5
 export DISPLAY=:0
 <run your GL application>
 kill -TERM `/sbin/pidof X`
 sleep 5

The portion after qlogin could be submitted as part of a command shell for jobs that have been submitted with qsub; in that case only one job should be run per node.

Determine The Node Type

It can be useful to know what type of node a process is running on and alter the process accordingly. Csh/tcsh have perl-like string matching functions:

#!/bin/tcsh
set NCPUS=1

if ( $HOSTNAME =~ *gpu* ) then
    echo "this is a gpu node!"
    set NCPUS=4
else if ( $HOSTNAME =~ *comp* ) then
    echo "this is a comp node!"
    set NCPUS=16
else if ( $HOSTNAME =~ *himem* ) then
    echo "this is a himem node!"
    set NCPUS=16
endif 

Be careful with the syntax of the script, as csh/tcsh are picky (especially with where the 'then' goes).

In Linux, the actual processor count on a given host is available via the file /proc/cpuinfo.

#!/bin/tcsh
@ cpuc = `grep processor /proc/cpuinfo | wc -l`

For the Nvidia GPU hosts, /proc/bus/pci/devices can be grepped for the string, "nvidia":

#!/bin/tcsh
@ gpuc = `grep -i nvidia /proc/bus/pci/devices | wc -l`

The bass-comp nodes return 0; the bass-gpu nodes return either 2 or 4, depending on how many cards/quadroplexes are installed.

Resource limits

Wallclock time

Maximum wallclock time limits allow the grid engine to be smarter with how it assigns jobs, as well as allows for reserving parts of the cluster for periods in the future.

The default maximum wallclock time is 2 days. If your job takes longer than this allotted time, it will be killed by the grid engine. If you require more than 2 days per job, send an email to the mailing list. If you know your job will require less time, it's good manners to request a smaller window of time from the grid engine. To do this, use the -l h_rt=<time> argument to qsub. The time argument can either be given in seconds or Hours:Minutes:Seconds. It benefits everybody to make sure your time estimates are conservative, but not too conservative.

If you submit jobs that are longer than two days, either make sure they don't take more than 50 slots total or contact Russ Taylor ahead of time for permission.

Accessing files from a job

  • The compute nodes do not have access to AFS space.
  • They do have access to your BASS home directory, via NFS mounts, using the same paths as the compile node. You can also access this from departmental Windows machines as \\bass-files.
  • They also have access to temporary storage local to each node, available in the $TMPDIR environment variable. This points to a local disk partition on each compute node. You can create files within the $TMPDIR directory to store temporary results that do not need to persist beyond the end of the job.
  • They also have access to a grid-wide temporary scratch space in /stage. This can be used to send data between the compile nodes and the compute nodes during a run but must be copied to permanent storage if it is to persist. The /stage partition is not backed up. From within the department, you can access this from a Windows machine as \\bass-thor\stage.

Watching the progress of your jobs

qstat: You can watch how the jobs you have submitted to the queue progress using the command qstat. It will show status qw when the job is queued and waiting, and status 'r' when the job is running. To see running and queued jobs from all users, run

qstat -u "*"

To see why one of your jobs (say job number 5534) is not running, use

qalter -w p 5534

This describes what queues have been tried and why they didn't get used.

tail: Also, you can tail -f on the output files in your home directory to see what output each is producing when it is running.

top: There are jobs running on each of the nodes once per minute and writing their output to /stage/top for each node. These list the CPU and memory usage of the top jobs on each node. qstat will tell you which nodes your jobs are running on, and you can watch the output in /stage/top to see their resource usage.

Ganglia: Finally, you can point your web browser at the ganglia server from within the computer-science department to see how busy the whole machine is, or how busy parts of it are. Note that there are background jobs running on the machine, so it may look full when in fact there is little or no wait for new jobs; clicking on one of the subcluster nodes (GPU or CPU) will show which jobs are foreground (blue) and which are background (yellow).

Email Notification

To receive email notification when your job starts and ends, add

-m be -M user@example

to your qsub script or command line.

Killing a job

If you realize that your job is running amok, you can kill it using qdel with the job ID listed when you submitted it (also shown in qstat).

Getting a shell for thread-parallel or CUDA jobs

Qsub is the preferred approach for launching jobs on the BASS. It enables maximum utilization of the machine, it does not require you to wait until a node is available to run your job, and it does not have the potential of leaving nodes on the machine idle after a job completes until the user remembers to log off.

If you need to develop and run thread-parallel programs that don't require running multiple jobs, and you want an interactive shell on a 16-way node to work on so that you don't clobber the compile node, you can use

qlogin -now n -pe smp 16

to get a shell on one of the comp nodes and allocate 16 CPUs for your use. If you only are going to use 4 CPUs, you can use -pe smp 4 (for example, if you're running Matlab). If you want to run on a particular host (for example, bass-gpu35), use -l hostname="bass-gpu35".

You can request the high-memory node (128GB) by doing:

qlogin -now n -l himem
qlogin -now n -pe smp 16 -l himem

The first entry will allocate one processor for your job. The second will allocate 16. You can put a number other than 16 to reserve only some of the processors. Remember to reserve as many processors as you will use to avoid having other jobs placed on the node to compete with your job. (Note: The himem node is exclusive-access, so taking even a single job on that node will prevent others from using it. Please do not use that node unless you require more than 32GB of memory to run.)

If you want a shell on a node that has two (or four) GPUs for CUDA work, you can use:

qlogin -now n -l gpus=2,gpu_host
qlogin -now n -l gpus=4,gpu_host

The -now n parameter tells the shell to wait if it cannot get a login shell immediately; this is useful because when the CPUs are full of quickly-ending jobs you will be able to wait for a shell.

Idle Shells

Idle BASH shells created using qlogin are killed after 1 hour. If the specifics of your job require a longer timeout or if you're running a GUI job that is triggering the auto-logout, you can redefine the TMOUT variable in your shell:

export TMOUT=7200 # A 2-hour timeout
export TMOUT=0 # No timeout
Personal tools