HPC Jobs

The master node of a cluster is a shared resource for all users and is used for preparing, submitting and managing jobs. Never run any computationally intensive processes on the master node. Jobs are submitted from the master node, but they actually run on one or more of the compute nodes. The procedure by which jobs are allocated to compute nodes and managed during their lifetime is the responsibility of the resource manager and the job scheduler. Different clusters use different tools to manage resources and schedule jobs. Vikram-100 uses Platform LSF (Load Sharing Facility) which is part of the Platform HPC suite from IBM.

Jobs are submitted using the bsub command. There are two types of job that bsub will accept: interactive jobs and batch jobs. An interactive job provides a login session on a compute node. This enables you to interact directly with the compute node by issuing any sequence of commands within the login session. Consequently, interactive jobs are useful for experimentation and debugging. In contrast, a batch job is a scripted job that runs from start to finish without any user intervention. The vast majority of jobs on the cluster are batch jobs. This type of job is appropriate for production runs of several hours or days.

When you are wish to submit a batch job using bsub you will need to create a job script which specifies the resources that your job requires and calls your program. The general structure of a job script is shown below.
#!/bin/bash
#
#BSUB -J hybrid_job_name      # job name (optional)
#BSUB -W 00:10                # wall-clock time (hrs:mins) (optional)
#BSUB -n 24                   # number of tasks in job
#BSUB -q cpu                  # queue (optional)
#BSUB -e errors.%J.hybrid     # error file name in which %J is replaced by the job ID (optional)
#BSUB -o output.%J.hybrid     # output file name in which %J is replaced by the job ID (optional)
 
mpirun -np 24 ./program_name.exe
Please note that the lines beginning with #BSUB are NOT comments. They are instructions for the job scheduler and must not be omitted.
Previous | Next