HPC Jobs
The master node of a cluster is a shared resource for all users and is used for preparing, submitting and managing jobs. Never run any computationally intensive processes on the master node. Jobs are submitted from the master node, but they actually run on one or more of the compute nodes. The procedure by which jobs are allocated to compute nodes and managed during their lifetime is the responsibility of the resource manager and the job scheduler. Different clusters use different tools to manage resources and schedule jobs. Vikram-100 uses Platform LSF (Load Sharing Facility) which is part of the Platform HPC suite from IBM.
Jobs are submitted using the bsub
command. There are two types of job that bsub will accept: interactive jobs and batch jobs. An interactive job provides a login session on a compute node. This enables you to interact directly with the compute node by issuing any sequence of commands within the login session. Consequently, interactive jobs are useful for experimentation and debugging. In contrast, a batch job is a scripted job that runs from start to finish without any user intervention. The vast majority of jobs on the cluster are batch jobs. This type of job is appropriate for production runs of several hours or days.
#!/bin/bash # #BSUB -J hybrid_job_name # job name (optional) #BSUB -W 00:10 # wall-clock time (hrs:mins) (optional) #BSUB -n 24 # number of tasks in job #BSUB -q cpu # queue (optional) #BSUB -e errors.%J.hybrid # error file name in which %J is replaced by the job ID (optional) #BSUB -o output.%J.hybrid # output file name in which %J is replaced by the job ID (optional) mpirun -np 24 ./program_name.exePlease note that the lines beginning with
#BSUB
are NOT comments. They are instructions for the job scheduler and must not be omitted.