HPC Job Scheduling and Queues | Physical Research Laboratory

Job Scheduling and Queues

It is the responsibility of the job scheduler to determine when and where jobs will be run. The rules that influence these decisions are defined by the job scheduling policy. The job scheduling policy on Vikram-100 attempts to accommodate the various needs of its users and ensure all those who invest in it receive a fair share of its compute resources. The job scheduling policy apply equally to all users. When a job starts running on a compute node it will only be interrupted if it exceeds its resource requirements. Jobs will not be preempted to make way for a later job with higher priority.

There are several job queues on Vikram-100. But in general, you should not specify a queue name when submitting the job unless your requirement cannot be satisfied by the default queue, or for specific types of jobs. Jobs will be routed by default into an appropriate execution queue based upon job resource requirements. Following is the queue details of Vikram-100:

Queue Details


S. No	Queue Name	Maximum Cores (per user per job)	Walltime	Priority	Remarks
1	short	2328 of 2328	15 mins	1	For compiling, debugging, short runs, etc.
2	defaultq	512 of 2328	1 week	2	Default Queue
3	medium	320 of 2328	15 days	3	For medium jobs
4	long	192 of 2328	30 days	4	For long running jobs
5	gpu	48 of 480	1 week	0	Only for GPU jobs
6	serial	1 of 2328	45 days	5	Only one core per job
7	smp	24 of 2328	30 days	6	Only for SMP jobs. Only one node per job.
8	garuda	240 of 240	∞	7	Only for Garuda users

short queue

An LSF queue called short has been created for test runs. Jobs in the short queue have a higher priority than any other jobs, but they are limited to 15 minutes of CPU time. You can submit a job to the short queue by specifying the queue name as an option via a #BSUB comment in the job script file,

#BSUB -q short

bsub

bsub -q short < jobfile 
bsub -J jobname -q short -oo outfile.%J -eo errorfile.%J myprog

gpu queue

gpu

defaultq queue

medium

long

serial queue

This queue must be used if your programs are serial (requiring only one CPU core to run). In this queue, users can run 36 jobs and queue 96 more, but each job can only access one CPU. Jobs submitted in this queue does not count towards jobs submitted across any other queues.

smp queue

This queue must be used if your programs only runs on SMP systems and cannot span across nodes. In this queue, users can run 10 jobs. Jobs submitted in this queue does not count towards jobs submitted across any other queues.

Queue policy

Users can run a total of 5 jobs across any queue (except for ‘serial’ and ‘gpu’ queue) at a time and queue 5 more.
In ‘gpu’ queue, users can run 5 jobs and queue 10 more. Jobs submitted in this queue does not count towards jobs submitted across any other queues.
In ‘serial’ queue, users can run 36 jobs and queue 96 more, but each job can only access one CPU. Jobs submitted in this queue does not count towards jobs submitted across any other queues.
In ‘smp’ queue, users can run 10 jobs. Each job can only run on one node.

Previous | Next