Job Scheduling and Queues

It is the responsibility of the job scheduler to determine when and where jobs will be run. The rules that influence these decisions are defined by the job scheduling policy. The job scheduling policy on Vikram-100 attempts to accommodate the various needs of its users and ensure all those who invest in it receive a fair share of its compute resources. The job scheduling policy apply equally to all users. When a job starts running on a compute node it will only be interrupted if it exceeds its resource requirements. Jobs will not be preempted to make way for a later job with higher priority.

There are several job queues on Vikram-100. But in general, you should not specify a queue name when submitting the job unless your requirement cannot be satisfied by the default queue, or for specific types of jobs. Jobs will be routed by default into an appropriate execution queue based upon job resource requirements. Following is the queue details of Vikram-100:

Queue Details

S. No Queue Name Maximum Cores (per user per job) Walltime Priority Remarks
1 short 2328 of 2328 15 mins 1 For compiling, debugging, short runs, etc.
2 defaultq 512 of 2328 1 week 2 Default Queue
3 medium 320 of 2328 15 days 3 For medium jobs
4 long 192 of 2328 30 days 4 For long running jobs
5 gpu 48 of 480 1 week 0 Only for GPU jobs
6 serial 1 of 2328 45 days 5 Only one core per job
7 smp 24 of 2328 30 days 6 Only for SMP jobs. Only one node per job.
8 garuda 240 of 240 7 Only for Garuda users

short queue

    An LSF queue called short has been created for test runs. Jobs in the short queue have a higher priority than any other jobs, but they are limited to 15 minutes of CPU time. You can submit a job to the short queue by specifying the queue name as an option via a #BSUB comment in the job script file,

    #BSUB -q short
    or on the bsub command when you submit the job:
    bsub -q short < jobfile 
    bsub -J jobname -q short -oo outfile.%J -eo errorfile.%J myprog

gpu queue

    The gpu queue has 20 GPU compute nodes and handles jobs that require access to Nvidia K40 cards for computation (e.g. CUDA, OpenACC Programs, etc). In this queue, users can run 5 jobs and queue 10 more. Jobs submitted in this queue does not count towards jobs submitted across any other queues. Submit your job to gpu queue ONLY if your job requires GPU access. Submitting a normal CPU job to gpu queue is strictly prohibited.

defaultq queue

    This is the default queue for the Vikram-100 HPC. You don't have to explicitly specify your job to run on this queue. This queue primarily includes 77 CPU compute nodes, however, if GPU nodes are free AND if all CPU nodes are occupied, it will schedule your jobs to the GPU nodes, albeit with low priority. It has a walltime of 1 week. If users require more walltime, they can submit to medium and long queues.

serial queue

    This queue must be used if your programs are serial (requiring only one CPU core to run). In this queue, users can run 36 jobs and queue 96 more, but each job can only access one CPU. Jobs submitted in this queue does not count towards jobs submitted across any other queues.

smp queue

    This queue must be used if your programs only runs on SMP systems and cannot span across nodes. In this queue, users can run 10 jobs. Jobs submitted in this queue does not count towards jobs submitted across any other queues.

Queue policy

  • Users can run a total of 5 jobs across any queue (except for ‘serial’ and ‘gpu’ queue) at a time and queue 5 more.
  • In ‘gpu’ queue, users can run 5 jobs and queue 10 more. Jobs submitted in this queue does not count towards jobs submitted across any other queues.
  • In ‘serial’ queue, users can run 36 jobs and queue 96 more, but each job can only access one CPU. Jobs submitted in this queue does not count towards jobs submitted across any other queues.
  • In ‘smp’ queue, users can run 10 jobs. Each job can only run on one node.
Previous | Next