What is a Cluster

A cluster is a collection of many computers or nodes. When you log into a cluster you are actually accessing the master node of the cluster. The master node is the public face of the cluster — it is the only one of the cluster nodes that is directly visible and accessible to the network. The other nodes in the cluster, including the storage node and all the compute nodes, communicate with the master node and each other via an internal private network.

The master node is a shared resource for all cluster users. It is used for preparing, submitting and managing jobs as well as for transferring files. Never run any computationally intensive processes on the master node. Jobs are submitted from the master node, but they actually run on one or more of the compute nodes. The procedure by which jobs are allocated to compute nodes and managed during their lifetime is the responsibility of the resource manager and the job scheduler. Vikram-100 uses a resource manager known as IBM Platform Computing.

Jobs are submitted using a queue submission command. There are two types of job that will be accepted: interactive jobs and batch jobs. An interactive job provides a login session on a compute node. This enables you to interact directly with the compute node by issuing any sequence of commands within the login session. Consequently, interactive jobs are useful for experimentation and debugging. In contrast, a batch job is a scripted job that runs from start to finish without any user intervention. The vast majority of jobs on the cluster are batch jobs. This type of job is appropriate for production runs of several hours or days.

Previous | Next