GPU Resource
Starting in HxMap 4.5, certain jobs require a GPU resource in order to run. Workflow Manager will write the job ad requiring the GPU resources in the HTCondor submit description (.sub) file automatically. The Cluster nodes need to be configured correctly in order to advertise the corresponding machine resource and match the job.
Node Configuration
Following parameters need to be added to the condor configuration on the compute node.
## This node has GPUs
use feature : GPUs
This will advertise each discrete GPU in the machine as a GPU resource available via HTCondor.
If the machine is configured with static slots add the GPU to the slot definition.
SLOT_TYPE_1 = CPUS=100%,GPUS=1
NUM_SLOTS_TYPE_1 = 1
The number of discrete GPU in the machine must be equal too or less than the number of slots advertised with GPU. It is possible to define slots with and without GPU on the same machine.
It can be helpful for testing to only allow the node to match jobs requesting GPU. This configuration may also be useful for production if purpose built machines are added with dedicated GPU resource.
## A node can only run jobs requesting GPU
START = (RequestGPUs > 0)
Configuration Test
Simple test described in Verifying your HTCondor installation can be adapted to verify the GPU job matching.
test.sub
# GPU requirements
request_GPUs = 1
require_gpus = Capability >= 6.1
The job should match to the nodes advertising the GPU resource.