Ant cluster

Overview

Ant is a university cluster available to ICP-affiliated personnel. Software can be loaded via Spack or EESSI.

All of the storage available on the cluster is handled with a BeeGFS file system. In total, we provide 224 TB of storage shared by all users with 4.7 Gbps write rate. Note that each user is not given a specified amount of storage on the cluster and this can cause issues when it is expended.

In total, the cluster consists of 1152 compute cores split across 18 nodes using a 25 Gbps interconnect. Each node is equipped with 2 NVIDIA GPUs.

Type

Nodes

Cores

RAM

GPU

Node list

Standard

18

2x32

384 GB

2x NVIDIA L4

compute[01-18]

Login

1

2x32

384 GB

1x NVIDIA L4

This cluster is managed by the ICP. If there are any issues, please submit an issue here. The cluster was acquired via a DFG grant in 2022 [1] and became operational in March 2024.

Partitions and nodes

All compute nodes are in a single partition, ant, which is selected by default. For short debugging jobs on the login node, use --partition=debug.

Type

Nodes

Time limit

RAM

GPU

Node list

ant*

18

2 days

384 GB

2x NVIDIA L4

compute[01-18]

debug

1

20 min

384 GB

1x NVIDIA L4

Standard nodes

There are 18 standard nodes equipped with 2 AMD EPYC 9374F (32 cores, 64 threads, 3.85 GHz, 320 W), 384 GB of RAM DDR5, Mellanox ConnectX-4 MCX4121A-ACAT (25 Gbps) interconnect. Each node has 2 NVIDIA L4 (24 GB GDDR6). All standard nodes follow the naming convention compute[x] where they can take on values compute01 ... compute18.

Login nodes

There is 1 login node with the exact same characteristics as the standard nodes, hold 1 GPU.

Filesystems

The following mount points are available:

  • /work/${USER}: 204 TB of storage shared by all users, \(n\times\)2.7 Gbps parallel read rate and \(n\times\)2.1 Gbps parallel write rate, with \(n\) the number of nodes that read or write concurrently (BeeGFS)

  • /home/${USER}: 4 TB of storage shared by all users, 1.6 Gbps read rate, 850 Mbps serial write rate on the head node (ZFS), 530 Mbps serial write rate on compute nodes (NFS)

  • /dev/shm/: 190 GB of RAM storage (tmpfs) shared by all users, 2.7 Gbps serial write rate, 10.7 Gbps serial read rate

Data stored under /work/${USER} is accessible from the ICP network under /beegfs/ant_work/${USER} with a 120 Mbps serial read rate. Subsequent read operations may have a 37 Gbps read rate if the file hasn’t been modified and the same parts of the file are read, thanks to caching.

Data stored under /home/${USER} needs to be transferred via scp, which has a 110 Mbps serial read rate.

Access

Frank Huber is in charge of creating user accounts on Ant. All ICP students and staff members are entitled to accessing Ant. For external personnel, Christian Holm must personally approve access. To log into Ant, refer to Logging into clusters. See also Using Ant for building software and submitting jobs.

Obligations

DFG funding must be acknowledged as follows: [2]

We acknowledge funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through the Compute Cluster grant no. 492175459.