Using bwForCluster¶
Login¶
There are several gateways that redirect to any of the login nodes in a load-balanced way:
Hostname |
Node type |
---|---|
|
login to one of the two Helix login nodes |
|
login to one of the four JUSTUS 2 login nodes |
|
login to one of the two JUSTUS VNC visualization login nodes |
Host key fingerprint for Helix:
Algorithm |
Fingerprint (SHA256) |
---|---|
RSA |
|
ECDSA |
|
ED25519 |
|
Your username for the cluster will be your ICP ID with an st_
prefix. For example, if your ID is ac123456
, then your Helix username will be st_ac123456
.
More details can be found in the wiki pages of the clusters.
Building dependencies¶
Python¶
# last update: December 2024
module load compiler/gnu/12.1 mpi/openmpi/4.1
CLUSTER_PYTHON_VERSION=3.12.4
curl -L https://www.python.org/ftp/python/${CLUSTER_PYTHON_VERSION}/Python-${CLUSTER_PYTHON_VERSION}.tgz | tar xz
cd Python-${CLUSTER_PYTHON_VERSION}/
./configure --enable-optimizations --with-lto --prefix="${HOME}/bin/cpython-${CLUSTER_PYTHON_VERSION}"
make -j 10
make install
make clean
Boost¶
# last update: December 2024
module load compiler/gnu/12.1 mpi/openmpi/4.1
mkdir boost-build
cd boost-build
BOOST_VERSION=1.82.0
BOOST_DOMAIN="https://boostorg.jfrog.io/artifactory/main"
BOOST_ROOT="${HOME}/bin/boost_mpi_${BOOST_VERSION//./_}"
mkdir -p "${BOOST_ROOT}"
curl -sL "${BOOST_DOMAIN}/release/${BOOST_VERSION}/source/boost_${BOOST_VERSION//./_}.tar.bz2" | tar xj
cd "boost_${BOOST_VERSION//./_}"
echo 'using mpi ;' > tools/build/src/user-config.jam
./bootstrap.sh --with-libraries=filesystem,system,mpi,serialization,test
./b2 -j 4 install --prefix="${BOOST_ROOT}"
FFTW¶
# last update: December 2024
module load compiler/gnu/12.1 mpi/openmpi/4.1
mkdir fftw-build
cd fftw-build
FFTW3_VERSION=3.3.10
FFTW3_ROOT="${HOME}/bin/fftw_${FFTW3_VERSION//./_}"
curl -sL "https://www.fftw.org/fftw-${FFTW3_VERSION}.tar.gz" | tar xz
cd "fftw-${FFTW3_VERSION}"
for floating_point in "" "--enable-float"; do
./configure --enable-shared --enable-mpi --enable-threads --enable-openmp \
--disable-fortran --enable-avx --prefix="${FFTW3_ROOT}" ${floating_point}
make -j 10
make install
make clean
done
CUDA¶
# last update: August 2023
module load compiler/gnu/12.1 devel/cuda/12.1
export CLUSTER_CUDA_ROOT="${HOME}/bin/cuda_12_1"
mkdir -p "${CLUSTER_CUDA_ROOT}/lib"
ln -s "${CUDA_HOME}/targets/x86_64-linux/lib/stubs/libcuda.so" "${CLUSTER_CUDA_ROOT}/lib/libcuda.so"
ln -s "${CUDA_HOME}/targets/x86_64-linux/lib/stubs/libcuda.so" "${CLUSTER_CUDA_ROOT}/lib/libcuda.so.1"
Building software¶
ESPResSo¶
Release 4.2:
# last update: August 2023
module load compiler/gnu/12.1 mpi/openmpi/4.1 devel/cmake/3.24.1 devel/cuda/12.1
CLUSTER_FFTW3_VERSION=3.3.10
CLUSTER_BOOST_VERSION=1.82.0
export BOOST_ROOT="${HOME}/bin/boost_mpi_${CLUSTER_BOOST_VERSION//./_}"
export FFTW3_ROOT="${HOME}/bin/fftw_${CLUSTER_FFTW3_VERSION//./_}"
export CUDA_HOME="${CUDA_PATH}"
export CUDA_ROOT="${CUDA_PATH}"
export LD_LIBRARY_PATH="${BOOST_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="${FFTW3_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_HOME}/targets/x86_64-linux/lib/stubs"
git clone --recursive --branch 4.2 --origin upstream \
https://github.com/espressomd/espresso.git espresso-4.2
cd espresso-4.2
sed -ri 's/find_package\(PythonInterp 3\.[0-9] REQUIRED/find_package\(PythonInterp 3.6 REQUIRED/' CMakeLists.txt
python3 -m pip install --user 'cython>=0.29.21,<3.0'
python3 -m pip install --user -c "requirements.txt" setuptools numpy scipy vtk
mkdir build
cd build
cp ../maintainer/configs/maxset.hpp myconfig.hpp
sed -i "/ADDITIONAL_CHECKS/d" myconfig.hpp
cmake .. -D CMAKE_BUILD_TYPE=Release -D WITH_CUDA=ON -D WITH_CCACHE=OFF -D WITH_SCAFACOS=OFF -D WITH_HDF5=OFF
make -j 4
Release 4.3:
# last update: December 2024
module load compiler/gnu/12.1 mpi/openmpi/4.1 devel/cuda/12.1
CLUSTER_FFTW3_VERSION=3.3.10
CLUSTER_BOOST_VERSION=1.82.0
CLUSTER_PYTHON_VERSION=3.12.4
export BOOST_ROOT="${HOME}/bin/boost_mpi_${CLUSTER_BOOST_VERSION//./_}"
export FFTW3_ROOT="${HOME}/bin/fftw_${CLUSTER_FFTW3_VERSION//./_}"
export CUDA_HOME="${CUDA_PATH}"
export CUDA_ROOT="${HOME}/bin/cuda_12_1"
export LD_LIBRARY_PATH="${BOOST_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="${FFTW3_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_HOME}/targets/x86_64-linux/lib/stubs"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_ROOT}/lib"
"${HOME}/bin/cpython-${CLUSTER_PYTHON_VERSION}/bin/python" -m venv "${HOME}/venv"
source "${HOME}/venv/bin/activate"
git clone --recursive --branch python --origin upstream \
https://github.com/espressomd/espresso.git espresso-4.3
cd espresso-4.3
python3 -m pip install -c "requirements.txt" cython setuptools numpy scipy vtk cmake
mkdir build
cd build
cp ../maintainer/configs/maxset.hpp myconfig.hpp
sed -i "/ADDITIONAL_CHECKS/d" myconfig.hpp
cmake .. -D CUDAToolkit_ROOT="${CUDA_HOME}" \
-D CMAKE_BUILD_TYPE=Release -D ESPRESSO_BUILD_WITH_CUDA=ON \
-D ESPRESSO_BUILD_WITH_CCACHE=OFF -D ESPRESSO_BUILD_WITH_WALBERLA=ON \
-D ESPRESSO_BUILD_WITH_SCAFACOS=OFF -D ESPRESSO_BUILD_WITH_HDF5=OFF
make -j 10
Submitting jobs¶
Batch command:
sbatch job.sh
Job script:
#!/bin/bash
#SBATCH --partition=cpu-single # Helix offers a variety of partitions
#SBATCH --job-name=test
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --output %j.stdout
#SBATCH --error %j.stderr
# last update: December 2024
module load compiler/gnu/12.1 mpi/openmpi/4.1 devel/cuda/12.1
CLUSTER_FFTW3_VERSION=3.3.10
CLUSTER_BOOST_VERSION=1.82.0
export BOOST_ROOT="${HOME}/bin/boost_mpi_${CLUSTER_BOOST_VERSION//./_}"
export FFTW3_ROOT="${HOME}/bin/fftw_${CLUSTER_FFTW3_VERSION//./_}"
export CUDA_HOME="${CUDA_PATH}"
export CUDA_ROOT="${HOME}/bin/cuda_12_1"
export LD_LIBRARY_PATH="${BOOST_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="${FFTW3_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_HOME}/targets/x86_64-linux/lib/stubs"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_ROOT}/lib"
export PYTHONPATH="${HOME}/espresso-4.3/build-maxset/src/python${PYTHONPATH:+:$PYTHONPATH}"
source "${HOME}/venv/bin/activate"
mpiexec --bind-to core --map-by core python3 script.py
The desired partition needs to be specified via #SBATCH --partition
command, without which your job will not be allocated any resources. Helix has following partitions available:
Partition |
Default Configuration |
Limit |
---|---|---|
|
ntasks=1, time=00:10:00, mem-per-cpu=2gb |
nodes=2, time=00:30:00 |
|
ntasks=1, time=00:30:00, mem-per-cpu=2gb |
nodes=1, time=120:00:00 |
|
ntasks=1, time=00:30:00, mem-per-cpu=2gb |
nodes=1, time=120:00:00 |
|
nodes=2, time=00:30:00 |
nodes=32, time=48:00:00 |
|
nodes=2, time=00:30:00 |
nodes=8, time=48:00:00 |
The documentation recommends using the MPI-specific launcher,
i.e. mpiexec
or mpirun
for OpenMPI, instead of SLURM’s srun
.
The number of processes and node information is automatically
passed to the launcher.
When using srun
instead of the MPI-specific launcher,
if the job script loads python via module load
,
it is necessary to preload the SLURM shared objects, like so:
LD_PRELOAD=/usr/lib64/slurm/libslurmfull.so \
sbatch --partition=devel --nodes=2 --ntasks-per-node=2 job.sh
Otherwise, the following fatal error is triggered:
python3: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/auth_munge.so): /usr/lib64/slurm/auth_munge.so: undefined symbol: slurm_conf
python3: error: Couldn't load specified plugin name for auth/munge: Dlopen of plugin file failed
python3: error: cannot create auth context for auth/munge
python3: fatal: failed to initialize auth plugin
Refer Helix Slurm Documentation for more details on submitting job scripts on Helix.