Loading Modules

Cluster often provide software through modules. A module can be something quite large like a piece of software you use such as GROMACS, CP2K, and LAMMPS or something more fundamental such as a compiler (gcc), an interpreter (python) or a library (cuda, BLAS). In order to use software on a cluster you will need to load these modules and often their dependencies. The best way to learn is by doing so let’s take a look at an example a lot of users may come across and load GROMACS and Python.

Keep in mind, all of this will be done on the head node, i.e, the node you first access when you ssh onto the cluster. This is NOT where your main work should be done and in fact it is ideal to avoid running anything other than job submission on this node. It is however a useful place to work out what modules you need to import. When you are read to submit a job with the appropriate submission script, check out the Submitting Jobs section of the documentation. What follows is designed to help you develop a deeper understanding of how modules work on the cluster. Also check out the guide for each cluster on this wiki for peculiarities.

Loading installed software

Use module load name1 [name2...] to load packages with default versions or module load name1/version1 [name2/version2...] to load packages with pinned versions (recommended). It is good practice to pin version numbers, because default packages might not always be compatible with one another. It also makes build environments and simulation results reproducible.

Use module list to show currently loaded packages, module unload name to unload a package and module purge to unload all packages.

Modules are controlled by “module files”, which specify dependencies, actions to carry out during the load (e.g. setting environment variables), and actions to carry out during unload (e.g. restoring environment variables). Module files can be provided by Spack [Gamblin et al., 2015], EasyBuild [Geimer et al., 2014, Hoste et al., 2012], or EESSI [Dröge et al., 2023]. On clusters that provide multiple package managers, it necessary to disambiguate package names. This is usually achieved by having users load a specific package manager like Spack, which will make its modules visible to module spider.

Loading EESSI software

Use the following commands to load pre-built software from EESSI [Dröge et al., 2023]:

source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load GCC/13.2.0 OpenMPI/4.1.6-GCC-13.2.0 CUDA/12.1.1 \
       Python/3.11.5-GCCcore-13.2.0 ESPResSo/4.2.2-foss-2023b pyMBE/0.8.0

Case: GROMACS and Python

Let’s consider the general case where you want to run a GROMACS simulation followed by some Python analysis script which calls MDAnalysis. Again, we will not discuss specifically how to run these different programs but rather how to prepare an environment capable of running them. The first thing we want to do is identify what modules must be loaded in order to use the software. This is done using the module spider command like so:

> module spider gromacs

-------------------------------------------------------------------------------------
gromacs:
-------------------------------------------------------------------------------------
  Versions:
     gromacs/2020.4
     gromacs/2021.3

What this is telling us is that there are more than one versions of GROMACS on the cluster and if we want more information about how to load one we should specify which version to use. For this case, let’s stick with the 2020 version:

> module spider gromacs/2020.4

-------------------------------------------------------------------------------------
gromacs: gromacs/2020.4
-------------------------------------------------------------------------------------

 You will need to load all module(s) on any one of the lines below before the "gromacs/2020.4" module is available to load.

   gcc/8.4.0  openmpi/4.0.5

 Help:
   GROMACS (GROningen MAchine for Chemical Simulations) is a molecular
   dynamics package primarily designed for simulations of proteins, lipids
   and nucleic acids. It was originally developed in the Biophysical
   Chemistry department of University of Groningen, and is now maintained
   by contributors in universities and research centers across the world.
   GROMACS is one of the fastest and most popular software packages
   available and can run on CPUs as well as GPUs. It is free, open source
   released under the GNU General Public License. Starting from version
   4.6, GROMACS is released under the GNU Lesser General Public License.

So we see here that in order to load gromacs we must have also a version of gcc and openmpi beforehand. So let’s do this and see what happens.

> module load gcc/8.4.0
> module load openmpi/4.0.5
> module load gromacs/2020.4