What is HPC?

In this section, we will answer some of your questions such as what is HPC?, why should I use it?, and what can it do for me? We will also cover some quick terminology so that you are somewhat informed when moving on to the more complicated sections of this documentation.

High performance computing

According to the Advanced Research Computing Group at the USGS, high performance computing or HPC as it is most often referred to, is, “The practice of aggregating computing power in such a way that delivers much higher performance than one could get from a typical desktop or workstation”. To put that more plainly, it is the practice of gathering more than one computing device, e.g, CPUs, GPUs, etc., and using them together to perform a computational task as opposed to trying to perform this task on a single CPU. HPC becomes unavoidable when your problem size is too large to fit in the memory of a single computer, or takes too long to calculate on a single CPU or GPU.

Why should I use it?

As we mentioned in the previous section, HPC can be used when a specific computational task is not feasible to be performed on a single machine. In these cases, HPC can make an impossible calculation possible, and save you months of research time.

Another slightly underappreciated reason for using HPC is for your own learning. Parallel or distributed computation is something that can be performed in many environments and often-times with existing infrastructure. By forcing yourself to use HPC machinery, you will not only learn more about computing, software management, and some programming skills, you will also learn how to make the most of distributed computations. This can be taken and applied in a number of areas and does not always require hundreds of thousands of dollars of equipment to use; see for example the use cases of Raspberry Pi clusters. But in a more typical office setting, it is possible to simply string together normal desktops on the same network and use them in a distributed fashion as a small compute cluster.

What can it do for me?

Compute clusters can do many things for you, here is a short list of just some of the most common use cases:

  • Running large simulations

  • Running hundreds of smaller simulations simultaneously

  • Performing distributed data analysis

  • Developing distributed software

and the list goes on…

In reality, this is a question that can be answered very quickly. If you have a task that will take more than two days to be run locally, you can probably save yourself a lot of time by allocating it to a cluster instead. Not only that, but you won’t lock up all of your computing resources while some job is running for a week.

Terminology shortlist

Finally, we cover some of the most basic terminology that you will want to know before reading the rest of the documentation.

Hardware

Physical devices to perform logic operations, store information, or transmit data, such as a CPU, GPU, RAM, SSD, interconnect, etc. If it exists physically in the real world, it is hardware.

Software

The code that people have written to use the hardware.

Node

These are essentially entire computers. They have an operating system, CPUs, sometimes GPUs, and can even be plugged into a display if you really need to.

CPU

A central processing unit, or microprocessor, is the main component of a node. These devices are made of many cores with a shared memory cache, and can communicate with accelerators, such as GPUs. A mainboard may hold more than one CPU.

Cores

Cores are the building blocks of CPUs. You will often hear the phrase “a 64-core node”. Typically cores are what you reserve on a cluster but you can also reserve complete nodes.

GPU

A graphics processing unit, or GPU, is a special piece of hardware designed initially for processing images. Nowadays, GPUs are routinely used for matrix operations. If you can manage to have your problems use a GPU, you can see a 1000x speedup in many cases.

Hyperthreading

Each physical core may be composed of 2 or more logical cores, also called “hyperthreads”. A physical core can only execute one thread at a time. With hyperthreading enabled, two threads can be assigned to two logical cores; when the two logical cores belong to the same physical core, their execution will be regularly suspended and resumed to ensure fair share (concurrent scheduling). The scheduler will leverage downtime, e.g. cache misses, to hand over the shared resources from one thread to the other, so that a calculation can be executed on one thread while the other thread waits for data to arrive. Not all calculations benefit from hyperthreading; bandwidth-limited problems can expect a 30% increase in performance when leveraging hyperthreading, while other calculations can expect a decrease of performance due to concurrency, in which case users should disable hyperthreading in the job allocation script.

Multithreading

Several APIs like OpenMP allow an application to run in parallel on several cores of the same CPU. Each core runs its own thread. These APIs are often used to automatically split large vectors of data into equally-sized chunks, such that a loop operating on the original vector will actually operate on a smaller chunk (one chunk per thread). Thus the runtime of the parallel part of the application can be divided by the number of threads allocated to the application.

Alternatively, when the execution time doesn’t scale linearly with the chunk size, dynamic scheduling can be used instead: more chunks are created than there are threads available, and they are assigned to free threads until all chunks have been processed. Each thread has to communicate to the scheduler when it has completed processing its chunk, which introduces overhead.

MPI

The Message Passing Interface is a standard protocol to communicate data and operations between CPU cores, including cores from other CPUs on the same mainboard, or from other nodes. MPI is necessary when more than one CPU is allocated to a calculation.

MPI+X

MPI can be used to exchange data between two CPUs, while each core exchanges data with its neighboring cores via OpenMP (MPI+OpenMP). MPI can also be used to exchange data between CPUs and GPUs (MPI+CUDA).