This example implements the Gram-Schmidt orthogonalization method for fully occupied matrices. While for sparse matrices, the Householder method is typically more efficient, the Gram-Schmidt method is still widely used for fully occupied matrices due to its robustness. And due to its structure, the method can be easily implemented on GPUs using CUDA.
Code example Gram-Schmidt method (23 KB)
Due to its very small computational cost, the scalar product is certainly 'not' suited for porting to a GPU. However, if a complex algorithm requires a scalar product and the algorithm is ported to a GPU, it is necessary to also port the scalar product to the GPU. Otherwise, one would need to transfer data back to the CPU, which is even more time consuming.
This example demonstrates several possible strategies. It also serves as a good example for the use of atomic operations, and how they can be avoided. e.g. on older hardware. Some of the examples deliberatedly don't work to show the pitfalls of massively parallel computation, and to demonstrate that atomic operations are really necessary. Note that the use of atomic operations is rather slow, and therefore in this example no speed up is gained by using them. However, the use of atomic operations makes the code considerably shorter and easier to read, which is important in a scientific environment, where code is continously developed further.
Code example scalar product (17 KB)