OpenSees and GPU

Using GPU's may speed up codes in a very impressive way. This is one of the directions that we want to pursue for OpenSees.

Simple setup

We have seen from the profiling with Allinea that the principal function used by OpenSees is dgemm (matrix multiplication provided by BLAS). It seems to be a reasonable assumption that using the GPU implementation of BLAS may improve the performance.

Based on previous experience, it is possible to replace the normal implementation of BLAS by the one provided by NVidia at runtime very easily.

The Lamb's 50K+ DOF problem has been tested on Sung's machine, as this machine is equiped with a fairly powerful GPU (GTX 970). I here compare the runtime for this problem on 1, 2 and 4 cores with and without GPU.

Cores	Execution time (seconds)	Execution time with GPU (seconds)
1	308	305
2	275	269
4	512	509

We notice that there is practically no difference between the execution times.

I would tend to think that this is due to the memory access pattern seen in the profiling with Allinea MAP. This is a well-known limitation of GPU as the copy to data from the memory to the CPU and then to GPU is very slow.

We will do some more investigation on this matter at a later stage.

Child pages

Simple setup

1 Comment

Fanjie Luo