Commit Graph

8 Commits

Author SHA1 Message Date
Francois Coppens
7594e15576 Improved memory allocation on the GPU. 2022-10-10 11:01:53 +02:00
Francois Coppens
bba5cf5f2c Improved version.
- All static arrays replaced by dynamic ones
- All overhead induced by checking before and after running of the kernels replaced as much as possible with calls to MKL/DGEMMs.
- Solved bugs due to dimension mismatches.

Overhead time is dramatically reduced because no more calls to naive 'matmul'.
2022-10-02 10:20:11 +02:00
Francois Coppens
5a61ccc6b1 Added cuSOLVER and replaced LAPACKE_dgetrf with cusolverDnDgetrf. 2022-09-23 18:57:54 +02:00
Francois Coppens
00bdcba230 cuBLAS version of Woodbury KxK is working, but called to lapacke dgetrf/ri need to be replaced with cuSOLVER calls to eliminate intermediate results to be transfered to/from device. 2022-09-22 14:37:00 +02:00
François Coppens
892358d0d1 Replaced all CBLAS dgemms with cuBLAS dgemms and dgeams. Works but not ideal. 2022-09-09 17:15:12 +02:00
François Coppens
87e319189e - Got rid of NVC compiler warnings
- Included lib paths for MKL/HDF5 and cuBLAS
- Cleaned Makefile
- Added GPU node session request script
2022-07-22 11:34:29 +02:00
Francois Coppens
ebe38e79e3 Added cuBLAS offloaded kernel for Woodbury KxK 2022-07-21 12:21:51 +02:00
Francois Coppens
732045284a Added independent test harness, written in C. It has it's own Makefile and datasets. It is completely independent of the main tree. 2022-07-11 14:48:59 +02:00