Francois Coppens
2d5a34faed
Trivial change
2022-10-17 15:26:30 +02:00
Francois Coppens
90bc5090c2
Trivial change
2022-10-17 15:15:16 +02:00
Francois Coppens
6bb95f068d
- Resrtructured tree
...
- Added NVTX annotations to GPU kernel.
2022-10-17 14:56:32 +02:00
Francois Coppens
7594e15576
Improved memory allocation on the GPU.
2022-10-10 11:01:53 +02:00
Francois Coppens
bba5cf5f2c
Improved version.
...
- All static arrays replaced by dynamic ones
- All overhead induced by checking before and after running of the kernels replaced as much as possible with calls to MKL/DGEMMs.
- Solved bugs due to dimension mismatches.
Overhead time is dramatically reduced because no more calls to naive 'matmul'.
2022-10-02 10:20:11 +02:00
Francois Coppens
15f959099d
Cleanup
2022-10-02 10:20:11 +02:00
Francois Coppens
c0d21dd9af
Various
2022-10-02 10:20:11 +02:00
Francois Coppens
a63b1289d4
Cleanup: consolidated some pragmas.
2022-09-27 11:11:54 +02:00
Francois Coppens
4e7a334b78
- LAPACKE_dgetrf/ri replaced with cusolverDnDgetrf/rs.
...
- Solved sign bug in computation of determinant.
Most code is now executed on the device. Some openMP pragmas can be consolidated.
2022-09-26 17:06:50 +02:00
Francois Coppens
5a61ccc6b1
Added cuSOLVER and replaced LAPACKE_dgetrf with cusolverDnDgetrf.
2022-09-23 18:57:54 +02:00
Francois Coppens
00bdcba230
cuBLAS version of Woodbury KxK is working, but called to lapacke dgetrf/ri need to be replaced with cuSOLVER calls to eliminate intermediate results to be transfered to/from device.
2022-09-22 14:37:00 +02:00
François Coppens
892358d0d1
Replaced all CBLAS dgemms with cuBLAS dgemms and dgeams. Works but not ideal.
2022-09-09 17:15:12 +02:00
François Coppens
87e319189e
- Got rid of NVC compiler warnings
...
- Included lib paths for MKL/HDF5 and cuBLAS
- Cleaned Makefile
- Added GPU node session request script
2022-07-22 11:34:29 +02:00
Francois Coppens
fa03590f6f
Resolved some warnings of icx
2022-07-21 13:57:28 +02:00
Francois Coppens
ebe38e79e3
Added cuBLAS offloaded kernel for Woodbury KxK
2022-07-21 12:21:51 +02:00
Francois Coppens
f35ad6a777
Small bugfix in qmckl_slagel_splitting()
2022-07-21 08:16:25 +02:00
Francois Coppens
0a083e2875
Added first version of K x K Woodbury kernel using only CBLAS and LAPACK calls
2022-07-20 19:09:55 +02:00
Francois Coppens
732045284a
Added independent test harness, written in C. It has it's own Makefile and datasets. It is completely independent of the main tree.
2022-07-11 14:48:59 +02:00