Commit Graph

3 Commits

Author SHA1 Message Date
Francois Coppens
7594e15576 Improved memory allocation on the GPU. 2022-10-10 11:01:53 +02:00
Francois Coppens
bba5cf5f2c Improved version.
- All static arrays replaced by dynamic ones
- All overhead induced by checking before and after running of the kernels replaced as much as possible with calls to MKL/DGEMMs.
- Solved bugs due to dimension mismatches.

Overhead time is dramatically reduced because no more calls to naive 'matmul'.
2022-10-02 10:20:11 +02:00
François Coppens
892358d0d1 Replaced all CBLAS dgemms with cuBLAS dgemms and dgeams. Works but not ideal. 2022-09-09 17:15:12 +02:00