hoffer
d4f0ccee3b
Add cublas batch Dgemm
2022-04-08 10:44:48 +02:00
hoffer
69b9e0fb89
Add cublas batched
2022-04-07 18:44:59 +02:00
Max Hoffer
cba6477e4a
Merge pull request #1 from TREX-CoE/gpu
...
Gpu
2022-04-06 17:17:45 +02:00
Max Hoffer
7aad2a79a2
Merge branch 'gpu' into gpu
2022-04-06 17:17:16 +02:00
9cef7048d3
Fix CI
2022-04-06 17:10:23 +02:00
hoffer
fe277b7a6e
Ok for openmp and Cublas
2022-04-06 17:04:00 +02:00
88e8404b2a
Merge branch 'gpu' of github.com:TREX-CoE/qmckl into gpu
2022-04-06 16:38:19 +02:00
cc5f6914f6
Cleaning
2022-04-06 16:26:35 +02:00
hoffer
3b5221531c
Add openmp and cublas
2022-04-06 16:20:29 +02:00
4224991b12
Merge pull request #74 from GianFree/jastrow_hpc
...
Jastrow hpc
2022-04-06 16:06:19 +02:00
Gianfranco Abrusci
ff6d2e17f2
Merge branch 'gpu' into jastrow_hpc
2022-04-06 14:13:24 +02:00
Gianfranco Abrusci
b79a23897d
qmckl_compute_een_rescaled_e_hpc (c version) working
2022-04-06 14:01:13 +02:00
6c7634038f
Improve configure
2022-04-06 13:48:37 +02:00
0d5d14b8e4
Fix openacc
2022-04-06 11:51:36 +02:00
hoffer
39bcc569e0
Start implementing cublas
2022-04-06 11:16:17 +02:00
0966e1e2b1
Fix OpenACC
2022-04-06 10:42:00 +02:00
2323
72fad819bf
Fix flags
2022-04-06 10:03:56 +02:00
2323
f02e761b79
Fixed configure.ac for GPUs
2022-04-05 19:31:11 +02:00
2323
08f01ece89
Fix configure
2022-04-05 17:57:56 +02:00
0489831e18
Simplified configure
2022-04-05 17:06:29 +02:00
a3a1cc6428
Merge branch 'gpu' of github.com:TREX-CoE/qmckl into gpu
2022-04-05 16:52:43 +02:00
c3424216de
Fix info
2022-04-05 16:52:35 +02:00
Aurélien Delval
63c7f8ea72
Replace placeholder cuBLAS kernels with new C HPC implementation
2022-04-05 16:29:52 +02:00
f8e6d5f06b
Merge pull request #72 from PurplePachyderm/master
...
Merge in-progress work of GPU ports
2022-04-05 14:44:30 +02:00
Aurélien Delval
0ce0a93522
Fix preprocessor else and remove old cuBLAS interface
2022-04-05 14:37:57 +02:00
Aurélien Delval
eb71a752f5
Fixed naive GPU kernels and ignored variable issue
2022-04-05 14:28:35 +02:00
Gianfranco Abrusci
586eb92801
compute_cord_vect_full done
2022-04-05 14:23:20 +02:00
Aurélien Delval
bc43113b6f
Merge branch 'gpu' into master
2022-04-05 11:46:12 +02:00
2f26ccd4f0
Merge branch 'gpu' of github.com:TREX-CoE/qmckl into gpu
2022-04-05 11:45:11 +02:00
94035929e4
Fixed cppcheck
2022-04-05 11:45:02 +02:00
c7dd46da05
Fixed cppcheck
2022-04-05 11:44:17 +02:00
Aurélien Delval
0e43d33a1d
Merge branch 'gpu' into master
2022-04-05 11:39:16 +02:00
6fb261d635
warnings
2022-04-05 11:15:42 +02:00
731fded4a8
warnings
2022-04-05 11:03:30 +02:00
Aurélien Delval
98097e8fa7
Convert GPU implementations to C
...
TODO : Fix naive implementation which seems to be incorrect (probably an
issue with indexing)
2022-04-05 11:02:08 +02:00
hoffer
508b294190
Fix flag for nvc and nvfortran
2022-04-05 10:07:25 +02:00
511eba5843
Fixed dgemm bug
2022-04-05 09:56:13 +02:00
bcdbc49d5f
Cleaning
2022-04-04 23:53:58 +02:00
dd045452f6
Fixed documentation
2022-04-04 17:30:38 +02:00
2a13d8e18d
Merge branch 'gpu' of github.com:TREX-CoE/qmckl into gpu
2022-04-04 16:56:39 +02:00
1f9ea610d4
Moved C version of Jastrow into HPC
2022-04-04 16:56:33 +02:00
hoffer
31a05c47e2
Add flags for nvc and nvfortran to support offload
2022-04-04 12:41:00 +02:00
Aurélien Delval
84013a5f76
Cleanup before merging into QMCkl's GPU branch
2022-04-04 12:12:11 +02:00
7e56b3e2ed
Merge branch 'master' into gpu
2022-04-04 12:11:57 +02:00
bac1eb33f0
Fixed configure for Nvidian compilers
2022-04-04 12:11:26 +02:00
9f03c32e20
Merge pull request #70 from GianFree/jastrow_c
...
Jastrow c
2022-04-04 11:55:33 +02:00
Gianfranco Abrusci
35e15205df
Merge branch 'master' into jastrow_c
2022-04-04 11:22:17 +02:00
Aurélien Delval
1173bb2586
Update configure.ac with cuBLAS support
...
(forgotten in last commit)
2022-04-01 17:56:27 +02:00
Aurélien Delval
26bbd6f341
Start work on cuBLAS implementation
...
TODO Replace CPU BLAS calls by cuBLAS calls (will probably require to write a Fortran to the functions we're interested in, at least DGEMMs)
2022-04-01 09:19:56 +02:00
Aurélien Delval
9428eaa19e
Implement computation of tmp_c and dtmp_c in OpenACC
...
These 2 kernels seem to give good speedup compared to the CPU BLAS
versions. However, the current GPU implementation of factor_een_deriv seems to
be slightly slower (on the tested machine).
TODO:
- Try to improve factor_een_deriv GPU implem
- Try out a cuBLAS implementation of tmp_c and dtmp_c
2022-03-30 16:16:06 +02:00