These 2 kernels seem to give good speedup compared to the CPU BLAS
versions. However, the current GPU implementation of factor_een_deriv seems to
be slightly slower (on the tested machine).
TODO:
- Try to improve factor_een_deriv GPU implem
- Try out a cuBLAS implementation of tmp_c and dtmp_c
TODO Start modifying dedicated function to implement offloading
Also, as of now, Fortran preprocessor flags should be passed manually,
we need to manage this in the configure.ac in the future. For now, when
using gfortran, you should pass FCFLAGS="-cpp -DWITH_OPENMP_OFFLOAD" to
enable offloading.
This system adds an additional field to the QMCkl context to store the
offload mode currently in use for each kernel (in this commit, this has
been implemented for Jastrow as an example). This will be useful to test
different offloading versions that can be easily toggled on/off at
compilation and at runtime.
* Fix bugs in the .yml file
* Fix test step in the GitHub CI
When using out of source build all `make` rules should be executed in the corresponding directory (e.g. `_build`)
* Add CI step to test TREXIO on MacOS
* Explicitly provide gcc-10 and gfortran-10 to configure
Recent HPC-related additions break the current build (make) process. This is because the HPC-related functions are not wrapped in the preprocessor ifdef statement.