diff --git a/README.html b/README.html index 0555276..c89740f 100644 --- a/README.html +++ b/README.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- ++The latest version fo QMCkl can be downloaded +here, and the source code is accessible on the +GitHub repository. +
+
+QMCkl is built with GNU Autotools, so the usual
+configure ; make ; make check ; make install
scheme will be used.
+
+As usual, the C compiler can be specified with the CC
variable
+and the Fortran compiler with the FC
variable. The compiler
+options are defined using CFLAGS
and FCFLAGS
.
+
+To compile from the source repository, additional dependencies are +required to generated the source files: +
+
+When the repository is downloaded, the Makefile is not yet
+generated, as well as the configure script. ./autogen.sh
has
+to be executed first.
+
The qmckl.h
header file installed in the ${prefix}/include
directory
has to be included in C codes when QMCkl functions are used:
include/
directory.
In a traditional source code, most of the lines of source files of a program
are code, scripts, Makefiles, and only a few lines are comments explaining
@@ -432,12 +486,17 @@ the command-line in the Makefile
, and then the produced files are c
Moreover, within the Emacs text editor the source code blocks can be executed
interactively, in the same spirit as Jupyter notebooks.
+Note that Emacs is not needed for end users because the distributed +tarball contains the generated source code. +
For a tutorial on literate programming with org-mode, follow this link.
@@ -467,46 +526,62 @@ org-mode.-Most of the codes of the TREX CoE are written in Fortran with some scripts in -Bash and Python. Outside of the CoE, Fortran is also important (Casino, Amolqc), -and other important languages used by the community are C and C++ (QMCPack, -QWalk), and Julia is gaining in popularity. The library we design should be -compatible with all of these languages. The QMCkl API has to be compatible -with the C language since libraries with a C-compatible API can be used in -every other language. +Most of the codes of the TREX CoE are written in Fortran with some +scripts in Bash and Python. Outside of the CoE, Fortran is also +important in QMC codes (Casino, Amolqc), and other important +languages used by the community are C and C++ (QMCPack, QWalk), +Julia and Rust are gaining in popularity. We want QMCkl to be +compatible with all of these languages, so the QMCkl API has to be +compatible with the C language since libraries with a C-compatible +API can be used in every other language.
-High-performance versions of the QMCkl, with the same API, will be rewritten by -the experts in HPC. These optimized libraries will be tuned for specific -architectures, among which we can cite x86 based processors, and GPU -accelerators. Nowadays, the most efficient software tools to take advantage of -low-level features of the processor (intrinsics) and of GPUs are for C++ -developers. It is highly probable that the optimized implementations will be -written in C++, and this is agreement with our choice to make the API -C-compatible. +High-performance versions of QMCkl, with the same API, can be +rewritten by HPC experts. These optimized libraries will be tuned +for specific architectures, among which we can cite x86 based +processors, and GPU accelerators. Nowadays, the most efficient +software tools to take advantage of low-level features +(intrinsics, prefetching, aligned or pinned memory allocation, +…) are for C++ developers. It is highly probable that optimized +implementations will be written in C++, but as the API is +C-compatible this doesn't pose any problem for linking the library +in other languages.
-Fortran is one of the most common languages used by the community, and is simple -enough to make the algorithms readable both by experts in QMC, and experts in -HPC. Hence we propose in this pedagogical implementation of QMCkl to use Fortran -to express the QMC algorithms. As the main languages of the library is C, this -implies that the exposed C functions call the Fortran routine. However, for -internal functions related to system programming, the C language is more natural -than Fortran. +Fortran is one of the most common languages used by the community, +and is simple enough to make the algorithms readable both by +experts in QMC, and experts in HPC. Hence we propose in this +pedagogical implementation of QMCkl to use Fortran to express the +QMC algorithms. However, for internal functions related to system +programming, the C language is more natural than Fortran.
-The Fortran source files should provide a C interface using the
-iso_c_binding
module. The name of the Fortran source files should end with
-_f.f90
to be properly handled by the Makefile
. The names of the functions
-defined in Fortran should be the same as those exposed in the API suffixed by
-_f
.
+As QMCkl appears like a C library, for each Fortran function there
+is an iso_c_binding
interface to make the Fortran function
+callable from C. It is this C interface which is exposed to the
+user. As a consequence, the Fortran users of the library never
+call directly the Fortran routines, but call instead the C binding
+function and an iso_c_binding
is still required:
+
+ ISO_C_BINDING ISO_C_BINDING +Fortran ---------------> C ---------------> Fortran ++ +
+ The name of the Fortran source files should end with _f.f90
to
+be properly handled by the Makefile
and to avoid collision of
+object files (*.o
) with the compiled C source files. The names
+of the functions defined in Fortran should be the same as those
+exposed in the API suffixed by _f
.
@@ -516,9 +591,9 @@ For more guidelines on using Fortran to generate a C interface, see
The authors should follow the recommendations of the C99
SEI+CERT C Coding Standard.
@@ -530,14 +605,16 @@ Compliance can be checked with cppcheck
as:
cppcheck --addon=cert --enable=all *.c &> cppcheck.out
+# or
+make cppcheck ; cat cppcheck.out
The proposed API should allow the library to: deal with memory transfers between CPU and accelerators, and to use different levels of floating-point @@ -547,9 +624,9 @@ functions (see below).
To avoid namespace collisions, we use qmckl_
as a prefix for all exported
functions and variables. All exported header files should have a file name
@@ -562,10 +639,6 @@ produced C files should be xxx.c
and xxx.h
and the nam
produced Fortran file should be xxx.f90
.
-Arrays are in uppercase and scalars are in lowercase. -
-In the names of the variables and functions, only the singular form is allowed. @@ -573,9 +646,9 @@ form is allowed.
In the C language, the number of bits used by the integer types can change from one architecture to another one. To circumvent this problem, we choose to @@ -606,15 +679,15 @@ bindings in other languages in other repositories.
Global variables should be avoided in the library, because it is
possible that one single program needs to use multiple instances
of the library. To solve this problem we propose to use a pointer
to a context
variable, built by the library with the
-qmckl_context_create
function. The =context= contains the global
+qmckl_context_create
function. The =context= contains the global
state of the library, and is used as the first argument of many
QMCkl functions.
A single qmckl.h
header to be distributed by the library
is built by concatenating some of the produced header files.
@@ -717,9 +790,9 @@ and the types definitions should be written in the *fh_type.f90
fil
Low-level functions are very simple functions which are leaves of the function call tree (they don't call any other QMCkl function). @@ -727,15 +800,15 @@ the function call tree (they don't call any other QMCkl function).
These functions are pure, and unaware of the QMCkl
-context
. They are not allowed to allocate/deallocate memory, and
+context
. They are not allowed to allocate/deallocate memory, and
if they need temporary memory it should be provided in input.
High-level functions are at the top of the function call tree. They are able to choose which lower-level function to call @@ -743,33 +816,26 @@ depending on the required precision, and do the corresponding type conversions. These functions are also responsible for allocating temporary storage, to simplify the use of accelerators.
- -
-The high-level functions should be pure, unless the introduction
-of non-purity is justified. All the side effects should be made in
-the context
variable.
-
-The number of bits of precision required for a function should be
-given as an input of low-level computational functions. This input
-will be used to define the values of the different thresholds that
-might be used to avoid computing unnecessary noise. High-level
-functions will use the precision specified in the context
-variable.
+The minimal number of bits of precision required for a function
+should be given as an input of low-level computational
+functions. This input will be used to define the values of the
+different thresholds that might be used to avoid computing
+unnecessary noise. High-level functions will use the precision
+specified in the context
variable.
In order to automatize numerical accuracy tests, QMCkl uses -Verificarlo and -its CI functionality. You can read Verificarlo CI's documentation -at the following link. -Reading it is advised to understand the remainder of this section. +Verificarlo and its CI functionality. You can read Verificarlo CI's +documentation at the following link. Reading it is advised to +understand the remainder of this section.
@@ -778,7 +844,7 @@ library, use the following configure command :
QMCKL_DEVEL=1 ./configure --prefix=$PWD/_install --enable-silent-rules --enable-maintainer-mode CC=verificarlo-f FC=verificarlo-f --host=x86_64 --enable-vfc_ci +./configure CC=verificarlo-f FC=verificarlo-f --host=x86_64 --enable-vfc_ci
-If you need more details on these functions or their Fortran
+If you need more detail on these functions or their Fortran
interfaces, have a look at the tools/qmckl_probes
files.
Reducing the scaling of an algorithm usually implies also reducing its arithmetic complexity (number of flops per byte). Therefore, @@ -847,7 +913,7 @@ implemented adapted to different problem sizes.
The atomic basis set is defined as a list of shells. Each shell \(s\) is @@ -432,19 +432,19 @@ gradients and Laplacian of the atomic basis functions.
The following arrays are stored in the context, and need to be set when initializing the library:
-Variable | +Type | +Description | +
---|---|---|
size |
+int64_t |
+Dimension of the vector | +
data |
+double* |
+Elements | +
typedef struct qmckl_vector { + int64_t size; + double* data; +} qmckl_vector; ++
qmckl_vector +qmckl_vector_alloc( qmckl_context context, + const int64_t size); ++
+Allocates a new vector. If the allocation failed the size is zero. +
+ +qmckl_vector +qmckl_vector_alloc( qmckl_context context, + const int64_t size) +{ + /* Should always be true by contruction */ + assert (size > (int64_t) 0); + + qmckl_vector result; + result.size = size; + + qmckl_memory_info_struct mem_info = qmckl_memory_info_struct_zero; + mem_info.size = size * sizeof(double); + result.data = (double*) qmckl_malloc (context, mem_info); + + if (result.data == NULL) { + result.size = (int64_t) 0; + } + + return result; +} ++
qmckl_exit_code +qmckl_vector_free( qmckl_context context, + qmckl_vector vector); ++
qmckl_exit_code +qmckl_vector_free( qmckl_context context, + qmckl_vector vector) +{ + /* Always true */ + assert (vector.data != NULL); + + qmckl_exit_code rc; + + rc = qmckl_free(context, vector.data); + if (rc != QMCKL_SUCCESS) { + return rc; + } + + vector.size = (int64_t) 0; + vector.data = NULL; + return QMCKL_SUCCESS; +} ++
Variable | +Type | +Description | +
---|---|---|
size |
+int64_t[2] |
+Dimension of each component | +
data |
+double* |
+Elements | +
+The dimensions use Fortran ordering: two elements differing by one +in the first dimension are consecutive in memory. +
+ +typedef struct qmckl_matrix { + int64_t size[2]; + double* data; +} qmckl_matrix; ++
qmckl_matrix +qmckl_matrix_alloc( qmckl_context context, + const int64_t size1, + const int64_t size2); ++
+Allocates a new matrix. If the allocation failed the sizes are zero. +
+ +qmckl_matrix +qmckl_matrix_alloc( qmckl_context context, + const int64_t size1, + const int64_t size2) +{ + /* Should always be true by contruction */ + assert (size1 * size2 > (int64_t) 0); + + qmckl_matrix result; + + result.size[0] = size1; + result.size[1] = size2; + + qmckl_memory_info_struct mem_info = qmckl_memory_info_struct_zero; + mem_info.size = size1 * size2 * sizeof(double); + result.data = (double*) qmckl_malloc (context, mem_info); + + if (result.data == NULL) { + result.size[0] = (int64_t) 0; + result.size[1] = (int64_t) 0; + } + + return result; +} ++
qmckl_exit_code +qmckl_matrix_free( qmckl_context context, + qmckl_matrix matrix); ++
qmckl_exit_code +qmckl_matrix_free( qmckl_context context, + qmckl_matrix matrix) +{ + /* Always true */ + assert (matrix.data != NULL); + + qmckl_exit_code rc; + + rc = qmckl_free(context, matrix.data); + if (rc != QMCKL_SUCCESS) { + return rc; + } + matrix.data = NULL; + matrix.size[0] = (int64_t) 0; + matrix.size[1] = (int64_t) 0; + + return QMCKL_SUCCESS; +} ++
Variable | +Type | +Description | +
---|---|---|
order |
+int64_t |
+Order of the tensor | +
size |
+int64_t[QMCKL_TENSOR_ORDER_MAX] |
+Dimension of each component | +
data |
+double* |
+Elements | +
+The dimensions use Fortran ordering: two elements differing by one +in the first dimension are consecutive in memory. +
+ +#define QMCKL_TENSOR_ORDER_MAX 16 + +typedef struct qmckl_tensor { + int64_t order; + int64_t size[QMCKL_TENSOR_ORDER_MAX]; + double* data; +} qmckl_tensor; ++
qmckl_tensor +qmckl_tensor_alloc( qmckl_context context, + const int64_t order, + const int64_t* size); ++
+Allocates memory for a tensor. If the allocation failed, the size +is zero. +
+ +qmckl_tensor +qmckl_tensor_alloc( qmckl_context context, + const int64_t order, + const int64_t* size) +{ + /* Should always be true by contruction */ + assert (order > 0); + assert (order <= QMCKL_TENSOR_ORDER_MAX); + assert (size != NULL); + + qmckl_tensor result; + result.order = order; + + int64_t prod_size = (int64_t) 1; + for (int64_t i=0 ; i<order ; ++i) { + assert (size[i] > (int64_t) 0); + result.size[i] = size[i]; + prod_size *= size[i]; + } + + qmckl_memory_info_struct mem_info = qmckl_memory_info_struct_zero; + mem_info.size = prod_size * sizeof(double); + + result.data = (double*) qmckl_malloc (context, mem_info); + + if (result.data == NULL) { + memset(&result, 0, sizeof(qmckl_tensor)); + } + + return result; +} ++
qmckl_exit_code +qmckl_tensor_free( qmckl_context context, + qmckl_tensor tensor); ++
qmckl_exit_code +qmckl_tensor_free( qmckl_context context, + qmckl_tensor tensor) +{ + /* Always true */ + assert (tensor.data != NULL); + + qmckl_exit_code rc; + + rc = qmckl_free(context, tensor.data); + if (rc != QMCKL_SUCCESS) { + return rc; + } + + memset(&tensor, 0, sizeof(qmckl_tensor)); + + return QMCKL_SUCCESS; +} ++
+Reshaping occurs in-place and the pointer to the data is copied. +
+qmckl_matrix +qmckl_matrix_of_vector(const qmckl_vector vector, + const int64_t size1, + const int64_t size2); ++
+Reshapes a vector into a matrix. +
+ +qmckl_matrix +qmckl_matrix_of_vector(const qmckl_vector vector, + const int64_t size1, + const int64_t size2) +{ + /* Always true */ + assert (size1 * size2 == vector.size); + + qmckl_matrix result; + + result.size[0] = size1; + result.size[1] = size2; + result.data = vector.data; + + return result; +} ++
qmckl_tensor +qmckl_tensor_of_vector(const qmckl_vector vector, + const int64_t order, + const int64_t* size); ++
+Reshapes a vector into a tensor. +
+ +qmckl_tensor +qmckl_tensor_of_vector(const qmckl_vector vector, + const int64_t order, + const int64_t* size) +{ + qmckl_tensor result; + + int64_t prod_size = 1; + for (int64_t i=0 ; i<order ; ++i) { + result.size[i] = size[i]; + prod_size *= size[i]; + } + assert (prod_size == vector.size); + + result.data = vector.data; + + return result; +} ++
qmckl_vector +qmckl_vector_of_matrix(const qmckl_matrix matrix, + const int64_t size); ++
+Reshapes a matrix into a vector. +
+ +qmckl_vector +qmckl_vector_of_matrix(const qmckl_matrix matrix, + const int64_t size) +{ + /* Always true */ + assert (matrix.size[0] * matrix.size[1] == size); + + qmckl_vector result; + + result.size = size; + result.data = matrix.data; + + return result; +} ++
qmckl_tensor +qmckl_tensor_of_matrix(const qmckl_matrix matrix, + const int64_t order, + const int64_t* size); ++
+Reshapes a matrix into a tensor. +
+ +qmckl_tensor +qmckl_tensor_of_matrix(const qmckl_matrix matrix, + const int64_t order, + const int64_t* size) +{ + qmckl_tensor result; + + int64_t prod_size = 1; + for (int64_t i=0 ; i<order ; ++i) { + result.size[i] = size[i]; + prod_size *= size[i]; + } + assert (prod_size == matrix.size[0] * matrix.size[1]); + + result.data = matrix.data; + + return result; +} ++
qmckl_vector +qmckl_vector_of_tensor(const qmckl_tensor tensor, + const int64_t size); ++
+Reshapes a tensor into a vector. +
+ +qmckl_vector +qmckl_vector_of_tensor(const qmckl_tensor tensor, + const int64_t size) +{ + /* Always true */ + int64_t prod_size = (int64_t) 1; + for (int64_t i=0 ; i<tensor.order ; i++) { + prod_size *= tensor.size[i]; + } + assert (prod_size == size); + + qmckl_vector result; + + result.size = size; + result.data = tensor.data; + + return result; +} ++
qmckl_matrix +qmckl_matrix_of_tensor(const qmckl_tensor tensor, + const int64_t size1, + const int64_t size2); ++
+Reshapes a tensor into a vector. +
+ +qmckl_matrix +qmckl_matrix_of_tensor(const qmckl_tensor tensor, + const int64_t size1, + const int64_t size2) +{ + /* Always true */ + int64_t prod_size = (int64_t) 1; + for (int64_t i=0 ; i<tensor.order ; i++) { + prod_size *= tensor.size[i]; + } + assert (prod_size == size1 * size2); + + qmckl_matrix result; + + result.size[0] = size1; + result.size[1] = size2; + result.data = tensor.data; + + return result; +} ++
#define qmckl_vec(v, i) v.data[i] +#define qmckl_mat(m, i, j) m.data[(i) + (j)*m.size[0]] + +#define qmckl_ten3(t, i, j, k) t.data[(i) + m.size[0]*((j) + size[1]*(k))] +#define qmckl_ten4(t, i, j, k, l) t.data[(i) + m.size[0]*((j) + size[1]*((k) + size[2]*(l)))] +#define qmckl_ten5(t, i, j, k, l, m) t.data[(i) + m.size[0]*((j) + size[1]*((k) + size[2]*((l) + size[3]*(m))))] ++
{ + int64_t m = 3; + int64_t n = 4; + int64_t p = m*n; + qmckl_vector vec = qmckl_vector_alloc(context, p); + + for (int64_t i=0 ; i<p ; ++i) + qmckl_vec(vec, i) = (double) i; + + for (int64_t i=0 ; i<p ; ++i) + assert( vec.data[i] == (double) i ); + + qmckl_matrix mat = qmckl_matrix_of_vector(vec, m, n); + assert (mat.size[0] == m); + assert (mat.size[1] == n); + assert (mat.data == vec.data); + + for (int64_t j=0 ; j<n ; ++j) + for (int64_t i=0 ; i<m ; ++i) + assert ( qmckl_mat(mat, i, j) == qmckl_vec(vec, i+j*m)) ; + + qmckl_vector vec2 = qmckl_vector_of_matrix(mat, p); + assert (vec2.size == p); + assert (vec2.data == vec.data); + for (int64_t i=0 ; i<p ; ++i) + assert ( qmckl_vec(vec2, i) == qmckl_vec(vec, i) ) ; + + qmckl_vector_free(context, vec); + +} ++
qmckl_dgemm
Matrix multiplication:
@@ -360,7 +1042,7 @@ Matrix multiplication: \] -typedef struct qmckl_determinant_struct { @@ -598,8 +598,8 @@ this mechanism.
When all the data for the slater determinants have been provided, the following
@@ -613,8 +613,8 @@ function returns true
.
To set the basis set, all the following functions need to be @@ -638,24 +638,24 @@ computed to accelerate the calculations.
qmckl_exit_code qmckl_get_det_vgl_alpha(qmckl_context context, double* const det_vgl_alpha); @@ -665,14 +665,14 @@ computed to accelerate the calculations.
context
is not QMCKL_NULL_CONTEXT
[n][3]
in C and (3,n)
in Fortra
qmckl_exit_code qmckl_distance ( @@ -834,8 +834,8 @@ the leading dimension:[n][3]
in C and(3,n)
in Fortra
integer function qmckl_distance_f(context, transa, transb, m, n, & @@ -1002,8 +1002,8 @@ the leading dimension:[n][3]
in C and(3,n)
in Fortra
This function is more efficient when A
and B
are transposed.
@@ -1013,12 +1013,12 @@ This function is more efficient when A
and B
are trans
qmckl_distance_rescaled
qmckl_distance_rescaled
qmckl_distance_rescaled
computes the matrix of the rescaled distances between all
@@ -1036,7 +1036,7 @@ If the input array is normal ('N'
), the xyz coordinates are in
the leading dimension: [n][3]
in C and (3,n)
in Fortran.
context
is not QMCKL_NULL_CONTEXT
[n][3]
in C and (3,n)
in Fortra
qmckl_exit_code qmckl_distance_rescaled ( @@ -1185,8 +1185,8 @@ the leading dimension:[n][3]
in C and(3,n)
in Fortra
integer function qmckl_distance_rescaled_f(context, transa, transb, m, n, & @@ -1356,8 +1356,8 @@ the leading dimension:[n][3]
in C and(3,n)
in Fortra
This function is more efficient when A
and B
are transposed.
@@ -1366,12 +1366,12 @@ This function is more efficient when A
and B
are trans
qmckl_distance_rescaled_deriv_e
qmckl_distance_rescaled_deriv_e
qmckl_distance_rescaled_deriv_e
computes the matrix of the gradient and laplacian of the
@@ -1438,7 +1438,7 @@ If the input array is normal ('N'
), the xyz coordinates are in
the leading dimension: [n][3]
in C and (3,n)
in Fortran.