1
0
mirror of https://github.com/TREX-CoE/qmckl.git synced 2024-11-19 20:42:50 +01:00
qmckl/src/README.org

208 lines
8.6 KiB
Org Mode
Raw Normal View History

2020-10-22 01:24:14 +02:00
#+TITLE: QMCkl source code documentation
2020-10-26 19:44:21 +01:00
#+EXPORT_FILE_NAME: index.html
2020-12-24 17:41:28 +01:00
#+PROPERTY: comments org
2020-10-22 01:24:14 +02:00
2020-12-03 18:59:25 +01:00
#+SETUPFILE: https://fniessen.github.io/org-html-themes/org/theme-readtheorg.setup
2020-10-22 01:24:14 +02:00
* Introduction
2020-11-14 18:27:38 +01:00
The ultimate goal of QMCkl is to provide a high-performance
implementation of the main kernels of QMC. In this particular
repository, we focus on the definition of the API and the tests, and
on a /pedagogical/ presentation of the algorithms. We expect the
HPC experts to use this repository as a reference for re-writing
2020-10-22 01:24:14 +02:00
optimized libraries.
2020-11-14 18:27:38 +01:00
Literate programming is particularly adapted in this context.
Source files are written in [[https://karl-voit.at/2017/09/23/orgmode-as-markup-only/][org-mode]] format, to provide useful
2020-10-22 01:24:14 +02:00
comments and LaTex formulas close to the code. There exists multiple
2020-11-14 18:27:38 +01:00
possibilities to convert org-mode files into different formats such
as HTML or pdf. For a tutorial on literate programming with
org-mode, follow [[http://www.howardism.org/Technical/Emacs/literate-programming-tutorial.html][this link]].
2020-10-22 01:24:14 +02:00
2020-11-14 18:27:38 +01:00
The code is extracted from the org files using Emacs as a
command-line tool in the =Makefile=, and then the produced files are
compiled.
2020-10-22 01:24:14 +02:00
** Language used
2020-11-14 18:27:38 +01:00
Fortran is one of the most common languages used by the community,
and is simple enough to make the algorithms readable. Hence we
propose in this pedagogical implementation of QMCkl to use Fortran
to express the algorithms. For specific internal functions where
2020-10-22 01:24:14 +02:00
the C language is more natural, C is used.
2020-11-14 18:27:38 +01:00
As Fortran modules generate compiler-dependent files, the use of
2020-10-22 01:24:14 +02:00
modules is restricted to the internal use of the library, otherwise
the compliance with C is violated.
2020-11-14 18:27:38 +01:00
The external dependencies should be kept as small as possible, so
external libraries should be used /only/ if their used is strongly
2020-10-22 01:24:14 +02:00
justified.
** Source code editing
2020-11-14 18:27:38 +01:00
Any text editor can be used to edit org-mode files. For a better
user experience Emacs is recommended. For users hating Emacs, it
is good to know that Emacs can behave like Vim when switched into
``Evil'' mode. There also exists [[https://www.spacemacs.org][Spacemacs]] which helps the
transition for Vim users.
2020-10-22 01:24:14 +02:00
2020-11-14 18:27:38 +01:00
For users with a preference for Jupyter notebooks, the following
2020-10-22 01:24:14 +02:00
script can convert jupyter notebooks to org-mode files:
#+BEGIN_SRC sh tangle: nb_to_org.sh
#!/bin/bash
2020-10-16 13:58:05 +02:00
# $ nb_to_org.sh notebook.ipynb
# produces the org-mode file notebook.org
set -e
nb=$(basename $1 .ipynb)
jupyter nbconvert --to markdown ${nb}.ipynb --output ${nb}.md
pandoc ${nb}.md -o ${nb}.org
rm ${nb}.md
2020-10-22 01:24:14 +02:00
#+END_SRC
2020-10-22 01:24:14 +02:00
And pandoc can convert multiple markdown formats into org-mode.
2020-10-22 01:24:14 +02:00
** Writing in Fortran
2020-11-14 18:27:38 +01:00
The Fortran source files should provide a C interface using
=iso_c_binding=. The name of the Fortran source files should end
with =_f.f90= to be properly handled by the Makefile. The names of
the functions defined in fortran should be the same as those
exposed in the API suffixed by =_f=. Fortran interface files
should also be written in the =qmckl_f.f90= file.
2020-10-26 19:30:50 +01:00
2020-11-05 00:46:19 +01:00
For more guidelines on using Fortran to generate a C interface, see
2020-11-14 18:27:38 +01:00
[[http://fortranwiki.org/fortran/show/Generating+C+Interfaces][this link]].
2020-11-05 00:46:19 +01:00
2020-10-22 01:24:14 +02:00
** Coding style
# TODO: decide on a coding style
2020-11-14 18:27:38 +01:00
To improve readability, we maintain a consistent coding style in
the library.
2020-11-14 18:27:38 +01:00
- For C source files, we will use __(decide on a coding style)__
- For Fortran source files, we will use __(decide on a coding
style)__
2020-10-22 01:24:14 +02:00
Coding style can be automatically checked with [[https://clang.llvm.org/docs/ClangFormat.html][clang-format]].
2020-11-05 15:34:58 +01:00
** Design of the library
2020-11-05 15:34:58 +01:00
The proposed API should allow the library to:
- deal with memory transfers between CPU and accelerators
- use different levels of floating-point precision
2020-11-14 18:27:38 +01:00
We chose a multi-layered design with low-level and high-level
2020-11-05 15:34:58 +01:00
functions (see below).
2020-11-05 15:34:58 +01:00
*** Naming conventions
2020-11-05 15:34:58 +01:00
Use =qmckl_= as a prefix for all exported functions and variables.
2020-11-14 18:27:38 +01:00
All exported header files should have a filename with the prefix
2020-11-05 15:34:58 +01:00
=qmckl_=.
2020-11-14 18:27:38 +01:00
If the name of the org-mode file is =xxx.org=, the name of the
2020-11-05 15:34:58 +01:00
produced C files should be =xxx.c= and =xxx.h= and the name of the
produced Fortran files should be =xxx.f90=
2020-10-25 15:02:37 +01:00
2020-11-05 15:34:58 +01:00
Arrays are in uppercase and scalars are in lowercase.
2020-11-14 18:27:38 +01:00
In the names of the variables and functions, only the singular
form is allowed.
2020-11-05 15:34:58 +01:00
*** Application programming interface
2020-11-14 18:27:38 +01:00
The application programming interface (API) is designed to be
compatible with the C programming language (not C++), to ensure
that the library will be easily usable in /any/ language. This
implies that only the following data types are allowed in the API:
2020-11-05 15:34:58 +01:00
- 32-bit and 64-bit floats and arrays (=real= and =double=)
- 32-bit and 64-bit integers and arrays (=int32_t= and =int64_t=)
2020-11-14 18:27:38 +01:00
- Pointers should be represented as 64-bit integers (even on
2020-11-05 15:34:58 +01:00
32-bit architectures)
2020-11-14 18:27:38 +01:00
- ASCII strings are represented as a pointers to a character
arrays and terminated by a zero character (C convention).
2020-11-05 00:46:19 +01:00
2020-11-05 15:34:58 +01:00
Complex numbers can be represented by an array of 2 floats.
# TODO : Link to repositories for bindings
2020-11-14 18:27:38 +01:00
To facilitate the use in other languages than C, we provide some
2020-11-05 15:34:58 +01:00
bindings in other languages in other repositories.
*** Global state
2020-11-14 18:27:38 +01:00
Global variables should be avoided in the library, because it is
possible that one single program needs to use multiple instances
of the library. To solve this problem we propose to use a pointer
to a =context= variable, built by the library with the
2020-11-05 15:34:58 +01:00
=qmckl_context_create= function. The =context= contains the global
2020-11-14 18:27:38 +01:00
state of the library, and is used as the first argument of many
2020-11-05 15:34:58 +01:00
QMCkl functions.
2020-11-14 18:27:38 +01:00
The internal structure of the context is not specified, to give a
maximum of freedom to the different implementations. Modifying
the state is done by setters and getters, prefixed by
=qmckl_context_set_= an =qmckl_context_get_=. When a context
variable is modified by a setter, a copy of the old data structure
is made and updated, and the pointer to the new data structure is
returned, such that the old contexts can still be accessed. It is
also possible to modify the state in an impure fashion, using the
=qmckl_context_update_= functions. The context and its old
versions can be destroyed with =qmckl_context_destroy=.
2020-11-05 15:34:58 +01:00
*** Low-level functions
2020-11-14 18:27:38 +01:00
Low-level functions are very simple functions which are leaves of
the function call tree (they don't call any other QMCkl function).
2020-11-05 15:34:58 +01:00
2020-11-14 18:27:38 +01:00
These functions are /pure/, and unaware of the QMCkl
=context=. They are not allowed to allocate/deallocate memory, and
if they need temporary memory it should be provided in input.
2020-11-05 15:34:58 +01:00
*** High-level functions
2020-11-14 18:27:38 +01:00
High-level functions are at the top of the function call tree.
They are able to choose which lower-level function to call
2020-11-05 15:34:58 +01:00
depending on the required precision, and do the corresponding type
2020-11-14 18:27:38 +01:00
conversions. These functions are also responsible for allocating
temporary storage, to simplify the use of accelerators.
2020-11-05 15:34:58 +01:00
2020-11-14 18:27:38 +01:00
The high-level functions should be pure, unless the introduction
of non-purity is justified. All the side effects should be made in
the =context= variable.
2020-11-05 15:34:58 +01:00
# TODO : We need an identifier for impure functions
*** Numerical precision
2020-11-14 18:27:38 +01:00
The number of bits of precision required for a function should be
given as an input of low-level computational functions. This input
will be used to define the values of the different thresholds that
might be used to avoid computing unnecessary noise. High-level
functions will use the precision specified in the =context=
variable.
2020-11-05 15:34:58 +01:00
** Algorithms
2020-10-25 15:02:37 +01:00
2020-11-14 18:27:38 +01:00
Reducing the scaling of an algorithm usually implies also reducing
its arithmetic complexity (number of flops per byte). Therefore,
for small sizes \(\mathcal{O}(N^3)\) and \(\mathcal{O}(N^2)\)
algorithms are better adapted than linear scaling algorithms. As
QMCkl is a general purpose library, multiple algorithms should be
implemented adapted to different problem sizes.
2020-11-05 15:34:58 +01:00
** Rules for the API
2020-10-22 00:50:07 +02:00
2020-11-05 15:34:58 +01:00
- =stdint= should be used for integers (=int32_t=, =int64_t=)
- integers used for counting should always be =int64_t=
- floats should be by default =double=, unless explicitly mentioned
- pointers are converted to =int64_t= to increase portability
2020-10-22 00:50:07 +02:00
2020-10-22 01:24:14 +02:00
* Documentation
2020-10-14 00:52:50 +02:00