1
0
mirror of https://github.com/TREX-CoE/qmckl.git synced 2024-06-27 15:42:54 +02:00
qmckl/src
2021-03-09 01:16:23 +01:00
..
.gitignore Merging org files 2020-11-05 12:57:39 +01:00
Makefile Separated org-mode files 2021-03-09 01:16:23 +01:00
qmckl_ao.org Separated org-mode files 2021-03-09 01:16:23 +01:00
qmckl_context.org Separated org-mode files 2021-03-09 01:16:23 +01:00
qmckl_distance.org Separated org-mode files 2021-03-09 01:16:23 +01:00
qmckl_error.org Separated org-mode files 2021-03-09 01:16:23 +01:00
qmckl_footer.org Separated org-mode files 2021-03-09 01:16:23 +01:00
qmckl_memory.org Separated org-mode files 2021-03-09 01:16:23 +01:00
qmckl_precision.org Separated org-mode files 2021-03-09 01:16:23 +01:00
qmckl.org Separated org-mode files 2021-03-09 01:16:23 +01:00
README.org Separated org-mode files 2021-03-09 01:16:23 +01:00
table_of_contents Separated org-mode files 2021-03-09 01:16:23 +01:00
test_qmckl.org Separated org-mode files 2021-03-09 01:16:23 +01:00

QMCkl source code documentation

Introduction

The ultimate goal of the QMCkl library is to provide a high-performance implementation of the main kernels of QMC. In this particular implementation of the library, we focus on the definition of the API and the tests, and on a pedagogical presentation of the algorithms. We expect the HPC experts to use this repository as a reference for re-writing optimized libraries.

Literate programming

In a traditional source code, most of the lines of source files of a program are code, scripts, Makefiles, and only a few lines are comments explaining parts of the code that are non-trivial to understand. The documentation of the prorgam is usually written in a separate directory, and is often outdated compared to the code.

Literate programming is a different approach to programming, where the program is considered as a publishable-quality document. Most of the lines of the source files are text, mathematical formulas, tables, figures, etc, and the lines of code are just the translation in a computer language of the ideas and algorithms expressed in the text. More importantly, the "document" is structured like a text document with sections, subsections, a bibliography, a table of contents etc, and the place where pieces of code appear are the places where they should belong for the reader to understand the logic of the program, not the places where the compiler expects to find them. Both the publishable-quality document and the binary executable are produced from the same source files.

Literate programming is particularly well adapted in this context, as the central part of this project is the documentation of an API. The implementation of the algorithms is just an expression of the algorithms in a language that can be compiled, so that the correctness of the algorithms can be tested.

We have chosen to write the source files in org-mode format, as any text editor can be used to edit org-mode files. To produce the documentation, there exists multiple possibilities to convert org-mode files into different formats such as HTML or PDF. The source code is easily extracted from the org-mode files invoking the Emacs text editor from the command-line in the Makefile, and then the produced files are compiled. Moreover, within the Emacs text editor the source code blocks can be executed interactively, in the same spirit as Jupyter notebooks.

Source code editing

For a tutorial on literate programming with org-mode, follow this link.

Any text editor can be used to edit org-mode files. For a better user experience Emacs is recommended. For users hating Emacs, it is good to know that Emacs can behave like Vim when switched into ``Evil'' mode.

In the tools/init.el file, we provide a minimal Emacs configuration file for vim users. This file should be copied into .emacs.d/init.el.

For users with a preference for Jupyter notebooks, we also provide the tools/nb_to_org.sh script can convert jupyter notebooks into org-mode files.

Note that pandoc can be used to convert multiple markdown formats into org-mode.

Choice of the programming language

Most of the codes of the TREX CoE are written in Fortran with some scripts in Bash and Python. Outside of the CoE, Fortran is also important (Casino, Amolqc), and other important languages used by the community are C and C++ (QMCPack, QWalk), and Julia is gaining in popularity. The library we design should be compatible with all of these languages. The QMCkl API has to be compatible with the C language since libraries with a C-compatible API can be used in every other language.

High-performance versions of the QMCkl, with the same API, will be rewritten by the experts in HPC. These optimized libraries will be tuned for specific architectures, among which we can cite x86 based processors, and GPU accelerators. Nowadays, the most efficient software tools to take advantage of low-level features of the processor (intrinsics) and of GPUs are for C++ developers. It is highly probable that the optimized implementations will be written in C++, and this is agreement with our choice to make the API C-compatible.

Fortran is one of the most common languages used by the community, and is simple enough to make the algorithms readable both by experts in QMC, and experts in HPC. Hence we propose in this pedagogical implementation of QMCkl to use Fortran to express the QMC algorithms. As the main languages of the library is C, this implies that the exposed C functions call the Fortran routine. However, for internal functions related to system programming, the C language is more natural than Fortran.

The Fortran source files should provide a C interface using the iso_c_binding module. The name of the Fortran source files should end with _f.f90 to be properly handled by the Makefile. The names of the functions defined in Fortran should be the same as those exposed in the API suffixed by _f. Fortran interfaces should also be written in the qmckl_f.f90 file.

For more guidelines on using Fortran to generate a C interface, see this link.

Design of the library

The proposed API should allow the library to: deal with memory transfers between CPU and accelerators, and to use different levels of floating-point precision. We chose a multi-layered design with low-level and high-level functions (see below).

Naming conventions

To avoid namespace collisions, we use qmckl_ as a prefix for all exported functions and variables. All exported header files should have a file name prefixed with qmckl_.

If the name of the org-mode file is xxx.org, the name of the produced C files should be xxx.c and xxx.h and the name of the produced Fortran file should be xxx.f90.

Arrays are in uppercase and scalars are in lowercase.

In the names of the variables and functions, only the singular form is allowed.

Application programming interface

In the C language, the number of bits used by the integer types can change from one architecture to another one. To circumvent this problem, we choose to use the integer types defined in <stdint.h> where the number of bits used for the integers are fixed.

To ensure that the library will be easily usable in any other language than C, we restrict the data types in the interfaces to the following:

  • 32-bit and 64-bit integers, scalars and and arrays (int32_t and int64_t)
  • 32-bit and 64-bit floats, scalars and and arrays (float and double)
  • Pointers are always casted into 64-bit integers, even on legacy 32-bit architectures
  • ASCII strings are represented as a pointers to character arrays and terminated by a '\0' character (C convention).
  • Complex numbers can be represented by an array of 2 floats.
  • Boolean variables are stored as integers, 1 for true and 0 for false
  • Floating point variables should be by default
  • double unless explicitly mentioned
  • integers used for counting should always be int64_t

To facilitate the use in other languages than C, we will provide some bindings in other languages in other repositories.

Global state

Global variables should be avoided in the library, because it is possible that one single program needs to use multiple instances of the library. To solve this problem we propose to use a pointer to a context variable, built by the library with the qmckl_context_create function. The context contains the global state of the library, and is used as the first argument of many QMCkl functions.

The internal structure of the context is not specified, to give a maximum of freedom to the different implementations. Modifying the state is done by setters and getters, prefixed by qmckl_context_set_ an qmckl_context_get_. When a context variable is modified by a setter, a copy of the old data structure is made and updated, and the pointer to the new data structure is returned, such that the old contexts can still be accessed. It is also possible to modify the state in an impure fashion, using the qmckl_context_update_ functions. The context and its old versions can be destroyed with qmckl_context_destroy.

Low-level functions

Low-level functions are very simple functions which are leaves of the function call tree (they don't call any other QMCkl function).

These functions are pure, and unaware of the QMCkl context. They are not allowed to allocate/deallocate memory, and if they need temporary memory it should be provided in input.

High-level functions

High-level functions are at the top of the function call tree. They are able to choose which lower-level function to call depending on the required precision, and do the corresponding type conversions. These functions are also responsible for allocating temporary storage, to simplify the use of accelerators.

The high-level functions should be pure, unless the introduction of non-purity is justified. All the side effects should be made in the context variable.

Numerical precision

The number of bits of precision required for a function should be given as an input of low-level computational functions. This input will be used to define the values of the different thresholds that might be used to avoid computing unnecessary noise. High-level functions will use the precision specified in the context variable.

Algorithms

Reducing the scaling of an algorithm usually implies also reducing its arithmetic complexity (number of flops per byte). Therefore, for small sizes \(\mathcal{O}(N^3)\) and \(\mathcal{O}(N^2)\) algorithms are better adapted than linear scaling algorithms. As QMCkl is a general purpose library, multiple algorithms should be implemented adapted to different problem sizes.

Rules for the API

  • stdint should be used for integers (int32_t, int64_t)
  • integers used for counting should always be int64_t
  • floats should be by default double, unless explicitly mentioned
  • pointers are converted to int64_t to increase portability

Documentation