Sherman-Morrison/random_generator/README.md

132 lines
4.8 KiB
Markdown

![Builds](https://github.com/Thukisdo/mlkaps-random-generator/actions/workflows/cmake.yml/badge.svg)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity)
![Maintainer](https://img.shields.io/badge/maintainer-Thukisdo-blue)
# Random cycle generator for QMC computations
TREX's QMCKL library provides an implementation of the Scherman-morrison formula.
This tool provides a way to generate random HDF5 datasets that can be used for
benchmarking those implementations.
Specifically, it reproduces "cycles" that are generated by QMCKL computations.
A cycle is a sequence of row updates on a matrix, that require multiple calls to
Scherman-morrison to update the corresponding inverse matrix.
Note that this tool main purpose is to generate splitting cycles ([More on this here](#More-on-splitting-cycles))
## Usage
This project uses cmake, so first set up a build directory and build the executable
```shell
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j 4
```
The executable can then be launched with
```shell
cd build
./bin/random_generator <output_file> <generation_mode>
```
Where generation mode is one of the following:
- matrix_size: Generate a dataset of increasing matrix sizes, stored as
```
/
|-<matrix_size>/
| |-<number_of_splitting_updates>/
| | |-cycle_<xxx>/
| | |- ...
| | |-cycle_<xxx>
| | |- ...
| |-<number_of_splitting_updates>/
|-<matrix_size>/
...
```
- update: Generate a dataset of increasing update count, stored as
```
/
|-<number_of_updates>/
| |-<number_of_splitting_updates>/
| | |-cycle_<xxx>/
| | |- ...
| | |-cycle_<xxx>
| | |- ...
| |-<number_of_splitting_updates>/
|-<number_of_updates>/
...
```
If you require another generation mode, please open an issue or modify the main accordingly.
# Cycle format
Cycles are stored in the following format:
```
cycle_<xxx>/
|-slater_inverse_t: The transposed-inverse of the slater matrix to update
|-updates: A matrix containing all the (additive) updates to apply to the slater matrix
|-nupdates: The number of updates in the cycle
|-determinant: The determinant of the slater matrix before the updates
|-slater_matrix: The slater matrix to update
|-col_update_index: The index of the row to update in the slater matrix
|-condition_number: The condition number of the slater matrix before the updates
```
Note that the updates are not applied on columns of the transposed-inverse slater matrix, but on its rows.
As such, it is needed to transpose the slater matrix before applying the updates.
# More on matrix-update cycles
During QMC computation, a series of updates are applied on a matrix. One update affect an entire row/column
of the matrix. This series of successive updates is called a "cycle".
Instead of computing the inverse of this matrix from scratch (which is expansive),
it is possible to update the inverse by applying the Scherman-Morrison algorithm.
This tool generates a dataset containing a number of matrix-update cycles.
# More on splitting cycles
During the cycle, an update may render the matrix non-invertible. We will call such an update a "splitting update".
Some algorithm will then fail to apply the cycle, whereas smarter algorithms will succeed by delaying the update
(applying the remaining updates first), or by applying the half of the update and delaying the other half.
Those splitting updates lead to increased computation time, and are critical for performance evaluation.
As such, this tool is centered around generating splitting updates.
# How it works
The hardest case for Scherman-Morrison is when the matrix is almost-singular (splitting update).
To reproduce this case, we generate an update that render a given column colinear to another.
A naive implementation would produce a failure after checking that the determinant is close to zero.
To fix this, we add a small noise to this update, and follow it with another update that breaks the colinearity.
### Chained-updates
We use a basic implementation of this algorithm called the "Chained-updates".
It produces updates by traversing the matrix left-to-right, where splitting updates are made colinear
to the column to their right.
A normal update will then break the chain of any previous splitting updates.
We ensure that the final update of the cycle is non-splitting, to guarantee that the final matrix
is invertible.
### Pitfalls
This methods has two main downfalls:
- A cycle cannot have a single splitting update, since a normal update is required to break the chain.
- This is the best case scenario for implementation of Scherman-Morrison that postpone splitting updates.
We could make this harder by generating bi-directional chains.