10
0
mirror of https://github.com/LCPQ/quantum_package synced 2025-05-05 14:44:55 +02:00

Integrals_storage

Anthony Scemama 2017-03-17 16:01:25 +01:00
parent a261f9a08e
commit b6399929d5

113
Integrals_storage.md Normal file

@ -0,0 +1,113 @@
Two electron integrals are of the form (ij|kl), with permutation rules for the 4 indices.
It is not possible to store all the (ij|kl) integrals in memory under the form of a 4-dimensional
array, so we store instead all the non-zero integrals in a data structure, similar to a hash table.
We do use hash tables because :
1. Rehashing the table as it grows is too costly
2. We need some locality if possible : (ij|kl) should be close to (ij|k l+1)
3. We tried boost and C++/STL unordered maps, and the hash tables of Google but it was still too slow
We do not use binary search trees because:
1. We need to be able to add/update some data in a shared-memory parallel environment
2. It makes lots of jumps among memory pages before the data is found
So we decided to implement our own key/value data structure well adapted to our problem.
The Hash function
=================
The key is a function that maps the four indices i,j,k,l into a unique 64-bit
integer, and which returns the same value when then permutation symetry
operations are applied to the indices. As it is a tiny function, it is written
in the module that uses it intensively such that it can be inlined by the compiler,
namely the `Integrals_Bielec/mo_bi_integrals.irp.f` file.
```fortran
subroutine mo_bielec_integrals_index(i,j,k,l,i1)
implicit none
integer, intent(in) :: i,j,k,l
integer(key_kind), intent(out) :: i1
integer(key_kind) :: p,q,r,s,i2
p = min(i,k)
r = max(i,k)
p = p+ishft(r*r-r,-1)
q = min(j,l)
s = max(j,l)
q = q+ishft(s*s-s,-1)
i1 = min(p,q)
i2 = max(p,q)
i1 = i1+ishft(i2*i2-i2,-1)
end
```
The Data Structure
==================
Now that it is possible from (ij|kl) to obtain a unique 64-bit integer and the
value of the integral, these can be used to store the data in a data structure
similar to a hash table. In what follows, we call this data structure a `map`.
A `map` type is essentially an array of `cache_map`s. The `cache_map` type
contains mainly a pointer to an array of values and a pointer to an array of
keys, and a flag to know if the data is sorted by increasing keys.
The `cache_map` keys are stored on signed 16-bits integers, so the value of the
key will not exceed 32768. This guarantees un upper bound on the total number
of values of the `cache_map`, and the largest possible size of the keys array
will be 64kiB. Most of the time, it is expected to fit in the L1 cache.
Search
------
When accessing the `map` data structure, the key is split in two. The 49 leading
bits of the 64-bit key are used to extract the index of the cache map in the
array of `cache_map`s. The 15 least significant bits are extracted to generate
the `cache_map` key:
```
000110101001001110011010010110100011101000011010 | 010011101001101
<-------------- index of the cache_map ------------> <--- cache_key --->
integer*8 integer*2
```
A direct access is made to the `cache_map` (usually in RAM, not in a cache).
If the `cache_map` is not yet sorted, it is sorted. Then the `cache_key` is
searched for in the array of `cache_key`s using a binary search. As the maximum
possible size of the array is 32768, the data will be found in less than 16
tries, which are likely not to cross a memory page boundary. When the key is
found, the corresponding value is at the same index in the array of values. If
the key is not found, a value of zero is returned.
Insert
------
The key is appended to the array of keys and
the value is appended to the array of values. When the `cache_map` arrays are
completely filled with data, the size of the array is doubled
(`cache_map_reallocate`), the keys are sorted (`cache_map_sort`), and the order
is applied to the values array (`dset_order`). When duplicates are found
(`cache_map_unique`), their values are added together such that there is always
a unique key for a unique value. Upon addition of duplicates, the resulting
value may be lower than a threshold so in that case the entry is removed from
the `cache_map` (`cache_map_shrink`).
Parallelism
-----------
When the atomic integrals are written, they are not likely to access
simultaneously the same `cache_map`. So the `cache_map`s are filled and sorted
in parallel with almost no overhead. When all the data has been added, all the
`cache_map`s are sorted concurrently.
During the 4-index transformation, the molecular integrals need to be
frequently updated. Using this data structure, it is possible to modify the
elements in parallel since each `cache_map` has its own lock which is acquired
when the `cache_map` is modified.