mirror of
https://github.com/LCPQ/quantum_package
synced 2025-05-05 14:44:55 +02:00
Integrals_storage
parent
a261f9a08e
commit
b6399929d5
113
Integrals_storage.md
Normal file
113
Integrals_storage.md
Normal file
@ -0,0 +1,113 @@
|
|||||||
|
Two electron integrals are of the form (ij|kl), with permutation rules for the 4 indices.
|
||||||
|
It is not possible to store all the (ij|kl) integrals in memory under the form of a 4-dimensional
|
||||||
|
array, so we store instead all the non-zero integrals in a data structure, similar to a hash table.
|
||||||
|
|
||||||
|
We do use hash tables because :
|
||||||
|
|
||||||
|
1. Rehashing the table as it grows is too costly
|
||||||
|
2. We need some locality if possible : (ij|kl) should be close to (ij|k l+1)
|
||||||
|
3. We tried boost and C++/STL unordered maps, and the hash tables of Google but it was still too slow
|
||||||
|
|
||||||
|
We do not use binary search trees because:
|
||||||
|
|
||||||
|
1. We need to be able to add/update some data in a shared-memory parallel environment
|
||||||
|
2. It makes lots of jumps among memory pages before the data is found
|
||||||
|
|
||||||
|
So we decided to implement our own key/value data structure well adapted to our problem.
|
||||||
|
|
||||||
|
|
||||||
|
The Hash function
|
||||||
|
=================
|
||||||
|
|
||||||
|
The key is a function that maps the four indices i,j,k,l into a unique 64-bit
|
||||||
|
integer, and which returns the same value when then permutation symetry
|
||||||
|
operations are applied to the indices. As it is a tiny function, it is written
|
||||||
|
in the module that uses it intensively such that it can be inlined by the compiler,
|
||||||
|
namely the `Integrals_Bielec/mo_bi_integrals.irp.f` file.
|
||||||
|
|
||||||
|
```fortran
|
||||||
|
subroutine mo_bielec_integrals_index(i,j,k,l,i1)
|
||||||
|
implicit none
|
||||||
|
integer, intent(in) :: i,j,k,l
|
||||||
|
integer(key_kind), intent(out) :: i1
|
||||||
|
integer(key_kind) :: p,q,r,s,i2
|
||||||
|
p = min(i,k)
|
||||||
|
r = max(i,k)
|
||||||
|
p = p+ishft(r*r-r,-1)
|
||||||
|
q = min(j,l)
|
||||||
|
s = max(j,l)
|
||||||
|
q = q+ishft(s*s-s,-1)
|
||||||
|
i1 = min(p,q)
|
||||||
|
i2 = max(p,q)
|
||||||
|
i1 = i1+ishft(i2*i2-i2,-1)
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
The Data Structure
|
||||||
|
==================
|
||||||
|
|
||||||
|
Now that it is possible from (ij|kl) to obtain a unique 64-bit integer and the
|
||||||
|
value of the integral, these can be used to store the data in a data structure
|
||||||
|
similar to a hash table. In what follows, we call this data structure a `map`.
|
||||||
|
|
||||||
|
A `map` type is essentially an array of `cache_map`s. The `cache_map` type
|
||||||
|
contains mainly a pointer to an array of values and a pointer to an array of
|
||||||
|
keys, and a flag to know if the data is sorted by increasing keys.
|
||||||
|
|
||||||
|
The `cache_map` keys are stored on signed 16-bits integers, so the value of the
|
||||||
|
key will not exceed 32768. This guarantees un upper bound on the total number
|
||||||
|
of values of the `cache_map`, and the largest possible size of the keys array
|
||||||
|
will be 64kiB. Most of the time, it is expected to fit in the L1 cache.
|
||||||
|
|
||||||
|
|
||||||
|
Search
|
||||||
|
------
|
||||||
|
|
||||||
|
When accessing the `map` data structure, the key is split in two. The 49 leading
|
||||||
|
bits of the 64-bit key are used to extract the index of the cache map in the
|
||||||
|
array of `cache_map`s. The 15 least significant bits are extracted to generate
|
||||||
|
the `cache_map` key:
|
||||||
|
|
||||||
|
```
|
||||||
|
000110101001001110011010010110100011101000011010 | 010011101001101
|
||||||
|
<-------------- index of the cache_map ------------> <--- cache_key --->
|
||||||
|
integer*8 integer*2
|
||||||
|
```
|
||||||
|
|
||||||
|
A direct access is made to the `cache_map` (usually in RAM, not in a cache).
|
||||||
|
If the `cache_map` is not yet sorted, it is sorted. Then the `cache_key` is
|
||||||
|
searched for in the array of `cache_key`s using a binary search. As the maximum
|
||||||
|
possible size of the array is 32768, the data will be found in less than 16
|
||||||
|
tries, which are likely not to cross a memory page boundary. When the key is
|
||||||
|
found, the corresponding value is at the same index in the array of values. If
|
||||||
|
the key is not found, a value of zero is returned.
|
||||||
|
|
||||||
|
Insert
|
||||||
|
------
|
||||||
|
|
||||||
|
The key is appended to the array of keys and
|
||||||
|
the value is appended to the array of values. When the `cache_map` arrays are
|
||||||
|
completely filled with data, the size of the array is doubled
|
||||||
|
(`cache_map_reallocate`), the keys are sorted (`cache_map_sort`), and the order
|
||||||
|
is applied to the values array (`dset_order`). When duplicates are found
|
||||||
|
(`cache_map_unique`), their values are added together such that there is always
|
||||||
|
a unique key for a unique value. Upon addition of duplicates, the resulting
|
||||||
|
value may be lower than a threshold so in that case the entry is removed from
|
||||||
|
the `cache_map` (`cache_map_shrink`).
|
||||||
|
|
||||||
|
|
||||||
|
Parallelism
|
||||||
|
-----------
|
||||||
|
|
||||||
|
When the atomic integrals are written, they are not likely to access
|
||||||
|
simultaneously the same `cache_map`. So the `cache_map`s are filled and sorted
|
||||||
|
in parallel with almost no overhead. When all the data has been added, all the
|
||||||
|
`cache_map`s are sorted concurrently.
|
||||||
|
|
||||||
|
During the 4-index transformation, the molecular integrals need to be
|
||||||
|
frequently updated. Using this data structure, it is possible to modify the
|
||||||
|
elements in parallel since each `cache_map` has its own lock which is acquired
|
||||||
|
when the `cache_map` is modified.
|
||||||
|
|
||||||
|
|
Loading…
x
Reference in New Issue
Block a user