mirror of
https://github.com/LCPQ/quantum_package
synced 2025-05-05 14:44:55 +02:00
Integrals_storage
parent
a261f9a08e
commit
b6399929d5
113
Integrals_storage.md
Normal file
113
Integrals_storage.md
Normal file
@ -0,0 +1,113 @@
|
||||
Two electron integrals are of the form (ij|kl), with permutation rules for the 4 indices.
|
||||
It is not possible to store all the (ij|kl) integrals in memory under the form of a 4-dimensional
|
||||
array, so we store instead all the non-zero integrals in a data structure, similar to a hash table.
|
||||
|
||||
We do use hash tables because :
|
||||
|
||||
1. Rehashing the table as it grows is too costly
|
||||
2. We need some locality if possible : (ij|kl) should be close to (ij|k l+1)
|
||||
3. We tried boost and C++/STL unordered maps, and the hash tables of Google but it was still too slow
|
||||
|
||||
We do not use binary search trees because:
|
||||
|
||||
1. We need to be able to add/update some data in a shared-memory parallel environment
|
||||
2. It makes lots of jumps among memory pages before the data is found
|
||||
|
||||
So we decided to implement our own key/value data structure well adapted to our problem.
|
||||
|
||||
|
||||
The Hash function
|
||||
=================
|
||||
|
||||
The key is a function that maps the four indices i,j,k,l into a unique 64-bit
|
||||
integer, and which returns the same value when then permutation symetry
|
||||
operations are applied to the indices. As it is a tiny function, it is written
|
||||
in the module that uses it intensively such that it can be inlined by the compiler,
|
||||
namely the `Integrals_Bielec/mo_bi_integrals.irp.f` file.
|
||||
|
||||
```fortran
|
||||
subroutine mo_bielec_integrals_index(i,j,k,l,i1)
|
||||
implicit none
|
||||
integer, intent(in) :: i,j,k,l
|
||||
integer(key_kind), intent(out) :: i1
|
||||
integer(key_kind) :: p,q,r,s,i2
|
||||
p = min(i,k)
|
||||
r = max(i,k)
|
||||
p = p+ishft(r*r-r,-1)
|
||||
q = min(j,l)
|
||||
s = max(j,l)
|
||||
q = q+ishft(s*s-s,-1)
|
||||
i1 = min(p,q)
|
||||
i2 = max(p,q)
|
||||
i1 = i1+ishft(i2*i2-i2,-1)
|
||||
end
|
||||
```
|
||||
|
||||
|
||||
The Data Structure
|
||||
==================
|
||||
|
||||
Now that it is possible from (ij|kl) to obtain a unique 64-bit integer and the
|
||||
value of the integral, these can be used to store the data in a data structure
|
||||
similar to a hash table. In what follows, we call this data structure a `map`.
|
||||
|
||||
A `map` type is essentially an array of `cache_map`s. The `cache_map` type
|
||||
contains mainly a pointer to an array of values and a pointer to an array of
|
||||
keys, and a flag to know if the data is sorted by increasing keys.
|
||||
|
||||
The `cache_map` keys are stored on signed 16-bits integers, so the value of the
|
||||
key will not exceed 32768. This guarantees un upper bound on the total number
|
||||
of values of the `cache_map`, and the largest possible size of the keys array
|
||||
will be 64kiB. Most of the time, it is expected to fit in the L1 cache.
|
||||
|
||||
|
||||
Search
|
||||
------
|
||||
|
||||
When accessing the `map` data structure, the key is split in two. The 49 leading
|
||||
bits of the 64-bit key are used to extract the index of the cache map in the
|
||||
array of `cache_map`s. The 15 least significant bits are extracted to generate
|
||||
the `cache_map` key:
|
||||
|
||||
```
|
||||
000110101001001110011010010110100011101000011010 | 010011101001101
|
||||
<-------------- index of the cache_map ------------> <--- cache_key --->
|
||||
integer*8 integer*2
|
||||
```
|
||||
|
||||
A direct access is made to the `cache_map` (usually in RAM, not in a cache).
|
||||
If the `cache_map` is not yet sorted, it is sorted. Then the `cache_key` is
|
||||
searched for in the array of `cache_key`s using a binary search. As the maximum
|
||||
possible size of the array is 32768, the data will be found in less than 16
|
||||
tries, which are likely not to cross a memory page boundary. When the key is
|
||||
found, the corresponding value is at the same index in the array of values. If
|
||||
the key is not found, a value of zero is returned.
|
||||
|
||||
Insert
|
||||
------
|
||||
|
||||
The key is appended to the array of keys and
|
||||
the value is appended to the array of values. When the `cache_map` arrays are
|
||||
completely filled with data, the size of the array is doubled
|
||||
(`cache_map_reallocate`), the keys are sorted (`cache_map_sort`), and the order
|
||||
is applied to the values array (`dset_order`). When duplicates are found
|
||||
(`cache_map_unique`), their values are added together such that there is always
|
||||
a unique key for a unique value. Upon addition of duplicates, the resulting
|
||||
value may be lower than a threshold so in that case the entry is removed from
|
||||
the `cache_map` (`cache_map_shrink`).
|
||||
|
||||
|
||||
Parallelism
|
||||
-----------
|
||||
|
||||
When the atomic integrals are written, they are not likely to access
|
||||
simultaneously the same `cache_map`. So the `cache_map`s are filled and sorted
|
||||
in parallel with almost no overhead. When all the data has been added, all the
|
||||
`cache_map`s are sorted concurrently.
|
||||
|
||||
During the 4-index transformation, the molecular integrals need to be
|
||||
frequently updated. Using this data structure, it is possible to modify the
|
||||
elements in parallel since each `cache_map` has its own lock which is acquired
|
||||
when the `cache_map` is modified.
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user