Integrals_storage

2025-05-05 14:44:55 +02:00 · 2017-03-17 16:01:25 +01:00 · 2017-03-17 16:01:25 +01:00 · b6399929d5
commit b6399929d5
parent a261f9a08e
1 changed files with 113 additions and 0 deletions
--- a/Integrals_storage.md
+++ b/Integrals_storage.md
@ -0,0 +1,113 @@
 Two electron integrals are of the form (ij|kl), with permutation rules for the 4 indices.
 It is not possible to store all the (ij|kl) integrals in memory under the form of a 4-dimensional
 array, so we store instead all the non-zero integrals in a data structure, similar to a hash table.
 We do use hash tables because :
 1. Rehashing the table as it grows is too costly
 2. We need some locality if possible : (ij|kl) should be close to (ij|k l+1)
 3. We tried boost and C++/STL unordered maps, and the hash tables of Google but it was still too slow
 We do not use binary search trees because:
 1. We need to be able to add/update some data in a shared-memory parallel environment
 2. It makes lots of jumps among memory pages before the data is found
 So we decided to implement our own key/value data structure well adapted to our problem.
 The Hash function
 =================
 The key is a function that maps the four indices i,j,k,l into a unique 64-bit
 integer, and which returns the same value when then permutation symetry
 operations are applied to the indices. As it is a tiny function, it is written
 in the module that uses it intensively such that it can be inlined by the compiler,
 namely the `Integrals_Bielec/mo_bi_integrals.irp.f` file.
 ```fortran
 subroutine mo_bielec_integrals_index(i,j,k,l,i1)
  implicit none
  integer, intent(in) :: i,j,k,l
  integer(key_kind), intent(out) :: i1
  integer(key_kind) :: p,q,r,s,i2
  p = min(i,k)
  r = max(i,k)
  p = p+ishft(r*r-r,-1)
  q = min(j,l)
  s = max(j,l)
  q = q+ishft(s*s-s,-1)
  i1 = min(p,q)
  i2 = max(p,q)
  i1 = i1+ishft(i2*i2-i2,-1)
 end
 ```
 The Data Structure
 ==================
 Now that it is possible from (ij|kl) to obtain a unique 64-bit integer and the
 value of the integral, these can be used to store the data in a data structure
 similar to a hash table. In what follows, we call this data structure a `map`.
 A `map` type is essentially an array of `cache_map`s.  The `cache_map` type
 contains mainly a pointer to an array of values and a pointer to an array of
 keys, and a flag to know if the data is sorted by increasing keys.
 The `cache_map` keys are stored on signed 16-bits integers, so the value of the
 key will not exceed 32768. This guarantees un upper bound on the total number
 of values of the `cache_map`, and the largest possible size of the keys array
 will be 64kiB. Most of the time, it is expected to fit in the L1 cache.
 Search
 ------
 When accessing the `map` data structure, the key is split in two. The 49 leading
 bits of the 64-bit key are used to extract the index of the cache map in the
 array of `cache_map`s.  The 15 least significant bits are extracted to generate
 the `cache_map` key:
 ```
     000110101001001110011010010110100011101000011010  |  010011101001101
   <-------------- index of the cache_map ------------> <--- cache_key --->
                         integer*8                           integer*2
 ```
 A direct access is made to the `cache_map` (usually in RAM, not in a cache).
 If the `cache_map` is not yet sorted, it is sorted.  Then the `cache_key` is
 searched for in the array of `cache_key`s using a binary search. As the maximum
 possible size of the array is 32768, the data will be found in less than 16
 tries, which are likely not to cross a memory page boundary.  When the key is
 found, the corresponding value is at the same index in the array of values.  If
 the key is not found, a value of zero is returned.
 Insert
 ------
 The key is appended to the array of keys and
 the value is appended to the array of values.  When the `cache_map` arrays are
 completely filled with data, the size of the array is doubled
 (`cache_map_reallocate`), the keys are sorted (`cache_map_sort`), and the order
 is applied to the values array (`dset_order`). When duplicates are found
 (`cache_map_unique`), their values are added together such that there is always
 a unique key for a unique value.  Upon addition of duplicates, the resulting
 value may be lower than a threshold so in that case the entry is removed from
 the `cache_map` (`cache_map_shrink`).
 Parallelism
 -----------
 When the atomic integrals are written, they are not likely to access
 simultaneously the same `cache_map`.  So the `cache_map`s are filled and sorted
 in parallel with almost no overhead. When all the data has been added, all the
 `cache_map`s are sorted concurrently.
 During the 4-index transformation, the molecular integrals need to be
 frequently updated. Using this data structure, it is possible to modify the
 elements in parallel since each `cache_map` has its own lock which is acquired
 when the `cache_map` is modified.