From b6399929d58d695304ef5345a088630bf0b8cedf Mon Sep 17 00:00:00 2001 From: Anthony Scemama Date: Fri, 17 Mar 2017 16:01:25 +0100 Subject: [PATCH] Integrals_storage --- Integrals_storage.md | 113 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 Integrals_storage.md diff --git a/Integrals_storage.md b/Integrals_storage.md new file mode 100644 index 0000000..f1fc740 --- /dev/null +++ b/Integrals_storage.md @@ -0,0 +1,113 @@ +Two electron integrals are of the form (ij|kl), with permutation rules for the 4 indices. +It is not possible to store all the (ij|kl) integrals in memory under the form of a 4-dimensional +array, so we store instead all the non-zero integrals in a data structure, similar to a hash table. + +We do use hash tables because : + +1. Rehashing the table as it grows is too costly +2. We need some locality if possible : (ij|kl) should be close to (ij|k l+1) +3. We tried boost and C++/STL unordered maps, and the hash tables of Google but it was still too slow + +We do not use binary search trees because: + +1. We need to be able to add/update some data in a shared-memory parallel environment +2. It makes lots of jumps among memory pages before the data is found + +So we decided to implement our own key/value data structure well adapted to our problem. + + +The Hash function +================= + +The key is a function that maps the four indices i,j,k,l into a unique 64-bit +integer, and which returns the same value when then permutation symetry +operations are applied to the indices. As it is a tiny function, it is written +in the module that uses it intensively such that it can be inlined by the compiler, +namely the `Integrals_Bielec/mo_bi_integrals.irp.f` file. + +```fortran +subroutine mo_bielec_integrals_index(i,j,k,l,i1) + implicit none + integer, intent(in) :: i,j,k,l + integer(key_kind), intent(out) :: i1 + integer(key_kind) :: p,q,r,s,i2 + p = min(i,k) + r = max(i,k) + p = p+ishft(r*r-r,-1) + q = min(j,l) + s = max(j,l) + q = q+ishft(s*s-s,-1) + i1 = min(p,q) + i2 = max(p,q) + i1 = i1+ishft(i2*i2-i2,-1) +end +``` + + +The Data Structure +================== + +Now that it is possible from (ij|kl) to obtain a unique 64-bit integer and the +value of the integral, these can be used to store the data in a data structure +similar to a hash table. In what follows, we call this data structure a `map`. + +A `map` type is essentially an array of `cache_map`s. The `cache_map` type +contains mainly a pointer to an array of values and a pointer to an array of +keys, and a flag to know if the data is sorted by increasing keys. + +The `cache_map` keys are stored on signed 16-bits integers, so the value of the +key will not exceed 32768. This guarantees un upper bound on the total number +of values of the `cache_map`, and the largest possible size of the keys array +will be 64kiB. Most of the time, it is expected to fit in the L1 cache. + + +Search +------ + +When accessing the `map` data structure, the key is split in two. The 49 leading +bits of the 64-bit key are used to extract the index of the cache map in the +array of `cache_map`s. The 15 least significant bits are extracted to generate +the `cache_map` key: + +``` + 000110101001001110011010010110100011101000011010 | 010011101001101 + <-------------- index of the cache_map ------------> <--- cache_key ---> + integer*8 integer*2 +``` + +A direct access is made to the `cache_map` (usually in RAM, not in a cache). +If the `cache_map` is not yet sorted, it is sorted. Then the `cache_key` is +searched for in the array of `cache_key`s using a binary search. As the maximum +possible size of the array is 32768, the data will be found in less than 16 +tries, which are likely not to cross a memory page boundary. When the key is +found, the corresponding value is at the same index in the array of values. If +the key is not found, a value of zero is returned. + +Insert +------ + +The key is appended to the array of keys and +the value is appended to the array of values. When the `cache_map` arrays are +completely filled with data, the size of the array is doubled +(`cache_map_reallocate`), the keys are sorted (`cache_map_sort`), and the order +is applied to the values array (`dset_order`). When duplicates are found +(`cache_map_unique`), their values are added together such that there is always +a unique key for a unique value. Upon addition of duplicates, the resulting +value may be lower than a threshold so in that case the entry is removed from +the `cache_map` (`cache_map_shrink`). + + +Parallelism +----------- + +When the atomic integrals are written, they are not likely to access +simultaneously the same `cache_map`. So the `cache_map`s are filled and sorted +in parallel with almost no overhead. When all the data has been added, all the +`cache_map`s are sorted concurrently. + +During the 4-index transformation, the molecular integrals need to be +frequently updated. Using this data structure, it is possible to modify the +elements in parallel since each `cache_map` has its own lock which is acquired +when the `cache_map` is modified. + +