5 Integrals_storage
Anthony Scemama edited this page 2017-03-17 16:17:32 +01:00

Two electron integrals are of the form (ij|kl), with permutation rules for the 4 indices. It is not possible to store all the (ij|kl) integrals in memory under the form of a 4-dimensional array, so we store instead all the non-zero integrals in a data structure, similar to a hash table.

We do not use hash tables because :

  1. Rehashing the table as it grows is too costly
  2. We need some locality if possible : (ij|kl) should be close to (ij|k l+1)
  3. We tried boost and C++/STL unordered maps, and the hash tables of Google but it was still too slow

We do not use binary search trees because:

  1. We need to be able to add/update some data in a shared-memory parallel environment
  2. It makes lots of jumps among memory pages before the data is found

So we decided to implement our own key/value data structure well adapted to our problem. This is implemented in the Utils/map_module.f90 file.

I would like to re-implement this data-structure in C as an open-source independent library, so that it could be easily used in many other programs and other languages than Fortran. Also, with a library one has the freedom to add some intrinsics for a better use of the hardware. If you are willing to contribute, here is the address of this project : https://github.com/scemama/sparse_array

The Hash function

The key is a function that maps the four indices i,j,k,l into a unique 64-bit integer, and which returns the same value when then permutation symmetry operations are applied to the indices. As it is a tiny function, it is written in the module that uses it intensively such that it can be inlined by the compiler, namely the Integrals_Bielec/mo_bi_integrals.irp.f file.

subroutine mo_bielec_integrals_index(i,j,k,l,i1)
  implicit none
  integer, intent(in) :: i,j,k,l
  integer(key_kind), intent(out) :: i1
  integer(key_kind) :: p,q,r,s,i2
  p = min(i,k)
  r = max(i,k)
  p = p+ishft(r*r-r,-1)
  q = min(j,l)
  s = max(j,l)
  q = q+ishft(s*s-s,-1)
  i1 = min(p,q)
  i2 = max(p,q)
  i1 = i1+ishft(i2*i2-i2,-1)
end

The Data Structure

Now that it is possible from (ij|kl) to obtain a unique 64-bit integer and the value of the integral, these can be used to store the data in a data structure similar to a hash table. In what follows, we call this data structure a map.

A map type is essentially an array of cache_maps. The cache_map type contains mainly a pointer to an array of values and a pointer to an array of keys, and a flag to know if the data is sorted by increasing keys.

The cache_map keys are stored on signed 16-bits integers, so the value of the key will not exceed 32768. This guarantees an upper bound on the total number of values of the cache_map, and the largest possible size of the keys array will be 64kiB. Most of the time, it is expected to fit in the L1 cache.

When accessing the map data structure, the key is split in two. The 49 leading bits of the 64-bit key are used to extract the index of the cache map in the array of cache_maps. The 15 least significant bits are extracted to generate the cache_map key:

     000110101001001110011010010110100011101000011010  |  010011101001101
   <-------------- index of the cache_map ------------> <--- cache_key --->
                         integer*8                           integer*2

A direct access is made to the cache_map (usually in RAM, not in a cache). If the cache_map is not yet sorted, it is sorted. Then the cache_key is searched for in the array of cache_keys using a binary search. As the maximum possible size of the array is 32768, the data will be found in less than 16 tries, which are likely not to cross a memory page boundary. When the key is found, the corresponding value is at the same index in the array of values. If the key is not found, a value of zero is returned.

Insert

The key is appended to the array of keys and the value is appended to the array of values. When the cache_map arrays are completely filled with data, the size of the array is doubled (cache_map_reallocate), the keys are sorted (cache_map_sort), and the order is applied to the values array (dset_order). When duplicates are found (cache_map_unique), their values are added together such that there is always a unique key for a unique value. Upon addition of duplicates, the resulting value may be lower than a threshold so in that case the entry is removed from the cache_map (cache_map_shrink).

Thread safety

When the atomic integrals are written, they are not likely to access simultaneously the same cache_map. So the cache_maps are filled and sorted concurrently with almost no overhead. When all the data has been added, all the cache_maps are sorted concurrently.

During the 4-index transformation, the molecular integrals need to be frequently updated. Using this data structure, it is possible to modify the elements in a parallel section each cache_map has its own lock which is acquired when the cache_map is modified.