Table of Contents
Two electron integrals are of the form (ij|kl), with permutation rules for the 4 indices. It is not possible to store all the (ij|kl) integrals in memory under the form of a 4-dimensional array, so we store instead all the non-zero integrals in a data structure, similar to a hash table.
We do not use hash tables because :
- Rehashing the table as it grows is too costly
- We need some locality if possible : (ij|kl) should be close to (ij|k l+1)
- We tried boost and C++/STL unordered maps, and the hash tables of Google but it was still too slow
We do not use binary search trees because:
- We need to be able to add/update some data in a shared-memory parallel environment
- It makes lots of jumps among memory pages before the data is found
So we decided to implement our own key/value data structure well adapted to our problem.
This is implemented in the Utils/map_module.f90
file.
I would like to re-implement this data-structure in C as an open-source independent library, so that it could be easily used in many other programs and other languages than Fortran. Also, with a library one has the freedom to add some intrinsics for a better use of the hardware. If you are willing to contribute, here is the address of this project : https://github.com/scemama/sparse_array
The Hash function
The key is a function that maps the four indices i,j,k,l into a unique 64-bit
integer, and which returns the same value when then permutation symmetry
operations are applied to the indices. As it is a tiny function, it is written
in the module that uses it intensively such that it can be inlined by the compiler,
namely the Integrals_Bielec/mo_bi_integrals.irp.f
file.
subroutine mo_bielec_integrals_index(i,j,k,l,i1)
implicit none
integer, intent(in) :: i,j,k,l
integer(key_kind), intent(out) :: i1
integer(key_kind) :: p,q,r,s,i2
p = min(i,k)
r = max(i,k)
p = p+ishft(r*r-r,-1)
q = min(j,l)
s = max(j,l)
q = q+ishft(s*s-s,-1)
i1 = min(p,q)
i2 = max(p,q)
i1 = i1+ishft(i2*i2-i2,-1)
end
The Data Structure
Now that it is possible from (ij|kl) to obtain a unique 64-bit integer and the
value of the integral, these can be used to store the data in a data structure
similar to a hash table. In what follows, we call this data structure a map
.
A map
type is essentially an array of cache_map
s. The cache_map
type
contains mainly a pointer to an array of values and a pointer to an array of
keys, and a flag to know if the data is sorted by increasing keys.
The cache_map
keys are stored on signed 16-bits integers, so the value of the
key will not exceed 32768. This guarantees an upper bound on the total number
of values of the cache_map
, and the largest possible size of the keys array
will be 64kiB. Most of the time, it is expected to fit in the L1 cache.
Search
When accessing the map
data structure, the key is split in two. The 49 leading
bits of the 64-bit key are used to extract the index of the cache map in the
array of cache_map
s. The 15 least significant bits are extracted to generate
the cache_map
key:
000110101001001110011010010110100011101000011010 | 010011101001101
<-------------- index of the cache_map ------------> <--- cache_key --->
integer*8 integer*2
A direct access is made to the cache_map
(usually in RAM, not in a cache).
If the cache_map
is not yet sorted, it is sorted. Then the cache_key
is
searched for in the array of cache_key
s using a binary search. As the maximum
possible size of the array is 32768, the data will be found in less than 16
tries, which are likely not to cross a memory page boundary. When the key is
found, the corresponding value is at the same index in the array of values. If
the key is not found, a value of zero is returned.
Insert
The key is appended to the array of keys and
the value is appended to the array of values. When the cache_map
arrays are
completely filled with data, the size of the array is doubled
(cache_map_reallocate
), the keys are sorted (cache_map_sort
), and the order
is applied to the values array (dset_order
). When duplicates are found
(cache_map_unique
), their values are added together such that there is always
a unique key for a unique value. Upon addition of duplicates, the resulting
value may be lower than a threshold so in that case the entry is removed from
the cache_map
(cache_map_shrink
).
Thread safety
When the atomic integrals are written, they are not likely to access
simultaneously the same cache_map
. So the cache_map
s are filled and sorted
concurrently with almost no overhead. When all the data has been added, all the
cache_map
s are sorted concurrently.
During the 4-index transformation, the molecular integrals need to be
frequently updated. Using this data structure, it is possible to modify the
elements in a parallel section each cache_map
has its own lock which is acquired
when the cache_map
is modified.