Two electron integrals are of the form (ij|kl), with permutation rules for the 4 indices. It is not possible to store all the (ij|kl) integrals in memory under the form of a 4-dimensional array, so we store instead all the non-zero integrals in a data structure, similar to a hash table.
We do not use hash tables because :
- Rehashing the table as it grows is too costly
- We need some locality if possible : (ij|kl) should be close to (ij|k l+1)
- We tried boost and C++/STL unordered maps, and the hash tables of Google but it was still too slow
We do not use binary search trees because:
- We need to be able to add/update some data in a shared-memory parallel environment
- It makes lots of jumps among memory pages before the data is found
So we decided to implement our own key/value data structure well
adapted to our problem. This is implemented in the
Utils/map_module.f90
file.
I would like to re-implement this data-structure in C as an open-source independent library, so that it could be easily used in many other programs and other languages than Fortran. Also, with a library one has the freedom to add some intrinsics for a better use of the hardware. If you are willing to contribute, here is the address of this project : https://github.com/scemama/sparse_array
The Hash function
The key is a function that maps the four indices i,j,k,l into a
unique 64-bit integer, and which returns the same value when then
permutation symmetry operations are applied to the indices. As it is a
tiny function, it is written in the module that uses it intensively such
that it can be inlined by the compiler, namely the
Integrals_Bielec/mo_bi_integrals.irp.f
file.
subroutine mo_bielec_integrals_index(i,j,k,l,i1)
implicit none
integer, intent(in) :: i,j,k,l
integer(key_kind), intent(out) :: i1
integer(key_kind) :: p,q,r,s,i2
= min(i,k)
p = max(i,k)
r = p+ishft(r*r-r,-1)
p = min(j,l)
q = max(j,l)
s = q+ishft(s*s-s,-1)
q = min(p,q)
i1 = max(p,q)
i2 = i1+ishft(i2*i2-i2,-1)
i1 end
The Data Structure
Now that it is possible from (ij|kl) to obtain a unique 64-bit
integer and the value of the integral, these can be used to store the
data in a data structure similar to a hash table. In what follows, we
call this data structure a map
.
A map
type is essentially an array of
cache_map
s. The cache_map
type contains mainly
a pointer to an array of values and a pointer to an array of keys, and a
flag to know if the data is sorted by increasing keys.
The cache_map
keys are stored on signed 16-bits
integers, so the value of the key will not exceed 32768. This guarantees
an upper bound on the total number of values of the
cache_map
, and the largest possible size of the keys array
will be 64kiB. Most of the time, it is expected to fit in the L1
cache.
Search
When accessing the map
data structure, the key is split
in two. The 49 leading bits of the 64-bit key are used to extract the
index of the cache map in the array of cache_map
s. The 15
least significant bits are extracted to generate the
cache_map
key:
000110101001001110011010010110100011101000011010 | 010011101001101
<-------------- index of the cache_map ------------> <--- cache_key --->
integer*8 integer*2
A direct access is made to the cache_map
(usually in
RAM, not in a cache). If the cache_map
is not yet sorted,
it is sorted. Then the cache_key
is searched for in the
array of cache_key
s using a binary search. As the maximum
possible size of the array is 32768, the data will be found in less than
16 tries, which are likely not to cross a memory page boundary. When the
key is found, the corresponding value is at the same index in the array
of values. If the key is not found, a value of zero is returned.
Insert
The key is appended to the array of keys and the value is appended to
the array of values. When the cache_map
arrays are
completely filled with data, the size of the array is doubled
(cache_map_reallocate
), the keys are sorted
(cache_map_sort
), and the order is applied to the values
array (dset_order
). When duplicates are found
(cache_map_unique
), their values are added together such
that there is always a unique key for a unique value. Upon addition of
duplicates, the resulting value may be lower than a threshold so in that
case the entry is removed from the cache_map
(cache_map_shrink
).
Thread safety
When the atomic integrals are written, they are not likely to access
simultaneously the same cache_map
. So the
cache_map
s are filled and sorted concurrently with almost
no overhead. When all the data has been added, all the
cache_map
s are sorted concurrently.
During the 4-index transformation, the molecular integrals need to be
frequently updated. Using this data structure, it is possible to modify
the elements in a parallel section each cache_map
has its
own lock which is acquired when the cache_map
is
modified.