Nuclearistes/algorithms.org

25 KiB

Important algorithms for CIPSI

#+LaTeX_CLASS_OPTIONS:[aspectratio=169]

Determinant-driven methods

  • Integral-driven : sequential access to $\mathcal{O}(N^4)$ integrals, indirect access to vectors

    for (i,j,k,l,integral) in all_integrals:
        pairs = find_determinant_pairs(i,j,k,l,ijkl)
        for (d1,d2) in pairs:
            do_work(d1,d2)
  • Determinant-driven : sequential access to vectors, indirect access to integrals

    for d1 in determinants:
      for d2 in determinants:
        i,j,k,l = get_excitation(d1,d2)
        do_work(d1,d2)
  • Integral-driven: $\mathcal{O}(\nmo^4)$
  • Determinant-driven: $\mathcal{O}(\Ndet^2)$
  • Efficient CIPSI: How to be efficient within a determinant-driven approach

Data structures

Slater-Condon's rules

\begin{eqnarray*} \langle I | {\cal O}_1 | I \rangle & = & \sum_{i \in D} \langle \varphi_i | {\cal O}_1 | \varphi_i \rangle \\ \langle I | {\cal O}_2 | I \rangle & = & \frac{1}{2} \sum_{(i,j) \in D} \langle \varphi_i \varphi_j | {\cal O}_2 | \varphi_i \varphi_j \rangle \langle \varphi_i \varphi_j | {\cal O}_2 | \varphi_j \varphi_i \rangle \\ \langle I | {\cal O}_1 | \hat{T}_i^j I \rangle & = & \langle \varphi_i | {\cal O}_1 | \varphi_j \rangle \\ \langle I | {\cal O}_2 | \hat{T}_i^j I \rangle & = & \sum_{k \in D} \langle \varphi_i \varphi_k | {\cal O}_2 | \varphi_j \varphi_k \rangle - \langle \varphi_i \varphi_k | {\cal O}_2 | \varphi_k \varphi_j \rangle \\ \langle I | {\cal O}_2 | \hat{T}_{ik}^{jl} I \rangle & = & \langle \varphi_i \varphi_k | {\cal O}_2 | \varphi_j \varphi_l \rangle - \langle \varphi_i \varphi_k | {\cal O}_2 | \varphi_l \varphi_j \rangle \end{eqnarray*}

Need for functions : $f(I, J) \rightarrow (i,j,k,l,\phi)$

Double-determinant representation

A Slater determinant can be written as a Waller-Hartree double determinant

\[ |I \rangle = \hat{I}\, |\rangle = -1^p \times \hat{I}_\uparrow \, \hat{I}_\downarrow\, |\rangle = -1^p \times \hat{I}_\uparrow \, |\rangle \otimes\ \hat{I}_\downarrow \, |\rangle \]

Storage:

  • 1 determinant: one integer for $\hat{I}_\uparrow$ and one integer for $\hat{I}_\downarrow$
  • Set the bit to 1 if the orbital is occupied
  • $>64$ orbitals: $N_{\text{int}}$ integers for $\hat{I}_\uparrow$ and for $\hat{I}_\downarrow$

Fast determinant comparisons

Bitwise operations (1 CPU cycle):

  • and, or, xor, shl, shr: logical
  • shl, shr: shift left/right
  • lzcnt, tzcnt : Number of leading/trailing zero bits
  • popcnt : Number of bits set to 1

Example: degree of excitation between $|I\rangle$ and $|J\rangle$:

 integer function degree(det_i, det_j, N_int)
   integer,   intent(in) :: N_int
   integer*8, intent(in) :: det_i(N_int,2), det_j(N_int,2)
   integer               :: two_d, i
   two_d = 0
   do i=1,N_int
      two_d = two_d + popcnt( ieor( det_i(i,1), det_j(i,1) ) ) &
                    + popcnt( ieor( det_i(i,2), det_j(i,2) ) ) 
   end do
   degree = rshift(two_d,1)
 end function degree

Fast determinant comparisons

/scemama/Nuclearistes/src/branch/master/slaterRules.pdf To get the orbital indices: number of leading/trailing zeros gives the positions of the 1's.

Integrals

Constraints

  • Integrals require a fast random access
  • 8-fold permtutation symmetry $\langle ij|kl \rangle = \langle kj|il \rangle = \cdots$
  • Many integrals are zero: need for a sparse data structure

Implementation

  • Hash table
  • $f(i,j,k,l) -> K$ gives the same $K$ for all similar permutations
  • $f(i+1,j,k,l) - f(i,j,k,l)$ is likely to be 1 : locality
  • Array (cache) for $128^4$ frequently used integrals

Integrals

Access Array Hash table
$i,j,k,l$ 9.72 125.79
$i,j,l,k$ 9.72 120.64
$i,k,j,l$ 10.29 144.65
$l,k,j,i$ 88.62 125.79
$l,k,i,j$ 88.62 120.64
Random 170.00 370.00
Time to access integrals (in nanoseconds/integral) with different access patterns. The time to generate random numbers (measured as 67~ns/integral) was not counted in the random access results.

Efficient direct CI

Sorting

Popular misconception

Sorting is not $\mathcal{O}(N \log(N))$ :

(linear in $N$, log in $M$)

.   B_ignoreheading

  • A is an array of $N$ integer values
  • The bitmask is an integer with only one bit set to one (00001000)

    void radix_sort(int* A, size_t N, int bitmask) {
      if (bitmask == 0) return;
      int left[N], right[N];
      int p=0 ; int q=0 ;
      for (int i=0 ; i<N ; i++) {
          if (A[i] & bitmask) { right[q] = A[i]; q++; }
          else                { left [p] = A[i]; p++; } }
      radix_sort(left , p, bitmask >> 1) ;
      radix_sort(right, q, bitmask >> 1) ;
      for (int i=0 ; i<p ; i++) { A[  i] = left [i] ; }
      for (int i=0 ; i<q ; i++) { A[p+i] = right[i] ; }
    }

Double-determinant representation of $\Psi$

\[ \mPsi = \sum_\mi \mci |\mi\rangle = \sum_{\mk=1}^{\Ndetup} \sum_{\mm=1}^{\Ndetdn} C_{\textcolor{darkgreen}{km}} \mDup \mDdn \]

  • If $\mDup$ and $\mDdn$ are represented as $\nmo$ -bit strings, this transformation can be done in $\mathcal{O}(\Ndet \times \nmo)$ (sorting).
  • Searching for same-spin excitations: looping over $\mk$ or $\mm$ : $\mathcal{O}(\Ndetup) \sim \mathcal{O}(\sqrt{\Ndet})$

$\mathcal{H} | \Psi \rangle$

For all $\mi = \mDup \mDdn$ in $\mPsi$:

  • Find indices $p$ of $\uparrow$ singles and $\uparrow \uparrow$ doubles \[ \langle \mi | \mathcal{H} | \Psi \rangle = \sum_J \langle \mi | \mathcal{H}| J \rangle c_J = \sum_p \langle \mDup \mDdn | \mathcal{H} | D_p^\uparrow \mDdn \rangle C_{pm} \]
  • Find indices $q$ of $\downarrow$ singles and $\downarrow \downarrow$ doubles \[ \langle \mi | \mathcal{H} | \Psi \rangle = \sum_J \langle \mi | \mathcal{H}| J \rangle c_J = \sum_q \langle \mDup \mDdn | \mathcal{H} | \mDup D_q^\downarrow \rangle C_{kq} \]
  • Find indices $pq$ of $\uparrow \downarrow$ doubles:

    • Find indices $p$ of $\uparrow$ singles
    • Find indices $q$ of $\downarrow$ singles \\

    \[ \langle \mi | \mathcal{H} | \Psi \rangle = \sum_J \langle \mi | \mathcal{H}| J \rangle c_J = \sum_{pq} \langle \mDup \mDdn | \mathcal{H} | D_p^\uparrow D_q^\downarrow \rangle C_{pq} \]

Stochastic evaluation of the PT2 correction and selection

Epstein-Nesbet Second order correction

Consider a wave function $\mPsi$ expanded on an arbitrary set $\mD$ of $\Ndet$ orthonormal Slater determinants, \[ \mPsi = \sum_{\mi \in \mD} {\mci} | \mi \rangle, \;\;\; \Evar = \frac{ \langle \mPsi | \mH | \mPsi \rangle}{\langle \mPsi | \mPsi \rangle} \]

The Epstein-Nesbet 2nd order correction to the energy is \[ \Ept = \sum_{\ma \in \mA} \frac{ \langle \mPsi | \mH | \ma \rangle \langle \ma | \mH | \mPsi \rangle}{\Evar - \langle \ma | \mH | \ma \rangle } \]

The set $\mA$ contains the Slater determinants

  • that are not in $\mD$
  • for which $d(\mi,\ma) = 1$ or $2$ for at least one pair $(\mi,\ma)$

Formal scaling

\[ \Ept = \sum_{\ma \in \mA} \frac{ \langle \mPsi | \mH | \ma \rangle \langle \ma | \mH | \mPsi \rangle}{\Evar - \langle \ma | \mH | \ma \rangle} = \sum_{\ma \in \mA} \frac{ \left( \sum_{\mi \in \mD} \mci \langle \mi | \mH | \ma \rangle \right)^2 }{\Evar - \langle \ma | \mH | \ma \rangle } \]

  • Size of $\mA$ : size of $(\hat{T}_1 + \hat{T}_2)|\mPsi\rangle$
  • Number of non-zero terms : $d(\mi,\ma) \le 2 \sim \Ndet \times \left[ \textcolor{darkgreen}{\left( \nea \times (\nmo - \nea) \right)^2 }\right]$
  • Expensive

Solutions to make simulations possible

"Non-general'' but conventional solutions:

  • Partition the MO space into different classes (active, virtual, inactive, \emph{etc})
  • Use another zeroth-order Hamiltonian (CAS-PT2, NEV-PT2)

Solutions applicable to any wave function:

  • Truncation of $\mD$ to consider only contributions due to large ${\mci}$
    But: Truncation $\longrightarrow$ bias because $\Ept$ is a sum of same-sign values (negative).
  • Algorithmic improvement
  • Monte Carlo sampling in $\mA$.
    But: Statistical error decreases as $\mathcal{O}\left(1/\sqrt{\Nsamples}\right) \Longrightarrow$ Difficult to get $\textcolor{darkgreen}{10^{-5} a.u}$ precision.
  • Parallelism

:B_ignoreheading:

Central idea

  • Choose an arbitrary ordering of $|\mi \rangle$. Natural choice: \[w_\mi = \frac{\mci ^2}{\langle \mPsi | \mPsi \rangle } \]
  • Make \emph{disjoint} groups $\mA_{\mi}$ of $| \ma \rangle$ originating from the same generator $| \mi \rangle$
  • Each $\mA_\mi$ has its own contribution $\epsi$ to $\Ept$

/scemama/Nuclearistes/src/branch/master/fig1.pdf

Central idea

\begin{eqnarray*} \Ept &=& \sum_{\ma \in \mA} \frac{ \left( \langle \mPsi | \mH | \ma \rangle \right)^2 }{\Evar - \langle \ma | \mH | \ma \rangle } \\ &=& \sum_{\mi \in \mD} \sum_{\ma_\mi \in \mA_\mi} \frac{ \left( \langle \mPsi | \mH | \ma_\mi \rangle \right)^2}{\Evar - \langle \ma_\mi | \mH | \ma_\mi \rangle} \\ &=& \sum_{\mi \in \mD} \epsi \end{eqnarray*}

Contribution per internal determinant

\[ \epsi = \sum_{\ma_\mi \in \mA_\mi} \frac{ \left( \langle \mPsi | \mH | \ma_\mi \rangle \right)^2}{\Evar - \langle \ma_\mi | \mH | \ma_\mi \rangle} \]

From $\mathcal{O}(N_\text{det}^2)$ to $\mathcal{O}(N_\text{det}^{3/2})$

\[ \mPsi = \sum_\mi \mci |\mi\rangle = \sum_{\mk=1}^{\Ndetup} \sum_{\mm=1}^{\Ndetdn} C_{\textcolor{red}{km}} \mDup \mDdn \]

  • Sorting is $\mathcal{O}(\Ndet)$
  • $\langle \mi | \mH | \ma \rangle \langle \ma | \mH | \mj \rangle = 0$ when $d(\mi, \mj) > 4$
  • Loop over $\Ndetup$ determinants (rows of the $C$ matrix)
    Remove all the rows where $d(\mDup, \mDupp)>4$ ($\sim \mathcal{O}(\sqrt{\Ndet})$)
  • Loop over $\Ndetdn$ determinants (columns of the $C$ matrix)
    Remove all the columns where $d(\mDdn, \mDdnn)>4)$
  • The remaining number of determinants is bounded by the size of the CISDTQ space

Finer filtering

\[ \epsi = \sum_{\ma \in \mA_\mi} \frac{ \langle \mPsi'_I | \mH | \ma_\mi \rangle \langle \ma_\mi | \mH | \mPsi'_I \rangle}{\Evar - \langle \ma_\mi | \mH | \ma_\mi \rangle} \]

  • We know that all the $| \ma_\mi \rangle$ are singles and doubles with respect to $| \mi \rangle$
  • $|\mPsi'_I\rangle$ is the projection of $|\mPsi \rangle$ on the subspace of determinants in $\mD$ which are no more than quadruply excited with respect to $| \mi \rangle$
  • For a subset of excitations $ij \rightarrow ab$, $|\mPsi'\rangle$ is filtered further with possible hole/particle constraints

Monte Carlo sampling

\[ \epsi = \sum_{\ma_\mi \in \mA_\mi} \frac{ \left( \langle \mPsi | \mH | \ma_\mi \rangle \right)^2}{\Evar - \langle \ma_\mi | \mH | \ma_\mi \rangle} \]

  1. $\langle \mPsi | \mH | \ma_\mi \rangle = \sum_{\mj \ge \mi} \mcj\, \langle\, \mj | \mH | \ma_\mi \rangle$
  2. $\langle \ma_\mi | \mH | \ma_\mi \rangle$ is always large (otherwise $|\ma_\mi \rangle$ would be better in the \textcolor{red}{variational space}, and PT is questionable)

:B_ignoreheading:

  • $\forall \mi \in \mD : \epsi \le 0$
  • $|\epsi|$ is expected to decrease as $\mci^2$
  • The computational cost decreases with $\mi$

Monte Carlo formulation

\[ \Ept = \sum_{\mi \in \mD} \epsi = \sum_{\mi \in \mD} \mpi \frac{\epsi}{\mpi} = \left\langle \frac{\epsi}{\mpi} \right\rangle_{\mpi} \]

Naive sampling

Uniform sampling: \mpi = \frac{1}{\Ndet}$

/scemama/Nuclearistes/media/branch/master/dist2_noise.png

Improved sampling

Sampling : $\mpi = \mci^2$

/scemama/Nuclearistes/media/branch/master/eici2.png

Lazy evaluation

Only $\Ndet$ contributions $\epsi \longrightarrow$ all $\epsi$ can be stored in memory.

Lazy Evaluation (Wikipedia)

In programming language theory, lazy evaluation, or call-by-need is an evaluation strategy which delays the evaluation of an expression until its value is needed (non-strict evaluation) and which also avoids repeated evaluations (sharing).

:B_ignoreheading:

def lazy_e(i):
  if not e_is_computed[i]:
      e[i] = compute_e(i)
      e_is_computed[i] = true
  return e[i]

Monte Carlo with Lazy Evaluation

\[ \Ept = \sum_{\mi \in \mD} \epsi = \sum_{\mi \in \mD} \mpi \frac{\epsi}{\mpi} = \left\langle \frac{\epsi}{\mpi} \right\rangle_{\mpi} \]

  • Draw a generator determinant $|\mi\,\rangle$ with probability $\mpi$
  • Increment $n_\mi$, the number of evaluations of $\epsi$
  • If $\epsi$ is not already computed, compute it and store its value
  • $\Ept \sim \sum_{\mi \in \mD} \frac{n_\mi}{\Nsamples} \frac{\epsi}{\mpi}$
  • Statistical error : $\mathcal{O}\left(1/\sqrt{\Nsamples}\right)$
  • Lazy evaluation : Exponential acceleration (time to solution)

Monte Carlo with Lazy Evaluation

Monte Carlo with Lazy Evaluation

Monte Carlo with Lazy Evaluation

Monte Carlo with Variance reduction

  • Noise can be smoothed out by averaging
  • Split $\mD$ into $\mM$ \emph{equiprobable} sets : "Comb"

\[ \Ept = \sum_{\mi \in \mD} \epsi = \sum_{\mk=1}^{\mM} \sum_{\mi_\mk \in \mD_\mk} \epsik \]

New Monte Carlo estimator

\[ \Ept = \left \langle \frac{1}{\mM} \sum_{\mk=1}^{\mM} \frac{\epsik}{{\textcolor{red}{p_{I_k}}}} \right \rangle_{\textcolor{red}{(p_{I_1}, \dots, p_{I_M})}} \]

Monte Carlo with Variance reduction

Monte Carlo with Variance reduction

Monte Carlo with Variance reduction

Monte Carlo with Variance reduction

/scemama/Nuclearistes/media/branch/master/dist2.png

Monte Carlo with Variance reduction

Monte Carlo with Variance reduction

Monte Carlo with Variance reduction

Hybrid deterministic/stochastic scheme

  • When all the determinants have been drawn, the \emph{exact} $\Ept$ can be computed
  • $\Longrightarrow$ The result with zero statistical error can be reached in a finite time
  • In typical wave functions, $90\%$ of the norm is on a few determinants
  • Compute the few first contributions $\epsi$, and perform the MC in the rest

\[ \Ept = \sum_{\mi \in \mD_D} \epsi + \left \langle \frac{1}{\mM} \sum_{\mk=1}^{\mM} \frac{\epsik}{\textcolor{red}{p_{I_k}}} \right \rangle_{(\textcolor{red}{p_{I \in \mD_S})}} \]

Hybrid deterministic/stochastic scheme

Make the deterministic part grow during the calculation.

At each MC step

  • Draw a random number
  • Find the determinants selected by the comb (increment $n_\mi$'s)
  • Compute the $\epsi$ which have not been yet computed
  • Compute deterministically the first non-computed determinant
  • If a tooth of the comb is completely filled $\Longrightarrow$ Deterministic

At any time

\[ \Ept(t) = \sum_{\mi \in \mD_D(t)} \epsi + \sum_{\mi \in \mD_S(t)} \frac{1}{\mM(t)} \frac{n_\mi(t)}{\Nsamples(t)} \frac{\epsi}{\mpi} \]

Hybrid deterministic/stochastic scheme

Hybrid deterministic/stochastic scheme

Hybrid deterministic/stochastic scheme

Hybrid deterministic/stochastic scheme

Some timings: Cr$_2$, $2\,10^7$ determinants, 800 cores

Basis $\Ept$ Wall-clock time
cc-pVDZ \textcolor{red}{$-0.068\,3(1)$} 14 min
\textcolor{red}{$-0.068\,36(1)$} 55 min
\textcolor{red}{$-0.068\,361(1)$} 2.4 hr
\textcolor{red}{$-0.068\,360\,604$} 3 hr
cc-pVTZ \textcolor{red}{$-0.124\,4(5)$} 19 min
\textcolor{red}{$-0.124\,7(1)$} 58 min
\textcolor{red}{$-0.124\,63(1)$} 3.5 hr
\textcolor{red}{$-0.124\,642(1)$} 8.7 hr
$\sim$ 15 hr (estimated)
cc-pVQZ \textcolor{red}{$-0.155\,8(5)$} 56 min
\textcolor{red}{$-0.155\,9(1)$} 2.5 hr
\textcolor{red}{$-0.155\,95(1)$} 9.0 hr
\textcolor{red}{$-0.155\,952(1)$} 18.5 hr
$\sim$ 29 hr (estimated)