Final changes

2024-06-25 12:35:28 +02:00 · 2024-06-25 12:35:28 +02:00 · acc4f921dc
commit acc4f921dc
parent 80098d38fe
1 changed files with 3 additions and 3 deletions
--- a/Manuscript/stochastic_triples.tex
+++ b/Manuscript/stochastic_triples.tex
@ -210,7 +210,7 @@ To reduce the fluctuations of the statistical estimator, we apply importance sam
 P^{abc} = \frac{1}{\mathcal{N}} \frac{1}{\max \left(\epsilon_{\min}, \epsilon_a + \epsilon_b + \epsilon_c \right)}
 \end{equation}
 where $\mathcal{N}$ normalizes the sum such that $\sum_{abc} P^{abc} = 1$, and $\epsilon_{\min}$ is an arbitrary minimal denominator to ensure that $P^{abc}$ does not diverge. In our calculations, we have set $\epsilon_{\min}$ to 0.2~a.u.
-\anthony{The algorithm is not very sensitive to the value of $\epsilon_{\min}$ as long as it is taken within reasonable bounds, in the range of the level-shift parameter in SCF calculations.}
+\anthony{The algorithm is not very sensitive to the value of $\epsilon_{\min}$ as long as it is taken within reasonable bounds (in the range of the level-shift parameter of SCF calculations).}
 The perturbative contribution is then evaluated as an average over $M$ samples
 \begin{equation}
 E_{(T)} = \left\langle \frac{E^{abc}}{P^{abc}} \right \rangle_{P^{abc}} =
@ -442,7 +442,7 @@ Xeon Gold 6130     & $2 \times 16$ & $2\times 22$   &  8  &  2.1  &    256.0
 ARM Q80            &          $80$ &          $32$  &  2  &  2.8  &    204.8         &     1~792 &   547 \\  % 292.492
 \end{tabular}
 \end{ruledtabular}
-\caption{\label{tab:flops} Average performance of the code measured as the number of double precision (DP) floating-point operations per second (Flop/s) on different machines.}
+\caption{\label{tab:flops} \anthony{Characteristics of the different machines, and the measured performance of the code in terms of double precision (DP) floating-point operations per second (Flop/s).}}
 \end{table*}

 Table~\ref{tab:flops} summarizes the performance tests.
@ -483,7 +483,7 @@ Notably, with fewer cores, the bandwidth per core \anthony{and the amount of ava
 For the benzene molecule in the triple-zeta basis set, the arithmetic intensity is 3.33~flops/byte.
 This intensity corresponds to a threshold of approximately 30 cores for the ARM server and 32 cores for the AMD server.
 \anthony{Beyond these thresholds, the heavy demand on memory bandwidth results in a decrease in speedup.
-Beyond 64 cores on the ARM server, we observe a severe performance drop due to the limited size of the L3 cache: each matrix multiplication requires 494~kb to host the three matrices, and with 64 independent threads the L3 cache is already full.}
+Beyond 64 cores on the ARM server, we observe a severe performance drop due to the limited size of the L3 cache: each matrix multiplication requires 494~kb for the three matrices, and with 64 independent threads the L3 cache is already full.}


 %%%