From acc4f921dcc0041ab231bbad14efc7b318cfd680 Mon Sep 17 00:00:00 2001 From: Anthony Scemama Date: Tue, 25 Jun 2024 12:35:28 +0200 Subject: [PATCH] Final changes --- Manuscript/stochastic_triples.tex | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Manuscript/stochastic_triples.tex b/Manuscript/stochastic_triples.tex index c9f6fce..ce182e4 100644 --- a/Manuscript/stochastic_triples.tex +++ b/Manuscript/stochastic_triples.tex @@ -210,7 +210,7 @@ To reduce the fluctuations of the statistical estimator, we apply importance sam P^{abc} = \frac{1}{\mathcal{N}} \frac{1}{\max \left(\epsilon_{\min}, \epsilon_a + \epsilon_b + \epsilon_c \right)} \end{equation} where $\mathcal{N}$ normalizes the sum such that $\sum_{abc} P^{abc} = 1$, and $\epsilon_{\min}$ is an arbitrary minimal denominator to ensure that $P^{abc}$ does not diverge. In our calculations, we have set $\epsilon_{\min}$ to 0.2~a.u. -\anthony{The algorithm is not very sensitive to the value of $\epsilon_{\min}$ as long as it is taken within reasonable bounds, in the range of the level-shift parameter in SCF calculations.} +\anthony{The algorithm is not very sensitive to the value of $\epsilon_{\min}$ as long as it is taken within reasonable bounds (in the range of the level-shift parameter of SCF calculations).} The perturbative contribution is then evaluated as an average over $M$ samples \begin{equation} E_{(T)} = \left\langle \frac{E^{abc}}{P^{abc}} \right \rangle_{P^{abc}} = @@ -442,7 +442,7 @@ Xeon Gold 6130 & $2 \times 16$ & $2\times 22$ & 8 & 2.1 & 256.0 ARM Q80 & $80$ & $32$ & 2 & 2.8 & 204.8 & 1~792 & 547 \\ % 292.492 \end{tabular} \end{ruledtabular} -\caption{\label{tab:flops} Average performance of the code measured as the number of double precision (DP) floating-point operations per second (Flop/s) on different machines.} +\caption{\label{tab:flops} \anthony{Characteristics of the different machines, and the measured performance of the code in terms of double precision (DP) floating-point operations per second (Flop/s).}} \end{table*} Table~\ref{tab:flops} summarizes the performance tests. @@ -483,7 +483,7 @@ Notably, with fewer cores, the bandwidth per core \anthony{and the amount of ava For the benzene molecule in the triple-zeta basis set, the arithmetic intensity is 3.33~flops/byte. This intensity corresponds to a threshold of approximately 30 cores for the ARM server and 32 cores for the AMD server. \anthony{Beyond these thresholds, the heavy demand on memory bandwidth results in a decrease in speedup. -Beyond 64 cores on the ARM server, we observe a severe performance drop due to the limited size of the L3 cache: each matrix multiplication requires 494~kb to host the three matrices, and with 64 independent threads the L3 cache is already full.} +Beyond 64 cores on the ARM server, we observe a severe performance drop due to the limited size of the L3 cache: each matrix multiplication requires 494~kb for the three matrices, and with 64 independent threads the L3 cache is already full.} %%%