The many-body Green's function Bethe-Salpeter formalism is steadily asserting itself as a new efficient and accurate tool in the armada of computational methods available to chemists in order to predict neutral electronic excitations in molecular systems.
In particular, the combination of the so-called $GW$ approximation of many-body perturbation theory, giving access to reliable charged excitations and quasiparticle energies, and the Bethe-Salpeter formalism, allowing to catch excitonic effects, has shown to provide accurate excitation energies in many chemical scenarios with a typical error of $0.1$--$0.3$ eV.
In this \textit{Perspective} article, we provide a historical overview of the Bethe-Salpeter formalism, with a particular focus on its condensed-matter roots, and we propose a critical review of its strengths and weaknesses for different chemical situations, such as \titou{bla bla bla}.
Future directions of developments and improvements are also discussed.
In its press release announcing the attribution of the 2013 Nobel prize in Chemistry to Karplus, Levitt and Warshel, the Royal Swedish Academy of Sciences concluded by stating \textit{``Today the computer is just as important a tool for chemists as the test tube.
Simulations are so realistic that they predict the outcome of traditional experiments.''}\cite{Nobel_2003}
Martin Karplus Nobel lecture moderated this bold statement, introducing his presentation by a 1929 quote from Dirac emphasizing that laws of quantum mechanics are \textit{``much too complicated to be soluble''}, urging the scientist to develop \textit{``approximate practical methods''}. This is where the methodology community stands, attempting to develop robust approximations to study with increasing accuracy the properties of ever more complex systems.
The study of neutral electronic excitations in condensed matter systems, from molecules to extended solids, has witnessed the development of a large number of such approximate methods with numerous applications to a large variety of fields, from the prediction of the colour of precious metals and stones for jewellery, to the understanding, \eg, of the basic principles behind photovoltaics, photocatalysis or DNA damage under irradiation in the context of biology.
The present Perspective aims at describing the current status and upcoming challenges for the Bethe-Salpeter equation (BSE) formalism \cite{Salpeter_1951,Strinati_1988} that, while sharing many features with time-dependent density functional theory (TD-DFT), \cite{Runge_1984,Casida_1995,Dreuw_2005} including computational cost scaling with system size, relies on a different formalism, with specific difficulties but also potential solutions to known issues. \cite{Blase_2018}
Originally developed in the framework of nuclear physics, \cite{Salpeter_1951} the BSE formalism has emerged in condensed-matter physics around the 1960's at the semi-empirical tight-binding level with the study of the optical properties of simple semiconductors. \cite{Sham_1966,Strinati_1984,Delerue_2000}
Three decades later, the first \textit{ab initio} implementations, starting with small clusters \cite{Onida_1995,Rohlfing_1998} extended semiconductors and wide-gap insulators, \cite{Albrecht_1997,Benedict_1998,Rohlfing_1999} paved the way to the popularization in the solid-state physics community of the BSE formalism.
Following early applications to periodic polymers and molecules, [REFS] BSE gained much momentum in the quantum chemistry community with, in particular, several benchmarks \cite{Korbel_2014,Jacquemin_2015a,Bruneval_2015,Jacquemin_2015b,Hirose_2015,Jacquemin_2017,Krause_2017,Gui_2018} on large molecular systems performed with the very same parameters (geometries, basis sets, etc) than the available higher-level reference calculations, \cite{Schreiber_2008} such as CC3. \cite{Christiansen_1995}
Such comparisons were grounded in the development of codes replacing the plane-wave paradigm of solid-state physics by well-documented correlation-consistent Gaussian basis sets, \cite{Dunning_1989} together with adequate auxiliary bases when resolution-of-the-identity (RI) techniques were used. [REFS]
An important conclusion drawn from these calculations was that the quality of the BSE excitation energies is strongly correlated to the deviation of the preceding $GW$ HOMO-LUMO gap
with the experimental (photoemission) fundamental gap \cite{Bredas_2014}
\begin{equation}
\EgFun = I - A,
\end{equation}
where $I = E_0^{N-1}- E_0^N$ and $A = E_0^{N+1}- E_0^N$ are the ionization potential and the electron affinity of the $N$-electron system, $E_s^{N}$ is the total energy of the $s$th excited state of the $N$-electron system, and $E_0^N$ corresponds to the $N$-electron ground-state energy.
Standard $G_0W_0$ calculations starting with Kohn-Sham (KS) eigenstates generated with (semi)local functionals yield much larger HOMO-LUMO gaps than the input KS gap
but still too small as compared to the experimental value, \ie,
\begin{equation}
\Eg^{\KS}\ll\Eg^{G_0W_0} < \EgFun.
\end{equation}
Such an underestimation of the fundamental gap leads to a similar underestimation of the optical gap $\EgOpt$, \ie, the lowest optical excitation energy.
\begin{equation}
\EgOpt = E_1^{N} - E_0^{N} = \EgFun + \EB,
\end{equation}
where $\EB$ is the excitonic effect, that is, the stabilization implied by the attraction of the excited electron and its hole left behind.
Because of this, we have $\EgOpt < \EgFun$.
Such a residual HOMO-LUMO gap problem can be significantly improved by adopting exchange-correlation (xc) functionals with a tuned amount of exact exchange \cite{Stein_2009,Kronik_2012} that yield a much improved KS gap as a starting point for the $GW$ correction. \cite{Bruneval_2013,Rangel_2016,Knight_2016}
Alternatively, self-consistent schemes, \cite{Hybertsen_1986,Shishkin_2007,Blase_2011,Faber_2011} where corrected eigenvalues, and possibly orbitals, \cite{Faleev_2004, vanSchilfgaarde_2006, Kotani_2007, Ke_2011} are reinjected in the construction of $G$ and $W$, have been shown to lead to a significant improvement of the quasiparticle energies in the case of molecular systems, with the advantage of significantly removing the dependence on the starting point functional. \cite{Rangel_2016,Kaplan_2016,Caruso_2016}
As a result, BSE excitation singlet energies starting from such improved quasiparticle energies were found to be in much better agreement with reference calculations.
For sake of illustration, an average error of $0.2$ eV was found for the well-known Thiel set \cite{Schreiber_2008} gathering more than hundred representative singlet excitations from a large variety of representative molecules. \cite{Jacquemin_2015a,Bruneval_2015,Gui_2018,Krause_2017}
This is equivalent to the best TD-DFT results obtained by scanning a large variety of global hybrid functionals with various amounts of exact exchange.
A very remarkable success of the BSE formalism lies in the description of the charge-transfer (CT) excitations, a notoriously difficult problem for TD-DFT calculations adopting standard functionals. \cite{Dreuw_2004}
Similar difficulties emerge in solid-state physics for semiconductors where extended Wannier excitons, characterized by weakly overlapping electrons and holes, cause a dramatic deficit of spectral weight at low energy. \cite{Botti_2004}
These difficulties can be ascribed to the lack of long-range electron-hole interaction with local xc functionals.
It can be cured through an exact exchange contribution, a solution that explains in particular the success of optimally-tuned range-separated hybrids for the description of CT excitations. \cite{Stein_2009,Kronik_2012}
The analysis of the screened Coulomb potential matrix elements in the BSE kernel (see below) reveals that such long-range (non-local) electron-hole interactions are properly described, including in environments (solvents, molecular solid, etc) where screening reduces the long-range electron-hole interactions.
The success of the BSE formalism to treat CT excitations has been demonstrated in several studies, \cite{Blase_2011b,Baumeier_2012,Duchemin_2012,Sharifzadeh_2013,Cudazzo_2010,Cudazzo_2013} opening the way to important applications such as doping, photovoltaics or photocatalysis in organic systems.\\
%We now leave the description of successes to discuss difficulties and Perspectives.\\
The BSE formalism \cite{Salpeter_1951,Strinati_1988,Albrecht_1998,Rohlfing_1998,Benedict_1998,vanderHorst_1999} belongs to the family of Green's function many-body perturbation theories (MBPT) \cite{Hedin_1965,Aryasetiawan_1998,Onida_2002,Reining_2017,ReiningBook} together with, for example, the algebraic-diagrammatic construction (ADC) techniques in quantum chemistry. \cite{Dreuw_2015}
While the density stands as the basic variable in DFT, Green's function MBPT takes the one-body Green's function as the central quantity. The (time-ordered) one-body Green's function reads
where $\ket{N}$ is the $N$-electron ground-state wave function.
The operators $\Hat{\psi}(\bx t)$ and $\Hat{\psi}^{\dagger}(\bx't')$ remove and add an electron (respectively) in space-spin-time positions ($\bx t$) and ($\bx't'$), while \titou{$T$ is the time-ordering operator}.
For ($t > t'$) the one-body Green's function provides the amplitude of probability of finding, on top of the ground-state Fermi sea, an electron in ($\bx t$) that was previously introduced in ($\bx't'$), while for ($t < t'$) the propagation of a hole is monitored.\\
A central property of the one-body Green's function is that its spectral representation presents poles at the charged excitation energies of the system
where $\mu$ is the chemical potential, $\eta$ is a positive infinitesimal, $\varepsilon_s = E_s^{N+1}- E_0^{N}$ for $\varepsilon_s > \mu$, and $\varepsilon_s = E_0^{N}- E_s^{N-1}$ for $\varepsilon_s < \mu$.
\titou{The $f_s$'s are the so-called Lehmann amplitudes that reduce to one-body orbitals in the case of single-determinant many-body wave functions [more ??].}
Unlike KS eigenvalues, the Green's function poles $\lbrace\varepsilon_s \rbrace$ are thus the proper \titou{charging} energies of the $N$-electron system, leading to well-defined ionization potentials and electronic affinities. Contrary to standard $\Delta$SCF techniques, the knowledge of $G$ provides the full ionization spectrum, as measured by direct and inverse photoemission, not only that associated with frontier orbitals.
The knowledge of $\Sigma$ allows thus to obtain the true addition/removal energies, namely the entire spectrum of occupied and virtual electronic energy levels, at the cost of solving a generalized one-body eigenvalue equation. [INTRODUCE QUASIPARTICLES and OTHER solutions ??] \\
While the equations reported above are formally exact, it remains to provide an expression for the xc self-energy operator $\Sigma$.
This is where Green's function practical theories differ.
Developed by Lars Hedin in 1965 with application to the interacting homogeneous electron gas, \cite{Hedin_1965} the $GW$ approximation
\cite{Aryasetiawan_1998,Farid_1999,Onida_2002,Ping_2013,Leng_2016,Golze_2019rev} follows the path of linear response by considering the variation of $G$ with respect to an external perturbation.
The obtained equation, when compared with the equation for the time-evolution of $G$ [Eq.~\eqref{eq:Gmotion}], leads to a formal expression for the self-energy
where $W$ is the dynamically-screened Coulomb potential and $\Gamma$ is a ``vertex" function that can be written as $\Gamma(12,3)=\delta(12)\delta(13)+\order{W}$, where $\order{W}$ means a corrective term with leading linear order in terms of $W$.
The neglect of the vertex leads to the so-called $GW$ approximation for $\Sigma$ that can be regarded as the lowest-order perturbation in terms of the screened Coulomb potential $W$ with
where $\chi_0$ is the independent electron susceptibility and $v$ the bare Coulomb potential.
In practice, the input $G$ and $\chi_0$ needed to build $\Sigma$ are taken to be the ``best'' Green's function and susceptibility that can be easily calculated, namely the DFT or Hartree-Fock (HF) ones where the $\lbrace\varepsilon_p, f_p \rbrace$ of Eq.~\eqref{eq:spectralG} are taken to be KS (or HF) eigenstates.
Taking then $(\Sigma^{\GW}-V^{\XC})$ as a correction to the DFT xc potential $V^{\XC}$, a first-order correction to the input KS energies $\lbrace\varepsilon_p^{\KS}\rbrace$ is obtained as follows:
Such an approach, where input KS energies are corrected to yield better electronic energy levels, is labeled as the single-shot, or perturbative, $G_0W_0$ technique. This simple scheme was used in the early $GW$ studies of extended semiconductors and insulators, \cite{Strinati_1980,Hybertsen_1986,Godby_1988,Linden_1988}
surfaces, \cite{Northrup_1991,Blase_1994,Rohlfing_1995} and 2D systems, \cite{Blase_1995} allowing to dramatically reduced the errors associated with KS eigenvalues in conjunction with common local or gradient-corrected approximations to the xc potential.
In particular, the well-known ``band gap" problem, \cite{Perdew_1983,Sham_1983} namely the underestimation of the occupied to unoccupied bands energy gap at the LDA KS level, was dramatically reduced, bringing the agreement with experiment to within a few tenths of an eV [REFS] with an $\mathcal{O}(N^4)$ computational scaling (see below).
Another important feature compared to other perturbative techniques, the $GW$ formalism can tackle finite size and periodic systems, and does not present any divergence in the limit of zero gap (metallic) systems. \cite{Campillo_1999}
However, remaining a low order perturbative approach starting with single-determinant mean-field solutions, it is not intended to explore strongly correlated systems. \cite{Verdozzi_1995}\\
While TD-DFT starts with the variation of the charge density $\rho$ with respect to an external local perturbation, the BSE formalism considers a generalized 4-points susceptibility that monitors the variation of the Green's function with respect to a non-local external perturbation:
that relates the full (interacting) Green's function, $G$, to its Hartree version, $G_0$, obtained by replacing the $\lbrace\varepsilon_s, f_s \rbrace$ by the Hartree eigenvalues and eigenfunctions.
The derivative with respect to $U$ of the Dyson equation yields
where it is traditional to neglect the derivative $(\partial W /\partial G)$ that introduces again higher orders in $W$.
Taking the static limit, \ie, $W(\omega=0)$, for the screened Coulomb potential, that replaces thus the static DFT xc kernel, and expressing Eq.~\eqref{eq:DysonL} in the standard product space $\lbrace\phi_i(\br)\phi_a(\br')\rbrace$ [where $(i,j)$ are occupied orbitals and $(a,b)$ are unoccupied orbitals), leads to an eigenvalue problem similar to the so-called Casida's equations in TD-DFT:
We emphasise that these equations can be solved at exactly the same cost as the standard TD-DFT equations once the quasiparticle energies and screened Coulomb potential $W$ are inherited from preceding $GW$ calculations. This defines the standard (static) BSE@$GW$ scheme that we discuss in this \textit{Perspective}, emphasizing its pros and cons. \\
As emphasized above, the BSE eigenvalue equation in the occupied-to-virtual product space is formally equivalent to that of TD-DFT or TD-HF. Searching iteratively for the lowest eigenstates presents the same $\mathcal{O}(N^4)$ matrix-vector multiplication computational cost within BSE and TD-DFT. Concerning the construction of the BSE Hamiltonian, it is no more expensive than building the TD-DFT one with hybrid functionals, reducing again to $\mathcal{O}(N^4)$ operations with standard resolution-of-identity techniques. At the price of sacrifying the knowledge of the eigenvectors, the BSE absorption spectrum can be known with $\mathcal{O}(N^3)$ operations using iterative techniques. \cite{Ljungberg_2015} With the same restriction on the eigenvectors, a time-propagation approach, similar to that implemented for TD-DFT, \cite{Yabana_1996} combined with stochastic techniques to reduce the cost of building the BSE Hamiltonian matrix elements, allows quadratic scaling with systems size. \cite{Rabani_2015}
In practice, the main bottleneck for standard BSE calculations as compared to TD-DFT resides in the preceding $GW$ calculations that scale as $\mathcal{O}(N^4)$ with system size using PWs or RI techniques, but with a rather large prefactor.
%%Such a cost is mainly associated with calculating the free-electron susceptibility with its entangled summations over occupied and virtual states.
%%While attempts to bypass the $GW$ calculations are emerging, replacing quasiparticle energies by Kohn-Sham eigenvalues matching energy electron addition/removal, \cite{Elliott_2019}
The field of low-scaling $GW$ calculations is however witnessing significant advances. While the sparcity of ..., \cite{Foerster_2011,Wilhelm_2018} efficient real-space-grid and time techniques are blooming, \cite{Rojas_1995,Liu_2016} borrowing in particular the well-known Laplace transform approach used in quantum chemistry. \cite{Haser_1992}
Together with a stochastic sampling of virtual states, this family of techniques allow to set up linear scaling $GW$ calculations. \cite{Vlcek_2017} The separability of occupied and virtual states summations lying at the heart of these approaches are now blooming in quantum chemistry withing the Interpolative Separable Density Fitting (ISDF) approach applied to calculating with cubic scaling the susceptibility needed in RPA and $GW$ calculations. \cite{Lu_2017,Duchemin_2019,Gao_2020} These ongoing developments pave the way to applying the $GW$/BSE formalism to systems comprising several hundred atoms on standard laboratory clusters. \\
The analysis of the singlet-triplet splitting is central to numerous applications such as singlet fission, thermally activated delayed fluorescence (TADF) or
stability analysis of restricted closed-shell solutions at the HF \cite{Seeger_1977} and TD-DFT \cite{Bauernschmitt_1996} levels.
contaminating as well TD-DFT calculations with popular range-separated hybrids (RSH) that generally contains a large fraction of exact exchange in the long-range. \cite{Sears_2011}
While TD-DFT with RSH can benefit from tuning the range-separation parameter as a mean to act on the triplet instability, \cite{Sears_2011} BSE calculations do not offer this pragmatic way-out since the screened Coulomb potential that builds the kernel does not offer any parameter to tune.
a first cure was offered by hybridizing TD-DFT and BSE, namely adding to the BSE kernel the correlation part of the underlying DFT functional used to build the susceptibility and resulting screened Coulomb potential $W$. \cite{Holzer_2018b}
An additional issue concerns the formalism taken to calculate the ground-state energy for a given atomic configuration. Since the BSE formalism presented so far concerns the calculation of the electronic excitations, namely the difference of energy between the GS and the ES, gradients of the ES absolute energy require
This points to another direction for the BSE foramlism, namely the calculation of GS total energy with the correlation energy calculated at the BSE level. Such a task was performed by several groups using in particular the adiabatic connection fluctuation-dissipation theorem (ACFDT), focusing in particular on small dimers. \cite{Olsen_2014,Holzer_2018b,Li_2020,Loos_2020}
As a chemist, it is maybe difficult to understand the concept of dynamical properties, the motivation behind their introduction, and their actual usefulness.
Here, we will try to give a pedagogical example showing the importance of dynamical quantities and their main purposes. \cite{Romaniello_2009,Sangalli_2011,Zhang_2013,ReiningBook}
where $\omega$ is one of the neutral excitation energies of the system associated with the transition vector $\bx$.
If we assume that the operator $\bA$ has a matrix representation of size $K \times K$, this \textit{linear} set of equations yields $K$ excitation energies.
However, in practice, $K$ might be very large, and it might therefore be practically useful to recast this system as two smaller coupled systems, such that
where the blocks $\bA_1$ and $\bA_2$, of sizes $K_1\times K_1$ and $K_2\times K_2$ (with $K_1+ K_2= K$), can be associated with, for example, the single and double excitations of the system.
Note that this \textit{exact} decomposition does not alter, in any case, the values of the excitation energies, not their eigenvectors.
Solving separately each row of the system \eqref{eq:lin_sys_split} yields
\begin{subequations}
\begin{gather}
\label{eq:row1}
\bA_1 \bx_1 + \tr{\bb}\bx_2 = \omega\bx_1,
\\
\label{eq:row2}
\bx_2 = (\omega\bI - \bA_2)^{-1}\bb\bx_1.
\end{gather}
\end{subequations}
Substituting Eq.~\eqref{eq:row2} into Eq.~\eqref{eq:row1} yields the following effective \textit{non-linear}, frequency-dependent operator
which has, by construction, exactly the same solutions than the linear system \eqref{eq:lin_sys} but a smaller dimension.
For example, an operator $\Tilde{\bA}_1(\omega)$ built in the basis of single excitations can potentially provide excitation energies for double excitations thanks to its frequency-dependent nature, the information from the double excitations being ``folded'' into $\Tilde{\bA}_1(\omega)$ via Eq.~\eqref{eq:row2}. \cite{Romaniello_2009,Sangalli_2011,ReiningBook}
How have we been able to reduce the dimension of the problem while keeping the same number of solutions?
To do so, we have transformed a linear operator $\bA$ into a non-linear operator $\Tilde{\bA}_1(\omega)$ by making it frequency dependent.
In other words, we have sacrificed the linearity of the system in order to obtain a new, non-linear systems of equations of smaller dimension.
This procedure converting degrees of freedom into frequency or energy dependence is very general and can be applied in various contexts. \cite{Gatti_2007,Garniron_2018}
Thanks to its non-linearity, Eq.~\eqref{eq:non_lin_sys} can produce more solutions than its actual dimension.
However, because there is no free lunch, this non-linear system is obviously harder to solve than its corresponding linear analogue given by Eq.~\eqref{eq:lin_sys}.
Nonetheless, approximations can be now applied to Eq.~\eqref{eq:non_lin_sys} in order to solve it efficiently.
One of these approximations is the so-called \textit{static} approximation, which corresponds to fix the frequency to a particular value.
For example, as commonly done within the Bethe-Salpeter formalism, $\Tilde{\bA}_1(\omega)=\Tilde{\bA}_1\equiv\Tilde{\bA}_1(\omega=0)$.
In such a way, the operator $\Tilde{\bA}_1$ is made linear again by removing its frequency-dependent nature.
This approximation comes with a heavy price as the number of solutions provided by the system of equations \eqref{eq:non_lin_sys} has now been reduced from $K$ to $K_1$.
Coming back to our example, in the static approximation, the operator $\Tilde{\bA}_1$ built in the basis of single excitations cannot provide double excitations anymore, and the only $K_1$ excitation energies are associated with single excitations.