10
0
mirror of https://github.com/QuantumPackage/qp2.git synced 2024-06-26 15:12:19 +02:00
QuantumPackage/src/utils_trust_region/org/trust_region_step.org

760 lines
26 KiB
Org Mode

* Trust region
*Compute the next step with the trust region algorithm*
The Newton method is an iterative method to find a minimum of a given
function. It uses a Taylor series truncated at the second order of the
targeted function and gives its minimizer. The minimizer is taken as
the new position and the same thing is done. And by doing so
iteratively the method find a minimum, a local or global one depending
of the starting point and the convexity/nonconvexity of the targeted
function.
The goal of the trust region is to constrain the step size of the
Newton method in a certain area around the actual position, where the
Taylor series is a good approximation of the targeted function. This
area is called the "trust region".
In addition, in function of the agreement between the Taylor
development of the energy and the real energy, the size of the trust
region will be updated at each iteration. By doing so, the step sizes
are not too larges. In addition, since we add a criterion to cancel the
step if the energy increases (more precisely if rho < 0.1), so it's
impossible to diverge. \newline
References: \newline
Nocedal & Wright, Numerical Optimization, chapter 4 (1999), \newline
https://link.springer.com/book/10.1007/978-0-387-40065-5, \newline
ISBN: 978-0-387-40065-5 \newline
By using the first and the second derivatives, the Newton method gives
a step:
\begin{align*}
\textbf{x}_{(k+1)}^{\text{Newton}} = - \textbf{H}_{(k)}^{-1} \cdot
\textbf{g}_{(k)}
\end{align*}
which leads to the minimizer of the Taylor series.
!!! Warning: the Newton method gives the minimizer if and only if
$\textbf{H}$ is positive definite, else it leads to a saddle point !!!
But we want a step $\textbf{x}_{(k+1)}$ with a constraint on its (euclidian) norm:
\begin{align*}
||\textbf{x}_{(k+1)}|| \leq \Delta_{(k+1)}
\end{align*}
which is equivalent to
\begin{align*}
\textbf{x}_{(k+1)}^T \cdot \textbf{x}_{(k+1)} \leq \Delta_{(k+1)}^2
\end{align*}
with: \newline
$\textbf{x}_{(k+1)}$ is the step for the k+1-th iteration (vector of
size n) \newline
$\textbf{H}_{(k)}$ is the hessian at the k-th iteration (n by n
matrix) \newline
$\textbf{g}_{(k)}$ is the gradient at the k-th iteration (vector of
size n) \newline
$\Delta_{(k+1)}$ is the trust radius for the (k+1)-th iteration
\newline
Thus we want to constrain the step size $\textbf{x}_{(k+1)}$ into a
hypersphere of radius $\Delta_{(k+1)}$.\newline
So, if $||\textbf{x}_{(k+1)}^{\text{Newton}}|| \leq \Delta_{(k)}$ and
$\textbf{H}$ is positive definite, the
solution is the step given by the Newton method
$\textbf{x}_{(k+1)} = \textbf{x}_{(k+1)}^{\text{Newton}}$.
Else we have to constrain the step size. For simplicity we will remove
the index $_{(k)}$ and $_{(k+1)}$. To restict the step size, we have
to put a constraint on $\textbf{x}$ with a Lagrange multiplier.
Starting from the Taylor series of a function E (here, the energy)
truncated at the 2nd order, we have:
\begin{align*}
E(\textbf{x}) = E +\textbf{g}^T \cdot \textbf{x} + \frac{1}{2}
\cdot \textbf{x}^T \cdot \textbf{H} \cdot \textbf{x} +
\mathcal{O}(\textbf{x}^2)
\end{align*}
With the constraint on the norm of $\textbf{x}$ we can write the
Lagrangian
\begin{align*}
\mathcal{L}(\textbf{x},\lambda) = E + \textbf{g}^T \cdot \textbf{x} + \frac{1}{2} \cdot \textbf{x}^T \cdot \textbf{H} \cdot \textbf{x}
+ \frac{1}{2} \lambda (\textbf{x}^T \cdot \textbf{x} - \Delta^2)
\end{align*}
Where: \newline
$\lambda$ is the Lagrange multiplier \newline
$E$ is the energy at the k-th iteration $\Leftrightarrow
E(\textbf{x} = \textbf{0})$ \newline
To solve this equation, we search a stationary point where the first
derivative of $\mathcal{L}$ with respect to $\textbf{x}$ becomes 0, i.e.
\begin{align*}
\frac{\partial \mathcal{L}(\textbf{x},\lambda)}{\partial \textbf{x}}=0
\end{align*}
The derivative is:
\begin{align*}
\frac{\partial \mathcal{L}(\textbf{x},\lambda)}{\partial \textbf{x}}
= \textbf{g} + \textbf{H} \cdot \textbf{x} + \lambda \cdot \textbf{x}
\end{align*}
So, we search $\textbf{x}$ such as:
\begin{align*}
\frac{\partial \mathcal{L}(\textbf{x},\lambda)}{\partial \textbf{x}}
= \textbf{g} + \textbf{H} \cdot \textbf{x} + \lambda \cdot \textbf{x} = 0
\end{align*}
We can rewrite that as:
\begin{align*}
\textbf{g} + \textbf{H} \cdot \textbf{x} + \lambda \cdot \textbf{x} = \textbf{g} + (\textbf{H} +\textbf{I} \lambda) \cdot \textbf{x} = 0
\end{align*}
with $\textbf{I}$ is the identity matrix.
By doing so, the solution is:
\begin{align*}
(\textbf{H} +\textbf{I} \lambda) \cdot \textbf{x}= -\textbf{g}
\end{align*}
\begin{align*}
\textbf{x}= - (\textbf{H} + \textbf{I} \lambda)^{-1} \cdot \textbf{g}
\end{align*}
with $\textbf{x}^T \textbf{x} = \Delta^2$.
We have to solve this previous equation to find this $\textbf{x}$ in the
trust region, i.e. $||\textbf{x}|| = \Delta$. Now, this problem is
just a one dimension problem because we can express $\textbf{x}$ as a
function of $\lambda$:
\begin{align*}
\textbf{x}(\lambda) = - (\textbf{H} + \textbf{I} \lambda)^{-1} \cdot \textbf{g}
\end{align*}
We start from the fact that the hessian is diagonalizable. So we have:
\begin{align*}
\textbf{H} = \textbf{W} \cdot \textbf{h} \cdot \textbf{W}^T
\end{align*}
with: \newline
$\textbf{H}$, the hessian matrix \newline
$\textbf{W}$, the matrix containing the eigenvectors \newline
$\textbf{w}_i$, the i-th eigenvector, i.e. i-th column of $\textbf{W}$ \newline
$\textbf{h}$, the matrix containing the eigenvalues in ascending order \newline
$h_i$, the i-th eigenvalue in ascending order \newline
Now we use the fact that adding a constant on the diagonal just shifts
the eigenvalues:
\begin{align*}
\textbf{H} + \textbf{I} \lambda = \textbf{W} \cdot (\textbf{h} +\textbf{I} \lambda) \cdot \textbf{W}^T
\end{align*}
By doing so we can express $\textbf{x}$ as a function of $\lambda$
\begin{align*}
\textbf{x}(\lambda) = - \sum_{i=1}^n \frac{\textbf{w}_i^T \cdot
\textbf{g}}{h_i + \lambda} \cdot \textbf{w}_i
\end{align*}
with $\lambda \neq - h_i$.
An interesting thing in our case is the norm of $\textbf{x}$,
because we want $||\textbf{x}|| = \Delta$. Due to the orthogonality of
the eigenvectors $\left\{\textbf{w} \right\} _{i=1}^n$ we have:
\begin{align*}
||\textbf{x}(\lambda)||^2 = \sum_{i=1}^n \frac{(\textbf{w}_i^T \cdot
\textbf{g})^2}{(h_i + \lambda)^2}
\end{align*}
So the $||\textbf{x}(\lambda)||^2$ is just a function of $\lambda$.
And if we study the properties of this function we see that:
\begin{align*}
\lim_{\lambda\to\infty} ||\textbf{x}(\lambda)|| = 0
\end{align*}
and if $\textbf{w}_i^T \cdot \textbf{g} \neq 0$:
\begin{align*}
\lim_{\lambda\to -h_i} ||\textbf{x}(\lambda)|| = + \infty
\end{align*}
From these limits and knowing that $h_1$ is the lowest eigenvalue, we
can conclude that $||\textbf{x}(\lambda)||$ is a continuous and
strictly decreasing function on the interval $\lambda \in
(-h_1;\infty)$. Thus, there is one $\lambda$ in this interval which
gives $||\textbf{x}(\lambda)|| = \Delta$, consequently there is one
solution.
Since $\textbf{x} = - (\textbf{H} + \lambda \textbf{I})^{-1} \cdot
\textbf{g}$ and we want to reduce the norm of $\textbf{x}$, clearly,
$\lambda > 0$ ($\lambda = 0$ is the unconstraint solution). But the
Newton method is only defined for a positive definite hessian matrix,
so $(\textbf{H} + \textbf{I} \lambda)$ must be positive
definite. Consequently, in the case where $\textbf{H}$ is not positive
definite, to ensure the positive definiteness, $\lambda$ must be
greater than $- h_1$.
\begin{align*}
\lambda > 0 \quad \text{and} \quad \lambda \geq - h_1
\end{align*}
From that there are five cases:
- if $\textbf{H}$ is positive definite, $-h_1 < 0$, $\lambda \in (0,\infty)$
- if $\textbf{H}$ is not positive definite and $\textbf{w}_1^T \cdot
\textbf{g} \neq 0$, $(\textbf{H} + \textbf{I}
\lambda)$
must be positve definite, $-h_1 > 0$, $\lambda \in (-h_1, \infty)$
- if $\textbf{H}$ is not positive definite , $\textbf{w}_1^T \cdot
\textbf{g} = 0$ and $||\textbf{x}(-h_1)|| > \Delta$ by removing
$j=1$ in the sum, $(\textbf{H} + \textbf{I} \lambda)$ must be
positive definite, $-h_1 > 0$, $\lambda \in (-h_1, \infty$)
- if $\textbf{H}$ is not positive definite , $\textbf{w}_1^T \cdot
\textbf{g} = 0$ and $||\textbf{x}(-h_1)|| \leq \Delta$ by removing
$j=1$ in the sum, $(\textbf{H} + \textbf{I} \lambda)$ must be
positive definite, $-h_1 > 0$, $\lambda = -h_1$). This case is
similar to the case where $\textbf{H}$ and $||\textbf{x}(\lambda =
0)|| \leq \Delta$
but we can also add to $\textbf{x}$, the first eigenvector $\textbf{W}_1$
time a constant to ensure the condition $||\textbf{x}(\lambda =
-h_1)|| = \Delta$ and escape from the saddle point
Thus to find the solution, we can write:
\begin{align*}
||\textbf{x}(\lambda)|| = \Delta
\end{align*}
\begin{align*}
||\textbf{x}(\lambda)|| - \Delta = 0
\end{align*}
Taking the square of this equation
\begin{align*}
(||\textbf{x}(\lambda)|| - \Delta)^2 = 0
\end{align*}
we have a function with one minimum for the optimal $\lambda$.
Since we have the formula of $||\textbf{x}(\lambda)||^2$, we solve
\begin{align*}
(||\textbf{x}(\lambda)||^2 - \Delta^2)^2 = 0
\end{align*}
But in practice, it is more effective to solve:
\begin{align*}
(\frac{1}{||\textbf{x}(\lambda)||^2} - \frac{1}{\Delta^2})^2 = 0
\end{align*}
To do that, we just use the Newton method with "trust_newton" using
first and second derivative of $(||\textbf{x}(\lambda)||^2 -
\Delta^2)^2$ with respect to $\textbf{x}$.
This will give the optimal $\lambda$ to compute the
solution $\textbf{x}$ with the formula seen previously:
\begin{align*}
\textbf{x}(\lambda) = - \sum_{i=1}^n \frac{\textbf{w}_i^T \cdot
\textbf{g}}{h_i + \lambda} \cdot \textbf{w}_i
\end{align*}
The solution $\textbf{x}(\lambda)$ with the optimal $\lambda$ is our
step to go from the (k)-th to the (k+1)-th iteration, is noted $\textbf{x}^*$.
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
#+END_SRC
** Evolution of the trust region
We initialize the trust region at the first iteration using a radius
\begin{align*}
\Delta = ||\textbf{x}(\lambda=0)||
\end{align*}
And for the next iteration the trust region will evolves depending of
the agreement of the energy prediction based on the Taylor series
truncated at the 2nd order and the real energy. If the Taylor series
truncated at the 2nd order represents correctly the energy landscape
the trust region will be extent else it will be reduced. In order to
mesure this agreement we use the ratio rho cf. "rho_model" and
"trust_e_model". From that we use the following values:
- if $\rho \geq 0.75$, then $\Delta = 2 \Delta$,
- if $0.5 \geq \rho < 0.75$, then $\Delta = \Delta$,
- if $0.25 \geq \rho < 0.5$, then $\Delta = 0.5 \Delta$,
- if $\rho < 0.25$, then $\Delta = 0.25 \Delta$.
In addition, if $\rho < 0.1$ the iteration is cancelled, so it
restarts with a smaller trust region until the energy decreases.
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
#+END_SRC
** Summary
To summarize, knowing the hessian (eigenvectors and eigenvalues), the
gradient and the radius of the trust region we can compute the norm of
the Newton step
\begin{align*}
||\textbf{x}(\lambda = 0)||^2 = ||- \textbf{H}^{-1} \cdot \textbf{g}||^2 = \sum_{i=1}^n
\frac{(\textbf{w}_i^T \cdot \textbf{g})^2}{(h_i + \lambda)^2}, \quad h_i \neq 0
\end{align*}
- if $h_1 \geq 0$, $||\textbf{x}(\lambda = 0)|| \leq \Delta$ and
$\textbf{x}(\lambda=0)$ is in the trust region and it is not
necessary to put a constraint on $\textbf{x}$, the solution is the
unconstrained one, $\textbf{x}^* = \textbf{x}(\lambda = 0)$.
- else if $h_1 < 0$, $\textbf{w}_1^T \cdot \textbf{g} = 0$ and
$||\textbf{x}(\lambda = -h_1)|| \leq \Delta$ (by removing $j=1$ in
the sum), the solution is $\textbf{x}^* = \textbf{x}(\lambda =
-h_1)$, similarly to the previous case.
But we can add to $\textbf{x}$, the first eigenvector $\textbf{W}_1$
time a constant to ensure the condition $||\textbf{x}(\lambda =
-h_1)|| = \Delta$ and escape from the saddle point
- else if $h_1 < 0$ and $\textbf{w}_1^T \cdot \textbf{g} \neq 0$ we
have to search $\lambda \in (-h_1, \infty)$ such as
$\textbf{x}(\lambda) = \Delta$ by solving with the Newton method
\begin{align*}
(||\textbf{x}(\lambda)||^2 - \Delta^2)^2 = 0
\end{align*}
or
\begin{align*}
(\frac{1}{||\textbf{x}(\lambda)||^2} - \frac{1}{\Delta^2})^2 = 0
\end{align*}
which is numerically more stable. And finally compute
\begin{align*}
\textbf{x}^* = \textbf{x}(\lambda) = - \sum_{i=1}^n \frac{\textbf{w}_i^T \cdot
\textbf{g}}{h_i + \lambda} \cdot \textbf{w}_i
\end{align*}
- else if $h_1 \geq 0$ and $||\textbf{x}(\lambda = 0)|| > \Delta$ we
do exactly the same thing that the previous case but we search
$\lambda \in (0, \infty)$
- else if $h_1 < 0$ and $\textbf{w}_1^T \cdot \textbf{g} = 0$ and
$||\textbf{x}(\lambda = -h_1)|| > \Delta$ (by removing $j=1$ in the
sum), again we do exactly the same thing that the previous case
searching $\lambda \in (-h_1, \infty)$.
For the cases where $\textbf{w}_1^T \cdot \textbf{g} = 0$ it is not
necessary in fact to remove the $j = 1$ in the sum since the term
where $h_i - \lambda < 10^{-6}$ are not computed.
After that, we take this vector $\textbf{x}^*$, called "x", and we do
the transformation to an antisymmetric matrix $\textbf{X}$, called
m_x. This matrix $\textbf{X}$ will be used to compute a rotation
matrix $\textbf{R}= \exp(\textbf{X})$ in "rotation_matrix".
NB:
An improvement can be done using a elleptical trust region.
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
#+END_SRC
** Code
Provided:
| mo_num | integer | number of MOs |
Cf. qp_edit in orbital optimization section, for some constants/thresholds
Input:
| m | integer | number of MOs |
| n | integer | m*(m-1)/2 |
| n2 | integer | m*(m-1)/2 or 1 if the hessian is diagonal |
| H(n,n2) | double precision | hessian |
| v_grad(n) | double precision | gradient |
| e_val(n) | double precision | eigenvalues of the hessian |
| W(n, n) | double precision | eigenvectors of the hessian |
| rho | double precision | agreement between the model and the reality, |
| | | represents the quality of the energy prediction |
| nb_iter | integer | number of iteration |
Input/Ouput:
| delta | double precision | radius of the trust region |
Output:
| x(n) | double precision | vector containing the step |
Internal:
| accu | double precision | temporary variable to compute the step |
| lambda | double precision | lagrange multiplier |
| trust_radius2 | double precision | square of the radius of the trust region |
| norm2_x | double precision | norm^2 of the vector x |
| norm2_g | double precision | norm^2 of the vector containing the gradient |
| tmp_wtg(n) | double precision | tmp_wtg(i) = w_i^T . g |
| i, j, k | integer | indexes |
Function:
| dnrm2 | double precision | Blas function computing the norm |
| f_norm_trust_region_omp | double precision | compute the value of norm(x(lambda)^2) |
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
subroutine trust_region_step(n,n2,nb_iter,v_grad,rho,e_val,w,x,delta)
include 'pi.h'
!BEGIN_DOC
! Compuet the step in the trust region
!END_DOC
implicit none
! Variables
! in
integer, intent(in) :: n,n2
double precision, intent(in) :: v_grad(n), rho
integer, intent(inout) :: nb_iter
double precision, intent(in) :: e_val(n), w(n,n2)
! inout
double precision, intent(inout) :: delta
! out
double precision, intent(out) :: x(n)
! Internal
double precision :: accu, lambda, trust_radius2
double precision :: norm2_x, norm2_g
double precision, allocatable :: tmp_wtg(:)
integer :: i,j,k
double precision :: t1,t2,t3
integer :: n_neg_eval
! Functions
double precision :: ddot, dnrm2
double precision :: f_norm_trust_region_omp
print*,''
print*,'=================='
print*,'---Trust_region---'
print*,'=================='
call wall_time(t1)
! Allocation
allocate(tmp_wtg(n))
#+END_SRC
*** Initialization and norm
The norm of the step size will be useful for the trust region
algorithm. We start from a first guess and the radius of the trust
region will evolve during the optimization.
avoid_saddle is actually a test to avoid saddle points
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
! Initialization of the Lagrange multiplier
lambda = 0d0
! List of w^T.g, to avoid the recomputation
tmp_wtg = 0d0
if (n == n2) then
do j = 1, n
do i = 1, n
tmp_wtg(j) = tmp_wtg(j) + w(i,j) * v_grad(i)
enddo
enddo
else
! For the diagonal case
do j = 1, n
k = int(w(j,1)+1d-15)
tmp_wtg(j) = v_grad(k)
enddo
endif
! Replacement of the small tmp_wtg corresponding to a negative eigenvalue
! in the case of avoid_saddle
if (avoid_saddle .and. e_val(1) < - thresh_eig) then
i = 2
! Number of negative eigenvalues
do while (e_val(i) < - thresh_eig)
if (tmp_wtg(i) < thresh_wtg2) then
if (version_avoid_saddle == 1) then
tmp_wtg(i) = 1d0
elseif (version_avoid_saddle == 2) then
tmp_wtg(i) = DABS(e_val(i))
elseif (version_avoid_saddle == 3) then
tmp_wtg(i) = dsqrt(DABS(e_val(i)))
else
tmp_wtg(i) = thresh_wtg2
endif
endif
i = i + 1
enddo
! For the fist one it's a little bit different
if (tmp_wtg(1) < thresh_wtg2) then
tmp_wtg(1) = 0d0
endif
endif
! Norm^2 of x, ||x||^2
norm2_x = f_norm_trust_region_omp(n,e_val,tmp_wtg,0d0)
! We just use this norm for the nb_iter = 0 in order to initialize the trust radius delta
! We don't care about the sign of the eigenvalue we just want the size of the step in a normal Newton-Raphson algorithm
! Anyway if the step is too big it will be reduced
!print*,'||x||^2 :', norm2_x
! Norm^2 of the gradient, ||v_grad||^2
norm2_g = (dnrm2(n,v_grad,1))**2
!print*,'||grad||^2 :', norm2_g
#+END_SRC
*** Trust radius initialization
At the first iteration (nb_iter = 0) we initialize the trust region
with the norm of the step generate by the Newton's method ($\textbf{x}_1 =
(\textbf{H}_0)^{-1} \cdot \textbf{g}_0$,
we compute this norm using f_norm_trust_region_omp as explain just
below)
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
! trust radius
if (nb_iter == 0) then
trust_radius2 = norm2_x
! To avoid infinite loop of cancellation of this first step
! without changing delta
nb_iter = 1
! Compute delta, delta = sqrt(trust_radius)
delta = dsqrt(trust_radius2)
endif
#+END_SRC
*** Modification of the trust radius
In function of rho (which represents the agreement between the model
and the reality, cf. rho_model) the trust region evolves. We update
delta (the radius of the trust region).
To avoid too big trust region we put a maximum size.
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
! Modification of the trust radius in function of rho
if (rho >= 0.75d0) then
delta = 2d0 * delta
elseif (rho >= 0.5d0) then
delta = delta
elseif (rho >= 0.25d0) then
delta = 0.5d0 * delta
else
delta = 0.25d0 * delta
endif
! Maximum size of the trust region
!if (delta > 0.5d0 * n * pi) then
! delta = 0.5d0 * n * pi
! print*,'Delta > delta_max, delta = 0.5d0 * n * pi'
!endif
if (delta > 1d10) then
delta = 1d10
endif
!print*, 'Delta :', delta
#+END_SRC
*** Calculation of the optimal lambda
We search the solution of $(||x||^2 - \Delta^2)^2 = 0$
- If $||\textbf{x}|| > \Delta$ or $h_1 < 0$ we have to add a constant
$\lambda > 0 \quad \text{and} \quad \lambda > -h_1$
- If $||\textbf{x}|| \leq \Delta$ and $h_1 \geq 0$ the solution is the
unconstrained one, $\lambda = 0$
You will find more details at the beginning
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
! By giving delta, we search (||x||^2 - delta^2)^2 = 0
! and not (||x||^2 - delta)^2 = 0
! Research of lambda to solve ||x(lambda)|| = Delta
! Display
!print*, 'e_val(1) = ', e_val(1)
!print*, 'w_1^T.g =', tmp_wtg(1)
! H positive definite
if (e_val(1) > - thresh_eig) then
norm2_x = f_norm_trust_region_omp(n,e_val,tmp_wtg,0d0)
!print*, '||x(0)||=', dsqrt(norm2_x)
!print*, 'Delta=', delta
! H positive definite, ||x(lambda = 0)|| <= Delta
if (dsqrt(norm2_x) <= delta) then
!print*, 'H positive definite, ||x(lambda = 0)|| <= Delta'
!print*, 'lambda = 0, no lambda optimization'
lambda = 0d0
! H positive definite, ||x(lambda = 0)|| > Delta
else
! Constraint solution
!print*, 'H positive definite, ||x(lambda = 0)|| > Delta'
!print*,'Computation of the optimal lambda...'
call trust_region_optimal_lambda(n,e_val,tmp_wtg,delta,lambda)
endif
! H indefinite
else
if (DABS(tmp_wtg(1)) < thresh_wtg) then
norm2_x = f_norm_trust_region_omp(n,e_val,tmp_wtg, - e_val(1))
!print*, 'w_1^T.g <', thresh_wtg,', ||x(lambda = -e_val(1))|| =', dsqrt(norm2_x)
endif
! H indefinite, w_1^T.g = 0, ||x(lambda = -e_val(1))|| <= Delta
if (dsqrt(norm2_x) <= delta .and. DABS(tmp_wtg(1)) < thresh_wtg) then
! Add e_val(1) in order to have (H - e_val(1) I) positive definite
!print*, 'H indefinite, w_1^T.g = 0, ||x(lambda = -e_val(1))|| <= Delta'
!print*, 'lambda = -e_val(1), no lambda optimization'
lambda = - e_val(1)
! H indefinite, w_1^T.g = 0, ||x(lambda = -e_val(1))|| > Delta
! and
! H indefinite, w_1^T.g =/= 0
else
! Constraint solution/ add lambda
!if (DABS(tmp_wtg(1)) < thresh_wtg) then
! print*, 'H indefinite, w_1^T.g = 0, ||x(lambda = -e_val(1))|| > Delta'
!else
! print*, 'H indefinite, w_1^T.g =/= 0'
!endif
!print*, 'Computation of the optimal lambda...'
call trust_region_optimal_lambda(n,e_val,tmp_wtg,delta,lambda)
endif
endif
! Recomputation of the norm^2 of the step x
norm2_x = f_norm_trust_region_omp(n,e_val,tmp_wtg,lambda)
print*,''
print*,'Summary after the trust region:'
print*,'lambda:', lambda
print*,'||x||:', dsqrt(norm2_x)
print*,'delta:', delta
#+END_SRC
*** Calculation of the step x
x refers to $\textbf{x}^*$
We compute x in function of lambda using its formula :
\begin{align*}
\textbf{x}^* = \textbf{x}(\lambda) = - \sum_{i=1}^n \frac{\textbf{w}_i^T \cdot \textbf{g}}{h_i
+ \lambda} \cdot \textbf{w}_i
\end{align*}
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
! Initialisation
x = 0d0
! Calculation of the step x
if (n == n2) then
! Normal version
if (.not. absolute_eig) then
do i = 1, n
if (DABS(e_val(i)) > thresh_eig .and. DABS(e_val(i)+lambda) > thresh_eig) then
do j = 1, n
x(j) = x(j) - tmp_wtg(i) * W(j,i) / (e_val(i) + lambda)
enddo
endif
enddo
! Version to use the absolute value of the eigenvalues
else
do i = 1, n
if (DABS(e_val(i)) > thresh_eig) then
do j = 1, n
x(j) = x(j) - tmp_wtg(i) * W(j,i) / (DABS(e_val(i)) + lambda)
enddo
endif
enddo
endif
else
! If the hessian is diagonal
! Normal version
if (.not. absolute_eig) then
do i = 1, n
if (DABS(e_val(i)) > thresh_eig .and. DABS(e_val(i)+lambda) > thresh_eig) then
j = int(w(i,1) + 1d-15)
x(j) = - tmp_wtg(i) * 1d0 / (e_val(i) + lambda)
endif
enddo
! Version to use the absolute value of the eigenvalues
else
do i = 1, n
if (DABS(e_val(i)) > thresh_eig) then
j = int(w(i,1) + 1d-15)
x(j) = - tmp_wtg(i) * 1d0 / (DABS(e_val(i)) + lambda)
endif
enddo
endif
endif
double precision :: beta, norm_x
! Test
! If w_1^T.g = 0, the lim of ||x(lambda)|| when lambda tend to -e_val(1)
! is not + infinity. So ||x(lambda=-e_val(1))|| < delta, we add the first
! eigenvectors multiply by a constant to ensure the condition
! ||x(lambda=-e_val(1))|| = delta and escape the saddle point
if (avoid_saddle .and. e_val(1) < - thresh_eig) then
if (tmp_wtg(1) < 1d-15 .and. (1d0 - dsqrt(norm2_x)/delta) > 1d-3 ) then
! norm of x
norm_x = dnrm2(n,x,1)
! Computes the coefficient for the w_1
beta = delta**2 - norm_x**2
! Updates the step x
x = x + W(:,1) * dsqrt(beta)
! Recomputes the norm to check
norm_x = dnrm2(n,x,1)
print*, 'Add w_1 * dsqrt(delta^2 - ||x||^2):'
print*, '||x||', norm_x
endif
endif
#+END_SRC
*** Transformation of x
x is a vector of size n, so it can be write as a m by m
antisymmetric matrix m_x cf. "mat_to_vec_index" and "vec_to_mat_index".
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
! ! Step transformation vector -> matrix
! ! Vector with n element -> mo_num by mo_num matrix
! do j = 1, m
! do i = 1, m
! if (i>j) then
! call mat_to_vec_index(i,j,k)
! m_x(i,j) = x(k)
! else
! m_x(i,j) = 0d0
! endif
! enddo
! enddo
!
! ! Antisymmetrization of the previous matrix
! do j = 1, m
! do i = 1, m
! if (i<j) then
! m_x(i,j) = - m_x(j,i)
! endif
! enddo
! enddo
#+END_SRC
*** Deallocation, end
#+BEGIN_SRC f90 :comments org :tangle trust_region_step.irp.f
deallocate(tmp_wtg)
call wall_time(t2)
t3 = t2 - t1
print*,'Time in trust_region:', t3
print*,'======================'
print*,'---End trust_region---'
print*,'======================'
end
#+END_SRC