BLAS functions
Table of Contents
-
Basic linear algebra data types and operations are described in this file. The data types are private, so that HPC implementations can use whatever data structures they prefer.
These data types are expected to be used internally in QMCkl. They are not intended to be passed to external codes.
1 Data types
1.1 Vector
Variable | Type | Description |
---|---|---|
size |
int64_t |
Dimension of the vector |
data |
double* |
Elements |
qmckl_vector qmckl_vector_alloc( qmckl_context context, const int64_t size);
Allocates a new vector. If the allocation failed the size is zero.
qmckl_exit_code qmckl_vector_free( qmckl_context context, qmckl_vector* vector);
1.2 Matrix
Variable | Type | Description |
---|---|---|
size |
int64_t[2] |
Dimension of each component |
data |
double* |
Elements |
The dimensions use Fortran ordering: two elements differing by one in the first dimension are consecutive in memory.
qmckl_matrix qmckl_matrix_alloc( qmckl_context context, const int64_t size1, const int64_t size2);
Allocates a new matrix. If the allocation failed the sizes are zero.
qmckl_exit_code qmckl_matrix_free( qmckl_context context, qmckl_matrix* matrix);
1.3 Tensor
Variable | Type | Description |
---|---|---|
order |
int32_t |
Order of the tensor |
size |
int64_t[QMCKL_TENSOR_ORDER_MAX] |
Dimension of each component |
data |
double* |
Elements |
The dimensions use Fortran ordering: two elements differing by one in the first dimension are consecutive in memory.
qmckl_tensor qmckl_tensor_alloc( qmckl_context context, const int32_t order, const int64_t* size);
Allocates memory for a tensor. If the allocation failed, the size is zero.
qmckl_exit_code qmckl_tensor_free (qmckl_context context, qmckl_tensor* tensor);
1.4 Reshaping
Reshaping occurs in-place and the pointer to the data is copied.
1.4.1 Vector -> Matrix
qmckl_matrix qmckl_matrix_of_vector(const qmckl_vector vector, const int64_t size1, const int64_t size2);
Reshapes a vector into a matrix.
1.4.2 Vector -> Tensor
qmckl_tensor qmckl_tensor_of_vector(const qmckl_vector vector, const int32_t order, const int64_t* size);
Reshapes a vector into a tensor.
1.4.3 Matrix -> Vector
qmckl_vector qmckl_vector_of_matrix(const qmckl_matrix matrix);
Reshapes a matrix into a vector.
1.4.4 Matrix -> Tensor
qmckl_tensor qmckl_tensor_of_matrix(const qmckl_matrix matrix, const int32_t order, const int64_t* size);
Reshapes a matrix into a tensor.
1.4.5 Tensor -> Vector
qmckl_vector qmckl_vector_of_tensor(const qmckl_tensor tensor);
Reshapes a tensor into a vector.
1.4.6 Tensor -> Matrix
qmckl_matrix qmckl_matrix_of_tensor(const qmckl_tensor tensor, const int64_t size1, const int64_t size2);
Reshapes a tensor into a vector.
1.5 Access macros
Macros are provided to ease the access to vectors, matrices and tensors. Matrices use column-major ordering, as in Fortran.
double qmckl_vec (qmckl_vector v, int64_t i); // v[i] double qmckl_mat (qmckl_matrix m, int64_t i, int64_t j) // m[j][i] double qmckl_ten3(qmckl_tensor t, int64_t i, int64_t j, int64_t k); // t[k][j][i] double qmckl_ten4(qmckl_tensor t, int64_t i, int64_t j, int64_t k, int64_t l); double qmckl_ten5(qmckl_tensor t, int64_t i, int64_t j, int64_t k, int64_t l, int64_t m); double qmckl_ten6(qmckl_tensor t, int64_t i, int64_t j, int64_t k, int64_t l, int64_t m, int64_t n); ...
For example:
1.6 Set all elements
1.6.1 Vector
qmckl_vector qmckl_vector_set(qmckl_vector vector, double value);
1.6.2 Matrix
qmckl_matrix qmckl_matrix_set(qmckl_matrix matrix, double value);
1.6.3 Tensor
qmckl_tensor qmckl_tensor_set(qmckl_tensor tensor, double value);
1.7 Copy to/from to double*
qmckl_exit_code qmckl_double_of_vector(const qmckl_context context, const qmckl_vector vector, double* const target, const int64_t size_max);
Converts a vector to a double*
.
qmckl_exit_code qmckl_double_of_matrix(const qmckl_context context, const qmckl_matrix matrix, double* const target, const int64_t size_max);
Converts a matrix to a double*
.
qmckl_exit_code qmckl_double_of_tensor(const qmckl_context context, const qmckl_tensor tensor, double* const target, const int64_t size_max);
Converts a tensor to a double*
.
qmckl_exit_code qmckl_vector_of_double(const qmckl_context context, const double* target, const int64_t size_max, qmckl_vector* vector);
Converts a double*
to a vector.
qmckl_exit_code qmckl_matrix_of_double(const qmckl_context context, const double* target, const int64_t size_max, qmckl_matrix* matrix);
Converts a double*
to a matrix.
qmckl_exit_code qmckl_tensor_of_double(const qmckl_context context, const double* target, const int64_t size_max, qmckl_tensor* tensor);
Converts a double*
to a tensor.
1.8 Allocate and copy to double*
double* qmckl_alloc_double_of_vector(const qmckl_context context, const qmckl_vector vector);
double* qmckl_alloc_double_of_matrix(const qmckl_context context, const qmckl_matrix matrix);
double* qmckl_alloc_double_of_tensor(const qmckl_context context, const qmckl_tensor tensor);
2 Matrix operations
2.1 qmckl_dgemm
Matrix multiplication with a BLAS interface:
\[ C_{ij} = \beta C_{ij} + \alpha \sum_{k} A_{ik} \cdot B_{kj} \]
Variable | Type | In/Out | Description |
---|---|---|---|
context |
qmckl_context |
in | Global state |
TransA |
char |
in | 'T' is transposed |
TransB |
char |
in | 'T' is transposed |
m |
int64_t |
in | Number of rows of the input matrix |
n |
int64_t |
in | Number of columns of the input matrix |
k |
int64_t |
in | Number of columns of the input matrix |
alpha |
double |
in | α |
A |
double[][lda] |
in | Array containing the matrix \(A\) |
lda |
int64_t |
in | Leading dimension of array A |
B |
double[][ldb] |
in | Array containing the matrix \(B\) |
ldb |
int64_t |
in | Leading dimension of array B |
beta |
double |
in | β |
C |
double[][ldc] |
out | Array containing the matrix \(C\) |
ldc |
int64_t |
in | Leading dimension of array C |
Requirements:
context
is notQMCKL_NULL_CONTEXT
m > 0
n > 0
k > 0
lda >= m
ldb >= n
ldc >= n
A
is allocated with at least \(m \times k \times 8\) bytesB
is allocated with at least \(k \times n \times 8\) bytesC
is allocated with at least \(m \times n \times 8\) bytes
qmckl_exit_code qmckl_dgemm ( const qmckl_context context, const char TransA, const char TransB, const int64_t m, const int64_t n, const int64_t k, const double alpha, const double* A, const int64_t lda, const double* B, const int64_t ldb, const double beta, double* const C, const int64_t ldc );
2.2 qmckl_dgemm_safe
"Size-safe" proxy function with the same functionality as qmckl_dgemm
but with 3 additional arguments. These arguments size_max_M
(where M
is a matix)
are required primarily for the Python API, where compatibility with
NumPy arrays implies that sizes of the input and output arrays are provided.
Variable | Type | In/Out | Description |
---|---|---|---|
context |
qmckl_context |
in | Global state |
TransA |
char |
in | 'T' is transposed |
TransB |
char |
in | 'T' is transposed |
m |
int64_t |
in | Number of rows of the input matrix |
n |
int64_t |
in | Number of columns of the input matrix |
k |
int64_t |
in | Number of columns of the input matrix |
alpha |
double |
in | α |
A |
double[][lda] |
in | Array containing the matrix \(A\) |
size_max_A |
int64_t |
in | Size of the matrix \(A\) |
lda |
int64_t |
in | Leading dimension of array A |
B |
double[][ldb] |
in | Array containing the matrix \(B\) |
size_max_B |
int64_t |
in | Size of the matrix \(B\) |
ldb |
int64_t |
in | Leading dimension of array B |
beta |
double |
in | β |
C |
double[][ldc] |
out | Array containing the matrix \(C\) |
size_max_C |
int64_t |
in | Size of the matrix \(C\) |
ldc |
int64_t |
in | Leading dimension of array C |
Requirements:
context
is notQMCKL_NULL_CONTEXT
m > 0
n > 0
k > 0
lda >= m
ldb >= n
ldc >= n
A
is allocated with at least \(m \times k \times 8\) bytesB
is allocated with at least \(k \times n \times 8\) bytesC
is allocated with at least \(m \times n \times 8\) bytessize_max_A >= m * k
size_max_B >= k * n
size_max_C >= m * n
qmckl_exit_code qmckl_dgemm_safe ( const qmckl_context context, const char TransA, const char TransB, const int64_t m, const int64_t n, const int64_t k, const double alpha, const double* A, const int64_t size_max_A, const int64_t lda, const double* B, const int64_t size_max_B, const int64_t ldb, const double beta, double* const C, const int64_t size_max_C, const int64_t ldc );
2.3 qmckl_matmul
Matrix multiplication using the qmckl_matrix
data type:
\[ C_{ij} = \beta C_{ij} + \alpha \sum_{k} A_{ik} \cdot B_{kj} \]
Variable | Type | In/Out | Description |
---|---|---|---|
context |
qmckl_context |
in | Global state |
TransA |
char |
in | 'T' is transposed |
TransB |
char |
in | 'T' is transposed |
alpha |
double |
in | α |
A |
qmckl_matrix |
in | Matrix \(A\) |
B |
qmckl_matrix |
in | Matrix \(B\) |
beta |
double |
in | β |
C |
qmckl_matrix |
out | Matrix \(C\) |
qmckl_exit_code qmckl_matmul ( const qmckl_context context, const char TransA, const char TransB, const double alpha, const qmckl_matrix A, const qmckl_matrix B, const double beta, qmckl_matrix* const C );
2.4 qmckl_adjugate
Given a matrix \(\mathbf{A}\), the adjugate matrix \(\text{adj}(\mathbf{A})\) is the transpose of the cofactors matrix of \(\mathbf{A}\).
\[ \mathbf{B} = \text{adj}(\mathbf{A}) = \text{det}(\mathbf{A}) \, \mathbf{A}^{-1} \]
See also: https://en.wikipedia.org/wiki/Adjugate_matrix
Variable | Type | In/Out | Description |
---|---|---|---|
context |
qmckl_context |
in | Global state |
n |
int64_t |
in | Number of rows and columns of the input matrix |
A |
double[][lda] |
in | Array containing the \(n \times n\) matrix \(A\) |
lda |
int64_t |
in | Leading dimension of array A |
B |
double[][ldb] |
out | Adjugate of \(A\) |
ldb |
int64_t |
in | Leading dimension of array B |
det_l |
double |
inout | determinant of \(A\) |
Requirements:
context
is notQMCKL_NULL_CONTEXT
n > 0
lda >= m
A
is allocated with at least \(m \times m \times 8\) bytesldb >= m
B
is allocated with at least \(m \times m \times 8\) bytes
qmckl_exit_code qmckl_adjugate ( const qmckl_context context, const int64_t n, const double* A, const int64_t lda, double* const B, const int64_t ldb, double* det_l );
For small matrices (≤ 5× 5), we use specialized routines for performance motivations. For larger sizes, we rely on the LAPACK library.
subroutine adjugate_general(context, na, A, LDA, B, LDB, det_l) use qmckl implicit none integer(qmckl_context), intent(in) :: context integer*8, intent(in) :: LDA integer*8, intent(in) :: LDB integer*8, intent(in) :: na double precision, intent(in) :: A (LDA,na) double precision, intent(out) :: B (LDB,na) double precision, intent(inout) :: det_l double precision :: work(LDA*max(na,64)) integer :: inf integer :: ipiv(LDA) integer :: lwork integer(8) :: i, j
We first copy the array A
into array B
.
B(1:na,1:na) = A(1:na,1:na)
Then, we compute the LU factorization \(LU=A\)
using the dgetrf
routine.
call dgetrf(na, na, B, LDB, ipiv, inf )
By convention, the determinant of \(L\) is equal to one, so the determinant of \(A\) is equal to the determinant of \(U\), which is simply computed as the product of its diagonal elements.
det_l = 1.d0 j=0_8 do i=1,na j = j+min(abs(ipiv(i)-i),1) det_l = det_l*B(i,i) enddo
As dgetrf
returns \(PLU=A\) where \(P\) is a permutation matrix, the
sign of the determinant is computed as \(-1^m\) where \(m\) is the
number of permutations.
if (iand(j,1_8) /= 0_8) then det_l = -det_l endif
Then, the inverse of \(A\) is computed using dgetri
:
lwork = SIZE(work) call dgetri(na, B, LDB, ipiv, work, lwork, inf )
and the adjugate matrix is computed as the product of the determinant with the inverse:
B(:,:) = B(:,:)*det_l end subroutine adjugate_general
2.5 qmckl_adjugate_safe
"Size-safe" proxy function with the same functionality as qmckl_adjugate
but with 2 additional arguments. These arguments size_max_M
(where M
is a matix)
are required primarily for the Python API, where compatibility with
NumPy arrays implies that sizes of the input and output arrays are provided.
Variable | Type | In/Out | Description |
---|---|---|---|
context |
qmckl_context |
in | Global state |
n |
int64_t |
in | Number of rows and columns of the input matrix |
A |
double[][lda] |
in | Array containing the \(n \times n\) matrix \(A\) |
size_max_A |
int64_t |
in | Size of the matrix \(A\) |
lda |
int64_t |
in | Leading dimension of array A |
B |
double[][ldb] |
out | Adjugate of \(A\) |
size_max_B |
int64_t |
in | Size of the matrix \(B\) |
ldb |
int64_t |
in | Leading dimension of array B |
det_l |
double |
inout | determinant of \(A\) |
Requirements:
context
is notQMCKL_NULL_CONTEXT
n > 0
lda >= m
A
is allocated with at least \(m \times m \times 8\) bytesldb >= m
B
is allocated with at least \(m \times m \times 8\) bytessize_max_A >= m * m
size_max_B >= m * m
qmckl_exit_code qmckl_adjugate_safe ( const qmckl_context context, const int64_t n, const double* A, const int64_t size_max_A, const int64_t lda, double* const B, const int64_t size_max_B, const int64_t ldb, double* det_l );
For small matrices (≤ 5× 5), we use specialized routines for performance motivations. For larger sizes, we rely on the LAPACK library.
2.5.1 C interface
2.6 qmckl_transpose
Transposes a matrix: \(A^\dagger_{ji} = A_{ij}\).
Variable | Type | In/Out | Description |
---|---|---|---|
context |
qmckl_context |
in | Global state |
A |
qmckl_matrix |
in | Input matrix |
At |
qmckl_matrix |
out | Transposed matrix |
qmckl_exit_code qmckl_transpose (qmckl_context context, const qmckl_matrix A, qmckl_matrix At );
3 Utilities
Trick to make MKL efficient on AMD
int mkl_serv_intel_cpu_true() { return 1; }