UP | HOME

Front end API

Table of Contents

1 Coding conventions

  • integer types will be defined using types given in stdint.h
  • pointers are always initialized to NULL
  • when memory is freed, the pointer is set to NULL
  • assert.h should be used extensively
  • variable names are in lower case
  • #define constants are in upper case
  • structs are suffixed by _s
  • types are suffixed by _t
  • API calls return trexio_exit_code (except for trexio_open function)

1.1 Memory allocation

Memory allocation of structures can be facilitated by using the following macro, which ensures that the size of the allocated object is the same as the size of the data type pointed by the pointer.

#define MALLOC(T) (T*) malloc (sizeof(T))
#define CALLOC(N,T) (T*) calloc ( (N) , sizeof(T) )

When a pointer is freed, it should be set to NULL. This can be facilitated by the use of the following macro:

#define FREE(X) { free(X) ; (X)=NULL; }

The maximum string size for the filenames is 4096 characters.

#define TREXIO_MAX_FILENAME_LENGTH 4096

2 Front end

All calls to TREXIO are thread-safe. TREXIO front end is modular, which simplifies implementation of new back ends.

2.1 Error handling

Macro Code Description
TREXIO_FAILURE -1 'Unknown failure'
TREXIO_SUCCESS 0 'Success'
TREXIO_INVALID_ARG_1 1 'Invalid argument 1'
TREXIO_INVALID_ARG_2 2 'Invalid argument 2'
TREXIO_INVALID_ARG_3 3 'Invalid argument 3'
TREXIO_INVALID_ARG_4 4 'Invalid argument 4'
TREXIO_INVALID_ARG_5 5 'Invalid argument 5'
TREXIO_END 6 'End of file'
TREXIO_READONLY 7 'Read-only file'
TREXIO_ERRNO 8 strerror(errno)
TREXIO_INVALID_ID 9 'Invalid ID'
TREXIO_ALLOCATION_FAILED 10 'Allocation failed'
TREXIO_HAS_NOT 11 'Element absent'
TREXIO_INVALID_NUM 12 'Invalid dimensions'
TREXIO_NUM_ALREADY_EXISTS 13 'Dimensioning variable already exists'
TREXIO_DSET_ALREADY_EXISTS 14 'Dataset already exists'
TREXIO_OPEN_ERROR 15 'Error opening file'
TREXIO_LOCK_ERROR 16 'Error locking file'
TREXIO_UNLOCK_ERROR 17 'Error unlocking file'
TREXIO_FILE_ERROR 18 'Invalid file handle'
TREXIO_GROUP_READ_ERROR 19 'Error reading group'
TREXIO_GROUP_WRITE_ERROR 20 'Error writing group'
TREXIO_ELEM_READ_ERROR 21 'Error reading element'
TREXIO_ELEM_WRITE_ERROR 22 'Error writing element'

The trexio_string_of_error converts an exit code into a string. The string is assumed to be large enough to contain the error message (typically 128 characters).

◉ Decoding errors

To decode the error messages, trexio_string_of_error converts an error code into a string.

128

The text strings are extracted from the previous table.

const char*
trexio_string_of_error (const trexio_exit_code error)
{
  switch (error) {
  case TREXIO_FAILURE:
          return "Unknown failure";
          break;
  case TREXIO_SUCCESS:
          return "Success";
          break;
  case TREXIO_INVALID_ARG_1:
          return "Invalid argument 1";
          break;
  case TREXIO_INVALID_ARG_2:
          return "Invalid argument 2";
          break;
  case TREXIO_INVALID_ARG_3:
          return "Invalid argument 3";
          break;
  case TREXIO_INVALID_ARG_4:
          return "Invalid argument 4";
          break;
  case TREXIO_INVALID_ARG_5:
          return "Invalid argument 5";
          break;
  case TREXIO_END:
          return "End of file";
          break;
  case TREXIO_READONLY:
          return "Read-only file";
          break;
  case TREXIO_ERRNO:
          return strerror(errno);
          break;
  case TREXIO_INVALID_ID:
          return "Invalid ID";
          break;
  case TREXIO_ALLOCATION_FAILED:
          return "Allocation failed";
          break;
  case TREXIO_HAS_NOT:
          return "Element absent";
          break;
  case TREXIO_INVALID_NUM:
          return "Invalid dimensions";
          break;
  case TREXIO_NUM_ALREADY_EXISTS:
          return "Dimensioning variable already exists";
          break;
  case TREXIO_DSET_ALREADY_EXISTS:
          return "Dataset already exists";
          break;
  case TREXIO_OPEN_ERROR:
          return "Error opening file";
          break;
  case TREXIO_LOCK_ERROR:
          return "Error locking file";
          break;
  case TREXIO_UNLOCK_ERROR:
          return "Error unlocking file";
          break;
  case TREXIO_FILE_ERROR:
          return "Invalid file handle";
          break;
  case TREXIO_GROUP_READ_ERROR:
          return "Error reading group";
          break;
  case TREXIO_GROUP_WRITE_ERROR:
          return "Error writing group";
          break;
  case TREXIO_ELEM_READ_ERROR:
          return "Error reading element";
          break;
  case TREXIO_ELEM_WRITE_ERROR:
          return "Error writing element";
          break;
  }
  return "Unknown error";
}

void
trexio_string_of_error_f (const trexio_exit_code error, char result[128])
{
  strncpy(result, trexio_string_of_error(error), 128);
}
interface
   subroutine trexio_string_of_error (error, string) bind(C, name='trexio_string_of_error_f')
     use, intrinsic :: iso_c_binding
     import
     integer (trexio_exit_code), intent(in), value :: error
     character, intent(out) :: string(128)
   end subroutine trexio_string_of_error
end interface

2.2 Back ends

TREXIO has several back ends:

  1. TREXIO_HDF5 relies on extensive use of the HDF5 library and the associated file format. The HDF5 file is binary and tailored to high-performance I/O. This back end is the default one. HDF5 can be compiled with MPI for parallel I/O. Note, that HDF5 has to be downloaded and installed independently of TREXIO, which may cause some obstacles, especially when the user is not allowed to install external software. The produced files usually have .h5 extension.
  2. TREXIO_TEXT relies on basic file I/O in C, namely fopen, fclose, fprintf, fscanf etc. from stdio.h library. This back end is not optimized for performance. It is supposed to be used for debug purposes or, for example, when the user wants to modify some data manually within the file. This back end is supposed to work "out-of-the-box" since there are no external dependencies, which might be useful for users that do not have access to HDF5 library. The produced files usually have .txt extension.

Additional back ends can be implemented thanks to the modular nature of the front end. This can be achieved by adding a new case (corresponding to the desired back end) in the front-end switch. Then the corresponding back-end has/read/write functions has to be implemented. For example, see the commented lines that correspond to the TREXIO_JSON back end (not implemented yet).

typedef int32_t back_end_t;

#define TREXIO_HDF5             ( (back_end_t) 0 )
#define TREXIO_TEXT             ( (back_end_t) 1 )
/*#define TREXIO_JSON             ( (back_end_t) 2 )*/
#define TREXIO_INVALID_BACK_END ( (back_end_t) 2 )

2.3 Read/write behavior

Every time a reading function is called, the data is read from the disk. If data needs to be cached, this is left to the user of the library.

Writing to TREXIO files is done with transactions (all-or-nothing effect) in a per-group fashion. File writes are attempted by calling explicitly the write (TREXIO_HDF5) or flush (TREXIO_TEXT) function, or when the TREXIO file is closed. If writing is impossible because the data is not valid, no data is written.

The order in which the data is written is not necessarily consistent with the order in which the function calls were made.

The TREXIO files are supposed to be opened by only one program at a time: if the same TREXIO file is modified simultaneously by multiple concurrent programs, the behavior is not specified.

2.4 TREXIO file type

trexio_s is the the main type for TREXIO files, visible to the users of the library. This type is kept opaque, and all modifications to the files will be necessarily done through the use of functions, taking such a type as argument.

File creation and opening functions will return TREXIO file handles, namely pointers to trexio_s types. All functions accessing to the TREXIO files will have as a first argument the TREXIO file handle.

typedef struct trexio_s trexio_t;
struct trexio_s {
  char*             file_name;
  pthread_mutex_t   thread_lock;
  back_end_t        back_end;
  char              mode;
  char              padding[7];   /* Ensures the proper alignment of back ends */
};

2.5 Polymorphism of the file handle

Polymorphism of the trexio_t type is handled by ensuring that the corresponding types for all back ends can be safely casted to trexio_t. This is done by making the back-end structs start with struct trexio_s:

struct trexio_back_end_s {
  trexio_t     parent ;
  /* add below specific back-end data */
}

2.6 File opening

trexio_open creates a new TREXIO file or opens existing one.

input parameters:

  1. file_name - string containing file name
  2. mode - character containing open mode (see below)
    • 'w' - (write) creates a new file as READWRITE (overwrite existing file)
    • 'r' - (read) opens existing file as READONLY
  3. back_end - integer number (or the corresponding global parameter) specifying the back end
    • TREXIO_HDF5 - for HDF5 back end (integer alternative: 0)
    • TREXIO_TEXT - for TEXT back end (integer alternative: 1)

output: trexio_t file handle

Note: the file_name in TEXT back end actually corresponds to the name of the folder where .txt data files are stored. The actual name of each .txt file corresponds to the group name provided in trex.config (e.g. nucleus.txt for nuclei-related data). These names are populated by the generator.py (i.e. they are hard-coded), which is why the user should tend to avoid renaming the .txt data files.

trexio_t*
trexio_open(const char* file_name, const char mode,
            const back_end_t back_end)
{

  if (file_name == NULL) return NULL;
  if (file_name[0] == '\0') return NULL;
  /* Check overflow in file_name */

  if (back_end <  0) return NULL;
  if (back_end >= TREXIO_INVALID_BACK_END) return NULL;

  if (mode != 'r' && mode != 'w') return NULL;

  trexio_t* result = NULL;
  void* result_tmp = NULL;

  /* Allocate data structures */
  switch (back_end) {

  case TREXIO_TEXT:
    result_tmp = malloc(sizeof(trexio_text_t));
    break;

  case TREXIO_HDF5:
    result_tmp = malloc(sizeof(trexio_hdf5_t));
    break;
/*
  case TREXIO_JSON:
    result = (trexio_t*) malloc (sizeof(trexio_json_t));
    break;
*/
  }
  result = (trexio_t*) result_tmp;

  assert (result != NULL);    /* TODO: Error handling */


  /* Data for the parent type */

  result->file_name   = CALLOC(TREXIO_MAX_FILENAME_LENGTH, char);
  strncpy(result->file_name, file_name, TREXIO_MAX_FILENAME_LENGTH);
  if (result->file_name[TREXIO_MAX_FILENAME_LENGTH-1] != '\0') {
    free(result->file_name);
    free(result);
    return NULL;
  }

  result->back_end    = back_end;
  result->mode        = mode;
  int irc = pthread_mutex_init ( &(result->thread_lock), NULL);
  assert (irc == 0);

  trexio_exit_code rc;

  /* Back end initialization */

  rc = TREXIO_OPEN_ERROR;

  switch (back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_init(result);
    break;

  case TREXIO_HDF5:
    rc = trexio_hdf5_init(result);
    break;
/*
  case TREXIO_JSON:
    rc = trexio_json_init(result);
    break;
*/
  }

  if (rc != TREXIO_SUCCESS) {
    free(result->file_name);
    free(result);
    return NULL;
  }

  /* File locking */

  rc = TREXIO_LOCK_ERROR;

  switch (back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_lock(result);
    break;
  /* HDF5 v.>=1.10 has file locking activated by default */
  case TREXIO_HDF5:
    rc = TREXIO_SUCCESS;
    break;
/*
  case TREXIO_JSON:
    rc = trexio_json_lock(result);
    break;
*/
  }

  if (rc != TREXIO_SUCCESS) {
    free(result->file_name);
    free(result);
    return NULL;
  }

  return result;
}
interface
   integer(8) function trexio_open_c (filename, mode, backend) bind(C, name="trexio_open")
     use, intrinsic :: iso_c_binding
     import
     character(kind=c_char), dimension(*)       :: filename
     character, intent(in), value               :: mode
     integer(trexio_backend), intent(in), value :: backend
   end function trexio_open_c
end interface

2.7 File closing

trexio_close closes an existing trexio_t file.

input parameters: file – TREXIO file handle.

output: trexio_exit_code exit code.

trexio_exit_code
trexio_close (trexio_t* file)
{

  if (file == NULL) return TREXIO_FILE_ERROR;

  trexio_exit_code rc = TREXIO_FAILURE;

  assert(file->back_end < TREXIO_INVALID_BACK_END);

  /* Terminate the back end */
  switch (file->back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_deinit(file);
    break;

  case TREXIO_HDF5:
    rc = trexio_hdf5_deinit(file);
    break;
/*
  case TREXIO_JSON:
    rc = trexio_json_deinit(file);
    break;
*/
  }

  if (rc != TREXIO_SUCCESS) {
    FREE(file->file_name);
    FREE(file);
    return rc;
  }

  /* File unlocking */

  rc = TREXIO_UNLOCK_ERROR;

  switch (file->back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_unlock(file);
    break;

  case TREXIO_HDF5:
    rc = TREXIO_SUCCESS;
    break;
/*
  case TREXIO_JSON:
    rc = trexio_json_unlock(file);
    break;
*/
  }

  /* Terminate front end */

  FREE(file->file_name);

  int irc = pthread_mutex_destroy( &(file->thread_lock) );

  free(file);

  if (irc != 0) return TREXIO_ERRNO;
  if (rc != TREXIO_SUCCESS) return rc;

  return TREXIO_SUCCESS;
}
interface
   integer function trexio_close (trex_file) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
   end function trexio_close
end interface

2.8 C helper functions

#+beginsrc c :tangle prefixfront.c trexioexitcode transformstr (char** dest, const char** src, uint64t strmaxnum, uint32t strmaxlen){

if (dest == NULL) return TREXIOINVALIDARG1; assert (strmaxnum > 0); assert (strmaxlen > 0);

char* tmpstr = (char*)calloc(strmaxnum*(strmaxlen+1)+1,sizeof(char));

for (int i=0; i<strmaxnum; i++){ dest[i] = tmpstr; strncpy(tmpstr, src[i], strmaxlen); tmpstr += strmaxlen + 1; }

*tmpstr cannot be freed here but it is taken case of when pointer to dest is deallocated *

return TREXIOSUCCESS; } #+endsrc c

3 Templates for front end

Consider the following block of trex.json:

{
  "nucleus": {
      "num"                : [ "int"  , [                     ] ]
    , "charge"             : [ "float", [ "nucleus.num"       ] ]
    , "coord"              : [ "float", [ "nucleus.num", "3"  ] ]
    , "label"              : [ "str" ,  [ "nucleus.num"       ] ]
  }
}

TREXIO is generated automatically by the generator.py Python script based on the tree-like configuration provided in the trex.json file. Because of that, generalized templates can be implemented and re-used. This approach minimizes the number of bugs as compared with manual copy-paste-modify scheme.

All templates presented below use the $var$ notation to indicate the variable, which will be replaced by the generator.py. Sometimes the upper case is used, i.e. $VAR$ (for example, in #define statements). More detailed description of each variable can be found below:

Template variable Description Example
$group$ Name of the group nucleus
$group_num$ Name of the dimensioning variable (scalar) nucleus_num
$group_dset$ Name of the dataset (vector/matrix/tensor) nucleus_coord
$group_dset_rank$ Rank of the dataset 2
$group_dset_dim$ Selected dimension of the dataset nucleus_num
$group_dset_dim_list$ All dimensions of the dataset {nucleus_num, 3}
$group_dset_dtype$ Basic type of the dataset (int/float/char) float
$group_dset_h5_dtype$ Type of the dataset in HDF5 double
$group_dset_std_dtype_in$ Input type of the dataset in TEXT [fscanf] %lf
$group_dset_std_dtype_out$ Output type of the dataset in TEXT [fprintf] %24.16e
$group_dset_dtype_default$ Default datatype of the dataset [C] double/int32_t
$group_dset_dtype_single$ Single precision datatype of the dataset [C] float/int32_t
$group_dset_dtype_double$ Double precision datatype of the dataset [C] double/int64_t
$default_prec$ Default precision for read/write without suffix [C] 64/32
$group_dset_f_dtype_default$ Default datatype of the dataset [Fortran] real(8)/integer(4)
$group_dset_f_dtype_single$ Single precision datatype of the dataset [Fortran] real(4)/integer(4)
$group_dset_f_dtype_double$ Double precision datatype of the dataset [Fortran] real(8)/integer(8)
$group_dset_f_dims$ Dimensions in Fortran (:,:)

Note: parent group name is always added to the child objects upon construction of TREXIO (e.g. num of nucleus group becomes nucleus_num and should be accessed accordingly within TREXIO).

TREXIO generator parses the trex.json file. TREXIO operates with names of variables based on the 1-st (parent group) and 2-nd (child object) levels of trex.json . The parsed data is divided in 2 parts:

  1. Dimensioning variables (contain num in their names). These are always scalar integers.
  2. Datasets. These can be vectors, matrices or tensors. The types are indicated in trex.json. Currently supported types: int, float and strings.

For each of the aforementioned objects, TREXIO provides has, read and write functionality. TREXIO supports I/O with single or double precision for integer and floating point numbers.

3.1 Templates for front end has/read/write a dimension

This section concerns API calls related to dimensioning variables.

Function name Description Precision
trexio_has_$group_num$ Check if a dimensioning variable exists in a file ---
trexio_read_$group_num$ Read a dimensioning variable Single
trexio_write_$group_num$ Write a dimensioning variable Single
trexio_read_$group_num$_32 Read a dimensioning variable Single
trexio_write_$group_num$_32 Write a dimensioning variable Single
trexio_read_$group_num$_64 Read a dimensioning variable Double
trexio_write_$group_num$_64 Write a dimensioning variable Double

3.1.1 C templates for front end

The C templates that correspond to each of the abovementioned functions can be found below. First parameter is the TREXIO file handle. Second parameter is the variable to be written/read to/from the TREXIO file (except for trexio_has_ functions). Suffixes _32 and _64 correspond to API calls dealing with single and double precision, respectively. The basic (non-suffixed) API call on dimensioning variables deals with single precision (see Table above).

trexio_exit_code
trexio_read_$group_num$_64 (trexio_t* const file, int64_t* const num)
{
  if (file == NULL) return TREXIO_INVALID_ARG_1;

  uint64_t u_num = 0;
  trexio_exit_code rc = TREXIO_GROUP_READ_ERROR;

  switch (file->back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_read_$group_num$(file, &u_num);
    break;

  case TREXIO_HDF5:
    rc = trexio_hdf5_read_$group_num$(file, &u_num);
    break;
/*
  case TREXIO_JSON:
    rc =trexio_json_read_$group_num$(file, &u_num);
    break;
*/
  }

  if (rc != TREXIO_SUCCESS) return rc;

  *num = (int64_t) u_num;
  return TREXIO_SUCCESS;
}
trexio_exit_code
trexio_write_$group_num$_64 (trexio_t* const file, const int64_t num)
{
  if (file == NULL) return TREXIO_INVALID_ARG_1;
  if (num  <  0   ) return TREXIO_INVALID_ARG_2;
  if (trexio_has_$group_num$(file) == TREXIO_SUCCESS) return TREXIO_NUM_ALREADY_EXISTS;

  trexio_exit_code rc = TREXIO_GROUP_WRITE_ERROR;

  switch (file->back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_write_$group_num$(file, (int64_t) num);
    break;

  case TREXIO_HDF5:
    rc = trexio_hdf5_write_$group_num$(file, (int64_t) num);
    break;
/*
  case TREXIO_JSON:
    rc = trexio_json_write_$group_num$(file, (int64_t) num);
    break;
*/
  }
  if (rc != TREXIO_SUCCESS) return rc;

  return TREXIO_SUCCESS;
}
trexio_exit_code
trexio_read_$group_num$_32 (trexio_t* const file, int32_t* const num)
{
  if (file == NULL) return TREXIO_INVALID_ARG_1;

  uint64_t u_num = 0;
  trexio_exit_code rc = TREXIO_GROUP_READ_ERROR;

  switch (file->back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_read_$group_num$(file, &u_num);
    break;

  case TREXIO_HDF5:
    rc = trexio_hdf5_read_$group_num$(file, &u_num);
    break;
/*
  case TREXIO_JSON:
    rc =trexio_json_read_$group_num$(file, &u_num);
    break;
*/
  }

  if (rc != TREXIO_SUCCESS) return rc;

  *num = (int32_t) u_num;
  return TREXIO_SUCCESS;
}
trexio_exit_code
trexio_write_$group_num$_32 (trexio_t* const file, const int32_t num)
{

  if (file == NULL) return TREXIO_INVALID_ARG_1;
  if (num  <  0   ) return TREXIO_INVALID_ARG_2;
  if (trexio_has_$group_num$(file) == TREXIO_SUCCESS) return TREXIO_NUM_ALREADY_EXISTS;

  trexio_exit_code rc = TREXIO_GROUP_WRITE_ERROR;

  switch (file->back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_write_$group_num$(file, (int64_t) num);
    break;

  case TREXIO_HDF5:
    rc = trexio_hdf5_write_$group_num$(file, (int64_t) num);
    break;
/*
  case TREXIO_JSON:
    rc = trexio_json_write_$group_num$(file, (int64_t) num);
    break;
*/
  }
  if (rc != TREXIO_SUCCESS) return rc;

  return TREXIO_SUCCESS;
}
trexio_exit_code
trexio_read_$group_num$ (trexio_t* const file, int32_t* const num)
{
  return trexio_read_$group_num$_32(file, num);
}
trexio_exit_code
trexio_write_$group_num$ (trexio_t* const file, const int32_t num)
{
  return trexio_write_$group_num$_32(file, num);
}
trexio_exit_code
trexio_has_$group_num$ (trexio_t* const file)
{

  if (file == NULL) return TREXIO_INVALID_ARG_1;

  assert(file->back_end < TREXIO_INVALID_BACK_END);

  switch (file->back_end) {

  case TREXIO_TEXT:
    return trexio_text_has_$group_num$(file);
    break;

  case TREXIO_HDF5:
    return trexio_hdf5_has_$group_num$(file);
    break;
/*
  case TREXIO_JSON:
    return trexio_json_has_$group_num$(file);
    break;
*/
  }
  return TREXIO_FAILURE;

}

3.1.2 Fortran templates for front end

The Fortran templates that provide an access to the C API calls from Fortran. These templates are based on the use of iso_c_binding. Pointers have to be passed by value.

interface
   integer function trexio_write_$group_num$_64 (trex_file, num) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     integer(8), intent(in), value :: num
   end function trexio_write_$group_num$_64
end interface
interface
   integer function trexio_read_$group_num$_64 (trex_file, num) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     integer(8), intent(out) :: num
   end function trexio_read_$group_num$_64
end interface
interface
   integer function trexio_write_$group_num$_32 (trex_file, num) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     integer(4), intent(in), value :: num
   end function trexio_write_$group_num$_32
end interface
interface
   integer function trexio_read_$group_num$_32 (trex_file, num) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     integer(4), intent(out) :: num
   end function trexio_read_$group_num$_32
end interface
interface
   integer function trexio_write_$group_num$ (trex_file, num) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     integer(4), intent(in), value :: num
   end function trexio_write_$group_num$
end interface
interface
   integer function trexio_read_$group_num$ (trex_file, num) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     integer(4), intent(out) :: num
   end function trexio_read_$group_num$
end interface
interface
   integer function trexio_has_$group_num$ (trex_file) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
   end function trexio_has_$group_num$
end interface

3.2 Templates for front end has/read/write a dataset

This section concerns API calls related to datasets.

Function name Description Precision
trexio_has_$group_dset$ Check if a dataset exists in a file ---
trexio_read_$group_dset$ Read a dataset Double
trexio_write_$group_dset$ Write a dataset Double
trexio_read_$group_dset$_32 Read a dataset Single
trexio_write_$group_dset$_32 Write a dataset Single
trexio_read_$group_dset$_64 Read a dataset Double
trexio_write_$group_dset$_64 Write a dataset Double

3.2.1 C templates for front end

The C templates that correspond to each of the abovementioned functions can be found below. First parameter is the TREXIO file handle. Second parameter is the variable to be written/read to/from the TREXIO file (except for trexio_has_ functions). Suffixes _32 and _64 correspond to API calls dealing with single and double precision, respectively. The basic (non-suffixed) API call on datasets deals with double precision (see Table above).

trexio_exit_code
trexio_read_$group_dset$_64 (trexio_t* const file, $group_dset_dtype_double$* const $group_dset$)
{

  if (file  == NULL) return TREXIO_INVALID_ARG_1;
  if ($group_dset$ == NULL) return TREXIO_INVALID_ARG_2;

  trexio_exit_code rc;
  int64_t $group_dset_dim$ = 0;

  /* Error handling for this call is added by the generator */
  rc = trexio_read_$group_dset_dim$_64(file, &($group_dset_dim$));

  if ($group_dset_dim$ == 0L) return TREXIO_INVALID_NUM;

  uint32_t rank = $group_dset_rank$;
  uint64_t dims[$group_dset_rank$] = {$group_dset_dim_list$};

  assert(file->back_end < TREXIO_INVALID_BACK_END);

  switch (file->back_end) {

  case TREXIO_TEXT:
    return trexio_text_read_$group_dset$(file, $group_dset$, rank, dims);
    break;

  case TREXIO_HDF5:
    return trexio_hdf5_read_$group_dset$(file, $group_dset$, rank, dims);
    break;
/*
  case TREXIO_JSON:
    return trexio_json_read_$group_dset$(file, $group_dset$, rank, dims);
    break;
*/
  }
  return TREXIO_FAILURE;
}
trexio_exit_code
trexio_write_$group_dset$_64 (trexio_t* const file, const $group_dset_dtype_double$* $group_dset$)
{

  if (file  == NULL) return TREXIO_INVALID_ARG_1;
  if ($group_dset$ == NULL) return TREXIO_INVALID_ARG_2;
  if (trexio_has_$group_dset$(file) == TREXIO_SUCCESS) return TREXIO_DSET_ALREADY_EXISTS;

  trexio_exit_code rc;
  int64_t $group_dset_dim$ = 0;

  /* Error handling for this call is added by the generator */
  rc = trexio_read_$group_dset_dim$_64(file, &($group_dset_dim$));

  if ($group_dset_dim$ == 0L) return TREXIO_INVALID_NUM;

  uint32_t rank = $group_dset_rank$;
  uint64_t dims[$group_dset_rank$] = {$group_dset_dim_list$};

  assert(file->back_end < TREXIO_INVALID_BACK_END);

  switch (file->back_end) {

  case TREXIO_TEXT:
    return trexio_text_write_$group_dset$(file, $group_dset$, rank, dims);
    break;

  case TREXIO_HDF5:
    return trexio_hdf5_write_$group_dset$(file, $group_dset$, rank, dims);
    break;
/*
  case TREXIO_JSON:
    return trexio_json_write_$group_dset$(file, $group_dset$, rank, dims);
    break;
*/
  }
  return TREXIO_FAILURE;
}
trexio_exit_code
trexio_read_$group_dset$_32 (trexio_t* const file, $group_dset_dtype_single$* const $group_dset$)
{

  if (file  == NULL) return TREXIO_INVALID_ARG_1;
  if ($group_dset$ == NULL) return TREXIO_INVALID_ARG_2;

  trexio_exit_code rc;
  int64_t $group_dset_dim$ = 0;

  /* Error handling for this call is added by the generator */
  rc = trexio_read_$group_dset_dim$_64(file, &($group_dset_dim$));

  if ($group_dset_dim$ == 0L) return TREXIO_INVALID_NUM;

  uint32_t rank = $group_dset_rank$;
  uint64_t dims[$group_dset_rank$] = {$group_dset_dim_list$};

  uint64_t dim_size = 1;
  for (unsigned int i=0; i<rank; ++i){
    dim_size *= dims[i];
  }

  $group_dset_dtype_double$* $group_dset$_64 = CALLOC(dim_size, $group_dset_dtype_double$);
  if ($group_dset$_64 == NULL) return TREXIO_ALLOCATION_FAILED;

  assert(file->back_end < TREXIO_INVALID_BACK_END);

  rc = TREXIO_FAILURE;

  switch (file->back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_read_$group_dset$(file, $group_dset$_64, rank, dims);
    break;

  case TREXIO_HDF5:
    rc = trexio_hdf5_read_$group_dset$(file, $group_dset$_64, rank, dims);
    break;
/*
  case TREXIO_JSON:
    rc = trexio_json_read_$group_dset$(file, $group_dset$_64, rank, dims);
    break;
*/
  }

  if (rc != TREXIO_SUCCESS){
    FREE($group_dset$_64);
    return rc;
  }

  for (uint64_t i=0; i<dim_size; ++i){
    $group_dset$[i] = ($group_dset_dtype_single$) $group_dset$_64[i];
  }

  FREE($group_dset$_64);
  return TREXIO_SUCCESS;
}
trexio_exit_code
trexio_write_$group_dset$_32 (trexio_t* const file, const $group_dset_dtype_single$* $group_dset$)
{

  if (file  == NULL) return TREXIO_INVALID_ARG_1;
  if ($group_dset$ == NULL) return TREXIO_INVALID_ARG_2;
  if (trexio_has_$group_dset$(file) == TREXIO_SUCCESS) return TREXIO_DSET_ALREADY_EXISTS;

  trexio_exit_code rc;
  int64_t $group_dset_dim$ = 0;

  /* Error handling for this call is added by the generator */
  rc = trexio_read_$group_dset_dim$_64(file, &($group_dset_dim$));

  if ($group_dset_dim$ == 0L) return TREXIO_INVALID_NUM;

  uint32_t rank = $group_dset_rank$;
  uint64_t dims[$group_dset_rank$] = {$group_dset_dim_list$};

  uint64_t dim_size = 1;
  for (unsigned int i=0; i<rank; ++i){
    dim_size *= dims[i];
  }

  $group_dset_dtype_double$* $group_dset$_64 = CALLOC(dim_size, $group_dset_dtype_double$);
  if ($group_dset$_64 == NULL) return TREXIO_ALLOCATION_FAILED;

  /* A type conversion from single precision to double reqired since back end only accepts 64-bit data */
  for (uint64_t i=0; i<dim_size; ++i){
    $group_dset$_64[i] = ($group_dset_dtype_double$) $group_dset$[i];
  }

  assert(file->back_end < TREXIO_INVALID_BACK_END);

  rc = TREXIO_FAILURE;
  switch (file->back_end) {

  case TREXIO_TEXT:
    rc = trexio_text_write_$group_dset$(file, $group_dset$_64, rank, dims);
    break;

  case TREXIO_HDF5:
    rc = trexio_hdf5_write_$group_dset$(file, $group_dset$_64, rank, dims);
    break;
/*
  case TREXIO_JSON:
    rc = trexio_json_write_$group_dset$(file, $group_dset$_64, rank, dims);
    break;
*/
  }

  FREE($group_dset$_64);

  if (rc != TREXIO_SUCCESS) return rc;

  return TREXIO_SUCCESS;
}
trexio_exit_code
trexio_read_$group_dset$ (trexio_t* const file, $group_dset_dtype_default$* const $group_dset$)
{
  return trexio_read_$group_dset$_$default_prec$(file, $group_dset$);
}
trexio_exit_code
trexio_write_$group_dset$ (trexio_t* const file, const $group_dset_dtype_default$* $group_dset$)
{
  return trexio_write_$group_dset$_$default_prec$(file, $group_dset$);
}
trexio_exit_code
trexio_has_$group_dset$ (trexio_t* const file)
{

  if (file  == NULL) return TREXIO_INVALID_ARG_1;

  assert(file->back_end < TREXIO_INVALID_BACK_END);

  switch (file->back_end) {

  case TREXIO_TEXT:
    return trexio_text_has_$group_dset$(file);
    break;

  case TREXIO_HDF5:
    return trexio_hdf5_has_$group_dset$(file);
    break;
/*
  case TREXIO_JSON:
    return trexio_json_has_$group_dset$(file);
    break;
*/
  }
  return TREXIO_FAILURE;
}

3.2.2 Fortran templates for front end

The Fortran templates that provide an access to the C API calls from Fortran. These templates are based on the use of iso_c_binding. Pointers have to be passed by value.

interface
   integer function trexio_write_$group_dset$_64 (trex_file, dset) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     $group_dset_f_dtype_double$, intent(in) :: dset$group_dset_f_dims$
   end function trexio_write_$group_dset$_64
end interface
interface
   integer function trexio_read_$group_dset$_64 (trex_file, dset) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     $group_dset_f_dtype_double$, intent(out) :: dset$group_dset_f_dims$
   end function trexio_read_$group_dset$_64
end interface
interface
   integer function trexio_write_$group_dset$_32 (trex_file, dset) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     $group_dset_f_dtype_single$, intent(in) :: dset$group_dset_f_dims$
   end function trexio_write_$group_dset$_32
end interface
interface
   integer function trexio_read_$group_dset$_32 (trex_file, dset) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     $group_dset_f_dtype_single$, intent(out) :: dset$group_dset_f_dims$
   end function trexio_read_$group_dset$_32
end interface
interface
   integer function trexio_write_$group_dset$ (trex_file, dset) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     $group_dset_f_dtype_default$, intent(in) :: dset$group_dset_f_dims$
   end function trexio_write_$group_dset$
end interface
interface
   integer function trexio_read_$group_dset$ (trex_file, dset) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
     $group_dset_f_dtype_default$, intent(out) :: dset$group_dset_f_dims$
   end function trexio_read_$group_dset$
end interface
interface
   integer function trexio_has_$group_dset$ (trex_file) bind(C)
     use, intrinsic :: iso_c_binding
     integer(8), intent(in), value :: trex_file
   end function trexio_has_$group_dset$
end interface

3.3 Sparse data structures

Sparse data structures are used typically for large tensors such as two-electron integrals. For example, in the trex.json file sparse arrays appear as for the eri :

"ao_2e_int"  : {
  "eri_num"  : [ "int", [  ] ]
  "eri"      : [ "float sparse", [ "ao.num", "ao.num", "ao.num", "ao.num" ] ]
}

The electron repulsion integral \(\langle ij | kl \rangle\) is represented as a quartet of integers \((i,j,k,l)\) and a floating point value.

To store \(N\) integrals in the file, we store

  • An array of quartets of integers
  • An array of values (floats)

Both arrays have the same size, \(N\), the number of non-zero integrals. Knowing the maximum dimensions allows to check that the integers are in a valid range, and also lets the library choose the smallest integer representation to compress the storage.

Fortran uses 1-based array indexing, while C uses 0-based indexing. Internally, we use a 0-based representation but the Fortran binding does the appropriate conversion when reading or writing.

As the number of integrals to store can be prohibitively large, we provide the possibility to read/write the integrals in chunks. So the functions take two extra parameters:

  • offset : the index of the 1st integral we want to read. An offset of zero implies to read the first integral.
  • num : the number of integrals to read.

We provide a function to read a chunk of indices, and a function to read a chunk of values, because some users might want to read only the values of the integrals, or only the indices.

Here is an example for the indices:

trexio_exit_code
trexio_read_chunk_ao_2e_int_eri_index_32(trexio_t* const file,
                                         const int64_t offset,
                                         const int64_t num,
                                         int32_t* buffer)
{
  if (file  == NULL) return TREXIO_INVALID_ARG_1;
  if (offset   < 0L) return TREXIO_INVALID_ARG_2;
  if (num      < 0L) return TREXIO_INVALID_ARG_3;

  const uint32_t rank = 4;  // To be set by generator : number of indices

  int64_t nmax;             // Max number of integrals
  trexio_exit_code rc;

  rc = trexio_read_ao_2e_int_eri_num(const file, &nmax);
  if (rc != TREXIO_SUCCESS) return rc;

  switch (file->back_end) {

  case TREXIO_TEXT:
    return trexio_text_read_chunk_ao_2e_int_eri_index(file, buffer, offset, num, rank, nmax);
    break;

  case TREXIO_HDF5:
    return trexio_hdf5_read_chunk_ao_2e_int_eri_index(file, buffer, offset, num, rank, nmax);
    break;

  default:
    return TREXIO_FAILURE;  /* Impossible case */
  }
}

For the values,

trexio_exit_code
trexio_read_chunk_ao_2e_int_eri_value_64(trexio_t* const file,
                                         const int64_t offset,
                                         const int64_t num,
                                         double* buffer)
{
  if (file  == NULL) return TREXIO_INVALID_ARG_1;
  if (offset   < 0L) return TREXIO_INVALID_ARG_2;
  if (num      < 0L) return TREXIO_INVALID_ARG_3;

  int64_t nmax;             // Max number of integrals
  trexio_exit_code rc;

  rc = trexio_read_ao_2e_int_eri_num(const file, &nmax);
  if (rc != TREXIO_SUCCESS) return rc;

  switch (file->back_end) {

  case TREXIO_TEXT:
    return trexio_text_read_chunk_ao_2e_int_eri_value(file, buffer, offset, num, nmax);
    break;

  case TREXIO_HDF5:
    return trexio_hdf5_read_chunk_ao_2e_int_eri_index(file, buffer, offset, num, nmax);
    break;

  default:
    return TREXIO_FAILURE;  /* Impossible case */
  }
}

4 Fortran helper/wrapper functions

The function below adapts the original C-based trexio_open for Fortran. This is needed due to the fact that strings in C are terminated by NULL character \0 unlike strings in Fortran. Note, that Fortran interface calls the main TREXIO API, which is written in C.

contains
   integer(8) function trexio_open (filename, mode, backend)
     use, intrinsic :: iso_c_binding
     implicit none
     character(len=*)      :: filename
     character, intent(in), value :: mode
     integer(trexio_backend), intent(in), value   :: backend
     character(len=len_trim(filename)+1) :: filename_c

     filename_c = trim(filename) // c_null_char
     trexio_open = trexio_open_c(filename_c, mode, backend)
   end function trexio_open

Author: TREX-CoE

Created: 2021-06-04 Fri 16:16

Validate