Parallel HDF5

It is possible to read and write parallel HDF5 files using MPI. For this, the HDF5 binaries loaded by HDF5.jl must have been compiled with parallel support, and linked to the specific MPI implementation that will be used for parallel I/O.

Parallel-enabled HDF5 libraries are usually included in computing clusters and linked to the available MPI implementations. They are also available via the package manager of a number of Linux distributions.

Finally, note that the MPI.jl package is lazy-loaded by HDF5.jl using Requires. In practice, this means that in Julia code, MPI must be imported before HDF5 for parallel functionality to be available.

Setting-up Parallel HDF5

The following step-by-step guide assumes one already has access to parallel-enabled HDF5 libraries linked to an existent MPI installation.

1. Using system-provided MPI libraries

Using a system-provided MPI library can be done with MPIPreferences.jl. After installing MPIPreferences.jl and running julia --project -e 'using MPIPreferences; MPIPreferences.use_system_binary()' MPIPreferences.jl identifies any available MPI implementation and stores the information in a file LocalPreferences.toml. See the MPI.jl docs for details.

2. Using parallel HDF5 libraries

Migration from HDF5.jl v0.16 and earlier

How to use a system-provided HDF5 library has been changed in HDF5.jl v0.17. Previously, the library path was set by the environment variable JULIA_HDF5_PATH, which required to rebuild HDF5.jl afterwards. The environment variable has been removed and no longer has an effect (for backward compatibility it is still recommended to also set the environment variable). Instead, proceed as described below.

As detailed in Using custom or system provided HDF5 binaries, set the preferences libhdf5 and libhdf5_hl to the full path, where the parallel HDF5 binaries are located. This can be done by:

julia> using HDF5

julia> HDF5.API.set_libraries!("/path/to/your/", "/path/to/your/")

3. Loading MPI-enabled HDF5

In Julia code, MPI.jl must be loaded before HDF5.jl for MPI functionality to be available:

using MPI
using HDF5

@assert HDF5.has_parallel()

Notes to HPC cluster administrators

More information for a setup at an HPC cluster can be found in the docs of MPI.jl. After performing the steps 1. and 2. the LocalPreferences.toml file could look something like the following:

_format = "1.0"
abi = "OpenMPI"
binary = "system"
libmpi = "/software/mpi/lib/"
mpiexec = "/software/mpi/bin/mpiexec"

libhdf5 = "/path/to/your/"
libhdf5_hl = "/path/to/your/"

Reading and writing data in parallel

A parallel HDF5 file may be opened by passing a MPI.Comm (and optionally a MPI.Info) object to h5open. For instance:

info = MPI.Info()
ff = h5open(filename, "w", comm, info)

MPI-distributed data is typically written by first creating a dataset describing the global dimensions of the data. The following example writes a 10 × Nproc array distributed over Nproc MPI processes.

Nproc = MPI.Comm_size(comm)
myrank = MPI.Comm_rank(comm)
M = 10
A = fill(myrank, M)  # local data
dims = (M, Nproc)    # dimensions of global data

# Create dataset
dset = create_dataset(ff, "/data", datatype(eltype(A)), dataspace(dims))

# Write local data
dset[:, myrank + 1] = A

Note that metadata operations, such as create_dataset, must be called collectively (on all processes at the same time, with the same arguments), but the actual writing to the dataset may be done independently. See Collective Calling Requirements in Parallel HDF5 Applications for the exact requirements.

Sometimes, it may be more efficient to write data in chunks, so that each process writes to a separate chunk of the file. This is especially the case when data is uniformly distributed among MPI processes. In this example, this can be achieved by passing chunk=(M, 1) to create_dataset.

For better performance, it is sometimes preferable to perform collective I/O when reading and writing datasets in parallel. This is achieved by passing dxpl_mpio=:collective to create_dataset. See also the HDF5 docs.

A few more examples are available in test/mpio.jl.