Parallel HDF5
It is possible to read and write parallel HDF5 files using MPI. For this, the HDF5 binaries loaded by HDF5.jl must have been compiled with parallel support, and linked to the specific MPI implementation that will be used for parallel I/O.
Parallel-enabled HDF5 libraries are usually included in computing clusters and linked to the available MPI implementations. They are also available via the package manager of a number of Linux distributions.
Finally, note that the MPI.jl package is lazy-loaded by HDF5.jl using Requires. In practice, this means that in Julia code, MPI
must be imported before HDF5
for parallel functionality to be available.
Setting-up Parallel HDF5
The following step-by-step guide assumes one already has access to parallel-enabled HDF5 libraries linked to an existent MPI installation.
1. Using system-provided MPI libraries
Using a system-provided MPI library can be done with MPIPreferences.jl. After installing MPIPreferences.jl and running julia --project -e 'using MPIPreferences; MPIPreferences.use_system_binary()'
MPIPreferences.jl identifies any available MPI implementation and stores the information in a file LocalPreferences.toml. See the MPI.jl docs for details.
2. Using parallel HDF5 libraries
How to use a system-provided HDF5 library has been changed in HDF5.jl v0.17. Previously, the library path was set by the environment variable JULIA_HDF5_PATH
, which required to rebuild HDF5.jl afterwards. The environment variable has been removed and no longer has an effect (for backward compatibility it is still recommended to also set the environment variable). Instead, proceed as described below.
As detailed in Using custom or system provided HDF5 binaries, set the preferences libhdf5
and libhdf5_hl
to the full path, where the parallel HDF5 binaries are located. This can be done by:
julia> using HDF5
julia> HDF5.API.set_libraries!("/path/to/your/libhdf5.so", "/path/to/your/libhdf5_hl.so")
3. Loading MPI-enabled HDF5
In Julia code, MPI.jl must be loaded before HDF5.jl for MPI functionality to be available:
using MPI
using HDF5
@assert HDF5.has_parallel()
Notes to HPC cluster administrators
More information for a setup at an HPC cluster can be found in the docs of MPI.jl. After performing the steps 1. and 2. the LocalPreferences.toml file could look something like the following:
[MPIPreferences]
_format = "1.0"
abi = "OpenMPI"
binary = "system"
libmpi = "/software/mpi/lib/libmpi.so"
mpiexec = "/software/mpi/bin/mpiexec"
[HDF5]
libhdf5 = "/path/to/your/libhdf5.so"
libhdf5_hl = "/path/to/your/libhdf5_hl.so"
Reading and writing data in parallel
A parallel HDF5 file may be opened by passing a MPI.Comm
(and optionally a MPI.Info
) object to h5open
. For instance:
comm = MPI.COMM_WORLD
info = MPI.Info()
ff = h5open(filename, "w", comm, info)
MPI-distributed data is typically written by first creating a dataset describing the global dimensions of the data. The following example writes a 10 × Nproc
array distributed over Nproc
MPI processes.
Nproc = MPI.Comm_size(comm)
myrank = MPI.Comm_rank(comm)
M = 10
A = fill(myrank, M) # local data
dims = (M, Nproc) # dimensions of global data
# Create dataset
dset = create_dataset(ff, "/data", datatype(eltype(A)), dataspace(dims))
# Write local data
dset[:, myrank + 1] = A
Note that metadata operations, such as create_dataset
, must be called collectively (on all processes at the same time, with the same arguments), but the actual writing to the dataset may be done independently. See Collective Calling Requirements in Parallel HDF5 Applications for the exact requirements.
Sometimes, it may be more efficient to write data in chunks, so that each process writes to a separate chunk of the file. This is especially the case when data is uniformly distributed among MPI processes. In this example, this can be achieved by passing chunk=(M, 1)
to create_dataset
.
For better performance, it is sometimes preferable to perform collective I/O when reading and writing datasets in parallel. This is achieved by passing dxpl_mpio=:collective
to create_dataset
. See also the HDF5 docs.
A few more examples are available in test/mpio.jl
.