Dataset

Many dataset operations are available through the indexing interface, which is aliased to the functional interface. Below describes the functional interface.

HDF5.DatasetType
HDF5.Dataset

A mutable wrapper for a HDF5 Dataset HDF5.API.hid_t.

source
HDF5.create_datasetFunction
create_dataset(
    parent::Union{File, Group}, 
    path::Union{AbstractString, Nothing},
    datatype::Union{Datatype, Type},
    dataspace::Union{Dataspace, Dims, Nothing};
    properties...)

Arguments

  • parent: parent file File or Group.
  • path: String describing the path of the dataset within the HDF5 file, or nothing to create an anonymous dataset
  • datatype - Datatype or Type or the dataset
  • dataspace - Dataspace or Dims of the dataset. If nothing, then it will create a null (empty) dataset.
  • properties - keyword name-value pairs set properties of the dataset

Keywords

There are many keyword properties that can be set. Below are a few select keywords.

  • max_dims - Dims describing the maximum size of the dataset. Required for resizable datasets. Unlimited dimensions are denoted by HDF5.UNLIMITED.
  • chunk - Dims describing the size of a chunk. Needed to apply filters.
  • filters - AbstractVector{<: Filters.Filter} describing the order of the filters to apply to the data. See Filters
  • external - Tuple{AbstractString, Intger, Integer} (filepath, offset, filesize) External dataset file location, data offset, and file size. See API.h5p_set_external.

Additionally, the initial create, transfer, and access properties can be provided as a keyword:

source
Base.copyto!Function
copyto!(output_buffer::AbstractArray{T}, obj::Union{DatasetOrAttribute}) where T

Copy [part of] a HDF5 dataset or attribute to a preallocated output buffer. The output buffer must be convertible to a pointer and have a contiguous layout.

source
Base.similarFunction
similar(obj::DatasetOrAttribute, [::Type{T}], [dims::Integer...]; normalize = true)

Return a Array{T} or Matrix{UInt8} to that can contain [part of] the dataset.

The normalize keyword will normalize the buffer for string and array datatypes.

source
HDF5.create_external_datasetFunction
create_external_dataset(parent, name, filepath, dtype, dspace, offset = 0)

Create an external dataset with data in an external file.

  • parent - File or Group
  • name - Name of the Dataset
  • filepath - File path to where the data is tored
  • dtype - Datatype, Type, or value where datatype is applicable
  • offset - Offset, in bytes, from the beginning of the file to the location in the file where the data starts.

See also API.h5p_set_external to link to multiple segments.

source
HDF5.get_datasetsFunction
get_datasets(file::HDF5.File) -> datasets::Vector{HDF5.Dataset}

Get all the datasets in an hdf5 file without loading the data.

source
HDF5.write_datasetFunction
write_dataset(parent::Union{File,Group}, name::Union{AbstractString,Nothing}, data; pv...)

Create and write a dataset with data. Keywords are forwarded to create_dataset. Providing nothing as the name will create an anonymous dataset.

See also create_dataset

source
HDF5.read_datasetFunction
read_dataset(parent::Union{File,Group}, name::AbstractString)

Read a dataset with named name from parent. This will typically return an array. The dataset will be opened, read, and closed.

See also HDF5.open_dataset, Base.read

source

Chunks

HDF5.do_read_chunkFunction
do_read_chunk(dataset::Dataset, offset)

Read a raw chunk at a given offset. offset is a 1-based list of rank ndims(dataset) and must fall on a chunk boundary.

source
do_read_chunk(dataset::Dataset, index::Integer)

Read a raw chunk at a given index. index is 1-based and consecutive up to the number of chunks.

source
HDF5.do_write_chunkFunction
do_write_chunk(dataset::Dataset, offset, chunk_bytes::AbstractArray, filter_mask=0)

Write a raw chunk at a given offset. chunk_bytes is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. offset is a 1-based list of rank ndims(dataset) and must fall on a chunk boundary.

source
do_write_chunk(dataset::Dataset, index, chunk_bytes::AbstractArray, filter_mask=0)

Write a raw chunk at a given linear index. chunk_bytes is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. index is 1-based and consecutive up to the number of chunks.

source
HDF5.get_chunk_indexFunction
HDF5.get_chunk_index(dataset_id, offset)

Get 0-based index of chunk from 0-based offset returned in Julia's column-major order. For a 1-based API, see HDF5.ChunkStorage.

source
HDF5.get_chunk_info_allFunction
HDF5.get_chunk_info_all(dataset, [dxpl])

Obtain information on all the chunks in a dataset. Returns a Vector{ChunkInfo{N}}. The fields of ChunkInfo{N} are

  • offset - NTuple{N, Int} indicating the offset of the chunk in terms of elements, reversed to F-order
  • filter_mask - Cuint, 32-bit flags indicating whether filters have been applied to the cunk
  • addr - haddr_t, byte-offset of the chunk in the file
  • size - hsize_t, size of the chunk in bytes
source
HDF5.get_chunk_lengthFunction
HDF5.get_chunk_length(dataset_id)

Retrieves the chunk size in bytes. Equivalent to API.h5d_get_chunk_info(dataset_id, index)[:size].

source
HDF5.get_chunk_offsetFunction
HDF5.get_chunk_offset(dataset_id, index)

Get 0-based offset of chunk from 0-based index. The offsets are returned in Julia's column-major order rather than hdf5 row-major order. For a 1-based API, see HDF5.ChunkStorage.

source
HDF5.get_num_chunksFunction
HDF5.get_num_chunks(dataset_id)

Returns the number of chunks in a dataset. Equivalent to API.h5d_get_num_chunks(dataset_id, HDF5.H5S_ALL).

source
HDF5.read_chunkFunction
HDF5.read_chunk(dataset_id, offset, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())

Helper method to read chunks via 0-based offsets in a Tuple.

Argument buf is optional and defaults to a Vector{UInt8} of length determined by HDF5.get_chunk_length. Argument dxpl_id can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT. Argument filters can be retrieved by supplying a Ref{UInt32} value via a keyword argument.

This method returns Vector{UInt8}.

source
HDF5.read_chunk(dataset_id, index::Integer, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())

Helper method to read chunks via 0-based integer index.

Argument buf is optional and defaults to a Vector{UInt8} of length determined by HDF5.API.h5d_get_chunk_info. Argument dxpl_id can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT. Argument filters can be retrieved by supplying a Ref{UInt32} value via a keyword argument.

This method returns Vector{UInt8}.

source
HDF5.write_chunkFunction
HDF5.write_chunk(dataset_id, offset, buf::AbstractArray; dxpl_id = HDF5.API.H5P_DEFAULT, filter_mask = 0)

Helper method to write chunks via 0-based offsets offset as a Tuple.

source
HDF5.write_chunk(dataset_id, index::Integer, buf::AbstractArray; dxpl_id = API.H5P_DEFAULT, filter_mask = 0)

Helper method to write chunks via 0-based integer index.

source

Private Implementation

These functions select private implementations of the public high-level API. They should be used for diagnostic purposes only.