Dataset

Many dataset operations are available through the indexing interface, which is aliased to the functional interface. Below describes the functional interface.

HDF5.Dataset — Type

HDF5.Dataset

A mutable wrapper for a HDF5 Dataset HDF5.API.hid_t.

source

HDF5.create_dataset — Function

create_dataset(parent, path, datatype, dataspace; properties...)

Arguments

parent - File or Group
path - String describing the path of the dataset within the HDF5 file or nothing to create an anonymous dataset
datatype - Datatype or Type or the dataset
dataspace - Dataspace or Dims of the dataset
properties - keyword name-value pairs set properties of the dataset

Keywords

There are many keyword properties that can be set. Below are a few select keywords.

chunk - Dims describing the size of a chunk. Needed to apply filters.
filters - AbstractVector{<: Filters.Filter} describing the order of the filters to apply to the data. See Filters
external - Tuple{AbstractString, Intger, Integer} (filepath, offset, filesize) External dataset file location, data offset, and file size. See API.h5p_set_external.

Additionally, the initial create, transfer, and access properties can be provided as a keyword:

source

Base.copyto! — Function

copyto!(output_buffer::AbstractArray{T}, obj::Union{DatasetOrAttribute}) where T

Copy [part of] a HDF5 dataset or attribute to a preallocated output buffer. The output buffer must be convertible to a pointer and have a contiguous layout.

source

Base.similar — Function

similar(obj::DatasetOrAttribute, [::Type{T}], [dims::Integer...]; normalize = true)

Return a Array{T} or Matrix{UInt8} to that can contain [part of] the dataset.

The normalize keyword will normalize the buffer for string and array datatypes.

source

HDF5.create_external_dataset — Function

create_external_dataset(parent, name, filepath, dtype, dspace, offset = 0)

Create an external dataset with data in an external file.

parent - File or Group
name - Name of the Dataset
filepath - File path to where the data is tored
dtype - Datatype, Type, or value where datatype is applicable
offset - Offset, in bytes, from the beginning of the file to the location in the file where the data starts.

See also API.h5p_set_external to link to multiple segments.

source

HDF5.get_datasets — Function

get_datasets(file::HDF5.File) -> datasets::Vector{HDF5.Dataset}

Get all the datasets in an hdf5 file without loading the data.

source

HDF5.open_dataset — Function

open_dataset(parent::Union{File, Group}, path::AbstractString; properties...)

Open an existing HDF5.Dataset at path under parent

Optional keyword arguments include any keywords that that belong to DatasetAccessProperties or DatasetTransferProperties.

source

HDF5.write_dataset — Function

write_dataset(parent::Union{File,Group}, name::Union{AbstractString,Nothing}, data; pv...)

Create and write a dataset with data. Keywords are forwarded to create_dataset. Providing nothing as the name will create an anonymous dataset.

Chunks

HDF5.do_read_chunk — Function

do_read_chunk(dataset::Dataset, offset)

Read a raw chunk at a given offset. offset is a 1-based list of rank ndims(dataset) and must fall on a chunk boundary.

source

do_read_chunk(dataset::Dataset, index::Integer)

Read a raw chunk at a given index. index is 1-based and consecutive up to the number of chunks.

source

HDF5.do_write_chunk — Function

do_write_chunk(dataset::Dataset, offset, chunk_bytes::AbstractArray, filter_mask=0)

Write a raw chunk at a given offset. chunk_bytes is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. offset is a 1-based list of rank ndims(dataset) and must fall on a chunk boundary.

source

do_write_chunk(dataset::Dataset, index, chunk_bytes::AbstractArray, filter_mask=0)

Write a raw chunk at a given linear index. chunk_bytes is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. index is 1-based and consecutive up to the number of chunks.

source

HDF5.get_chunk_index — Function

HDF5.get_chunk_index(dataset_id, offset)

Get 0-based index of chunk from 0-based offset returned in Julia's column-major order. For a 1-based API, see HDF5.ChunkStorage.

source

HDF5.get_chunk_info_all — Function

HDF5.get_chunk_info_all(dataset, [dxpl])

Obtain information on all the chunks in a dataset. Returns a Vector{ChunkInfo{N}}. The fields of ChunkInfo{N} are

offset - NTuple{N, Int} indicating the offset of the chunk in terms of elements, reversed to F-order
filter_mask - Cuint, 32-bit flags indicating whether filters have been applied to the cunk
addr - haddr_t, byte-offset of the chunk in the file
size - hsize_t, size of the chunk in bytes

source

HDF5.get_chunk_length — Function

HDF5.get_chunk_length(dataset_id)

Retrieves the chunk size in bytes. Equivalent to API.h5d_get_chunk_info(dataset_id, index)[:size].

source

HDF5.get_chunk_offset — Function

HDF5.get_chunk_offset(dataset_id, index)

Get 0-based offset of chunk from 0-based index. The offsets are returned in Julia's column-major order rather than hdf5 row-major order. For a 1-based API, see HDF5.ChunkStorage.

source

HDF5.get_num_chunks — Function

HDF5.get_num_chunks(dataset_id)

Returns the number of chunks in a dataset. Equivalent to API.h5d_get_num_chunks(dataset_id, HDF5.H5S_ALL).

source

HDF5.get_num_chunks_per_dim — Function

HDF5.get_num_chunks_per_dim(dataset_id)

Get the number of chunks in each dimension in Julia's column-major order.

source

HDF5.read_chunk — Function

HDF5.read_chunk(dataset_id, offset, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())

Helper method to read chunks via 0-based offsets in a Tuple.

Argument buf is optional and defaults to a Vector{UInt8} of length determined by HDF5.get_chunk_length. Argument dxpl_id can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT. Argument filters can be retrieved by supplying a Ref{UInt32} value via a keyword argument.

This method returns Vector{UInt8}.

source

HDF5.read_chunk(dataset_id, index::Integer, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())

Helper method to read chunks via 0-based integer index.

Argument buf is optional and defaults to a Vector{UInt8} of length determined by HDF5.API.h5d_get_chunk_info. Argument dxpl_id can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT. Argument filters can be retrieved by supplying a Ref{UInt32} value via a keyword argument.

This method returns Vector{UInt8}.

source

HDF5.write_chunk — Function

HDF5.write_chunk(dataset_id, offset, buf::AbstractArray; dxpl_id = HDF5.API.H5P_DEFAULT, filter_mask = 0)

Helper method to write chunks via 0-based offsets offset as a Tuple.

source

HDF5.write_chunk(dataset_id, index::Integer, buf::AbstractArray; dxpl_id = API.H5P_DEFAULT, filter_mask = 0)

Helper method to write chunks via 0-based integer index.

source

Private Implementation

These functions select private implementations of the public high-level API. They should be used for diagnostic purposes only.

HDF5._get_chunk_info_all_by_index — Function

_get_chunk_info_all_by_index(dataset, [dxpl])

Implementation of get_chunk_info_all via HDF5.API.h5d_get_chunk_info.

We expect this will be slower, O(N^2), than using h5d_chunk_iter since each call to h5d_get_chunk_info iterates through the B-tree structure.

source

HDF5._get_chunk_info_all_by_iter — Function

_get_chunk_info_all_by_iter(dataset, [dxpl])

Implementation of get_chunk_info_all via HDF5.API.h5d_chunk_iter.

We expect this will be faster, O(N), than using h5d_get_chunk_info since this allows us to iterate through the chunks once.

source