Dataset
Many dataset operations are available through the indexing interface, which is aliased to the functional interface. Below describes the functional interface.
HDF5.Dataset — TypeHDF5.DatasetA mutable wrapper for a HDF5 Dataset HDF5.API.hid_t.
HDF5.create_dataset — Functioncreate_dataset(parent, path, datatype, dataspace; properties...)Arguments
parent-FileorGrouppath-Stringdescribing the path of the dataset within the HDF5 file ornothingto create an anonymous datasetdatatype-DatatypeorTypeor the datasetdataspace-DataspaceorDimsof the datasetproperties- keyword name-value pairs set properties of the dataset
Keywords
There are many keyword properties that can be set. Below are a few select keywords.
chunk-Dimsdescribing the size of a chunk. Needed to apply filters.filters-AbstractVector{<: Filters.Filter}describing the order of the filters to apply to the data. SeeFiltersexternal-Tuple{AbstractString, Intger, Integer}(filepath, offset, filesize)External dataset file location, data offset, and file size. SeeAPI.h5p_set_external.
Additionally, the initial create, transfer, and access properties can be provided as a keyword:
dcpl-DatasetCreatePropertiesdxpl-DatasetTransferPropertiesdapl-DatasetAccessProperties
Base.copyto! — Functioncopyto!(output_buffer::AbstractArray{T}, obj::Union{DatasetOrAttribute}) where TCopy [part of] a HDF5 dataset or attribute to a preallocated output buffer. The output buffer must be convertible to a pointer and have a contiguous layout.
Base.similar — Functionsimilar(obj::DatasetOrAttribute, [::Type{T}], [dims::Integer...]; normalize = true)Return a Array{T} or Matrix{UInt8} to that can contain [part of] the dataset.
The normalize keyword will normalize the buffer for string and array datatypes.
HDF5.create_external_dataset — Functioncreate_external_dataset(parent, name, filepath, dtype, dspace, offset = 0)Create an external dataset with data in an external file.
parent- File or Groupname- Name of the Datasetfilepath- File path to where the data is toreddtype- Datatype, Type, or value wheredatatypeis applicableoffset- Offset, in bytes, from the beginning of the file to the location in the file where the data starts.
See also API.h5p_set_external to link to multiple segments.
HDF5.get_datasets — Functionget_datasets(file::HDF5.File) -> datasets::Vector{HDF5.Dataset}Get all the datasets in an hdf5 file without loading the data.
HDF5.open_dataset — Functionopen_dataset(parent::Union{File, Group}, path::AbstractString; properties...)Open an existing HDF5.Dataset at path under parent
Optional keyword arguments include any keywords that that belong to DatasetAccessProperties or DatasetTransferProperties.
HDF5.write_dataset — Functionwrite_dataset(parent::Union{File,Group}, name::Union{AbstractString,Nothing}, data; pv...)Create and write a dataset with data. Keywords are forwarded to create_dataset. Providing nothing as the name will create an anonymous dataset.
See also create_dataset
HDF5.read_dataset — Functionread_dataset(parent::Union{File,Group}, name::AbstractString)Read a dataset with named name from parent. This will typically return an array. The dataset will be opened, read, and closed.
See also HDF5.open_dataset, Base.read
Chunks
HDF5.do_read_chunk — Functiondo_read_chunk(dataset::Dataset, offset)Read a raw chunk at a given offset. offset is a 1-based list of rank ndims(dataset) and must fall on a chunk boundary.
do_read_chunk(dataset::Dataset, index::Integer)Read a raw chunk at a given index. index is 1-based and consecutive up to the number of chunks.
HDF5.do_write_chunk — Functiondo_write_chunk(dataset::Dataset, offset, chunk_bytes::AbstractArray, filter_mask=0)Write a raw chunk at a given offset. chunk_bytes is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. offset is a 1-based list of rank ndims(dataset) and must fall on a chunk boundary.
do_write_chunk(dataset::Dataset, index, chunk_bytes::AbstractArray, filter_mask=0)Write a raw chunk at a given linear index. chunk_bytes is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. index is 1-based and consecutive up to the number of chunks.
HDF5.get_chunk_index — FunctionHDF5.get_chunk_index(dataset_id, offset)Get 0-based index of chunk from 0-based offset returned in Julia's column-major order. For a 1-based API, see HDF5.ChunkStorage.
HDF5.get_chunk_info_all — FunctionHDF5.get_chunk_info_all(dataset, [dxpl])Obtain information on all the chunks in a dataset. Returns a Vector{ChunkInfo{N}}. The fields of ChunkInfo{N} are
- offset -
NTuple{N, Int}indicating the offset of the chunk in terms of elements, reversed to F-order - filter_mask - Cuint, 32-bit flags indicating whether filters have been applied to the cunk
- addr - haddr_t, byte-offset of the chunk in the file
- size - hsize_t, size of the chunk in bytes
HDF5.get_chunk_length — FunctionHDF5.get_chunk_length(dataset_id)Retrieves the chunk size in bytes. Equivalent to API.h5d_get_chunk_info(dataset_id, index)[:size].
HDF5.get_chunk_offset — FunctionHDF5.get_chunk_offset(dataset_id, index)Get 0-based offset of chunk from 0-based index. The offsets are returned in Julia's column-major order rather than hdf5 row-major order. For a 1-based API, see HDF5.ChunkStorage.
HDF5.get_num_chunks — FunctionHDF5.get_num_chunks(dataset_id)Returns the number of chunks in a dataset. Equivalent to API.h5d_get_num_chunks(dataset_id, HDF5.H5S_ALL).
HDF5.get_num_chunks_per_dim — FunctionHDF5.get_num_chunks_per_dim(dataset_id)Get the number of chunks in each dimension in Julia's column-major order.
HDF5.read_chunk — FunctionHDF5.read_chunk(dataset_id, offset, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())Helper method to read chunks via 0-based offsets in a Tuple.
Argument buf is optional and defaults to a Vector{UInt8} of length determined by HDF5.get_chunk_length. Argument dxpl_id can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT. Argument filters can be retrieved by supplying a Ref{UInt32} value via a keyword argument.
This method returns Vector{UInt8}.
HDF5.read_chunk(dataset_id, index::Integer, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())Helper method to read chunks via 0-based integer index.
Argument buf is optional and defaults to a Vector{UInt8} of length determined by HDF5.API.h5d_get_chunk_info. Argument dxpl_id can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT. Argument filters can be retrieved by supplying a Ref{UInt32} value via a keyword argument.
This method returns Vector{UInt8}.
HDF5.write_chunk — FunctionHDF5.write_chunk(dataset_id, offset, buf::AbstractArray; dxpl_id = HDF5.API.H5P_DEFAULT, filter_mask = 0)Helper method to write chunks via 0-based offsets offset as a Tuple.
HDF5.write_chunk(dataset_id, index::Integer, buf::AbstractArray; dxpl_id = API.H5P_DEFAULT, filter_mask = 0)Helper method to write chunks via 0-based integer index.
Private Implementation
These functions select private implementations of the public high-level API. They should be used for diagnostic purposes only.
HDF5._get_chunk_info_all_by_index — Function_get_chunk_info_all_by_index(dataset, [dxpl])Implementation of get_chunk_info_all via HDF5.API.h5d_get_chunk_info.
We expect this will be slower, O(N^2), than using h5d_chunk_iter since each call to h5d_get_chunk_info iterates through the B-tree structure.
HDF5._get_chunk_info_all_by_iter — Function_get_chunk_info_all_by_iter(dataset, [dxpl])Implementation of get_chunk_info_all via HDF5.API.h5d_chunk_iter.
We expect this will be faster, O(N), than using h5d_get_chunk_info since this allows us to iterate through the chunks once.