Dataset
Many dataset operations are available through the indexing interface, which is aliased to the functional interface. Below describes the functional interface.
HDF5.Dataset
— TypeHDF5.Dataset
A mutable wrapper for a HDF5 Dataset HDF5.API.hid_t
.
HDF5.create_dataset
— Functioncreate_dataset(
parent::Union{File, Group},
path::Union{AbstractString, Nothing},
datatype::Union{Datatype, Type},
dataspace::Union{Dataspace, Dims, Nothing};
properties...)
Arguments
parent
: parent fileFile
orGroup
.path
:String
describing the path of the dataset within the HDF5 file, ornothing
to create an anonymous datasetdatatype
-Datatype
orType
or the datasetdataspace
-Dataspace
orDims
of the dataset. Ifnothing
, then it will create a null (empty) dataset.properties
- keyword name-value pairs set properties of the dataset
Keywords
There are many keyword properties that can be set. Below are a few select keywords.
max_dims
-Dims
describing the maximum size of the dataset. Required for resizable datasets. Unlimited dimensions are denoted byHDF5.UNLIMITED
.chunk
-Dims
describing the size of a chunk. Needed to apply filters.filters
-AbstractVector{<: Filters.Filter}
describing the order of the filters to apply to the data. SeeFilters
external
-Tuple{AbstractString, Intger, Integer}
(filepath, offset, filesize)
External dataset file location, data offset, and file size. SeeAPI.h5p_set_external
.
Additionally, the initial create, transfer, and access properties can be provided as a keyword:
dcpl
-DatasetCreateProperties
dxpl
-DatasetTransferProperties
dapl
-DatasetAccessProperties
Base.copyto!
— Functioncopyto!(output_buffer::AbstractArray{T}, obj::Union{DatasetOrAttribute}) where T
Copy [part of] a HDF5 dataset or attribute to a preallocated output buffer. The output buffer must be convertible to a pointer and have a contiguous layout.
Base.similar
— Functionsimilar(obj::DatasetOrAttribute, [::Type{T}], [dims::Integer...]; normalize = true)
Return a Array{T}
or Matrix{UInt8}
to that can contain [part of] the dataset.
The normalize
keyword will normalize the buffer for string and array datatypes.
HDF5.create_external_dataset
— Functioncreate_external_dataset(parent, name, filepath, dtype, dspace, offset = 0)
Create an external dataset with data in an external file.
parent
- File or Groupname
- Name of the Datasetfilepath
- File path to where the data is toreddtype
- Datatype, Type, or value wheredatatype
is applicableoffset
- Offset, in bytes, from the beginning of the file to the location in the file where the data starts.
See also API.h5p_set_external
to link to multiple segments.
HDF5.get_datasets
— Functionget_datasets(file::HDF5.File) -> datasets::Vector{HDF5.Dataset}
Get all the datasets in an hdf5 file without loading the data.
HDF5.open_dataset
— Functionopen_dataset(parent::Union{File, Group}, path::AbstractString; properties...)
Open an existing HDF5.Dataset
at path
under parent
Optional keyword arguments include any keywords that that belong to DatasetAccessProperties
or DatasetTransferProperties
.
HDF5.write_dataset
— Functionwrite_dataset(parent::Union{File,Group}, name::Union{AbstractString,Nothing}, data; pv...)
Create and write a dataset with data
. Keywords are forwarded to create_dataset
. Providing nothing
as the name will create an anonymous dataset.
See also create_dataset
HDF5.read_dataset
— Functionread_dataset(parent::Union{File,Group}, name::AbstractString)
Read a dataset with named name
from parent
. This will typically return an array. The dataset will be opened, read, and closed.
See also HDF5.open_dataset
, Base.read
Chunks
HDF5.do_read_chunk
— Functiondo_read_chunk(dataset::Dataset, offset)
Read a raw chunk at a given offset. offset
is a 1-based list of rank ndims(dataset)
and must fall on a chunk boundary.
do_read_chunk(dataset::Dataset, index::Integer)
Read a raw chunk at a given index. index
is 1-based and consecutive up to the number of chunks.
HDF5.do_write_chunk
— Functiondo_write_chunk(dataset::Dataset, offset, chunk_bytes::AbstractArray, filter_mask=0)
Write a raw chunk at a given offset. chunk_bytes
is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. offset
is a 1-based list of rank ndims(dataset)
and must fall on a chunk boundary.
do_write_chunk(dataset::Dataset, index, chunk_bytes::AbstractArray, filter_mask=0)
Write a raw chunk at a given linear index. chunk_bytes
is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. index
is 1-based and consecutive up to the number of chunks.
HDF5.get_chunk_index
— FunctionHDF5.get_chunk_index(dataset_id, offset)
Get 0-based index of chunk from 0-based offset
returned in Julia's column-major order. For a 1-based API, see HDF5.ChunkStorage
.
HDF5.get_chunk_info_all
— FunctionHDF5.get_chunk_info_all(dataset, [dxpl])
Obtain information on all the chunks in a dataset. Returns a Vector{ChunkInfo{N}}
. The fields of ChunkInfo{N}
are
- offset -
NTuple{N, Int}
indicating the offset of the chunk in terms of elements, reversed to F-order - filter_mask - Cuint, 32-bit flags indicating whether filters have been applied to the cunk
- addr - haddr_t, byte-offset of the chunk in the file
- size - hsize_t, size of the chunk in bytes
HDF5.get_chunk_length
— FunctionHDF5.get_chunk_length(dataset_id)
Retrieves the chunk size in bytes. Equivalent to API.h5d_get_chunk_info(dataset_id, index)[:size]
.
HDF5.get_chunk_offset
— FunctionHDF5.get_chunk_offset(dataset_id, index)
Get 0-based offset of chunk from 0-based index
. The offsets are returned in Julia's column-major order rather than hdf5 row-major order. For a 1-based API, see HDF5.ChunkStorage
.
HDF5.get_num_chunks
— FunctionHDF5.get_num_chunks(dataset_id)
Returns the number of chunks in a dataset. Equivalent to API.h5d_get_num_chunks(dataset_id, HDF5.H5S_ALL)
.
HDF5.get_num_chunks_per_dim
— FunctionHDF5.get_num_chunks_per_dim(dataset_id)
Get the number of chunks in each dimension in Julia's column-major order.
HDF5.read_chunk
— FunctionHDF5.read_chunk(dataset_id, offset, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())
Helper method to read chunks via 0-based offsets in a Tuple
.
Argument buf
is optional and defaults to a Vector{UInt8}
of length determined by HDF5.get_chunk_length
. Argument dxpl_id
can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT
. Argument filters
can be retrieved by supplying a Ref{UInt32}
value via a keyword argument.
This method returns Vector{UInt8}
.
HDF5.read_chunk(dataset_id, index::Integer, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())
Helper method to read chunks via 0-based integer index
.
Argument buf
is optional and defaults to a Vector{UInt8}
of length determined by HDF5.API.h5d_get_chunk_info
. Argument dxpl_id
can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT
. Argument filters
can be retrieved by supplying a Ref{UInt32}
value via a keyword argument.
This method returns Vector{UInt8}
.
HDF5.write_chunk
— FunctionHDF5.write_chunk(dataset_id, offset, buf::AbstractArray; dxpl_id = HDF5.API.H5P_DEFAULT, filter_mask = 0)
Helper method to write chunks via 0-based offsets offset
as a Tuple
.
HDF5.write_chunk(dataset_id, index::Integer, buf::AbstractArray; dxpl_id = API.H5P_DEFAULT, filter_mask = 0)
Helper method to write chunks via 0-based integer index
.
Private Implementation
These functions select private implementations of the public high-level API. They should be used for diagnostic purposes only.
HDF5._get_chunk_info_all_by_index
— Function_get_chunk_info_all_by_index(dataset, [dxpl])
Implementation of get_chunk_info_all
via HDF5.API.h5d_get_chunk_info
.
We expect this will be slower, O(N^2), than using h5d_chunk_iter
since each call to h5d_get_chunk_info
iterates through the B-tree structure.
HDF5._get_chunk_info_all_by_iter
— Function_get_chunk_info_all_by_iter(dataset, [dxpl])
Implementation of get_chunk_info_all
via HDF5.API.h5d_chunk_iter
.
We expect this will be faster, O(N), than using h5d_get_chunk_info
since this allows us to iterate through the chunks once.