Filters

HDF5 supports filters for compression and validation: these are applied sequentially to each chunk of a dataset when writing data, and in reverse order when reading data.

These can be set by passing a filter or vector of filters as a filters property to DatasetCreateProperties or via the filters keyword argument of create_dataset.

Example

HDF5.FiltersModule

HDF5.Filters

This module contains the interface for using filters in HDF5.jl.

Example Usage

using HDF5
using HDF5.Filters

# Create a new file
fn = tempname()

# Create test data
data = rand(1000, 1000)

# Open temp file for writing
f = h5open(fn, "w") 

# Create datasets
dsdeflate = create_dataset(f, "deflate", datatype(data), dataspace(data),
                           chunk=(100, 100), deflate=3)

dsshufdef = create_dataset(f, "shufdef", datatype(data), dataspace(data),
                           chunk=(100, 100), shuffle=true, deflate=3)

dsfiltdef = create_dataset(f, "filtdef", datatype(data), dataspace(data),
                           chunk=(100, 100), filters=Filters.Deflate(3))

dsfiltshufdef = create_dataset(f, "filtshufdef", datatype(data), dataspace(data),
                               chunk=(100, 100), filters=[Filters.Shuffle(), Filters.Deflate(3)])

# Write data
write(dsdeflate, data)
write(dsshufdef, data)
write(dsfiltdef, data)
write(dsfiltshufdef, data)

close(f)

Additonal Examples

See test/filter.jl for further examples.

source

Built-in Filters

HDF5.Filters.ShuffleType
Shuffle()

The shuffle filter de-interlaces a block of data by reordering the bytes. All the bytes from one consistent byte position of each data element are placed together in one block; all bytes from a second consistent byte position of each data element are placed together a second block; etc. For example, given three data elements of a 4-byte datatype stored as 012301230123, shuffling will re-order data as 000111222333. This can be a valuable step in an effective compression algorithm because the bytes in each byte position are often closely related to each other and putting them together can increase the compression ratio.

As implied above, the primary value of the shuffle filter lies in its coordinated use with a compression filter; it does not provide data compression when used alone. When the shuffle filter is applied to a dataset immediately prior to the use of a compression filter, the compression ratio achieved is often superior to that achieved by the use of a compression filter without the shuffle filter.

External links

source
HDF5.Filters.SzipType
Szip(coding=:nn, pixels_per_block=8)

Szip compression lossless filter. Options:

  • coding: the coding method: either :ec (entropy coding) or :nn (nearest neighbors, default)
  • pixels_per_block: The number of pixels or data elements in each data block (typically 8, 10, 16, or 32)

External links

source
HDF5.Filters.ExternalFilterType
ExternalFilter(filter_id::API.H5Z_filter_t, flags::Cuint, data::Vector{Cuint}, name::String, config::Cuint)
ExternalFilter(filter_id, flags, data::Integer...)
ExternalFilter(filter_id, data::AbstractVector{<:Integer} = Cuint[])

Intended to support arbitrary, unregistered, external filters. Allows the quick creation of filters using internal/proprietary filters without subtyping HDF5.Filters.Filter. Users are instead encouraged to define subtypes on HDF5.Filters.Filter.

Fields / Arguments

  • filter_id - (required) Integer` filter identifer.
  • flags - (optional) bit vector describing general properties of the filter. Defaults to API.H5Z_FLAG_MANDATORY
  • data - (optional) auxillary data for the filter. See cd_values. Defaults to Cuint[]
  • name - (optional) String describing the name of the filter. Defaults to "Unknown Filter with id [filter_id]"
  • config - (optional) bit vector representing information about the filter regarding whether it is able to encode data, decode data, neither, or both. Defaults to 0.

See also:

flags bits

  • API.H5Z_FLAG_OPTIONAL
  • API.H5Z_FLAG_MANDATORY

config bits

  • API.H5Z_FILTER_CONFIG_ENCODE_ENABLED
  • API.H5Z_FILTER_CONFIG_DECODE_ENABLED
source

External Filter Packages

Several external Julia packages implement HDF5 filter plugins in Julia. As they are independent of HDF5.jl, they must be installed in order to use their plugins.

The H5Zblosc.jl, H5Zbzip2.jl, H5Zlz4.jl, and H5Zzstd.jl packages are maintained as independent subdirectory packages within the HDF5.jl repository.

H5Zblosc.jl

H5Zbzip2.jl

H5Zlz4.jl

H5Zzstd.jl

Other External Filters

Additional filters can be dynamically loaded by the HDF5 library. See External Links below for more information.

Using an ExternalFilter

ExternalFilter can be used to insert a dynamically loaded filter into the FilterPipeline in an ad-hoc fashion.

Example for bitshuffle

If we do not have a defined subtype of Filter for the bitshuffle filter we can create an ExternalFilter. From the header file or list of registered plugins, we see that the bitshuffle filter has an id of 32008.

Furthermore, the header describes two options:

  1. block_size (optional). Default is 0.
  2. compression - This can be 0 or BSHUF_H5_COMPRESS_LZ4 (2 as defined in the C header)
using HDF5.Filters

bitshuf = ExternalFilter(32008, Cuint[0, 0])
bitshuf_comp = ExternalFilter(32008, Cuint[0, 2])

data_A = rand(0:31, 1024)
data_B = rand(32:63, 1024)

filename, _ = mktemp()
h5open(filename, "w") do h5f
    # Indexing style
    h5f["ex_data_A", chunk=(32,), filters=bitshuf] = data_A
    # Procedural style
    d, dt = create_dataset(h5f, "ex_data_B", data_B, chunk=(32,), filters=[bitshuf_comp])
    write(d, data_B)
end

Creating a new Filter type

Examining the bitshuffle filter source code we see that three additional data components get prepended to the options. These are

  1. The major version
  2. The minor version
  3. The element size in bytes of the type via H5Tget_size.
import HDF5.Filters: FILTERS, Filter, FilterPipeline, filterid
using HDF5.API

const H5Z_BSHUF_ID = API.H5Z_filter_t(32008)
struct BitShuffleFilter <: HDF5.Filters.Filter
    major::Cuint
    minor::Cuint
    elem_size::Cuint
    block_size::Cuint
    compression::Cuint
    BitShuffleFilter(block_size, compression) = new(0, 0, 0, block_size, compression)
end
# filterid is the only required method of the filter interface
# since we are using an externally registered filter
filterid(::Type{BitShuffleFilter}) = H5Z_BSHUF_ID
FILTERS[H5Z_BSHUF_ID] = BitShuffleFilter

function Base.push!(p::FilterPipeline, f::BitShuffleFilter)
    ref = Ref(f)
    GC.@preserve ref begin
        API.h5p_set_filter(p.plist, H5Z_BSHUF_ID, API.H5Z_FLAG_OPTIONAL, 2, pointer_from_objref(ref) + sizeof(Cuint)*3)
    end
    return p
end

Because the first three elements are not provided directly via h5p_set_filter, we also needed to implement a custom Base.push! into the FilterPipeline.

Filter Interface

The filter interface is used to describe filters and obtain information on them.

HDF5.Filters.FilterType
Filter

Abstract type to describe HDF5 Filters. See the Extended Help for information on implementing a new filter.

Extended Help

Filter interface

The Filter interface is implemented upon the Filter subtype.

See API.h5z_register for details.

Required Methods to Implement

  • filterid - registered filter ID
  • filter_func - implement the actual filter

Optional Methods to Implement

  • filtername - defaults to "Unnamed Filter"
  • encoder_present - defaults to true
  • decoder_present - defaults to true
  • can_apply_func - defaults to nothing
  • set_local_func - defaults to nothing

Advanced Methods to Implement

  • can_apply_cfunc - Defaults to wrapping @cfunction around the result of can_apply_func
  • set_local_cfunc - Defaults to wrapping @cfunction around the result of set_local_func
  • filter_cfunc - Defaults to wrapping @cfunction around the result of filter_func
  • register_filter - Defaults to using the above functions to register the filter

Implement the Advanced Methods to avoid @cfunction from generating a runtime closure which may not work on all systems.

source
HDF5.Filters.FilterPipelineType
FilterPipeline(plist::DatasetCreateProperties)

The filter pipeline associated with plist. Acts like a AbstractVector{Filter}, supporting the following operations:

  • length(pipeline): the number of filters.
  • pipeline[i] to return the ith filter.
  • pipeline[FilterType] to return a filter of type FilterType
  • push!(pipline, filter) to add an extra filter to the pipeline.
  • append!(pipeline, filters) to add multiple filters to the pipeline.
  • delete!(pipeline, FilterType) to remove a filter of type FilterType from the pipeline.
  • empty!(pipeline) to remove all filters from the pipeline.
source
HDF5.Filters.can_apply_funcFunction
can_apply_func(::Type{F}) where {F<:Filter}

Return a function indicating whether the filter can be applied or nothing if no function exists. The function signature is func(dcpl_id::API.hid_t, type_id::API.hid_t, space_id::API.hid_t). See API.h5z_register

source
HDF5.Filters.can_apply_cfuncFunction
can_apply_cfunc(::Type{F}) where {F<:Filter}

Return a C function pointer for the can apply function. By default, this will return the result of using @cfunction on the function specified by can_apply_func(F) or C_NULL if nothing.

Overriding this will allow @cfunction to return a Ptr{Nothing} rather than a CFunction` closure which may not work on all systems.

source
HDF5.Filters.set_local_funcFunction
set_local_func(::Type{F}) where {F<:Filter}

Return a function that sets dataset specific parameters or nothing if no function exists. The function signature is func(dcpl_id::API.hid_t, type_id::API.hid_t, space_id::API.hid_t). See API.h5z_register.

source
HDF5.Filters.set_local_cfuncFunction
set_local_cfunc(::Type{F}) where {F<:Filter}

Return a C function pointer for the set local function. By default, this will return the result of using @cfunction on the function specified by set_local_func(F) or C_NULL if nothing.

Overriding this will allow @cfunction to return a Ptr{Nothing} rather than a CFunction` closure which may not work on all systems.

source
HDF5.Filters.filter_cfuncFunction
filter_cfunc(::Type{F}) where {F<:Filter}

Return a C function pointer for the filter function. By default, this will return the result of using @cfunction on the function specified by filter_func(F) or will throw an error if nothing.

Overriding this will allow @cfunction to return a Ptr{Nothing} rather than a CFunction` closure which may not work on all systems.

source