Kerchunk.DeltaFilter
Kerchunk.FixedScaleOffsetFilter
Kerchunk.Fletcher32Filter
Kerchunk.QuantizeFilter
Kerchunk.ReferenceStore
Kerchunk._get_file_bytes
Kerchunk.add_scale_offset_filter_and_set_mask!
Kerchunk.apply_templates
Kerchunk.do_correction!
Kerchunk.materialize
Kerchunk.move_compressor_from_filters!
Kerchunk.readbytes
Kerchunk.resolve_uri
DeltaFilter(; DecodingType, [EncodingType = DecodingType])
Delta-based compression for Zarr arrays. (Delta encoding is Julia diff
, decoding is Julia cumsum
).
FixedScaleOffsetFilter{T,TENC}(scale, offset)
A compressor that scales and offsets the data.
Note
The geographic CF standards define scale/offset decoding as x * scale + offset
, but this filter defines it as x / scale + offset
. Constructing a FixedScaleOffsetFilter
from CF data means FixedScaleOffsetFilter(1/cf_scale_factor, cf_add_offset)
.
Fletcher32Filter()
A compressor that uses the Fletcher32 checksum algorithm to compress and uncompress data.
Note that this goes from UInt8 to UInt8, and is effectively only checking the checksum and cropping the last 4 bytes of the data during decoding.
QuantizeFilter(; digits, DecodingType, [EncodingType = DecodingType])
Quantization based compression for Zarr arrays.
ReferenceStore(filename_or_dict) <: Zarr.AbstractStore
A ReferenceStore
is a "fake filesystem" encoded by some key-value store dictionary, either held in memory, or read from a JSON file in the Kerchunk format.
Generally, you will only need to construct this if you have an in-memory Dict or other representation, or if you want to explicitly modify the store before constructing a ZGroup, which eagerly loads metadata.
Extended help
Implementation
The reference store has several fields:
mapper
: The actual key-value store that file information (string of base64 bytes
,[single uri]
,[uri, byte_offset, byte_length]
) is stored in. The type here is parametrized so this may be mutable if in memory, or immutable, e.g a JSON3.Object.zmetadata
: The toplevel Zarr metadata, sometimes stored separately.templates
: Key-value store for template expansion, if URLs need to be compressed.cache
: Key-value store for explicitly downloaded or otherwise modified keys.
_get_file_bytes(store::ReferenceStore, reference)
By hook or by crook, this function will return the bytes for the given reference. The reference could be a base64 encoded binary string, a path to a file, or a subrange of a file.
add_scale_offset_filter_and_set_mask!(zarray::Dict, zattrs::Dict)
Adapts the CF metadata convention of scale/offset, valid_range, _FillValue, and _Unsigned by modifying the Zarr metadata to add:
An additional reinterpretation filter is added to the filter stack if
_Unsigned=true
. This allows the values to be interpreted as UInts instead of Ints, which removes the sign error that would otherwise plague your dataset.A
FixedScaleOffset
filter replacesscale_factor
andadd_offset
.valid_range
and_FillValue
are mutated based on the scale factor and added offset, and the native Zarrfill_value
is replaced by the mutated and read_FillValue
.
apply_templates(store::ReferenceStore, source::String)
This function applies the templates stored in store
to the source string, and returns the resolved string.
It uses Mustache.jl under the hood, but all {
{
template
}
}
values are set to not URI-encode characters.
do_correction!(f!, store, path)
Applies f!
on the parsed .zarray
and .zattrs
files about the array at path path
in the Zarr store store
. These corrections mutate the files .zarray
and .zmetadata
, and attempt to save them to the store.
Available corrections are add_scale_offset_filter_and_set_mask!
and move_compressor_from_filters!
.
TODOs:
- Make this work for consolidated metadata (check for the presence of a .zmetadata key)?
Usage
st, = Zarr.storefromstring("reference://catalog.json")
Kerchunk.do_correction!(Kerchunk.add_scale_offset_filter_and_set_mask!, st, "SomeVariable")
zopen(st)
materialize(path, store::ReferenceStore)
Materialize a Zarr directory from a Kerchunk catalog. This actually downloads and writes the files to the given path, and you can open that with any Zarr reader.
move_compressor_from_filters!(zarray, zattrs)
Checks if the last entry of zarray["filters"]
is actually a compressor, and if there is no other compressor moves it from the filter array to the zarray["compressor"]
field.
This is a common issue with Kerchunk metadata, since it seems numcodecs doesn't distinguish between compressors and filters. This function will not be needed for Zarr v3 datasets, since the compressors and filters are all codecs in that schema.
readbytes(path, start::Integer, stop::Integer)::Vector{UInt8}
Read bytes from a file at a given range.
resolve_uri(store::ReferenceStore, source::String)
This function resolves a string which may or may not have templating to a URI.