Gotchas & Troubleshooting

Objects are cached during loading

JLD2 caches objects during loading. It may give you the same object twice. This can lead to surprising results if you edit loaded arrays. Note, the underlying file is not being edited!

julia> jldsave("demo.jld2", a=zeros(2))
julia> f = jldopen("demo.jld2")JLDFile /home/runner/work/JLD2.jl/JLD2.jl/docs/build/demo.jld2 (read-only) └─🔢 a
julia> a = f["a"] # bind loaded array to name `a`2-element Vector{Float64}: 0.0 0.0
julia> a[1] = 42; # editing the underlying array
julia> f["a"]2-element Vector{Float64}: 42.0 0.0
julia> a = nothing # remove all references to the loaded array
julia> GC.gc(true) # call GC to remove the cache
julia> f["a"] # a new copy is loaded from the file2-element Vector{Float64}: 0.0 0.0

Cross-compatibility

JLD2 tries to write files in a way that allows you to load them on different operating systems and in particular both on 32bit and 64bit systems. However, many julia structs may be inherently different on different architectures making this task impossible. In particular, moving data from a 64bit system to a 32bit system is only guaranteed to work for basic datatypes.

Security

Beware of opening JLD2 files from untrusted sources. A malicious file may execute code on your computer. See e.g. this project's issue #117. To check a file, you can use debug tooling provided by JLD2 to view what kinds of objects are stored. Details on the available tools are described below.

Viewing header messages

Following the HDF5 format specification, JLD2 stores metadata and all information required to interpret the stored data for each dataset in the form of so-called header messages. Each hdf5 group, dataset, and committed datatype consist of and object header followed by a variable number of header messages.

There exist different types of these to encode for the data type or the layout i.e. single element or array.

These can be printed for inspection using JLD2:

julia> jldsave("test.jld2";
           a = 42,
           b = [1,2,3,4,5],
           c = (1,2),
       )
julia> f = jldopen("test.jld2")JLDFile /home/runner/work/JLD2.jl/JLD2.jl/docs/build/test.jld2 (read-only) ├─🔢 a ├─🔢 b └─🔢 c
julia> JLD2.print_header_messages(f, "a")┌─ Header Message: HmFillValue │ ┌─ offset: RelOffset(55) │ │ size: 2 │ └─ flags: 0 │ version: 3 └─ flags: 9 ┌─ Header Message: HmDataspace │ ┌─ offset: RelOffset(61) │ │ size: 4 │ └─ flags: 0 │ version: 2 │ dimensionality: 0 │ flags: 0 │ dataspace_type: 0 │ dim_offset: RelOffset(69) └─ dimensions: () ┌─ Header Message: HmDatatype │ ┌─ offset: RelOffset(69) │ │ size: 12 │ └─ flags: 1 │ datatype_offset: RelOffset(73) └─ dt: JLD2.FixedPointDatatype(0x30, 0x08, 0x00, 0x00, 0x00000008, 0x0000, 0x0040) ┌─ Header Message: HmDataLayout │ ┌─ offset: RelOffset(85) │ │ size: 12 │ └─ flags: 0 │ version: 4 │ layout_class: LcCompact │ data_size: 8 │ data_address: RelOffset(93) └─ data: UInt8[0x2a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00] ┌─ Header Message: HmNil │ ┌─ offset: RelOffset(101) │ │ size: 16 │ └─ flags: 0 └─

Here we see, among other things, a

  • dataspace message which states that "a" is a single (scalar) element
  • datatype message
  • datalayout message of the compact type which means that the data is so small it was

directly stored as part of the message.

julia> JLD2.print_header_messages(f, "b")┌─ Header Message: HmFillValue
│ ┌─ offset:	RelOffset(132)
│ │  size:	2
│ └─ flags:	0
│    version:	3
└─   flags:	9
┌─ Header Message: HmDataspace
│ ┌─ offset:	RelOffset(138)
│ │  size:	12
│ └─ flags:	0
│    version:	2
│    dimensionality:	1
│    flags:	0
│    dataspace_type:	1
│    dim_offset:	RelOffset(146)
└─   dimensions:	(5,)
┌─ Header Message: HmDatatype
│ ┌─ offset:	RelOffset(154)
│ │  size:	12
│ └─ flags:	1
│    datatype_offset:	RelOffset(158)
└─   dt:	JLD2.FixedPointDatatype(0x30, 0x08, 0x00, 0x00, 0x00000008, 0x0000, 0x0040)
┌─ Header Message: HmDataLayout
│ ┌─ offset:	RelOffset(170)
│ │  size:	18
│ └─ flags:	0
│    version:	4
│    layout_class:	LcContiguous
│    data_address:	RelOffset(216)
└─   data_size:	40
┌─ Header Message: HmNil
│ ┌─ offset:	RelOffset(192)
│ │  size:	16
│ └─ flags:	0
└─

Important differences to "a" are that the dataspace now reports the dimensions of the array as (5,) and the the data layout has changed to contiguous which means that it is stored as a single block starting at the offset reported in data_address.

julia> JLD2.print_header_messages(f, "c")┌─ Header Message: HmFillValue
│ ┌─ offset:	RelOffset(4825)
│ │  size:	2
│ └─ flags:	0
│    version:	3
└─   flags:	9
┌─ Header Message: HmDataspace
│ ┌─ offset:	RelOffset(4831)
│ │  size:	4
│ └─ flags:	0
│    version:	2
│    dimensionality:	0
│    flags:	0
│    dataspace_type:	0
│    dim_offset:	RelOffset(4839)
└─   dimensions:	()
┌─ Header Message: HmDatatype
│ ┌─ offset:	RelOffset(4839)
│ │  size:	10
│ └─ flags:	3
│    version:	3
│    msgtype:	2
│    dt:	JLD2.SharedDatatype(RelOffset(4520))
└─   datatype_offset:	RelOffset(4520)
┌─ Header Message: HmDataLayout
│ ┌─ offset:	RelOffset(4853)
│ │  size:	20
│ └─ flags:	0
│    version:	4
│    layout_class:	LcCompact
│    data_size:	16
│    data_address:	RelOffset(4861)
└─   data:	UInt8[0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
┌─ Header Message: HmNil
│ ┌─ offset:	RelOffset(4877)
│ │  size:	16
│ └─ flags:	0
└─

For dataset c we see that the datatype is a shared datatype which is stored elsewhere in the file and is referenced by its offset. This is, of course, also a regular hdf5 object and we can print its header messages by supplying the offset:

julia> JLD2.print_header_messages(f, JLD2.RelOffset(4520))┌─ Header Message: HmDatatype
│ ┌─ offset:	RelOffset(4527)
│ │  size:	38
│ └─ flags:	64
│    datatype_offset:	RelOffset(4531)
└─   dt:	JLD2.CompoundDatatype(0x00000010, [Symbol("1"), Symbol("2")], [0, 8], JLD2.H5Datatype[JLD2.FixedPointDatatype(0x30, 0x08, 0x00, 0x00, 0x00000008, 0x0000, 0x0040), JLD2.FixedPointDatatype(0x30, 0x08, 0x00, 0x00, 0x00000008, 0x0000, 0x0040)])
┌─ Header Message: HmAttribute
│ ┌─ offset:	RelOffset(4569)
│ │  size:	65
│ └─ flags:	0
│    version:	2
│    flags:	1
│    name_size:	11
│    datatype_size:	10
│    dataspace_size:	4
│    name:	julia_type
│    vshared:	3
│    sharedtype:	2
│    datatype:	JLD2.SharedDatatype(RelOffset(256))
│    dataspace_offset:	RelOffset(4602)
│    dataspace_message:	UInt8[0x02, 0x00, 0x00, 0x00]
└─   data_offset:	RelOffset(4606)
    data: "Tuple{Int64, Int64}"

This object consists of just two messages:

  • The datatype message defines the hdf5 datatype and therefore describes the byte layout

and field types.

  • The attribute message has the name julia_type and as payload the julia DataType

signature Tuple{Int64, Int64} which is needed for reconstruction.