MultifileArrays

MultifileArrays implements "lazy concatenation" of file data. The primary function, load_series, will load data from disk on-demand and store "slices" in a temporary buffer. This allows you treat a series of files as if they are a large contiguous array.

Further examples are described in the API section, but a simple demo using a directory dir with a bunch of PNG files might be

julia> using MultifileArrays, FileIO

julia> img = load_series(load, "myimage_*.png"; dir)

Performance tips

While MultifileArrays is convenient, there are some performance caveats to keep in mind:

  • to reduce the number of times that a file needs to be (re)loaded from disk, iteration over the resulting array is best done in a manner consistent with the file-by-file slicing.
  • operations than can be performed "slice at a time" (e.g., visualization with ImageView) are even more optimized than scalar (single-element) indexing, as the latter must check whether the supplied slice-index corresponds to the currently loaded file upon each access.

For uncompressed data, alternative approaches that exploit memory-mapping may yield better performance. The StackViews package allows you to "glue" such arrays together.

API

MultifileArrays.MultifileArraysModule

MultifileArrays creates lazily-loaded multidimensional arrays from files. Here are the main functions:

  • load_chunked: Load an array from chunks stored in files in filenames.
  • load_series: Create a lazily-loaded array A from a set of files.
  • select_series: Create a vector of filenames from filepattern.
source
MultifileArrays.load_chunkedFunction
A = load_chunked(lazyloader, filenames)

Load an array from chunks stored in files in filenames. filenames must be shaped so that it is "extended" along the dimension of concatenation.

When each chunk has the same size and is equivalent to a single slice of the final array, load_series may yield better performance.

Examples

Suppose you have 2 files, myimage_1.tiff and myimage_2.tiff, with the first storing 1000 two-dimensional images and the second storing 555 images of the same shape. Then you can load a contiguous 3d array with

julia> julia> filenames = reshape(["myimage_1.tiff", "myimage_2.tiff"], (1, 1, 2))
1×1×2 Array{String, 3}:
[:, :, 1] =
 "myimage_1.tiff"

[:, :, 2] =
 "myimage_2.tiff"

julia> img = load_chunked(fn -> load(fn; mmap=true), filenames);

julia> size(img)
(512, 512, 1555)

In the TiffImages package, mmap=true allows you to "virtually" load the data by memory-mapping, supporting arrays much larger than computer memory.

Note

load_chunked requires that you manually load the BlockArrays package.

source
MultifileArrays.load_seriesMethod
A = load_series(f, filepattern; dir=pwd())

Create a lazily-loaded array A from a set of files. f(filename) should create an array from the filename, and filepattern is a pattern matching the names of the desired files. The file names should have a numeric portion that indicates ordering; ordering is numeric rather than alphabetical, so left-padding with zeros is optional. See select_series for details about the pattern-matching.

Examples

Suppose you are currently in a directory with files image01.tiff ... image12.tiff. Then either

julia> using FileIO, MultifileArrays

julia> img = load_series(load, "image*.tiff")

or the more precise regular-expression form

julia> img = load_series(load, r"image(\d+).tiff");

suffice to load the image files.

source
MultifileArrays.load_seriesMethod
A = load_series(f, filenames::AbstractArray{<:AbstractString}, buffer::AbstractArray)

Create a lazily-loaded array A from a set of files. f is a function to load the data from a specific file into an array equivalent to buffer, meaning that

f(buffer, filename)

should fill buffer with the contents of filename.

filenames should be an array of file names with shape equivalent to the trailing dimensions of A, i.e., those that follow the dimensions of buffer.

The advantage of this syntax is that it provides greater control than load_series(f, filepattern) over the choice of files and the shape of the overall output.

Note

StackViews provides an alternative approach that may yield better performance if you can either load all the files into memory at once or use lazy mmap-based loading.

Examples

Suppose you are currently in a directory with files image_z=1_t=1.tiff through image_z=5_t=30.tiff, where each file corresponds to a 2d (x, y) slice and the filename indicates the z and t coordinates. You could reshape filenames into matrix form

5×30 Matrix{String}:
 "image_z=1_t=1.tiff"  "image_z=1_t=2.tiff"  "image_z=1_t=3.tiff"  …  "image_z=1_t=29.tiff"  "image_z=1_t=30.tiff"
 "image_z=2_t=1.tiff"  "image_z=2_t=2.tiff"  "image_z=2_t=3.tiff"     "image_z=2_t=29.tiff"  "image_z=2_t=30.tiff"
 "image_z=3_t=1.tiff"  "image_z=3_t=2.tiff"  "image_z=3_t=3.tiff"     "image_z=3_t=29.tiff"  "image_z=3_t=30.tiff"
 "image_z=4_t=1.tiff"  "image_z=4_t=2.tiff"  "image_z=4_t=3.tiff"     "image_z=4_t=29.tiff"  "image_z=4_t=30.tiff"
 "image_z=5_t=1.tiff"  "image_z=5_t=2.tiff"  "image_z=5_t=3.tiff"     "image_z=5_t=29.tiff"  "image_z=5_t=30.tiff"

and then

julia> buf = load(first(filenames));

julia> img = load_series(load!, filenames, buf)

would create a 4-dimensional output. load! would ideally load directly into its first argument, but could be defined as

load!(dest, filename) = copyto!(dest, load(filename))

if needed.

source
MultifileArrays.select_seriesMethod
filenames = select_series(filepattern; dir=pwd())

Create a vector of filenames from filepattern. filepattern may be a string containing a * character or a regular expression capturing a digit-substring. The */capture extracts an integer that determines file order.

When dir contains no extraneous files, and the filenames are ordered alphabetically in the desired sequence, then readdir is a simpler alternative. select_series may be useful for cases that don't satisfy both of these conditions.

Examples

Suppose you have a directory with myimage_1.png, myimage_2.png, ..., myimage_12.png. Then

julia> select_series("myimage_*.png")
12-element Vector{String}:
 "myimage_1.png"
 "myimage_2.png"
 ⋮
 "myimage_12.png"
Note

The myimage_ part of the string is essential: the * must match only integer data. The "generic wildcard" meaning of * is implemented in Glob.

source