Registering a new format

You register a new format by adding

add_format(fmt, magic, extension, libraries...)

to FileIO's registry. It's generally best if you experiment with this locally and make sure everything works before submitting a pull request. You'll need to pkg> dev FileIO to make the required changes.

Before going into detail explaining the arguments of add_format, here is a real example that could be used to register an I/O package for one of the Netpbm image formats:

add_format(format"PPMBinary", "P6", ".ppm", [:Netpbm => UUID("f09324ee-3d7c-5217-9330-fc30815ba969")])

Briefly, this indicates that files in this format typically have extension .ppm, the file contents typically start with "P6" (the byte sequence [0x50, 0x36]), and these files can be read and written by the Netpbm package. (The UUID is Julia's unique identifier for this registered package and can be obtained from the Project.toml file.)

Argument fmt

fmt is a DataFormat type, most conveniently created as format"IDENTIFIER". If this file format has not previously been supported, you can make up IDENTIFIER yourself–there is no external standard, this is just a "tag" used internally by FileIO and its support routines. You should generally choose something that makes it easy to guess what format it refers to. Examples of some existing fmts are:

Argument magic

magic typically contains the magic bytes that identify the format. While file format can sometimes be guessed from the extension (e.g., "pic.png" would likely be a PNG image file), fundamentally the name of the file is something that can be changed by the user, so it may have no relationship to the content of the file. Moreover, there are many examples in which two or more different formats use the same extension, leading to ambiguity about the nature of the file. Is a .gbv file a Genie Timeline file or a PCB CAD file? Is that .fst file an audio file, a puzzle game file, or an R serialized dataframe file?

To identify the file uniquely, good format designers will include "magic bytes" as part of the content of the file to ensure that one can recognize or validate the format of the file. Typically, these magic bytes are the first bytes in the file, although there are many exceptions.

Warning

Formats that use common extensions (e.g., .out) and lack magic bytes cannot be registered with FileIO– permitting this would force us to choose one particular format above all others. In such cases, your package should provide its own I/O without using FileIO. To avoid name conflicts with FileIO, it may be best to avoid exporting names like load and save from your package; use module-qualifiers like MyPkg.load instead.

Some formats have multiple "flavors" of magic bytes (which might, for example, include a "format version" number); in such cases magic can be a list of byte sequences. In other cases, files cannot be identified by a specific set of bytes, but there's a pattern that can be exploited: magic can be a function that returns true or false depending on whether an I/O stream is consistent with the format.

Examples of magic bytes include:

  • GIF image files can have magic bytes corresponding to the ASCII characters in "GIF87a", i.e., [0x47, 0x49, 0x46, 0x38, 0x37, 0x61]. Alternatively, they might use "GIF89a", which signals a different version of the GIF format.
  • PLY mesh files can come in two flavors, ASCII and binary. Their magic bytes are "ply\nformat ascii 1.0" and "ply\nformat binary_little_endian 1.0", respectively. These magic bytes are human-readable and span the first two lines of the file.
  • BedGraph genomic data files do not have official magic bytes, but they do have a structure which can be fairly reliably recognized by a suitable detection function. (Though it would have made life far more straightforward if the creators of the format had just added some magic bytes!)

Argument extension

This can be a string or list of strings. Each should start with '.'.

Example: the Nearly Raw Raster Data format uses [".nrrd",".nhdr"].

Argument libraries

This argument specifies the package or packages that can support input and/or output for the format. Each package specification should be of the form name => uuid, where name is the name of the package (encoded as a Symbol, e.g., :FeatherFiles) and uuid is the UUID from the package's Project.toml. The first-to-be listed package has highest priority; FileIO will try to use it to perform the requested operation, and move onto the next only if that fails. Failure might occur because the user does not have the package installed, or because the package's handler threw an error.

Some packages may only support specific forms of I/O, and can use LOAD and SAVE as specifiers for supported operations. Likewise, some packages rely on system libraries available only on certain platforms, and can include a platform specifier.

If your package isn't (yet) registered, you can alternatively specify the handler as the module itself. In such cases, your call to add_format will likely be made from within your module or at the Julia REPL rather than in FileIO's registry. An exception is MimeWriter, a sub-module of FileIO that can write a few MIME formats.

Here's a real-world example (contained in FileIO's src/registry.jl) for PNG:

add_format(
    format"PNG",
    UInt8[0x89,0x50,0x4e,0x47,0x0d,0x0a,0x1a,0x0a],
    ".png",
    [idImageIO],
    [idQuartzImageIO, OSX],
    [idImageMagick],
    [MimeWriter, SAVE]
)

idImageIO, idQuartzImageIO, and idImageMagic are name => uuid pairs for three different packages. QuartzImageIO is available only on macOS (OSX). The MimeWriter module (which is internally accessible to FileIO) only supports output (SAVE), not input.

Examples

For further examples, users are encouraged to inspect FileIO's registry directly.