Registering a new format
You register a new format by adding
add_format(fmt, magic, extension, libraries...)
to FileIO's registry. It's generally best if you experiment with this locally and make sure everything works before submitting a pull request. You'll need to
pkg> dev FileIO to make the required changes.
Before going into detail explaining the arguments of
add_format, here is a real example that could be used to register an I/O package for one of the Netpbm image formats:
add_format(format"PPMBinary", "P6", ".ppm", [:Netpbm => UUID("f09324ee-3d7c-5217-9330-fc30815ba969")]
Briefly, this indicates that files in this format typically have extension
.ppm, the file contents typically start with "P6" (the byte sequence
[0x50, 0x36]), and these files can be read and written by the Netpbm package. (The
UUID is Julia's unique identifier for this registered package and can be obtained from the
fmt is a
DataFormat type, most conveniently created as
format"IDENTIFIER". If this file format has not previously been supported, you can make up
IDENTIFIER yourself–there is no external standard, this is just a "tag" used internally by FileIO and its support routines. You should generally choose something that makes it easy to guess what format it refers to. Examples of some existing
format"PNG": the format for png image files
format"HDF5": the format for hierachical data files v5
magic typically contains the magic bytes that identify the format. While file format can sometimes be guessed from the extension (e.g., "pic.png" would likely be a PNG image file), fundamentally the name of the file is something that can be changed by the user, so it may have no relationship to the content of the file. Moreover, there are many examples in which two or more different formats use the same extension, leading to ambiguity about the nature of the file. Is a
.gbv file a Genie Timeline file or a PCB CAD file? Is that
.fst file an audio file, a puzzle game file, or an R serialized dataframe file?
To identify the file uniquely, good format designers will include "magic bytes" as part of the content of the file to ensure that one can recognize or validate the format of the file. Typically, these magic bytes are the first bytes in the file, although there are many exceptions.
Formats that use common extensions (e.g.,
.out) and lack magic bytes cannot be registered with FileIO– permitting this would force us to choose one particular format above all others. In such cases, your package should provide its own I/O without using FileIO. To avoid name conflicts with FileIO, it may be best to avoid exporting names like
save from your package; use module-qualifiers like
Some formats have multiple "flavors" of magic bytes (which might, for example, include a "format version" number); in such cases
magic can be a list of byte sequences. In other cases, files cannot be identified by a specific set of bytes, but there's a pattern that can be exploited:
magic can be a function that returns
false depending on whether an I/O stream is consistent with the format.
Examples of magic bytes include:
- GIF image files can have magic bytes corresponding to the ASCII characters in
[0x47, 0x49, 0x46, 0x38, 0x37, 0x61]. Alternatively, they might use
"GIF89a", which signals a different version of the GIF format.
- PLY mesh files can come in two flavors, ASCII and binary. Their magic bytes are
"ply\nformat ascii 1.0"and
"ply\nformat binary_little_endian 1.0", respectively. These magic bytes are human-readable and span the first two lines of the file.
- BedGraph genomic data files do not have official magic bytes, but they do have a structure which can be fairly reliably recognized by a suitable detection function. (Though it would have made life far more straightforward if the creators of the format had just added some magic bytes!)
This can be a string or list of strings. Each should start with
Example: the Nearly Raw Raster Data format uses
This argument specifies the package or packages that can support input and/or output for the format. Each package specification should be of the form
name => uuid, where
name is the name of the package (encoded as a
uuid is the
UUID from the package's
Project.toml. The first-to-be listed package has highest priority; FileIO will try to use it to perform the requested operation, and move onto the next only if that fails. Failure might occur because the user does not have the package installed, or because the package's handler threw an error.
Some packages may only support specific forms of I/O, and can use
SAVE as specifiers for supported operations. Likewise, some packages rely on system libraries available only on certain platforms, and can include a platform specifier.
If your package isn't (yet) registered, you can alternatively specify the handler as the module itself. In such cases, your call to
add_format will likely be made from within your module or at the Julia REPL rather than in FileIO's registry. An exception is
MimeWriter, a sub-module of
FileIO that can write a few MIME formats.
Here's a real-world example (contained in FileIO's
src/registry.jl) for PNG:
add_format( format"PNG", UInt8[0x89,0x50,0x4e,0x47,0x0d,0x0a,0x1a,0x0a], ".png", [idImageIO], [idQuartzImageIO, OSX], [idImageMagick], [MimeWriter, SAVE] )
name => uuid pairs for three different packages. QuartzImageIO is available only on macOS (
MimeWriter module (which is internally accessible to FileIO) only supports output (
SAVE), not input.
For further examples, users are encouraged to inspect FileIO's registry directly.