TranscodingStreams.jl
TranscodingStreams.jl is a package for transcoding (e.g. compression) data streams. This package exports a type TranscodingStream, which is a subtype of IO and supports various I/O operations like other usual I/O streams in the standard library.
Introduction
TranscodingStream has two type parameters, C<:Codec and S<:IO, and hence the actual type should be written as TranscodingStream{C<:Codec,S<:IO}. This type wraps an underlying I/O stream S by a codec C. The codec defines transformation (or transcoding) of the stream. For example, when C is a lossless decompression type and S is a file, TranscodingStream{C,S} behaves like a data stream that incrementally decompresses data from the file.
Codecs are defined in other packages listed below:
| Package | Library | Format | Codec | Stream | Description |
|---|---|---|---|---|---|
| CodecZlib.jl | zlib | RFC1952 | GzipCompression |
GzipCompressionStream |
Compress data in gzip (.gz) format. |
GzipDecompression |
GzipDecompressionStream |
Decompress data in gzip (.gz) format. | |||
| RFC1950 | ZlibCompression |
ZlibCompressionStream |
Compress data in zlib format. | ||
ZlibDecompression |
ZlibDecompressionStream |
Decompress data in zlib format. | |||
| RFC1951 | DeflateCompression |
DeflateCompressionStream |
Compress data in deflate format. | ||
DeflateDecompression |
DeflateDecompressionStream |
Decompress data in deflate format. | |||
| CodecBzip2.jl | bzip2 | Bzip2Compression |
Bzip2CompressionStream |
Compress data in bzip2 (.bz2) format. | |
Bzip2Decompression |
Bzip2DecompressionStream |
Decompress data in bzip2 (.bz2) format. | |||
| CodecXz.jl | xz | The .xz File Format | XzCompression |
XzCompressionStream |
Compress data in xz (.xz) format. |
XzDecompression |
XzDecompressionStream |
Decompress data in xz (.xz) format. | |||
| CodecZstd.jl | zstd | Zstandard Compression Format | ZstdCompression |
ZstdCompressionStream |
Compress data in zstd (.zst) format. |
ZstdDecompression |
ZstdDecompressionStream |
Decompress data in zstd (.zst) format. |
Install packages you need by calling Pkg.add(<package name>) in a Julia session. For example, if you want to read gzip-compressed files, call Pkg.add("CodecZlib") to use GzipDecompression or GzipDecompressionStream. By convention, codec types have a name that matches .*(Co|Deco)mpression and I/O types have a codec name with Stream suffix. All codecs are a subtype TranscodingStreams.Codec and streams are a subtype of Base.IO. An important thing is these packages depend on TranscodingStreams.jl and not vice versa. This means you can install any codec package you need without installing all codec packages. Also, if you want to define your own codec, it is totally feasible like these packages. TranscodingStreams.jl requests a codec to implement some interface functions which will be described later.
Examples
Read lines from a gzip-compressed file
The following snippet is an example of using CodecZlib.jl, which exports GzipDecompressionStream{S} as an alias of TranscodingStream{GzipDecompression,S} where S<:IO:
using CodecZlib
stream = GzipDecompressionStream(open("data.txt.gz"))
for line in eachline(stream)
# do something...
end
close(stream)Note that the last close call will close the file as well. Alternatively, open(<stream type>, <filepath>) do ... end syntax will close the file at the end:
using CodecZlib
open(GzipDecompressionStream, "data.txt.gz") do stream
for line in eachline(stream)
# do something...
end
endSave a data matrix with Zstd compression
Writing compressed data is easy. One thing you need to keep in mind is to call close after writing data; otherwise, the output file will be incomplete:
using CodecZstd
mat = randn(100, 100)
stream = ZstdCompressionStream(open("data.mat.zst", "w"))
writedlm(stream, mat)
close(stream)Of course, open(<stream type>, ...) do ... end works well:
using CodecZstd
mat = randn(100, 100)
open(ZstdCompressionStream, "data.mat.zst", "w") do stream
writedlm(stream, mat)
endExplicitly finish transcoding by writing TOKEN_END
When writing data, the end of a data stream is indicated by calling close, which may write an epilogue if necessary and flush all buffered data to the underlying I/O stream. If you want to explicitly specify the end position of a stream for some reason, you can write TranscodingStreams.TOKEN_END to the transcoding stream as follows:
using CodecZstd
using TranscodingStreams
buf = IOBuffer()
stream = ZstdCompressionStream(buf)
write(stream, "foobarbaz"^100, TranscodingStreams.TOKEN_END)
flush(stream)
compressed = take!(buf)
close(stream)Use an identity (no-op) codec
Sometimes, the Identity codec, which does nothing, may be useful. The following example creates a decompression stream based on the extension of a filepath:
using CodecZlib
using CodecBzip2
using TranscodingStreams
using TranscodingStreams.CodecIdentity
function makestream(filepath)
if endswith(filepath, ".gz")
codec = GzipDecompression()
elseif endswith(filepath, ".bz2")
codec = Bzip2Decompression()
else
codec = Identity()
end
return TranscodingStream(codec, open(filepath))
end
makestream("data.txt.gz")
makestream("data.txt.bz2")
makestream("data.txt")Transcode data in one shot
TranscodingStreams.jl extends the transcode function to transcode a data in one shot. transcode takes a codec object as its first argument and a data vector as its second argument:
using CodecZlib
decompressed = transcode(ZlibDecompression(), b"x\x9cKL*JLNLI\x04R\x00\x19\xf2\x04U")
String(decompressed)API
TranscodingStreams.TranscodingStream — Method.TranscodingStream(codec::Codec, stream::IO; bufsize::Integer=16384)Create a transcoding stream with codec and stream.
Examples
julia> using TranscodingStreams
julia> using CodecZlib
julia> file = open(Pkg.dir("TranscodingStreams", "test", "abra.gzip"));
julia> stream = TranscodingStream(GzipDecompression(), file)
TranscodingStreams.TranscodingStream{CodecZlib.GzipDecompression,IOStream}(<state=idle>)
julia> readstring(stream)
"abracadabra"
Base.transcode — Method.transcode(codec::Codec, data::Vector{UInt8})::Vector{UInt8}Transcode data by applying codec.
Examples
julia> using CodecZlib
julia> data = Vector{UInt8}("abracadabra");
julia> compressed = transcode(ZlibCompression(), data);
julia> decompressed = transcode(ZlibDecompression(), compressed);
julia> String(decompressed)
"abracadabra"
TranscodingStreams.TOKEN_END — Constant.A special token indicating the end of data.
TOKEN_END may be written to a transcoding stream like write(stream, TOKEN_END), which will terminate the current transcoding block.
Call flush(stream) after write(stream, TOKEN_END) to make sure that all data are written to the underlying stream.
Identity()Create an identity (no-op) codec.
IdentityStream(stream::IO)Create an identity (no-op) stream.
Defining a new codec
TranscodingStreams.Codec — Type.An abstract codec type.
Any codec supporting transcoding interfaces must be a subtype of this type.
TranscodingStreams.initialize — Function.initialize(codec::Codec)::VoidInitialize codec.
TranscodingStreams.finalize — Function.finalize(codec::Codec)::VoidFinalize codec.
TranscodingStreams.startproc — Function.startproc(codec::Codec, state::Symbol)::SymbolStart data processing with codec of state.
TranscodingStreams.process — Function.process(codec::Codec, input::Memory, output::Memory)::Tuple{Int,Int,Symbol}Do data processing with codec.