TranscodingStreams.jl
TranscodingStreams.jl is a package for transcoding (e.g. compression) data streams. This package exports a type TranscodingStream
, which is a subtype of IO
and supports various I/O operations like other usual I/O streams in the standard library.
Introduction
TranscodingStream
has two type parameters, C<:Codec
and S<:IO
, and hence the actual type should be written as TranscodingStream{C<:Codec,S<:IO}
. This type wraps an underlying I/O stream S
by a codec C
. The codec defines transformation (or transcoding) of the stream. For example, when C
is a lossless decompression type and S
is a file, TranscodingStream{C,S}
behaves like a data stream that incrementally decompresses data from the file.
Codecs are defined in other packages listed below:
Package | Library | Format | Codec | Stream | Description |
---|---|---|---|---|---|
CodecZlib.jl | zlib | RFC1952 | GzipCompression |
GzipCompressionStream |
Compress data in gzip (.gz) format. |
GzipDecompression |
GzipDecompressionStream |
Decompress data in gzip (.gz) format. | |||
RFC1950 | ZlibCompression |
ZlibCompressionStream |
Compress data in zlib format. | ||
ZlibDecompression |
ZlibDecompressionStream |
Decompress data in zlib format. | |||
RFC1951 | DeflateCompression |
DeflateCompressionStream |
Compress data in deflate format. | ||
DeflateDecompression |
DeflateDecompressionStream |
Decompress data in deflate format. | |||
CodecBzip2.jl | bzip2 | Bzip2Compression |
Bzip2CompressionStream |
Compress data in bzip2 (.bz2) format. | |
Bzip2Decompression |
Bzip2DecompressionStream |
Decompress data in bzip2 (.bz2) format. | |||
CodecXz.jl | xz | The .xz File Format | XzCompression |
XzCompressionStream |
Compress data in xz (.xz) format. |
XzDecompression |
XzDecompressionStream |
Decompress data in xz (.xz) format. | |||
CodecZstd.jl | zstd | Zstandard Compression Format | ZstdCompression |
ZstdCompressionStream |
Compress data in zstd (.zst) format. |
ZstdDecompression |
ZstdDecompressionStream |
Decompress data in zstd (.zst) format. |
Install packages you need by calling Pkg.add(<package name>)
in a Julia session. For example, if you want to read gzip-compressed files, call Pkg.add("CodecZlib")
to use GzipDecompression
or GzipDecompressionStream
. By convention, codec types have a name that matches .*(Co|Deco)mpression
and I/O types have a codec name with Stream
suffix. All codecs are a subtype TranscodingStreams.Codec
and streams are a subtype of Base.IO
. An important thing is these packages depend on TranscodingStreams.jl and not vice versa. This means you can install any codec package you need without installing all codec packages. Also, if you want to define your own codec, it is totally feasible like these packages. TranscodingStreams.jl requests a codec to implement some interface functions which will be described later.
Examples
Read lines from a gzip-compressed file
The following snippet is an example of using CodecZlib.jl, which exports GzipDecompressionStream{S}
as an alias of TranscodingStream{GzipDecompression,S} where S<:IO
:
using CodecZlib
stream = GzipDecompressionStream(open("data.txt.gz"))
for line in eachline(stream)
# do something...
end
close(stream)
Note that the last close
call will close the file as well. Alternatively, open(<stream type>, <filepath>) do ... end
syntax will close the file at the end:
using CodecZlib
open(GzipDecompressionStream, "data.txt.gz") do stream
for line in eachline(stream)
# do something...
end
end
Save a data matrix with Zstd compression
Writing compressed data is easy. One thing you need to keep in mind is to call close
after writing data; otherwise, the output file will be incomplete:
using CodecZstd
mat = randn(100, 100)
stream = ZstdCompressionStream(open("data.mat.zst", "w"))
writedlm(stream, mat)
close(stream)
Of course, open(<stream type>, ...) do ... end
works well:
using CodecZstd
mat = randn(100, 100)
open(ZstdCompressionStream, "data.mat.zst", "w") do stream
writedlm(stream, mat)
end
Explicitly finish transcoding by writing TOKEN_END
When writing data, the end of a data stream is indicated by calling close
, which may write an epilogue if necessary and flush all buffered data to the underlying I/O stream. If you want to explicitly specify the end position of a stream for some reason, you can write TranscodingStreams.TOKEN_END
to the transcoding stream as follows:
using CodecZstd
using TranscodingStreams
buf = IOBuffer()
stream = ZstdCompressionStream(buf)
write(stream, "foobarbaz"^100, TranscodingStreams.TOKEN_END)
flush(stream)
compressed = take!(buf)
close(stream)
Use an identity (no-op) codec
Sometimes, the Identity
codec, which does nothing, may be useful. The following example creates a decompression stream based on the extension of a filepath:
using CodecZlib
using CodecBzip2
using TranscodingStreams
using TranscodingStreams.CodecIdentity
function makestream(filepath)
if endswith(filepath, ".gz")
codec = GzipDecompression()
elseif endswith(filepath, ".bz2")
codec = Bzip2Decompression()
else
codec = Identity()
end
return TranscodingStream(codec, open(filepath))
end
makestream("data.txt.gz")
makestream("data.txt.bz2")
makestream("data.txt")
Transcode data in one shot
TranscodingStreams.jl extends the transcode
function to transcode a data in one shot. transcode
takes a codec object as its first argument and a data vector as its second argument:
using CodecZlib
decompressed = transcode(ZlibDecompression(), b"x\x9cKL*JLNLI\x04R\x00\x19\xf2\x04U")
String(decompressed)
API
TranscodingStreams.TranscodingStream
— Method.TranscodingStream(codec::Codec, stream::IO; bufsize::Integer=16384)
Create a transcoding stream with codec
and stream
.
Examples
julia> using TranscodingStreams
julia> using CodecZlib
julia> file = open(Pkg.dir("TranscodingStreams", "test", "abra.gzip"));
julia> stream = TranscodingStream(GzipDecompression(), file)
TranscodingStreams.TranscodingStream{CodecZlib.GzipDecompression,IOStream}(<state=idle>)
julia> readstring(stream)
"abracadabra"
Base.transcode
— Method.transcode(codec::Codec, data::Vector{UInt8})::Vector{UInt8}
Transcode data
by applying codec
.
Examples
julia> using CodecZlib
julia> data = Vector{UInt8}("abracadabra");
julia> compressed = transcode(ZlibCompression(), data);
julia> decompressed = transcode(ZlibDecompression(), compressed);
julia> String(decompressed)
"abracadabra"
TranscodingStreams.TOKEN_END
— Constant.A special token indicating the end of data.
TOKEN_END
may be written to a transcoding stream like write(stream, TOKEN_END)
, which will terminate the current transcoding block.
Call flush(stream)
after write(stream, TOKEN_END)
to make sure that all data are written to the underlying stream.
Identity()
Create an identity (no-op) codec.
IdentityStream(stream::IO)
Create an identity (no-op) stream.
Defining a new codec
TranscodingStreams.Codec
— Type.An abstract codec type.
Any codec supporting transcoding interfaces must be a subtype of this type.
TranscodingStreams.initialize
— Function.initialize(codec::Codec)::Void
Initialize codec
.
TranscodingStreams.finalize
— Function.finalize(codec::Codec)::Void
Finalize codec
.
TranscodingStreams.startproc
— Function.startproc(codec::Codec, state::Symbol)::Symbol
Start data processing with codec
of state
.
TranscodingStreams.process
— Function.process(codec::Codec, input::Memory, output::Memory)::Tuple{Int,Int,Symbol}
Do data processing with codec
.