Reference
TranscodingStream
TranscodingStreams.TranscodingStream
— MethodTranscodingStream(codec::Codec, stream::IO;
bufsize::Integer=16384,
stop_on_end::Bool=false,
sharedbuf::Bool=(stream isa TranscodingStream))
Create a transcoding stream with codec
and stream
.
A TranscodingStream
object wraps an input/output stream object stream
, and transcodes the byte stream using codec
. It is a subtype of IO
and supports most of the I/O functions in the standard library.
See the docs (https://bicycle1885.github.io/TranscodingStreams.jl/stable/) for available codecs, examples, and more details of the type.
Arguments
codec
: The data transcoder. The transcoding stream does the initialization and finalization ofcodec
. Therefore, a codec object is not reusable once it is passed to a transcoding stream.stream
: The wrapped stream. It must be opened before passed to the constructor.bufsize
: The initial buffer size (the default size is 16KiB). The buffer may be extended whenevercodec
requests so.stop_on_end
: The flag to stop reading on:end
return code fromcodec
. The transcoded data are readable even after stopping transcoding process. With this flag on,stream
is not closed when the wrapper stream is closed withclose
. Note that if reading some extra data may be read fromstream
into an internal buffer, and thusstream
must be aTranscodingStream
object andsharedbuf
must betrue
to reusestream
.sharedbuf
: The flag to share buffers between adjacent transcoding streams. The value must befalse
ifstream
is not aTranscodingStream
object.
Examples
julia> using TranscodingStreams
julia> file = open(joinpath(dirname(dirname(pathof(TranscodingStreams))), "README.md"));
julia> stream = TranscodingStream(Noop(), file);
julia> readline(file)
"TranscodingStreams.jl"
julia> close(stream)
Base.transcode
— Functiontranscode(
::Type{C},
data::Union{Vector{UInt8},Base.CodeUnits{UInt8}},
)::Vector{UInt8} where {C<:Codec}
Transcode data
by applying a codec C()
.
Note that this method does allocation and deallocation of C()
in every call, which is handy but less efficient when transcoding a number of objects. transcode(codec, data)
is a recommended method in terms of performance.
Examples
julia> using CodecZlib
julia> data = b"abracadabra";
julia> compressed = transcode(ZlibCompressor, data);
julia> decompressed = transcode(ZlibDecompressor, compressed);
julia> String(decompressed)
"abracadabra"
transcode(
codec::Codec,
data::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer},
[output::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer}],
)::Vector{UInt8}
Transcode data
by applying codec
.
If output
is unspecified, then this method will allocate it.
Note that this method does not initialize or finalize codec
. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize
and TranscodingStreams.finalize
explicitly.
Examples
julia> using CodecZlib
julia> data = b"abracadabra";
julia> codec = ZlibCompressor();
julia> TranscodingStreams.initialize(codec)
julia> compressed = Vector{UInt8}()
julia> transcode(codec, data, compressed);
julia> TranscodingStreams.finalize(codec)
julia> codec = ZlibDecompressor();
julia> TranscodingStreams.initialize(codec)
julia> decompressed = transcode(codec, compressed);
julia> TranscodingStreams.finalize(codec)
julia> String(decompressed)
"abracadabra"
TranscodingStreams.unsafe_transcode!
— Functionunsafe_transcode!(output::Buffer, codec::Codec, input::Buffer)
Transcode input
by applying codec
and storing the results in output
without validation of input or output. Note that this method does not initialize or finalize codec
. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize
and TranscodingStreams.finalize
explicitly.
TranscodingStreams.transcode!
— Functiontranscode!(output::Buffer, codec::Codec, input::Buffer)
Transcode input
by applying codec
and storing the results in output
with validation of input and output. Note that this method does not initialize or finalize codec
. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize
and TranscodingStreams.finalize
explicitly.
TranscodingStreams.TOKEN_END
— ConstantA special token indicating the end of data.
TOKEN_END
may be written to a transcoding stream like write(stream, TOKEN_END)
, which will terminate the current transcoding block.
Call flush(stream)
after write(stream, TOKEN_END)
to make sure that all data are written to the underlying stream.
TranscodingStreams.unsafe_read
— Functionunsafe_read(input::IO, output::Ptr{UInt8}, nbytes::Int)::Int
Copy at most nbytes
from input
into output
.
This function is similar to Base.unsafe_read
but is different in some points:
- It does not throw
EOFError
when it fails to readnbytes
frominput
. - It returns the number of bytes written to
output
. - It does not block if there are buffered data in
input
.
TranscodingStreams.unread
— Functionunread(stream::TranscodingStream, data::AbstractVector{UInt8})
Insert data
to the current reading position of stream
.
The next read(stream, sizeof(data))
call will read data that are just inserted.
data
must not alias any internal buffers in stream
TranscodingStreams.unsafe_unread
— Functionunsafe_unread(stream::TranscodingStream, data::Ptr, nbytes::Integer)
Insert nbytes
pointed by data
to the current reading position of stream
.
The data are copied into the internal buffer and hence data
can be safely used after the operation without interfering the stream.
data
must not alias any internal buffers in stream
Base.position
— Methodposition(stream::TranscodingStream)
Return the number of bytes read from or written to stream
.
Note that the returned value will be different from that of the underlying stream wrapped by stream
. This is because stream
buffers some data and the codec may change the length of data.
Base.skip
— Functionskip(stream::TranscodingStream, offset)
Read bytes from stream
until offset
bytes have been read or eof(stream)
is reached.
Return stream
, discarding read bytes.
This function will not throw an EOFError
if eof(stream)
is reached before offset
bytes can be read.
Statistics
TranscodingStreams.Stats
— TypeI/O statistics.
Its object has four fields:
in
: the number of bytes supplied into the streamout
: the number of bytes consumed out of the streamtranscoded_in
: the number of bytes transcoded from the input buffertranscoded_out
: the number of bytes transcoded to the output buffer
Note that, since the transcoding stream does buffering, in
is transcoded_in + {size of buffered data}
and out
is transcoded_out - {size of buffered data}
.
TranscodingStreams.stats
— Functionstats(stream::TranscodingStream)
Create an I/O statistics object of stream
.
Codec
TranscodingStreams.Noop
— TypeNoop()
Create a noop codec.
Noop (no operation) is a codec that does nothing. The data read from or written to the stream are kept as-is without any modification. This is often useful as a buffered stream or an identity element of a composition of streams.
The implementations are specialized for this codec. For example, a Noop
stream uses only one buffer rather than a pair of buffers, which avoids copying data between two buffers and the throughput will be larger than a naive implementation.
TranscodingStreams.NoopStream
— TypeNoopStream(stream::IO)
Create a noop stream.
Base.position
— Methodposition(stream::NoopStream)
Get the current poition of stream
.
Note that this method may return a wrong position when
- some data have been inserted by
TranscodingStreams.unread
, or - the position of the wrapped stream has been changed outside of this package.
TranscodingStreams.Codec
— TypeAn abstract codec type.
Any codec supporting the transcoding protocol must be a subtype of this type.
Transcoding protocol
Transcoding proceeds by calling some functions in a specific way. We call this "transcoding protocol" and any codec must implement it as described below.
There are six functions for a codec to implement:
expectedsize
: return the expected size of transcoded datapledgeinsize
: tell the codec the total input sizeminoutsize
: return the minimum output size ofprocess
initialize
: initialize the codecfinalize
: finalize the codecstartproc
: start processing with the codecprocess
: process data with the codec.
These are defined in the TranscodingStreams
and a new codec type must extend these methods if necessary. Implementing a process
method is mandatory but others are optional. expectedsize
, minoutsize
, pledgeinsize
, initialize
, finalize
, and startproc
have a default implementation.
Your codec type is denoted by C
and its object by codec
.
Errors that occur in these methods are supposed to be unrecoverable and the stream will go to the panic mode. Only Base.isopen
and Base.close
are available in that mode.
expectedsize
The expectedsize(codec::C, input::Memory)::Int
method takes codec
and input
, and returns the expected size of transcoded data. This method will be used as a hint to determine the size of a data buffer when transcode
is called. A good hint will reduce the number of buffer resizing and hence result in better performance.
pledgeinsize
The pledgeinsize(codec::C, insize::Int64, error::Error)::Symbol
method is used when transcode
is called to tell the codec
the total input size. This is called after startproc
and before process
. Some compressors can add this total input size to a header, making expectedsize
accurate during later decompression. By default this just returns :ok
. If there is an error, the return code must be :error
and the error
argument must be set to an exception object. Setting an inaccurate insize
may cause the codec to error later on while processing data. A negative insize
means unknown content size.
minoutsize
The minoutsize(codec::C, input::Memory)::Int
method takes codec
and input
, and returns the minimum required size of the output memory when process
is called. For example, an encoder of base64 will write at least four bytes to the output and hence it is reasonable to return 4 with this method.
initialize
The initialize(codec::C)::Void
method takes codec
and returns nothing
. This is called once and only once before starting any data processing. Therefore, you may initialize codec
(e.g. allocating memory needed to process data) with this method. If initialization fails for some reason, it may throw an exception and no other methods (including finalize
) will be called. Therefore, you need to release the memory before throwing an exception.
finalize
The finalize(codec::C)::Void
method takes codec
and returns nothing
. This is called once and only only once just before the transcoding stream goes to the close mode (i.e. when Base.close
is called) or just after startproc
or process
throws an exception. Other errors that happen inside the stream (e.g. EOFError
) will not call this method. Therefore, you may finalize codec
(e.g. freeing memory) with this method. If finalization fails for some reason, it may throw an exception. You should release the allocated memory in codec before returning or throwing an exception in finalize
because otherwise nobody cannot release the memory. Even when an exception is thrown while finalizing a stream, the stream will become the close mode for safety.
startproc
The startproc(codec::C, mode::Symbol, error::Error)::Symbol
method takes codec
, mode
, and error
, and returns a status code. This resets the state of the codec and is called before the stream starts processing data. After a call to startproc
, pledgeinsize
can be optionally called. mode
is either :read
or :write
. The return code must be :ok
if codec
is ready to process data. Otherwise, it must be :error
and the error
argument must be set to an exception object.
process
The process(codec::C, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol}
method takes codec
, input
, output
and error
, and returns a consumed data size, a produced data size and a status code. This is called repeatedly while processing data. The input (input
) and output (output
) data are a Memory
object, which is a pointer to a contiguous memory region with size. You must read input data from input
, transcode the bytes, and then write the output data to output
. Finally you need to return the size of read data, the size of written data, and :ok
status code so that the caller can know how many bytes are consumed and produced in the method. When transcoding reaches the end of a data stream, it is notified to this method by empty input. In that case, the method need to write the buffered data (if any) to output
. If there is no data to write, the status code must be set to :end
. The process
method will be called repeatedly until it returns :end
status code. If an error happens while processing data, the error
argument must be set to an exception object and the return code must be :error
.
TranscodingStreams.expectedsize
— Functionexpectedsize(codec::Codec, input::Memory)::Int
Return the expected size of the transcoded input
with codec
.
The default method returns input.size
.
TranscodingStreams.pledgeinsize
— Functionpledgeinsize(codec::Codec, insize::Int64, error::Error)::Symbol
Tell the codec the total input size.
The default method does nothing and returns :ok
.
TranscodingStreams.minoutsize
— Functionminoutsize(codec::Codec, input::Memory)::Int
Return the minimum output size to be ensured when calling process
.
The default method returns max(1, div(input.size, 4))
.
TranscodingStreams.initialize
— Functioninitialize(codec::Codec)::Void
Initialize codec
.
The default method does nothing.
TranscodingStreams.finalize
— Functionfinalize(codec::Codec)::Void
Finalize codec
.
The default method does nothing.
TranscodingStreams.startproc
— Functionstartproc(codec::Codec, mode::Symbol, error::Error)::Symbol
Start data processing with codec
of mode
.
The default method does nothing and returns :ok
.
TranscodingStreams.process
— Functionprocess(codec::Codec, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol}
Do data processing with codec
.
There is no default method.
Internal types
TranscodingStreams.Memory
— TypeA contiguous memory.
This type works like a Vector
method.
TranscodingStreams.Error
— TypeContainer of transcoding error.
An object of this type is used to notify the caller of an exception that happened inside a transcoding method. The error
field is undefined at first but will be filled when data processing failed. The error should be set by calling the setindex!
method (e.g. error[] = ErrorException("error!")
).
TranscodingStreams.State
— TypeA mutable state type of transcoding streams.
See Developer's notes for details.