Reference

TranscodingStream

TranscodingStreams.TranscodingStreamMethod
TranscodingStream(codec::Codec, stream::IO;
                  bufsize::Integer=16384,
                  stop_on_end::Bool=false,
                  sharedbuf::Bool=(stream isa TranscodingStream))

Create a transcoding stream with codec and stream.

A TranscodingStream object wraps an input/output stream object stream, and transcodes the byte stream using codec. It is a subtype of IO and supports most of the I/O functions in the standard library.

See the docs (https://bicycle1885.github.io/TranscodingStreams.jl/stable/) for available codecs, examples, and more details of the type.

Arguments

  • codec: The data transcoder. The transcoding stream does the initialization and finalization of codec. Therefore, a codec object is not reusable once it is passed to a transcoding stream.
  • stream: The wrapped stream. It must be opened before passed to the constructor.
  • bufsize: The initial buffer size (the default size is 16KiB). The buffer may be extended whenever codec requests so.
  • stop_on_end: The flag to stop reading on :end return code from codec. The transcoded data are readable even after stopping transcoding process. With this flag on, stream is not closed when the wrapper stream is closed with close. Note that if reading some extra data may be read from stream into an internal buffer, and thus stream must be a TranscodingStream object and sharedbuf must be true to reuse stream.
  • sharedbuf: The flag to share buffers between adjacent transcoding streams. The value must be false if stream is not a TranscodingStream object.

Examples

julia> using TranscodingStreams

julia> file = open(joinpath(dirname(dirname(pathof(TranscodingStreams))), "README.md"));

julia> stream = TranscodingStream(Noop(), file);

julia> readline(file)
"TranscodingStreams.jl"

julia> close(stream)
source
Base.transcodeFunction
transcode(
    ::Type{C},
    data::Union{Vector{UInt8},Base.CodeUnits{UInt8}},
)::Vector{UInt8} where {C<:Codec}

Transcode data by applying a codec C().

Note that this method does allocation and deallocation of C() in every call, which is handy but less efficient when transcoding a number of objects. transcode(codec, data) is a recommended method in terms of performance.

Examples

julia> using CodecZlib

julia> data = b"abracadabra";

julia> compressed = transcode(ZlibCompressor, data);

julia> decompressed = transcode(ZlibDecompressor, compressed);

julia> String(decompressed)
"abracadabra"
source
transcode(
    codec::Codec,
    data::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer},
    [output::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer}],
)::Vector{UInt8}

Transcode data by applying codec.

If output is unspecified, then this method will allocate it.

Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.

Examples

julia> using CodecZlib

julia> data = b"abracadabra";

julia> codec = ZlibCompressor();

julia> TranscodingStreams.initialize(codec)

julia> compressed = Vector{UInt8}()

julia> transcode(codec, data, compressed);

julia> TranscodingStreams.finalize(codec)

julia> codec = ZlibDecompressor();

julia> TranscodingStreams.initialize(codec)

julia> decompressed = transcode(codec, compressed);

julia> TranscodingStreams.finalize(codec)

julia> String(decompressed)
"abracadabra"
source
TranscodingStreams.TOKEN_ENDConstant

A special token indicating the end of data.

TOKEN_END may be written to a transcoding stream like write(stream, TOKEN_END), which will terminate the current transcoding block.

Note

Call flush(stream) after write(stream, TOKEN_END) to make sure that all data are written to the underlying stream.

source
TranscodingStreams.unsafe_readFunction
unsafe_read(input::IO, output::Ptr{UInt8}, nbytes::Int)::Int

Copy at most nbytes from input into output.

This function is similar to Base.unsafe_read but is different in some points:

  • It does not throw EOFError when it fails to read nbytes from input.
  • It returns the number of bytes written to output.
  • It does not block if there are buffered data in input.
source
TranscodingStreams.unreadFunction
unread(stream::TranscodingStream, data::AbstractVector{UInt8})

Insert data to the current reading position of stream.

The next read(stream, sizeof(data)) call will read data that are just inserted.

data must not alias any internal buffers in stream

source
TranscodingStreams.unsafe_unreadFunction
unsafe_unread(stream::TranscodingStream, data::Ptr, nbytes::Integer)

Insert nbytes pointed by data to the current reading position of stream.

The data are copied into the internal buffer and hence data can be safely used after the operation without interfering the stream.

data must not alias any internal buffers in stream

source
Base.positionMethod
position(stream::TranscodingStream)

Return the number of bytes read from or written to stream.

Note that the returned value will be different from that of the underlying stream wrapped by stream. This is because stream buffers some data and the codec may change the length of data.

source
Base.skipFunction
skip(stream::TranscodingStream, offset)

Read bytes from stream until offset bytes have been read or eof(stream) is reached.

Return stream, discarding read bytes.

This function will not throw an EOFError if eof(stream) is reached before offset bytes can be read.

source

Statistics

TranscodingStreams.StatsType

I/O statistics.

Its object has four fields:

  • in: the number of bytes supplied into the stream
  • out: the number of bytes consumed out of the stream
  • transcoded_in: the number of bytes transcoded from the input buffer
  • transcoded_out: the number of bytes transcoded to the output buffer

Note that, since the transcoding stream does buffering, in is transcoded_in + {size of buffered data} and out is transcoded_out - {size of buffered data}.

source

Codec

TranscodingStreams.NoopType
Noop()

Create a noop codec.

Noop (no operation) is a codec that does nothing. The data read from or written to the stream are kept as-is without any modification. This is often useful as a buffered stream or an identity element of a composition of streams.

The implementations are specialized for this codec. For example, a Noop stream uses only one buffer rather than a pair of buffers, which avoids copying data between two buffers and the throughput will be larger than a naive implementation.

source
Base.positionMethod
position(stream::NoopStream)

Get the current poition of stream.

Note that this method may return a wrong position when

  • some data have been inserted by TranscodingStreams.unread, or
  • the position of the wrapped stream has been changed outside of this package.
source
TranscodingStreams.CodecType

An abstract codec type.

Any codec supporting the transcoding protocol must be a subtype of this type.

Transcoding protocol

Transcoding proceeds by calling some functions in a specific way. We call this "transcoding protocol" and any codec must implement it as described below.

There are six functions for a codec to implement:

  • expectedsize: return the expected size of transcoded data
  • pledgeinsize: tell the codec the total input size
  • minoutsize: return the minimum output size of process
  • initialize: initialize the codec
  • finalize: finalize the codec
  • startproc: start processing with the codec
  • process: process data with the codec.

These are defined in the TranscodingStreams and a new codec type must extend these methods if necessary. Implementing a process method is mandatory but others are optional. expectedsize, minoutsize, pledgeinsize, initialize, finalize, and startproc have a default implementation.

Your codec type is denoted by C and its object by codec.

Errors that occur in these methods are supposed to be unrecoverable and the stream will go to the panic mode. Only Base.isopen and Base.close are available in that mode.

expectedsize

The expectedsize(codec::C, input::Memory)::Int method takes codec and input, and returns the expected size of transcoded data. This method will be used as a hint to determine the size of a data buffer when transcode is called. A good hint will reduce the number of buffer resizing and hence result in better performance.

pledgeinsize

The pledgeinsize(codec::C, insize::Int64, error::Error)::Symbol method is used when transcode is called to tell the codec the total input size. This is called after startproc and before process. Some compressors can add this total input size to a header, making expectedsize accurate during later decompression. By default this just returns :ok. If there is an error, the return code must be :error and the error argument must be set to an exception object. Setting an inaccurate insize may cause the codec to error later on while processing data. A negative insize means unknown content size.

minoutsize

The minoutsize(codec::C, input::Memory)::Int method takes codec and input, and returns the minimum required size of the output memory when process is called. For example, an encoder of base64 will write at least four bytes to the output and hence it is reasonable to return 4 with this method.

initialize

The initialize(codec::C)::Void method takes codec and returns nothing. This is called once and only once before starting any data processing. Therefore, you may initialize codec (e.g. allocating memory needed to process data) with this method. If initialization fails for some reason, it may throw an exception and no other methods (including finalize) will be called. Therefore, you need to release the memory before throwing an exception.

finalize

The finalize(codec::C)::Void method takes codec and returns nothing. This is called once and only only once just before the transcoding stream goes to the close mode (i.e. when Base.close is called) or just after startproc or process throws an exception. Other errors that happen inside the stream (e.g. EOFError) will not call this method. Therefore, you may finalize codec (e.g. freeing memory) with this method. If finalization fails for some reason, it may throw an exception. You should release the allocated memory in codec before returning or throwing an exception in finalize because otherwise nobody cannot release the memory. Even when an exception is thrown while finalizing a stream, the stream will become the close mode for safety.

startproc

The startproc(codec::C, mode::Symbol, error::Error)::Symbol method takes codec, mode, and error, and returns a status code. This resets the state of the codec and is called before the stream starts processing data. After a call to startproc, pledgeinsize can be optionally called. mode is either :read or :write. The return code must be :ok if codec is ready to process data. Otherwise, it must be :error and the error argument must be set to an exception object.

process

The process(codec::C, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol} method takes codec, input, output and error, and returns a consumed data size, a produced data size and a status code. This is called repeatedly while processing data. The input (input) and output (output) data are a Memory object, which is a pointer to a contiguous memory region with size. You must read input data from input, transcode the bytes, and then write the output data to output. Finally you need to return the size of read data, the size of written data, and :ok status code so that the caller can know how many bytes are consumed and produced in the method. When transcoding reaches the end of a data stream, it is notified to this method by empty input. In that case, the method need to write the buffered data (if any) to output. If there is no data to write, the status code must be set to :end. The process method will be called repeatedly until it returns :end status code. If an error happens while processing data, the error argument must be set to an exception object and the return code must be :error.

source
TranscodingStreams.expectedsizeFunction
expectedsize(codec::Codec, input::Memory)::Int

Return the expected size of the transcoded input with codec.

The default method returns input.size.

source
TranscodingStreams.pledgeinsizeFunction
pledgeinsize(codec::Codec, insize::Int64, error::Error)::Symbol

Tell the codec the total input size.

The default method does nothing and returns :ok.

source
TranscodingStreams.minoutsizeFunction
minoutsize(codec::Codec, input::Memory)::Int

Return the minimum output size to be ensured when calling process.

The default method returns max(1, div(input.size, 4)).

source
TranscodingStreams.startprocFunction
startproc(codec::Codec, mode::Symbol, error::Error)::Symbol

Start data processing with codec of mode.

The default method does nothing and returns :ok.

source
TranscodingStreams.processFunction
process(codec::Codec, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol}

Do data processing with codec.

There is no default method.

source

Internal types

TranscodingStreams.ErrorType

Container of transcoding error.

An object of this type is used to notify the caller of an exception that happened inside a transcoding method. The error field is undefined at first but will be filled when data processing failed. The error should be set by calling the setindex! method (e.g. error[] = ErrorException("error!")).

source