Converter
Added in version 2.24.
- class Converter(*args, **kwargs)
Implementations: CharsetConverter
, ZlibCompressor
, ZlibDecompressor
GConverter
is an interface for streaming conversions.
GConverter
is implemented by objects that convert
binary data in various ways. The conversion can be
stateful and may fail at any place.
Some example conversions are: character set conversion, compression, decompression and regular expression replace.
Methods
- class Converter
- convert(inbuf: Sequence[int], outbuf: Sequence[int], flags: ConverterFlags) tuple[ConverterResult, int, int]
This is the main operation used when converting data. It is to be called multiple times in a loop, and each time it will do some work, i.e. producing some output (in
outbuf
) or consuming some input (frominbuf
) or both. If its not possible to do any work an error is returned.Note that a single call may not consume all input (or any input at all). Also a call may produce output even if given no input, due to state stored in the converter producing output.
If any data was either produced or consumed, and then an error happens, then only the successful conversion is reported and the error is returned on the next call.
A full conversion loop involves calling this method repeatedly, each time giving it new input and space output space. When there is no more input data after the data in
inbuf
, the flagINPUT_AT_END
must be set. The loop will be (unless some error happens) returningCONVERTED
each time until all data is consumed and all output is produced, thenFINISHED
is returned instead. Note, thatFINISHED
may be returned even ifINPUT_AT_END
is not set, for instance in a decompression converter where the end of data is detectable from the data (and there might even be other data after the end of the compressed data).When some data has successfully been converted
bytes_read
and is set to the number of bytes read frominbuf
, andbytes_written
is set to indicate how many bytes was written tooutbuf
. If there are more data to output or consume (i.e. unless theINPUT_AT_END
is specified) thenCONVERTED
is returned, and if no more data is to be output thenFINISHED
is returned.On error
ERROR
is returned anderror
is set accordingly. Some errors need special handling:NO_SPACE
is returned if there is not enough space to write the resulting converted data, the application should call the function again with a largeroutbuf
to continue.PARTIAL_INPUT
is returned if there is not enough input to fully determine what the conversion should produce, and theINPUT_AT_END
flag is not set. This happens for example with an incomplete multibyte sequence when converting text, or when a regexp matches up to the end of the input (and may match further input). It may also happen wheninbuf_size
is zero and there is no more data to produce.When this happens the application should read more input and then call the function again. If further input shows that there is no more data call the function again with the same data but with the
INPUT_AT_END
flag set. This may cause the conversion to finish as e.g. in the regexp match case (or, to fail again withPARTIAL_INPUT
in e.g. a charset conversion where the input is actually partial).After
convert()
has returnedFINISHED
the converter object is in an invalid state where its not allowed to callconvert()
anymore. At this time you can only free the object or callreset()
to reset it to the initial state.If the flag
FLUSH
is set then conversion is modified to try to write out all internal state to the output. The application has to call the function multiple times with the flag set, and when the available input has been consumed and all internal state has been produced thenFLUSHED
(orFINISHED
if really at the end) is returned instead ofCONVERTED
. This is somewhat similar to what happens at the end of the input stream, but done in the middle of the data.This has different meanings for different conversions. For instance in a compression converter it would mean that we flush all the compression state into output such that if you uncompress the compressed data you get back all the input data. Doing this may make the final file larger due to padding though. Another example is a regexp conversion, where if you at the end of the flushed data have a match, but there is also a potential longer match. In the non-flushed case we would ask for more input, but when flushing we treat this as the end of input and do the match.
Flushing is not always possible (like if a charset converter flushes at a partial multibyte sequence). Converters are supposed to try to produce as much output as possible and then return an error (typically
PARTIAL_INPUT
).Added in version 2.24.
- Parameters:
inbuf – the buffer containing the data to convert.
outbuf – a buffer to write converted data in.
flags – a
ConverterFlags
controlling the conversion details
Virtual Methods
- class Converter
- do_convert(inbuf: Sequence[int] | None, outbuf: Sequence[int], flags: ConverterFlags) tuple[ConverterResult, int, int]
This is the main operation used when converting data. It is to be called multiple times in a loop, and each time it will do some work, i.e. producing some output (in
outbuf
) or consuming some input (frominbuf
) or both. If its not possible to do any work an error is returned.Note that a single call may not consume all input (or any input at all). Also a call may produce output even if given no input, due to state stored in the converter producing output.
If any data was either produced or consumed, and then an error happens, then only the successful conversion is reported and the error is returned on the next call.
A full conversion loop involves calling this method repeatedly, each time giving it new input and space output space. When there is no more input data after the data in
inbuf
, the flagINPUT_AT_END
must be set. The loop will be (unless some error happens) returningCONVERTED
each time until all data is consumed and all output is produced, thenFINISHED
is returned instead. Note, thatFINISHED
may be returned even ifINPUT_AT_END
is not set, for instance in a decompression converter where the end of data is detectable from the data (and there might even be other data after the end of the compressed data).When some data has successfully been converted
bytes_read
and is set to the number of bytes read frominbuf
, andbytes_written
is set to indicate how many bytes was written tooutbuf
. If there are more data to output or consume (i.e. unless theINPUT_AT_END
is specified) thenCONVERTED
is returned, and if no more data is to be output thenFINISHED
is returned.On error
ERROR
is returned anderror
is set accordingly. Some errors need special handling:NO_SPACE
is returned if there is not enough space to write the resulting converted data, the application should call the function again with a largeroutbuf
to continue.PARTIAL_INPUT
is returned if there is not enough input to fully determine what the conversion should produce, and theINPUT_AT_END
flag is not set. This happens for example with an incomplete multibyte sequence when converting text, or when a regexp matches up to the end of the input (and may match further input). It may also happen wheninbuf_size
is zero and there is no more data to produce.When this happens the application should read more input and then call the function again. If further input shows that there is no more data call the function again with the same data but with the
INPUT_AT_END
flag set. This may cause the conversion to finish as e.g. in the regexp match case (or, to fail again withPARTIAL_INPUT
in e.g. a charset conversion where the input is actually partial).After
convert()
has returnedFINISHED
the converter object is in an invalid state where its not allowed to callconvert()
anymore. At this time you can only free the object or callreset()
to reset it to the initial state.If the flag
FLUSH
is set then conversion is modified to try to write out all internal state to the output. The application has to call the function multiple times with the flag set, and when the available input has been consumed and all internal state has been produced thenFLUSHED
(orFINISHED
if really at the end) is returned instead ofCONVERTED
. This is somewhat similar to what happens at the end of the input stream, but done in the middle of the data.This has different meanings for different conversions. For instance in a compression converter it would mean that we flush all the compression state into output such that if you uncompress the compressed data you get back all the input data. Doing this may make the final file larger due to padding though. Another example is a regexp conversion, where if you at the end of the flushed data have a match, but there is also a potential longer match. In the non-flushed case we would ask for more input, but when flushing we treat this as the end of input and do the match.
Flushing is not always possible (like if a charset converter flushes at a partial multibyte sequence). Converters are supposed to try to produce as much output as possible and then return an error (typically
PARTIAL_INPUT
).Added in version 2.24.
- Parameters:
inbuf – the buffer containing the data to convert.
outbuf – a buffer to write converted data in.
flags – a
ConverterFlags
controlling the conversion details