Question about decoding over-flexibility #10

rpeszek · 2020-05-27T02:17:56Z

I have been trying to use this package, I am also trying to bridge it to my typed-encoding project.
I have more of a question than an issue:

decodeXyz functions seem to be overly permissive. Here is an example of decoding an invalid ASCII ByteString:

>>> decodeStrictByteStringExplicit ASCII "\239"
Right "\239"

Here is a UTF8 example:

>>> decodeStrictByteStringExplicit UTF8 "\192\NUL"
Right "\NUL"
>>> encodeStrictByteStringExplicit UTF8 "\NUL"
"\NUL"

I imagine this is just part of the design.
Is there a way to check that encoded text is valid other than
a round trip decodeXyz followed by encodeXyz to see if the outcome is the same?

Thank you for your help!

swamp-agr · 2020-05-27T18:17:57Z

Hi @rpeszek,

Regarding particular Char and ASCII: I would suggest to use encodeable from Encoding typeclass:

>>>  encodeable ASCII '\239'
False

Regarding more complex examples, let me check further.

Thanks,
Andrey

rpeszek · 2020-05-29T03:44:57Z

Thanks for the reply
Unfortunately, I really only care about more complicated cases

rpeszek · 2020-05-31T03:11:52Z

Here is a longer list of curiosities:
For a number of 1-byte encodings the encoding of characters > \255 succeeds

Example:

>>> Encoding.encodeStringExplicit (Encoding.encodingFromString "cp1257") "\x100"
Right "\194"

You can succesfully encode and than fail decoding:

>>> Encoding.encodeStringExplicit EncCP932.CP932 "\DEL"
Right "\DEL"

>>> Encoding.decodeStringExplicit EncCP932.CP932 "\DEL"
Left (IllegalCharacter 127)

Issues like these will make it hard to reason about code.
I understand that this is very much legacy stuff, just wanted to share my findings, maybe the limitations could be documented... .

Most property tests I could think about are finding issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about decoding over-flexibility #10

Question about decoding over-flexibility #10

rpeszek commented May 27, 2020 •

edited

Loading

swamp-agr commented May 27, 2020

rpeszek commented May 29, 2020

rpeszek commented May 31, 2020

Question about decoding over-flexibility #10

Question about decoding over-flexibility #10

Comments

rpeszek commented May 27, 2020 • edited Loading

swamp-agr commented May 27, 2020

rpeszek commented May 29, 2020

rpeszek commented May 31, 2020

rpeszek commented May 27, 2020 •

edited

Loading