Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about decoding over-flexibility #10

Open
rpeszek opened this issue May 27, 2020 · 3 comments
Open

Question about decoding over-flexibility #10

rpeszek opened this issue May 27, 2020 · 3 comments

Comments

@rpeszek
Copy link

rpeszek commented May 27, 2020

I have been trying to use this package, I am also trying to bridge it to my typed-encoding project.
I have more of a question than an issue:

decodeXyz functions seem to be overly permissive. Here is an example of decoding an invalid ASCII ByteString:

>>> decodeStrictByteStringExplicit ASCII "\239"
Right "\239"

Here is a UTF8 example:

>>> decodeStrictByteStringExplicit UTF8 "\192\NUL"
Right "\NUL"
>>> encodeStrictByteStringExplicit UTF8 "\NUL"
"\NUL"

I imagine this is just part of the design.
Is there a way to check that encoded text is valid other than
a round trip decodeXyz followed by encodeXyz to see if the outcome is the same?

Thank you for your help!

@swamp-agr
Copy link
Collaborator

Hi @rpeszek,

Regarding particular Char and ASCII: I would suggest to use encodeable from Encoding typeclass:

>>>  encodeable ASCII '\239'
False

Regarding more complex examples, let me check further.

Thanks,
Andrey

@rpeszek
Copy link
Author

rpeszek commented May 29, 2020

Thanks for the reply
Unfortunately, I really only care about more complicated cases

@rpeszek
Copy link
Author

rpeszek commented May 31, 2020

Here is a longer list of curiosities:
For a number of 1-byte encodings the encoding of characters > \255 succeeds

Example:

>>> Encoding.encodeStringExplicit (Encoding.encodingFromString "cp1257") "\x100"
Right "\194"

You can succesfully encode and than fail decoding:

>>> Encoding.encodeStringExplicit EncCP932.CP932 "\DEL"
Right "\DEL"    
>>> Encoding.decodeStringExplicit EncCP932.CP932 "\DEL"
Left (IllegalCharacter 127)

Issues like these will make it hard to reason about code.
I understand that this is very much legacy stuff, just wanted to share my findings, maybe the limitations could be documented... .

Most property tests I could think about are finding issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants