Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix codec overriding #210

Open
komuw opened this issue Jun 7, 2020 · 0 comments
Open

fix codec overriding #210

komuw opened this issue Jun 7, 2020 · 0 comments

Comments

@komuw
Copy link
Owner

komuw commented Jun 7, 2020

see:

naz/tests/test_codec.py

Lines 177 to 232 in 8b1ec3d

@skip(
"""
TODO:fix this. It does not work.
Note: Encodings are first looked up in the registry's cache.
thus if you call `register_codecs` and then call it again with different
codecs, the second codecs may not take effect.
ie; codecs.lookup(encoding) will return the first codecs since they were stored
in the cache.
There doesn't appear to be away to clear codec cache at runtime.
see: https://docs.python.org/3/library/codecs.html#codecs.lookup
This test passes when called in isolation but it fails when all tests are ran together.
This is because, when all tests are ran together; `naz.codec.register_codecs` will already have
been called and registered the inbuilt codecs. So when this test runs,it tries to override
an already registered codec. And then it calls `codecs.lookup()`. However,
lookup will return the codec that is in the cache which is the inbuilt codecs instead of the ones
we just tried to register in this test.
We should look if to find a way to clear the codec cache at runtime.
There's a C api for that `_PyCodec_Forget`
https://sourcegraph.com/github.com/python/cpython@3.8/-/blob/Python/codecs.c#L193
We need to figure out how to call it or find other alternatives.
I have sent an email to Marc-Andre Lemburg asking for advice.
"""
)
def test_codec_overriding(self):
"""
tests that users can be able to override an inbuilt codec
with their own implementation.
"""
class OverridingCodec(codecs.Codec):
# All the methods have to be staticmethods because they are passed to `codecs.CodecInfo`
@staticmethod
def encode(input, errors="strict"):
return codecs.utf_8_encode(input, errors)
@staticmethod
def decode(input, errors="strict"):
return codecs.utf_8_decode(input, errors)
custom_codecs = {
"gsm0338": codecs.CodecInfo(
name="gsm0338", encode=OverridingCodec.encode, decode=OverridingCodec.decode,
),
}
# register, this will override inbuilt `gsm0338` codec with a custom one.
naz.codec.register_codecs(custom_codecs)
new_codec = codecs.lookup("gsm0338")
self.assertNotEqual(new_codec.encode, naz.codec.GSM7BitCodec.encode)
self.assertNotEqual(new_codec.decode, naz.codec.GSM7BitCodec.decode)
self.assertEqual(new_codec.encode, OverridingCodec.encode)
self.assertEqual(new_codec.decode, OverridingCodec.decode)

I had asked Marc-Andre Lemburg(author of codec module in python) whether you can override an inbuilt codec, and his reply:

It's not really supported to override builtin codecs via a search
function. The only way is to monkey patch the encodings package
module implementing the codec and then only if you manage to
implement this patching before the codec gets used for the first
time, since the codec subsystem uses a cache for codecs to improve
performance.

Note that the codec search function registry is mainly meant to
*add* new codecs, not to override existing ones.

This was via email.

We should re-architect based on this new info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant