-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/bugre #764 json serialize crash w bytes instead of str in pattern.py #768
Fix/bugre #764 json serialize crash w bytes instead of str in pattern.py #768
Conversation
In Python 3 all str are unicode, so no need for `item = item.encode('utf-8')`anymore. And json.dumps, needs ' ensure_ascii=False ' to print unicode chars correctly and not as \unnn or \xnnn notation. We should get rit of six completelly.
Remove "six" package and replacing its calls with the python3 equivalent as we no longer support Python 2, we can remove six usage also.
@bugre Thanks for investigating and following through with a PR! :) |
…_JSONSerialize_crash_w_bytes * 'develop' of github.com:bugre/mtools: Add VSCode config to .gitignore Bump dev version Update for 1.6.1 release Fix Flake8 complaints Fix rueckstiess#765: mloginfo --clients: more robust parsing of client metadata (rueckstiess#766) Fix rueckstiess#761: mtools should use python3 in shebangs Fix rueckstiess#698: Add rounding option for mloginfo --queries (rueckstiess#758) More specific match for checkpoint duration log line Fix rueckstiess#258: Add timezone to mloginfo summary
- Add simplification regex for key: [list,list] value patterns - Change test (__main()__) format, using a dict to store test pattern and expected result, easier to catch code changes that break result - Add fDebug and some debug prints of pattern processing.
Hi @stennie ,
Let me know what you think. |
It's a pattern with list of url '$all : [ "https://xxxxx.xx" ] ..' See comment at rueckstiess#764 (comment)
The improved regex using positive lookbehind to check for a " (quote) before ']' (closing list bracket) will correctly handle cases where a ']' is part of the value and also cases where list values are url's "nnn://aaa.bbb" will correctly be simplified to '1' Few debug print updated to print to stderr Adjusted some test cases, to include one with 'closing bracket' as part of value Added the correct "expected" simplified output for @niccottrell's use case with {..."$all" : [ "url1", "url2" ] ...}.
After running the 'fixed' code over a few days of logs, found two situations that got me the wrong results, because i wasn't correctly protecting the starting of the list with regex. So i've added a positive lookahead to asure that after the '[' i only get space and than MUST a " (quote) exist. Have also added a new test case. Hope that this does generate a better and more precise result. Fix #768 |
@bugre I think there is still some work to do on test cases for patterns (regex with escaped I added your debug output to Regards, |
Thanks @stennie. Nice solution with '--debug'. I've a few really busy weeks ahead, so it probably will take some time until i can get back to this code. But i'll have a look as soon i can. |
Description of changes
in
mtools/utils/pattern.py
a behavior change from py2 to py3 (string.encode('utf-u')) generates a byte string output and not a standart unicode string. And thus, JSON can't serialize this output later.As mtools 1.6.0 no longer is Python 2.7 compatible, and in Python3 all strings are unicode the part that generates the problem isn't needed anymore. Also used the opportunity to remove 'six' package (introduced on Python 2 to Python3 transition).
Also added
ensure_ascii=False
to json.dumps call so that unicode chars are printed as 'chars' and not as octa or hex char codes.Testing
i've added an additional test case in
mtools/utils/pattern.py
for this situationFixes #764