Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/bugre #764 json serialize crash w bytes instead of str in pattern.py #768

Merged

Conversation

bugre
Copy link
Contributor

@bugre bugre commented Jan 2, 2020

Description of changes

in mtools/utils/pattern.py a behavior change from py2 to py3 (string.encode('utf-u')) generates a byte string output and not a standart unicode string. And thus, JSON can't serialize this output later.

As mtools 1.6.0 no longer is Python 2.7 compatible, and in Python3 all strings are unicode the part that generates the problem isn't needed anymore. Also used the opportunity to remove 'six' package (introduced on Python 2 to Python3 transition).

Also added ensure_ascii=False to json.dumps call so that unicode chars are printed as 'chars' and not as octa or hex char codes.

 return json.dumps(doc, sort_keys=True, separators=(', ', ': '), ensure_ascii=False)

Testing

i've added an additional test case in mtools/utils/pattern.py for this situation


    ##
    ## 20191231 - bugre - issue#764 - adding some more test cases.. based on our mongodb logs (mongod 4.0.3)
    ##
    s = (r'{_id: ObjectId(\'528556616dde23324f233168\'), curList: [ "€", "XYZ", "Krown"], allowedSnacks: 1000 }')
    print(json2pattern(s))

Fixes #764

bugre added 3 commits January 2, 2020 14:01
In Python 3 all str are unicode, so no need for
`item = item.encode('utf-8')`anymore.

And json.dumps, needs ' ensure_ascii=False ' to print unicode
chars correctly and not as \unnn or \xnnn notation.

We should get rit of six completelly.
Remove "six" package and replacing its calls with the python3 equivalent
as we no longer support Python 2, we can remove six usage also.
@stennie stennie self-assigned this Jan 2, 2020
@stennie stennie added this to the 1.6.1 milestone Jan 2, 2020
@stennie
Copy link
Collaborator

stennie commented Jan 2, 2020

@bugre Thanks for investigating and following through with a PR! :)

bugre added 4 commits January 4, 2020 18:13
…_JSONSerialize_crash_w_bytes

* 'develop' of github.com:bugre/mtools:
  Add VSCode config to .gitignore
  Bump dev version
  Update for 1.6.1 release
  Fix Flake8 complaints
  Fix rueckstiess#765: mloginfo --clients: more robust parsing of client metadata (rueckstiess#766)
  Fix rueckstiess#761: mtools should use python3 in shebangs
  Fix rueckstiess#698: Add rounding option for mloginfo --queries (rueckstiess#758)
  More specific match for checkpoint duration log line
  Fix rueckstiess#258: Add timezone to mloginfo summary
- Add simplification regex for key: [list,list] value patterns
- Change test (__main()__)  format, using a dict to store test pattern
  and expected result,  easier to catch code changes that break result
- Add fDebug and some debug prints of pattern processing.
@bugre
Copy link
Contributor Author

bugre commented Jan 6, 2020

Hi @stennie ,
I've made a couple of changes on the last commit

  • improve (i hope) the main() test pattern/verification process, adding the pattern and the expected 'result' to a dict to easier spot code changes that change/break the expected result. (*1)
  • add a regex that simplifies "{... "attrib" : ["val1", "val2", "3"],...}" to "{..."atrib" : [1],... }".
    I've tested this change as far i could and know the code (that is not much) so, if you could double check it would be best. Maybe add a note about it on next release , asking for extra care.
    This change, instead of doing it with a regex, it could probably also be made inside the 'def _decode_pattern_list(...)' code, but i wasn't sure the side effects of it.
    --
    (*1) This is a sample output, if you run only the "util/pattern.py" with the new pattern/result dict. Let me know if you would like that i revert it to the original code.
✔ ~/dsv/mtools [fix/bugre-#764_JSONSerialize_crash_w_bytes|✚ 1⚑ 3] 
07:20 $ /usr/local/bin/python3 /Users/werner.michels/dsv/mtools/mtools/util/pattern.py
OK:
  Input : {d: {$gt: 2, $lt: 4}, b: {$gte: 3}, c: {$nin: [1, "foo", "bar"]}, "$or": [{a:1}, {b:1}] }
  Expect: {"$or": [{"a": 1}, {"b": 1}], "b": 1, "c": {"$nin": 1}, "d": 1}
  Output: {"$or": [{"a": 1}, {"b": 1}], "b": 1, "c": {"$nin": 1}, "d": 1}
ERROR ******* :
  Input : {aa: {$gt: 22, $lt: 4}, "b": {$nin: [1, 2, 3]}, "$or": [{a:1}, {b:1}] }
  Expect: {"$or": [{"a": 1}, {"b": 1}], "a": 1, "b": {"$nin": 1}}
  Output: {"$or": [{"a": 1}, {"b": 1}], "aa": 1, "b": {"$nin": 1}}

Let me know what you think.

bugre added 3 commits January 6, 2020 11:27
It's a pattern with list of url '$all : [ "https://xxxxx.xx" ] ..'
See comment at rueckstiess#764 (comment)
The improved regex using positive lookbehind to check for a " (quote) before ']'
(closing list bracket) will correctly handle cases where a ']' is part of the value and
 also cases where list values are url's "nnn://aaa.bbb"  will correctly be simplified to '1'

Few debug print updated to print to stderr

Adjusted some test cases, to include one with 'closing bracket' as part of value

Added the correct "expected" simplified output for @niccottrell's use case with
{..."$all" : [ "url1", "url2" ] ...}.
@bugre
Copy link
Contributor Author

bugre commented Jan 12, 2020

After running the 'fixed' code over a few days of logs, found two situations that got me the wrong results, because i wasn't correctly protecting the starting of the list with regex. So i've added a positive lookahead to asure that after the '[' i only get space and than MUST a " (quote) exist.

Have also added a new test case.

Hope that this does generate a better and more precise result.

Fix #768

@stennie stennie merged commit b7827b8 into rueckstiess:develop Feb 10, 2020
@stennie
Copy link
Collaborator

stennie commented Feb 10, 2020

@bugre I think there is still some work to do on test cases for patterns (regex with escaped [s, for example). However, mloginfo previously silently skipped these so that isn't new behaviour.

I added your debug output to mloginfo --debug via 200b0a1. The output is rather verbose but could be useful for debugging log patterns.

Regards,
Stennie

@bugre
Copy link
Contributor Author

bugre commented Feb 11, 2020

Thanks @stennie. Nice solution with '--debug'.

I've a few really busy weeks ahead, so it probably will take some time until i can get back to this code. But i'll have a look as soon i can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mloginfo --queries crash "TypeError: Object of type bytes is not JSON serializable"
2 participants