Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/invalid utf8 during encoding #75

Merged
merged 30 commits into from
Oct 9, 2015

Conversation

coryvirok
Copy link
Contributor

No description provided.

JSON encoding fails sometimes if there is a local variable or custom data
passed in that cannot be converted to unicode and then to utf8 encoded
bytes. This change implements a custom iterencode() method for the JSON
encoder. It takes each chunk of JSON that is produced during JSON serialization
and attempts to decode it using UTF-8 encoding. If a decode error occurs,
a custom unicode string will be added to the JSON output that gives
information about the decode error and the data that produced it.

Needs more testing.

@brianr
The check to see if a value was in the list of scrub fields needs
to do a .lower() which will do an implicit conversion of the value
if it's not already a unicode string. That can cause UnicodeDecodeErrors
so I added a _to_text() function which will attempt a couple of
different ways at converting bytes to unicode, (for python 2.)

I also bumped the minor version and added a beta.1 tag since this
code is potentially buggy, (unicode is always difficult to get
right, especially across Python versions.)

@brianr
None was not encoded properly, neither were integer types that needed to be sanitized.
Python 3 bytes instances will be verified to decode into utf-8. If they do not decode
they will be sent over as base64.

@brianr
Don't return bytes from the JSONErrorIgnoringEncoder.encode()
Decode the base64 data into ascii before putting into the
custom <UnicodeDecodeError> string.
Remove the reason: part of the custom <UnicodeDecodeError>
string.
I was almost able to get serialization and scrubbing working using the old
method of overriding iterencode() inside a custom JSONEncoder class but
I finally ran into a 'bug' I couldn't get around.

I was relying on iterencode() to yield each element in the JSON dict/list
separately from the JSON format characters. _iterencode_dict() did this
nicely, however _iterencode_list() munges together JSON formatters, (e.g. ',')
with the elements in the object, making it near impossible to correctly
escape undecodable keys/values. This was buggy and more or less unmaintainable.
So, I revamped the serialization + scrubbing to use a custom object
traversal function which allows us to provide callbacks for each node in
the JSON object being traversed.

Now, serialization + scrubbing is all done using the same mechanism, a
Transform. These transforms provide special-purpose callbacks for each
type we expect to see in the object being traversed.

Most of the tests are passing in Python 2.7 but many are still failing.

Saving progress...

@brianr @jondeandres
Install the frameworks using Travis so we can test across different
framework versions.
The ScrubUrlTransform object needed to receive a suffixes= parameter
in order for the rollbar.init() code to configure it to only check
certain key suffixes.

Fixed a bunch of tests. Mainly, I moved tests out of test_rollbar
and into test_*_transform.
Don't create the Flask tests if flask is not installed or the Python
version isn't supported by Flask.

Added a method to BaseTest that doesn't exist in some Python versions.
Python3 bytes need to be handled with the transform_string() method.
Added force_lower() which will attempt to turn whatever it's passed
into a lowercased value even if it needs to typecast it.
Added a map() helper to fix inconcistent implementations for Python 2/3.
Fixed a few tests that treated Python 2/3 results differently but
checked the Python version incorrectly.
Python 2.6 namedtuples don't have a __dict__ attribute.
I also removed some unnecessary code from the scruburl scrub method
that was causing tests to fail for Python 2.6
I think that Travis is running the unit tests from a different base
directory or something because it was failing with errors regarding
the name of the CustomRepr class in the tests. Hopefully this fixes.
Remove commented out code.
Move dict_merge() into lib
Moved the logic for types that are allowed to be circularly
referenced into the traverse() algorithm. This makes things
much simpler since we no longer have to worry about the
correct type handling in the transform.transform_circular_reference()
method.

Fixed a bug in the URL scrubbing logic which was scrubbing
non-strings if the suffix matched. Added a test for this.

Finally, based on Brian's feedback, changed the format of
the circular reference label.

@brianr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants