Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add depythonize_on_error #38

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

LilyFoote
Copy link
Contributor

@LilyFoote LilyFoote commented Mar 1, 2023

I'm opening this PR more as a place to start a discussion for my use-case and the changes I needed to make to support it. I'm hoping it's possible to turn this into a feature that's more broadly useful to other users.

My use case is I'm building a profiler/debugger called Kolo (https://kolo.app/). As part of this we have a Python library which inspects Python frame objects and serializes information about them (such as local variables) as json. In this I make use of a custom JSONEncoder to serialize otherwise unserializable data (as their repr).

I have been reimplementing this library in Rust for performance, and json encoding is a measurable slowdown in my profiling, so I'm trying to use pythonize and serde_json to avoid calling Python's json.dumps.

With the changes in this branch, and the following implementation of a dump_json function my tests all pass:

use pyo3::intern;
use pyo3::prelude::*;
use pythonize::depythonize_on_error;
use serde_json::Value;

fn on_error(data: &PyAny) -> &PyAny {
    let py = data.py();
    match data.repr() {
        Ok(repr) => repr,
        Err(_) => intern!(py, "SerializationError"),
    }
}

pub fn dump_json(data: &PyAny) -> Result<Value, PyErr> {
    let data: Value = depythonize_on_error(data, &Some(on_error))?;
    Ok(data)
}

Is a feature allowing some fallback behaviour on errors desirable for this library? Or is there a better way to achieve what I'm trying to do?

This allows a user to pass a custom function that can do extra handling
whenever an error happens in deserialization.

Inspired by the `default` parameter to Python's `json.dump`.
https://docs.python.org/3/library/json.html#json.dump
Comment on lines -66 to -67
} else if obj.is_instance_of::<PyByteArray>()? || obj.is_instance_of::<PyBytes>()? {
self.deserialize_bytes(visitor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because it was the quickest and easiest way to use the on_error callback when handling a bytes object. Obviously this breaks other use cases of this library, so a different solution will be needed here.

Some(on_error) => {
// Do we recurse infinitely if on_error returns something we can't
// deserialize?
self.input = on_error(self.input);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain that reassigning to self.input here is appropriate. Maybe that could break serializing some nested data?

Comment on lines +106 to +108
// Do we recurse infinitely if on_error returns something we can't
// deserialize?
self.input = on_error(self.input);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how best to avoid this infinite recursion.

/// Attempt to convert a Python object to an instance of `T`
pub fn depythonize_on_error<'de, T>(
obj: &'de PyAny,
on_error: &'de Option<fn(&PyAny) -> &PyAny>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought deeply about whether a different lifetime might be appropriate for on_error.

Comment on lines +378 to +385
let key = match key {
key if key.is_instance_of::<PyUnicode>()? => key,
key if key.is_none() => intern!(py, "null"),
key if key.is_instance_of::<PyBool>()? => match key.is_true()? {
true => intern!(py, "true"),
false => intern!(py, "false"),
},
key => key.str()?,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm (mostly) matching the behaviour of json.dumps that converts "dict keys of a basic type" (str, int, float, bool, None) to strings.

Again, this probably shouldn't live here, but I'm not clear what the right approach is.

@davidhewitt
Copy link
Owner

Sorry for having just seen this, been a long week for me. Thanks for opening the PR! The idea is the on_error callback gives a substitution which can then be deserialized instead?

I wonder if we should call this default like in json.dumps?

I'll need to have a play with the implementation to understand if it's the right approach. Will try to do this in the next few days.

@LilyFoote
Copy link
Contributor Author

Sorry for having just seen this, been a long week for me.

No worries!

The idea is the on_error callback gives a substitution which can then be deserialized instead?

Pretty much - in my use case I use the repr of the thing that failed to serialize, plus a further fallback if the repr itself is broken.

I wonder if we should call this default like in json.dumps?

I considered that. I'm not sure if that's going to be helpful or confusing, since serde has its own concept of "default", which is different. (https://serde.rs/container-attrs.html#default)

I'll need to have a play with the implementation to understand if it's the right approach. Will try to do this in the next few days.

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants