Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove json_clean #706

Closed
martinRenou opened this issue Jul 1, 2021 · 4 comments · Fixed by #708
Closed

Remove json_clean #706

martinRenou opened this issue Jul 1, 2021 · 4 comments · Fixed by #708

Comments

@martinRenou
Copy link
Contributor

martinRenou commented Jul 1, 2021

cc. @SylvainCorlay @minrk @afshin
Also @ibdafna, you might be interested.

Problem

The json_clean function is responsible for some slowness in ipykernel, both in comms (jupyter widgets) and in the kernel protocol.

This function loops over all communication messages and makes copies of all lists-like and dicts in there. This is fine when it comes to execution requests, but with comms it can be very slow as people are passing entire dataframes through them (see beakerx_tabledisplay and ipydatagrid for example).

After the object has been cleaned by json_clean, it is serialized to a JSON string by jupyter_client inside the send and serialize methods by self.pack.

Solution?

We should remove this json_clean method, and instead, we should use the official JSON library API to do the "cleaning" while serializing the message, directly in jupyter_client. This is already achieved for dates (see this and this) and could be done for other objects (bytes, etc).

More context

I have to say that I tried making the json_clean function a no-op locally, and it worked perfectly with most of the Notebooks I tried (Folium, Plotly, Altair, Interactive Widgets, ipydatagrid, IPython.display)

Only the Matplotlib inline backend was not working anymore, due to incorrect serializing of bytes (which could be done in jupyter_client, see the proposed solution).

@JohanMabille
Copy link
Contributor

JohanMabille commented Jul 1, 2021

I agree we could avoid using it for every single message. However, I need a mean to detect if a variable is JSON serializable or not for the debugger (see #697), and I'm currently using this json_clean function. I would be happy to use any alternative taht coule help reducing the size of the code base.

@martinRenou
Copy link
Contributor Author

martinRenou commented Jul 1, 2021

I guess you could do something like:

import json

try:
    json.dumps(obj)
except TypeError:
    ...

@Carreau
Copy link
Member

Carreau commented Jul 1, 2021

+1 for stricter types down the line.

@martinRenou
Copy link
Contributor Author

martinRenou commented Jul 2, 2021

The json_clean function is responsible for some slowness in ipykernel

I should have come with proofs instead of raw statements. I did some experiments this morning, taking a comm message from ipydatagrid's repository examples.

This example loads a dataset containing all trees from Paris and displays it in a grid widget. One could argue that ipydatagrid should use binary buffers, and it should. But it comes as a good example for testing json_clean's performances.

This was the result: (I saved the comm message to a JSON file before doing the test)
speedup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants