Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling custom serialization with MsgPack directly #4379

Closed
jakirkham opened this issue Dec 18, 2020 · 7 comments · Fixed by #4531
Closed

Handling custom serialization with MsgPack directly #4379

jakirkham opened this issue Dec 18, 2020 · 7 comments · Fixed by #4531

Comments

@jakirkham
Copy link
Member

Today we using things like extract_serialize to pull out objects MsgPack can't handle and serialize alongside it. In benchmarks we have done our extra handling code takes about 3x more time than MsgPack alone. An interesting idea to follow up on would be to see if we can add an ExtType or something to default encoding/decoding to handle out-of-band buffers and merely track where to insert them later. This would be analogous to how pickle works with out-of-band buffers. Though it may speed up serializing and deserializing by doing fewer passes over the data by leveraging MsgPack's own passes. In theory we could get up to a 4x speed up in serialization by following this strategy.

@jakirkham
Copy link
Member Author

cc @mrocklin @quasiben

@mrocklin
Copy link
Member

mrocklin commented Jan 5, 2021 via email

@jakirkham
Copy link
Member Author

This came up again today when we were revisiting one of the case where the idea originated that the scheduler is slow. In that case about 50% of the time on the scheduler is spent evenly receiving/deserializing data (admittedly this before the HLG work) and performing transitions. Additionally another ~20% of time on the scheduler is spent sending data most of which is spent serializing it. As a result it seems reasonable to conclude improvements in serialization/deserialization are well worth our time.

@mrocklin
Copy link
Member

@jakirkham maybe spending time on Cythonization is the wrong move if this can be done easily.

@jakirkham
Copy link
Member Author

Yeah mostly was curious if the extract_serialize Cythonization would be easy to test/show some notable improvement. Agree if that's not easy, we can just ignore it.

Looking at transitions atm.

@madsbk, if you have some time on Monday, maybe we can chat about this? 🙂

@jakirkham
Copy link
Member Author

Just chatted with @madsbk about this, I think we will start with trying to remove extract_serialize by using MsgPack to do this work ( #4379 ). Our hope is this already addresses performance for the status message use case ( #4376 ).

Will continue pushing on optimizing transitions and moving communication out of there ( #4454 ) ( #4451 ).

cc @quasiben (for vis)

@jakirkham
Copy link
Member Author

As a first step we are adding a fast path for things that can be handled by MsgPack alone ( #4480 ). Though we are still interested in improving the serialization workflow overall, which may be handled in later work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants