vine: deserialize argument infile before forking #3902
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed Changes
To fix #3892
In the serverless mode, we use
cloudpickle
to serialize and deserialize arguments to/from a file. However, if some of the arguments are python packages or objects, thecloudpickle
will serialize them by reference as the default approach, and unpickling them triggers the import of their dependencies at load time. (difference between pickle and cloudpickle for reference)For example, the code snippets are dumping exactly the same
Data
object, except that one serializes by reference and the other serializes by value. The first one has a smaller file (<100 kb) and unpickling takes 0.8s, the second one has a much larger file (>1200 kb) and unpickling takes 0.0001s. Pickling by reference creates a smaller file but takes longer as imports happen at unpickling.In the
fork
mode, doing the imports in each of the child processes doesn't contribute to reusing the overlapped environment, thus introduces latency to each of the function calls. The imports happen at the load time, which is the time to invokecloudpickle.load(...)
. Loading arguments before forking enables the library to cache some packages in advance and thus to avoid such slowdown.As discussed here, there are other possible ways and one of the disadvantages is that if unpickling the argument file is somehow unavoidably expensive, this change will slightly impact the concurrency as the child processes could've done the deserializations in parallel.
Merge Checklist
The following items must be completed before PRs can be merge.
Check these off to verify you have completed all steps.
make test
Run local tests prior to pushing.make format
Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lint
Run lint on source code prior to pushing.