Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vine: deserialize argument infile before forking #3902

Conversation

JinZhou5042
Copy link
Member

@JinZhou5042 JinZhou5042 commented Aug 7, 2024

Proposed Changes

To fix #3892

In the serverless mode, we use cloudpickle to serialize and deserialize arguments to/from a file. However, if some of the arguments are python packages or objects, the cloudpickle will serialize them by reference as the default approach, and unpickling them triggers the import of their dependencies at load time. (difference between pickle and cloudpickle for reference)

For example, the code snippets are dumping exactly the same Data object, except that one serializes by reference and the other serializes by value. The first one has a smaller file (<100 kb) and unpickling takes 0.8s, the second one has a much larger file (>1200 kb) and unpickling takes 0.0001s. Pickling by reference creates a smaller file but takes longer as imports happen at unpickling.

In the fork mode, doing the imports in each of the child processes doesn't contribute to reusing the overlapped environment, thus introduces latency to each of the function calls. The imports happen at the load time, which is the time to invoke cloudpickle.load(...). Loading arguments before forking enables the library to cache some packages in advance and thus to avoid such slowdown.

As discussed here, there are other possible ways and one of the disadvantages is that if unpickling the argument file is somehow unavoidably expensive, this change will slightly impact the concurrency as the child processes could've done the deserializations in parallel.

Merge Checklist

The following items must be completed before PRs can be merge.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update Update the manual to reflect user-visible changes.
  • Type Labels Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM Mark your PR as ready to merge.

@dthain dthain merged commit 7348e3e into cooperative-computing-lab:master Aug 7, 2024
8 checks passed
@JinZhou5042 JinZhou5042 deleted the vine_deserialize_args_before_forking branch August 7, 2024 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

TaskVine Futures: High latency on chained tasks.
2 participants