Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Dask #242

Closed
dlqqq opened this issue Jun 23, 2023 · 0 comments · Fixed by #244
Closed

Migrate to Dask #242

dlqqq opened this issue Jun 23, 2023 · 0 comments · Fixed by #244
Assignees
Labels
enhancement New feature or request

Comments

@dlqqq
Copy link
Member

dlqqq commented Jun 23, 2023

Problem

We are currently using Ray in our backend to parallelize certain tasks. However, for our use-case, Dask may be more appropriate. Here is our team's rationale:

  • Dask is a parallel compute (single-node) library for Python, whereas Ray is more for distributed compute (multi-node).
    • In general, Dask seems more optimized for our use-case of single-node compute by offering more granular configuration on how workers are run on the hardware.
  • Ray is blocking cross-platform Python 3.11 compatibility for Jupyter AI.
  • Ray does not allow for multithreading.
    • From our recent investigation (use Dask to parallelize document learning #182), learning tasks (via /learn) observe a ~2x performance gain via multiprocessing, and a ~10x performance gain via multithreading.
    • Forking processes incurs a very significant overhead due to the size of the libraries Jupyter AI is using. The Dask dashboard shows that roughly ~3-4 GB of Python code is being pickled & unpickled for each process.
    • Furthermore, most of our long-running tasks spend a majority of their lifetime waiting on I/O. Hence, Python's GIL is not the limiting factor in performance here.
  • Spawning Ray actors significantly impacts server start time, because each initialized Ray actor requires a process to be spawned, which as discussed previously, takes a significant amount of time for Jupyter AI. Furthermore, Ray actors consume a generous amount of memory (again, because each is a separate process) despite being idle throughout most of their lifetime.
    • Not a problem with Ray, but rather with our initial implementation consisting almost exclusively of Ray actors.

Proposed Solution

Implement a comprehensive proof-of-concept for migrating the backend to Dask.

Additional context

We have to be careful about avoiding thread starvation, i.e. handle the case in which a worker thread (for whatever reason) is executing a significant amount of kernel-thread-blocking code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants