Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous Contents API #324

Merged
merged 15 commits into from
Dec 10, 2020
Merged

Conversation

mwakaba2
Copy link
Member

@mwakaba2 mwakaba2 commented Oct 30, 2020

Description
Hi, here's the draft for adding an async option for the contents api.

So far, I added an async option for each class/mixin below:

  • ContentsManager and AsyncContentsManager
  • Checkpoints and AsyncCheckpoints
  • GenericCheckpointsMixin and AsyncGenericCheckpointsMixin

@kevin-bates and/or @Zsailer I'd appreciate your review and feedback!

Open Questions
I added a test for the configuration, but how do we want to test the new classes since most methods are not implemented and left up to the user?

Checklist

  • Update Contents Api Doc
  • Add test for contents configuration
  • Implement AsyncFileContentsManager
  • Update test_ap/test_manager to test both FileContentsManager and AsyncFileContentsManager
  • Rebase with master branch to include public fixture update.
  • Remove python 3.5 support code.
  • Fix code scanning lint errors
  • Test new classes with my team's custom ContentsManager
  • Update docs to include best use cases for the async classes.
  • Fix tests

@mwakaba2 mwakaba2 marked this pull request as draft November 2, 2020 18:13
@kevin-bates
Copy link
Member

Thanks @mwakaba2 - this is a good start.

how do we want to test the new classes since most methods are not implemented and left up to the user?

You're right, most tests leverage the FileContentsManager and things should get "concrete" when adding its async form. At that time, I suspect we could use parameters like in the kernels tests, which run the tests using both classes...

@pytest.fixture(params=["MappingKernelManager", "AsyncMappingKernelManager"])
def argv(request):
    if request.param == "AsyncMappingKernelManager" and sys.version_info < (3, 6):
        pytest.skip("Kernel manager is AsyncMappingKernelManager, Python version < 3.6")
    return ["--ServerApp.kernel_manager_class=jupyter_server.services.kernels.kernelmanager." + request.param]

@mwakaba2
Copy link
Member Author

mwakaba2 commented Nov 4, 2020

@kevin-bates Since FileContentsManager is a subclass of ContentsManager, one option is to create AsyncFileContentsManager for asynchronous file operations and test that. Then, I think we'll be able to test both classes in the test_api file. However, it looks like asyncio doesn't support async file operations by default, so we'll need to use something like aiofiles: file support for asyncio.

Another option would be to create a no-op or simple custom ContentsManager class that inherits from the async classes, and test that. To mimic network IO, the async methods can include asyncio.sleep statements.
I think the first option would be a complete solution and can rely on the existing tests, whereas the second will not exactly test real situations. What would you recommend?

@kevin-bates
Copy link
Member

Hi Mariko - I think taking the forward-moving branch by implementing AsyncFileContentsManager is the best approach. Thank you.

@mwakaba2
Copy link
Member Author

Hi @kevin-bates, I added the AsyncFileContentsManager class and updated tests to test both async and non-async classes.
Instead of aiofiles, I added a new dependency anyio to run some synchronous callables in a thread with run_sync_in_worker_thread. This is used to run most os operations and nbformat methods concurrently. The nbformat team is looking into making the api asynchronous (jupyter/nbformat#194), but it looks like that'll take some time to get that feature out, so using anyio for now.
Please let me know if you have any questions and I'd appreciate your review!

@pytest.fixture(params=["FileContentsManager", "AsyncFileContentsManager"])
def argv(request):
if request.param == "AsyncFileContentsManager" and sys.version_info < (3, 6):
pytest.skip("Kernel manager is AsyncFileContentsManager, Python version < 3.6")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy/paste - Kernel -> Contents. Actually, since we no longer support 3.5, this condition can be dropped altogether. 😄

@kevin-bates
Copy link
Member

Hi @mwakaba2 - thank you for this great update. I hope to be able to take this for spin next week - I hope that's okay. Things look real good via a quick scan - just had the one immediate comment (to show I did look at things 😄 ).

Also, we've made some fairly large changes to the pytest landscape. This will manifest in your additional tests in the form for requiring fp_ prefix on the "public" fixtures - which I suspect to be fairly isolated, but a merge with master might be helpful.

Thanks for the links to nbformat. The additional dependency on anyio seems fine, especially since @bollwyvl is on board!

@mwakaba2
Copy link
Member Author

Hi @kevin-bates, hope you had a good weekend. Yeah that works! I see some code scanning lint errors, so I'll fix those along with removing the python 3.5 pytest skip logic, rebasing with master to include public fixture changes, and testing the new classes with my team's custom contentsmanager. Thanks for the quick review!

@mwakaba2
Copy link
Member Author

Hi @kevin-bates, is it possible to publish a dev version to PyPI to test this branch?

@kevin-bates
Copy link
Member

Hi @mwakaba2 - I'm not sure there's precedent for this and reading up on publishing dev packages to pypi (as I'm not familiar with that particular aspect) I'm finding it not a recommended practice.

What I've done in the past is to use pip with a reference to the GH repo@branch, like so:

pip install --upgrade git+https://github.com/mwakaba2/jupyter_server@async-contents-api

Would that be sufficient?

Btw, I brought up your PR in the today's server meeting and will try to spend time with this. I apologize for the delay.

@mwakaba2
Copy link
Member Author

Ah ok no problem. Yeah, we're using an in-house tool to manage 3rd party python dependencies so I'm not able to use pip directly, but I think I found a way to test it.
And thanks! I saw the meeting notes. I'm hoping to complete testing this week.

@codecov-io
Copy link

codecov-io commented Nov 20, 2020

Codecov Report

Merging #324 (1a5395d) into master (ca492c1) will decrease coverage by 1.63%.
The diff coverage is 53.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #324      +/-   ##
==========================================
- Coverage   67.37%   65.74%   -1.64%     
==========================================
  Files          55       56       +1     
  Lines        5361     6025     +664     
  Branches      696      804     +108     
==========================================
+ Hits         3612     3961     +349     
- Misses       1545     1832     +287     
- Partials      204      232      +28     
Impacted Files Coverage Δ
jupyter_server/pytest_plugin.py 0.00% <0.00%> (ø)
jupyter_server/files/handlers.py 31.11% <33.33%> (+1.56%) ⬆️
jupyter_server/base/handlers.py 67.26% <40.00%> (ø)
jupyter_server/services/contents/checkpoints.py 46.51% <45.23%> (-1.22%) ⬇️
...upyter_server/services/contents/filecheckpoints.py 61.40% <54.16%> (-5.60%) ⬇️
jupyter_server/services/contents/fileio.py 74.34% <61.11%> (-5.22%) ⬇️
jupyter_server/services/contents/filemanager.py 67.23% <64.60%> (-2.13%) ⬇️
jupyter_server/nbconvert/handlers.py 23.00% <66.66%> (+0.77%) ⬆️
jupyter_server/view/handlers.py 57.14% <66.66%> (ø)
...pyter_server/services/contents/largefilemanager.py 82.41% <82.22%> (-1.26%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca492c1...5033aa8. Read the comment docs.

@kevin-bates
Copy link
Member

Hi Mariko - I was just going to post that LargeFileManager should be included since it's currently the default contents_manager_class value. Thank you for the update.

I was able to take AsyncFileContentsManager for a spin yesterday and found the functionality to be fine. I decided to run some crude tests using curl and uploading/fetching larger file contents (189 MB). A "test" simply consisted of timing 3 curl commands (each in the background) to simulate 3 "simultaneous" requests. Larger content was used solely to trigger longer request times and ensure the previous requests hadn't finished prior to the subsequent requests. I.e., I wasn't as concerned about the request time as I was about getting multiple requests "serviced" simultaneously.

I found that the current implementation didn't exhibit the blocking behavior that I experienced with kernel startup before we implemented async/await support - which surprised me. I also found that it seems like the current implementation was slightly better in that you could see a "clear winner" amongst the 3 requests (upwards of 50% faster), whereas, with the async implementation, the times tended to be the same across the 3 requests. I suspect this is because of "too much cooperation" since each await is essentially a pause point for that executor.

Since the primary purpose of this is to address high-latent filesystems, I suspect there will be some "wins" there. I also wonder if we should take a close look at where awaits have been introduced and ensure they make sense.

I'm curious if you see similar behavior. Thank you for working on this.

@kevin-bates
Copy link
Member

Hi Mariko, would you mind splitting the PEP440 update out in a separate PR? This is something I had on my plate to do today (along with an update to RELEASE.md to update the dev version following the release tagging).

I think only the dev and post suffixes use a dot: https://www.python.org/dev/peps/pep-0440/#public-version-identifiers

If you'd rather I do that, that's fine also.

@kevin-bates
Copy link
Member

😄 Apparently someone also had this on their plate yesterday: 8763b5b

I'll go ahead and submit a PR to add the dot and update RELEASE.md.

@jasongrout
Copy link
Contributor

I'll go ahead and submit a PR to add the dot and update RELEASE.md.

You may remember that we did this sort of thing in https://github.com/jupyter/jupyter_core/blob/master/jupyter_core/version.py

@kevin-bates
Copy link
Member

Thanks @jasongrout - I just saw your comment (after submitting #348). I hadn't forgotten about your change in core, but personally found that to be overly complex. That said, if others prefer that approach, I'm okay with adopting it.

@jasongrout
Copy link
Contributor

jasongrout commented Nov 20, 2020

personally found that to be overly complex

It could have been way more complex for full pep440 compliance, but we dialed it back :).

The purpose for the extra stuff in that file is to provide a good versioninfo tuple that mirrors the versioninfo tuple from Python more faithfully, and to make it so that you don't have to think about how pep440 formats version numbers (dot? consistent abbreviation?). I think it probably does that as simply as possible, but (a) if there is something simpler, I'd love to see it - I think it's a bit much to stomach as well, and (b) of course, if you don't want those two goals, that's a different story.

@mwakaba2
Copy link
Member Author

mwakaba2 commented Nov 20, 2020

Hi @kevin-bates, thanks for testing this out! I have some bugs to fix, so I think I need a little more time to test this out, but yep I agree. There are definitely some awaits that can be removed. I'm interested in the tests you performed and would like to try it out after I fix some bugs on my end. Could you share the commands you ran for the curl tests?

As for the pep440 update, I made that quick change just to see if the tests are still working in this branch. I'll remove the pep440 commit and update this branch once your PR is merged to the main branch.

I'm taking next week off, so I'll be able to resume testing the week after that (11/30). I noticed you're taking time off in December and was wondering if another jupyter member can help with the code review/release?

@kevin-bates
Copy link
Member

Thanks @mwakaba2 - that all sounds great. Have a great week off!

Although I'll be out for (essentially) December, my plan is to be around for this kind of stuff. @Zsailer can you help review this and move this forward as well. Perhaps we can target the Server Team meeting of December 3 to sync up with things?

@kevin-bates
Copy link
Member

I just realized you had asked to see the test scripts. I've included them in the attached tar file but did not include the large content file. The README describes how to set these up and how to construct the larger payload. As I said, they are crude and minimally viable. 😄

contents-test.tar.gz

@mwakaba2
Copy link
Member Author

mwakaba2 commented Dec 1, 2020

@kevin-bates Thanks for putting together the test bundle.
I ran your put/get scripts 5 times to calculate the avg for each scenario:

  • Before (current implementation on master branch)
$ jupyter server --ServerApp.contents_manager_class=jupyter_server.services.contents.largefilemanager.FileContentsManager --debug on main branch)
  • After (FileContentsManager)
$ jupyter server --ServerApp.contents_manager_class=jupyter_server.services.contents.largefilemanager.FileContentsManager --debug
  • After (AsyncFileContentsManager)
$ jupyter server --ServerApp.contents_manager_class=jupyter_server.services.contents.largefilemanager.AsyncFileContentsManager --debug

Here are the results.
Fastest request out of three "huge" requests

Before
---------
PUT
201 PUT /api/contents/contents-test/huge1.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 604.49ms
201 PUT /api/contents/contents-test/huge2.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 517.84ms
201 PUT /api/contents/contents-test/huge2.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 555.70ms
201 PUT /api/contents/contents-test/huge2.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 490.21ms
201 PUT /api/contents/contents-test/huge2.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 707.03ms
---------
avg 575.054ms

GET
200 GET /api/contents/contents-test/huge2.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 941.87ms
200 GET /api/contents/contents-test/huge2.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 843.39ms
200 GET /api/contents/contents-test/huge1.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 873.16ms
201 PUT /api/contents/contents-test/huge1.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 506.65ms
200 GET /api/contents/contents-test/huge2.json?token=fd559df06790466297db4a89b210b66ef79a104e935ba17d (::1) 873.18ms
---------
avg 807.65ms

After (non-async)
-----------------
PUT
201 PUT /api/contents/contents-test/huge1.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 727.68ms
201 PUT /api/contents/contents-test/huge2.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 520.14ms
201 PUT /api/contents/contents-test/huge1.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 494.54ms
201 PUT /api/contents/contents-test/huge2.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 558.08ms
201 PUT /api/contents/contents-test/huge3.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 599.44ms
---------
avg 579.976ms

GET
200 GET /api/contents/contents-test/huge1.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 836.26ms
200 GET /api/contents/contents-test/huge1.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 819.93ms
200 GET /api/contents/contents-test/huge2.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 834.84ms
200 GET /api/contents/contents-test/huge2.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 857.20ms
200 GET /api/contents/contents-test/huge2.json?token=b9c8c67f686f6b746cc4479ddbaee06f8526659d0b10b5a1 (::1) 828.08ms
---------
avg 835.26ms

After (async)
-----------------
PUT
201 PUT /api/contents/contents-test/huge1.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 1253.11ms
201 PUT /api/contents/contents-test/huge2.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 893.71ms
201 PUT /api/contents/contents-test/huge3.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 917.52ms
201 PUT /api/contents/contents-test/huge1.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 1457.99ms
201 PUT /api/contents/contents-test/huge2.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 1363.17ms
---------
avg 1177.1ms

GET
200 GET /api/contents/contents-test/huge1.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 859.31ms
200 GET /api/contents/contents-test/huge1.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 851.68ms
200 GET /api/contents/contents-test/huge2.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 876.50ms
200 GET /api/contents/contents-test/huge2.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 886.85ms
200 GET /api/contents/contents-test/huge3.json?token=207ea8b331c25affbb8e607138e46fff0c587935d83cd3f4 (::1) 859.66ms
---------
avg 866.8ms

I confirmed that there's no noticeable performance difference between "Before" and "After (FileContentsManager)", so that's good. 😀
As for the asynchronous implementation, as you saw, it looks like there's a 104.69% increase in response time for the fastest "huge" PUT request. The increase is minimal for the GET request.

So I removed some await statements to reduce the PUT response time. The increase is minimal now.

After (async, optimal)
-----------------
PUT
201 PUT /api/contents/contents-test/huge1.json?token=4f24b1be4ce39fdfd8d911b737f718e1572e4c1bc2dd7c93 (::1) 747.23ms
201 PUT /api/contents/contents-test/huge1.json?token=4f24b1be4ce39fdfd8d911b737f718e1572e4c1bc2dd7c93 (::1) 512.65ms
201 PUT /api/contents/contents-test/huge2.json?token=4f24b1be4ce39fdfd8d911b737f718e1572e4c1bc2dd7c93 (::1) 587.67ms
201 PUT /api/contents/contents-test/huge2.json?token=4f24b1be4ce39fdfd8d911b737f718e1572e4c1bc2dd7c93 (::1) 576.43ms
201 PUT /api/contents/contents-test/huge3.json?token=4f24b1be4ce39fdfd8d911b737f718e1572e4c1bc2dd7c93 (::1) 538.99ms
---------
avg 592.59ms

This is just an update for PUT, so I'll look into optimizations for the other file operations next (e.g. rename, delete).

@mwakaba2 mwakaba2 marked this pull request as ready for review December 4, 2020 20:34
@mwakaba2
Copy link
Member Author

mwakaba2 commented Dec 4, 2020

Hi @kevin-bates and @Zsailer, the PR is ready for review.
After testing my team’s custom manager with the asynchronous Contents API, I confirmed that the terminal lag was reduced by 12 ~ 38%.

I performed three tests:

  1. Executing a ~10 second command in the terminal with no user-triggered notebook operations: 12.57% decrease in command execution time. (the files are fetched in the background every 10 seconds.)
  2. Executing a ~10 second command while manually fetching a large notebook (1.1 MB): 28.9% decrease in command execution time.
  3. Executing a ~10 second command while saving a large notebook (1.1 MB): 37.97% decrease in command execution time
    The percentage was calculated by comparing before and after results. (Result = average of 5 command execution times)

@Zsailer Zsailer self-requested a review December 10, 2020 18:37
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great contribution @mwakaba2. As Jupyter expands across more filesystem types and concurrency demands increase, I believe the async contents you've provided will prove useful - thank you!

@Zsailer
Copy link
Member

Zsailer commented Dec 10, 2020

Thanks, @mwakaba2! Looks great! Merging away! 🎉

@Zsailer Zsailer merged commit 6fef2a8 into jupyter-server:master Dec 10, 2020
@mwakaba2 mwakaba2 deleted the async-contents-api branch December 12, 2020 20:14
Zsailer added a commit to Zsailer/jupyter_server that referenced this pull request Nov 18, 2022
hMED22 pushed a commit to hMED22/jupyter_server that referenced this pull request Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants