Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore: trying out mmap-based file extraction #3524

Merged
merged 8 commits into from
Aug 5, 2020
Merged

Conversation

zkat
Copy link
Contributor

@zkat zkat commented Jul 16, 2020

Fixes: NuGet/Home#9807

Results are promising:

Benchmark results from one of our test benchmark projects (the Orchard example)

Note: I couldn't get the NuGet.Client-based benchmarks to work, but the other two worked fine.

TODO

  • (SOLVED) You'll notice a bunch of "file already in use" errors in the test failures in CI. These seem to be spurious/random and I'm not sure exactly what's triggering them but I can probably work around it with a try{} around the mmap code that falls through to a regular write. These are rare enough that it shouldn't affect performance, and if we catch the right exception, won't affect correctness (because the incoming stream won't have been consumed at all).
  • Have a conversation about max file size to mmap: I'm not sure how many of these extractions we're doing in parallel, but my assumption is "not many, if any at all", so I'm inclined to just... leave it at the current 10mb setting (files smaller than 10mb will be fully slurped into RAM)
  • Should we bother flushing? Basically none of our code flushes streams to disk, and a process crash won't prevent mmapped files from being persisted, so I'm inclined to not take the (relatively small) perf hit, and just go "don't power off your computer :)", which is what we're saying already, anyway, for these extractions.

@zkat zkat changed the title spike: trying out mmap-based file extraction restore: trying out mmap-based file extraction Jul 17, 2020
Copy link
Member

@zivkan zivkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never used mmap before in .NET, and don't remember reading much about it in the past. How confident are we that this will work for everyone in all scenarios? Should we consider checking an environment variable to allow people to turn it off, considering otherwise the only workaround will be to downgrade to older tooling, which might not be an option for other reasons?

Should we bother flushing? Basically none of our code flushes streams to disk

Dispose implicitly flushes. And therefore it's not a perf hit. In fact, for async code this is a problem because dispose is synchornous, and therefore flushing on dispose can block the thread when the code is otherwise async. Therefore for perf it's important to call FlushAsync in async code before disposing. But I think this is a bigger problem with network streams, rather than disk IO. In any case, the method modified in this PR is not async, so it doesn't really matter.

I don't know if it is true that mmaped files also flush on dispose for mmaped files in .NET. I would guess so, as it would be very confusing for developers like us knowing that one type of stream flushes on dispose but another does not.

a process crash won't prevent mmapped files from being persisted

making mmap more relisient than "normal" streams :)

@zkat zkat force-pushed the dev-zkat-mmap-extraction branch 4 times, most recently from ef632d6 to e31d8b2 Compare August 4, 2020 18:40
@zkat zkat requested review from zivkan and dtivel August 4, 2020 18:42
@zkat
Copy link
Contributor Author

zkat commented Aug 4, 2020

This is ready for another review! I simplified the whole thing. The only stream type that didn't have .Length was the Stream that comes from the ZipArchiveEntry. I've created a wrapper Stream in that case so I can pass the size in from the entry, and that gets rid of all breaking changes (and all visible changes, for that matter. This is now thoroughly tucked away inside CopyToFile)

@zkat zkat requested a review from dtivel August 4, 2020 22:44
@zkat
Copy link
Contributor Author

zkat commented Aug 4, 2020

@dtivel fixed. I had to change how I'm doing the mmap because of issues I had on Linux, but this version is actually passing tests AND performing well :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up nupkg extraction using mmap/CreateFileMapping
5 participants