-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-17195. ABFS Store thread pool for stream IO. #2294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17195. ABFS Store thread pool for stream IO. #2294
Conversation
Change-Id: I6915539cfafe7164c404dfc153653710280d9bf6
|
🎊 +1 overall
This message was automatically generated. |
|
Looking at this a bit more
|
|
Closing this, but leaving up as the PoC to say "we should have a shared thread pool for lower startup costs"; it would be a switch to buffering on which will the way to guarantee an end to OOM problems I am happy for the S3A blocks class to be moved to hadoop-common to address this. |
|
Maybe I was being pessimistic there. If the #of active writes a single stream can have active is throttled, the #of open blocks a single stream can have allocated is also blocked. But: ability to buffer on disk is the way to robustly avoid scale issues with many active threads. |
This is the successor to #2179
To actually defend against OOMs the per-stream queue length is what needs to be managed; looking at the patch it still has the problem of #2179: you need one buffer per pending upload in the the pools.
Ultimately the S3A Connector fixed this by going to disk buffering by default. A more performant design might be to have a blocking byte buffer factory which limits the #of buffers which the streams can request, so putting an upper bound on the amount of memory which a single ABFS store instance can demand.