-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use only loop executor for fsspec
source
#999
Conversation
ab20f43
to
e397896
Compare
fsspec
optional backends such as s3fs
…ec-optional-backends * origin/source-futures-submit: Future init do not use named arguments for the path/url annotation update submit interface
fsspec
optional backends such as s3fs
fsspec
source
* origin/main: fix: url and object splitting for local files (#1007)
Trace for the error:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good simplification! I see that you need work-arounds for fsspec backends that don't have async and Python 3.8, which doesn't have to_thread
, and that's okay.
Although sshfs
has been added to the test dependencies, I think the only ssh test is disabled. I wonder if running sshd on the test-runner and connecting to
ssh `whoami`@localhost
would be an option? It's not a big deal.
Isolating the glitchiness of the test to S3 is good—we can provide the functionality without testing it because it's one of the things fsspec is supposed to do on its own. (We should only be responsible for using the fsspec API correctly in Uproot—there's a "separation of concerns.") It's too bad that it can't be reproduced outside of Uproot, but I know you put a lot of time into trying to do that.
I think this PR is ready to go as-is! Thanks!
Good idea I can try this in a different PR, I can use the cache directory for
I'm 90% sure it's |
This PR removes the support for different types of executors for the
fsspec
source. Theuse_threads
andnum_workers
options are also dropped.The loop executor exists only for compatibility with the remaining sources and does not hold any resources. The loop is accessed directly from
fsspec.asyn
andsubmit
is basically justasyncio.run_coroutine_threadsafe
.With this PR, whenever the
fsspec
filesystem does not supportasync
calls, the methods are wrapped into a coroutine that runs them in a separate thread so they are not blocking. This may spawn too many short-lived threads but from limited testing I haven't found this to be an issue. The alternative would be to run them as coroutines in thefsspec
loop but they wouldn't run concurrently as they are blocking calls. Another alternative would be to use something likehttps://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor
to run the coroutines in an executor. Edit: #999 (comment)This PR took a long time to finish mainly due to an intermittent error in the CI. After much testing I am still not sure what the cause of this is, but I think it has to do with
s3fs
specifically. I could not reproduce this error outside of pytest running on the uproot repo, I even created another repository to test this but could not reproduce it using the same pytest code and uproot version. Since it's proven so hard to trigger and I don't think this PR causes this error (since it can also be reproduced on the main branch of uproot), I think we can merge this PR and I will try to fix / understand the error in another PR where I enable thes3fs
tests.The code I used to reproduce this:
Notice in the start where I have defined multiple pairs of
fs, path
then I get some bytes from the file. Only when I run this line with thes3
backend it produces the error. Using a sync-only backend such asgithub
or another async one such ashttps
does not cause the error to trigger, so this is why I thinks3fs
is the culprit. I disabled the test that useds3fs
and the error also disappeared.