feat: use only loop executor for `fsspec` source #999

lobis · 2023-10-19T03:55:42Z

This PR removes the support for different types of executors for the fsspec source. The use_threads and num_workers options are also dropped.

The loop executor exists only for compatibility with the remaining sources and does not hold any resources. The loop is accessed directly from fsspec.asyn and submit is basically just asyncio.run_coroutine_threadsafe.

With this PR, whenever the fsspec filesystem does not support async calls, the methods are wrapped into a coroutine that runs them in a separate thread so they are not blocking. This may spawn too many short-lived threads but from limited testing I haven't found this to be an issue. The alternative would be to run them as coroutines in the fsspec loop but they wouldn't run concurrently as they are blocking calls. Another alternative would be to use something like https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor to run the coroutines in an executor. Edit: #999 (comment)

This PR took a long time to finish mainly due to an intermittent error in the CI. After much testing I am still not sure what the cause of this is, but I think it has to do with s3fs specifically. I could not reproduce this error outside of pytest running on the uproot repo, I even created another repository to test this but could not reproduce it using the same pytest code and uproot version. Since it's proven so hard to trigger and I don't think this PR causes this error (since it can also be reproduced on the main branch of uproot), I think we can merge this PR and I will try to fix / understand the error in another PR where I enable the s3fs tests.

The code I used to reproduce this:

def test_open_fsspec_s3_issue():
    fs, path = fsspec.core.url_to_fs("s3://pivarski-princeton/pythia_ppZee_run17emb.picoDst.root")
    # fs, path = fsspec.core.url_to_fs("github://scikit-hep:scikit-hep-testdata@v0.4.33/src/skhep_testdata/data/uproot-issue121.root")
    # fs, path = fsspec.core.url_to_fs("https://github.com/scikit-hep/scikit-hep-testdata/raw/main/src/skhep_testdata/data/uproot-issue121.root")
    data = fs.cat_file(path, start=0, end=100)

    url = "s3://pivarski-princeton/pythia_ppZee_run17emb.picoDst.root:PicoDst"

    for handler in [
        uproot.source.s3.S3Source,
        uproot.source.s3.S3Source,
        uproot.source.s3.S3Source,
        uproot.source.s3.S3Source,
        uproot.source.s3.S3Source,
        uproot.source.s3.S3Source,
    ]:
        with uproot.open(
                url,
                anon=True,
                handler=handler,
        ) as f:
            data = f["Event/Event.mEventId"].array(library="np")
            assert len(data) == 8004

Notice in the start where I have defined multiple pairs of fs, path then I get some bytes from the file. Only when I run this line with the s3 backend it produces the error. Using a sync-only backend such as github or another async one such as https does not cause the error to trigger, so this is why I think s3fs is the culprit. I disabled the test that used s3fs and the error also disappeared.

…ec-optional-backends * origin/source-futures-submit: Future init do not use named arguments for the path/url annotation update submit interface

…ing asyncio

* origin/main: fix: url and object splitting for local files (#1007)

lobis · 2023-10-30T17:06:33Z

Trace for the error:

Testing started at 13:05 ...
Launching pytest with arguments test_0692_fsspec.py::test_open_fsspec_s3_issue --no-header --no-summary -q in /Users/lobis/git/uproot/tests

============================= test session starts ==============================
collecting ... collected 1 item

test_0692_fsspec.py::test_open_fsspec_s3_issue 

============================== 1 failed in 17.62s ==============================
FAILED                    [100%]
tests/test_0692_fsspec.py:98 (test_open_fsspec_s3_issue)
cls = <class '_pytest.runner.CallInfo'>
func = <function call_runtest_hook.<locals>.<lambda> at 0x10774faf0>
when = 'call'
reraise = (<class '_pytest.outcomes.Exit'>, <class 'KeyboardInterrupt'>)

    @classmethod
    def from_call(
        cls,
        func: "Callable[[], TResult]",
        when: "Literal['collect', 'setup', 'call', 'teardown']",
        reraise: Optional[
            Union[Type[BaseException], Tuple[Type[BaseException], ...]]
        ] = None,
    ) -> "CallInfo[TResult]":
        """Call func, wrapping the result in a CallInfo.
    
        :param func:
            The function to call. Called without arguments.
        :param when:
            The phase in which the function is called.
        :param reraise:
            Exception or exceptions that shall propagate if raised by the
            function, instead of being wrapped in the CallInfo.
        """
        excinfo = None
        start = timing.time()
        precise_start = timing.perf_counter()
        try:
>           result: Optional[TResult] = func()

cls        = <class '_pytest.runner.CallInfo'>
duration   = 17.538951792
excinfo    = <ExceptionInfo PytestUnraisableExceptionWarning('Exception ignored in: <function _SSLProtocolTransport.__del__ at 0x1021ab9d0>\n\nTra...g, source=self)\nResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x122382be0>\n') tblen=7>
func       = <function call_runtest_hook.<locals>.<lambda> at 0x10774faf0>
precise_start = 0.873500208
precise_stop = 18.412452
reraise    = (<class '_pytest.outcomes.Exit'>, <class 'KeyboardInterrupt'>)
result     = None
start      = 1698685524.7358022
stop       = 1698685542.2744918
when       = 'call'

../../../miniconda3/envs/uproot-38/lib/python3.8/site-packages/_pytest/runner.py:341: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/uproot-38/lib/python3.8/site-packages/_pytest/runner.py:262: in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
        ihook      = <HookCaller 'pytest_runtest_call'>
        item       = <Function test_open_fsspec_s3_issue>
        kwds       = {}
../../../miniconda3/envs/uproot-38/lib/python3.8/site-packages/pluggy/_hooks.py:493: in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
        firstresult = False
        kwargs     = {'item': <Function test_open_fsspec_s3_issue>}
        self       = <HookCaller 'pytest_runtest_call'>
../../../miniconda3/envs/uproot-38/lib/python3.8/site-packages/pluggy/_manager.py:115: in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
        firstresult = False
        hook_name  = 'pytest_runtest_call'
        kwargs     = {'item': <Function test_open_fsspec_s3_issue>}
        methods    = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/_pytest/runner.py'>>,
 <HookImpl plugin_name='skipping', plugin=<module '_pytest.skipping' from '/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/_pytest/skipping.py'>>,
 <HookImpl plugin_name='timeout', plugin=<module 'pytest_timeout' from '/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/pytest_timeout.py'>>,
 <HookImpl plugin_name='capturemanager', plugin=<CaptureManager _method='fd' _global_capturing=<MultiCapture out=<FDCapture 1 oldfd=5 _state='suspended' tmpfile=<_io.TextIOWrapper name="<_io.FileIO name=6 mode='rb+' closefd=True>" mode='r+' encoding='utf-8'>> err=<FDCapture 2 oldfd=7 _state='suspended' tmpfile=<_io.TextIOWrapper name="<_io.FileIO name=8 mode='rb+' closefd=True>" mode='r+' encoding='utf-8'>> in_=<FDCapture 0 oldfd=3 _state='started' tmpfile=<_io.TextIOWrapper name='/dev/null' mode='r' encoding='utf-8'>> _state='suspended' _in_suspended=False> _capture_fixture=None>>,
 <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x1076b9eb0>>,
 <HookImpl plugin_name='unraisableexception', plugin=<module '_pytest.unraisableexception' from '/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/_pytest/unraisableexception.py'>>,
 <HookImpl plugin_name='threadexception', plugin=<module '_pytest.threadexception' from '/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/site-packages/_pytest/threadexception.py'>>]
        self       = <_pytest.config.PytestPluginManager object at 0x10350bd90>
../../../miniconda3/envs/uproot-38/lib/python3.8/site-packages/_pytest/unraisableexception.py:88: in pytest_runtest_call
    yield from unraisable_exception_runtest_hook()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def unraisable_exception_runtest_hook() -> Generator[None, None, None]:
        with catch_unraisable_exception() as cm:
            yield
            if cm.unraisable:
                if cm.unraisable.err_msg is not None:
                    err_msg = cm.unraisable.err_msg
                else:
                    err_msg = "Exception ignored in"
                msg = f"{err_msg}: {cm.unraisable.object!r}\n\n"
                msg += "".join(
                    traceback.format_exception(
                        cm.unraisable.exc_type,
                        cm.unraisable.exc_value,
                        cm.unraisable.exc_traceback,
                    )
                )
>               warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
E               pytest.PytestUnraisableExceptionWarning: Exception ignored in: <function _SSLProtocolTransport.__del__ at 0x1021ab9d0>
E               
E               Traceback (most recent call last):
E                 File "/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/asyncio/sslproto.py", line 321, in __del__
E                   _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
E               ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x122382be0>

cm         = <_pytest.unraisableexception.catch_unraisable_exception object at 0x107742b20>
err_msg    = 'Exception ignored in'
msg        = ('Exception ignored in: <function _SSLProtocolTransport.__del__ at '
 '0x1021ab9d0>\n'
 '\n'
 'Traceback (most recent call last):\n'
 '  File '
 '"/Users/lobis/miniconda3/envs/uproot-38/lib/python3.8/asyncio/sslproto.py", '
 'line 321, in __del__\n'
 '    _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)\n'
 'ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport '
 'object at 0x122382be0>\n')

../../../miniconda3/envs/uproot-38/lib/python3.8/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning

Process finished with exit code 1```

jpivarski

This is a good simplification! I see that you need work-arounds for fsspec backends that don't have async and Python 3.8, which doesn't have to_thread, and that's okay.

Although sshfs has been added to the test dependencies, I think the only ssh test is disabled. I wonder if running sshd on the test-runner and connecting to

ssh `whoami`@localhost

would be an option? It's not a big deal.

Isolating the glitchiness of the test to S3 is good—we can provide the functionality without testing it because it's one of the things fsspec is supposed to do on its own. (We should only be responsible for using the fsspec API correctly in Uproot—there's a "separation of concerns.") It's too bad that it can't be reproduced outside of Uproot, but I know you put a lot of time into trying to do that.

I think this PR is ready to go as-is! Thanks!

lobis · 2023-10-31T14:09:26Z

Although sshfs has been added to the test dependencies, I think the only ssh test is disabled. I wonder if running sshd on the test-runner and connecting to
ssh `whoami`@localhost
would be an option? It's not a big deal.

Good idea I can try this in a different PR, I can use the cache directory for skhep_testdata after pulling the file in the same test.

Isolating the glitchiness of the test to S3 is good—we can provide the functionality without testing it because it's one of the things fsspec is supposed to do on its own. (We should only be responsible for using the fsspec API correctly in Uproot—there's a "separation of concerns.") It's too bad that it can't be reproduced outside of Uproot, but I know you put a lot of time into trying to do that.

I'm 90% sure it's s3fs but I cannot say for sure. I created #1012 to continue debugging it. I would say the problem lies here: https://github.com/fsspec/s3fs/blob/main/s3fs/core.py#L541-L560, I don't think the socket is properly closed.

src/uproot/source/fsspec.py

install optional fsspec backends in the CI for some python versions

e397896

lobis force-pushed the fsspec-optional-backends branch from ab20f43 to e397896 Compare October 19, 2023 04:01

lobis changed the title ~~test: install optional fsspec backends in the CI for some builds~~ test: install fsspec optional backends such as s3fs Oct 19, 2023

lobis mentioned this pull request Oct 19, 2023

Integration of fsspec #972

Closed

lobis and others added 5 commits October 19, 2023 13:31

Merge branch 'main' into fsspec-optional-backends

a93eb6c

update submit interface

c674789

annotation

8da9c09

do not use named arguments for the path/url

3242517

Future init

59bec72

lobis mentioned this pull request Oct 19, 2023

feat: update the executor submit interface to take keyword arguments and be compatible with concurrent.futures.ThreadPoolExecutor #1001

Merged

lobis and others added 20 commits October 19, 2023 14:11

add s3fs and sshfs as test dependencies (fsspec beckends)

2a97904

Merge remote-tracking branch 'origin/source-futures-submit' into fssp…

181fa74

…ec-optional-backends * origin/source-futures-submit: Future init do not use named arguments for the path/url annotation update submit interface

test fsspec s3 for more combination of parameters

e07231f

remove pip install from ci

5fa2dcc

Merge branch 'main' into fsspec-optional-backends

b3ddcc2

style: pre-commit fixes

8a842c9

Merge branch 'main' into fsspec-optional-backends

3fea743

revert test order

662b0f9

remove dependencies as a test

8cd55dd

add s3fs to test

ae0a96e

exclude s3fs to python version 3.12

ba58b8f

add sshfs test (skipped)

3b7dfeb

fix pytest

a60ca28

asyncio not available in 3.12

cbbd07d

asyncio not available in 3.12

06bbf92

add comment for fsspec threads

e9c71ac

attempt to close resources

37e5a24

handle s3fs case separate for now

d2fb8a4

attempt to pass tests

d30e531

attempt to pass tests

ee5eeab

lobis added 2 commits October 23, 2023 12:08

simplified

993f979

remove support for use_threads option, run non-async fs in threads us…

26924e2

…ing asyncio

lobis changed the title ~~test: install fsspec optional backends such as s3fs~~ feat: use only loop executor for fsspec source Oct 24, 2023

lobis and others added 8 commits October 24, 2023 18:15

stop the loop on resource shutdown

d898dbc

add skip for xrootd due to server issues

58fe0f0

Merge remote-tracking branch 'origin/main' into fsspec-optional-backends

a8550eb

* origin/main: fix: url and object splitting for local files (#1007)

remove skip for xrootd

09aa93a

remove shutdown

2e200b5

Merge branch 'main' into fsspec-optional-backends

45fddff

merge and fix conflicts

2f73c45

understand ci fail

27ca078

lobis marked this pull request as ready for review October 30, 2023 17:13

lobis requested review from jpivarski and nsmith- October 30, 2023 20:21

jpivarski approved these changes Oct 31, 2023

View reviewed changes

lobis merged commit 0f2c4da into main Oct 31, 2023
21 checks passed

lobis deleted the fsspec-optional-backends branch October 31, 2023 14:10

nsmith- reviewed Oct 31, 2023

View reviewed changes

src/uproot/source/fsspec.py Show resolved Hide resolved

This was referenced Nov 20, 2023

build: uproot 5.2.0 integration testing scikit-hep/coffea#930

Closed

multithreaded file source breaks interpretation #1035

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use only loop executor for `fsspec` source #999

feat: use only loop executor for `fsspec` source #999

lobis commented Oct 19, 2023 •

edited

Loading

lobis commented Oct 30, 2023

jpivarski left a comment

lobis commented Oct 31, 2023

feat: use only loop executor for fsspec source #999

feat: use only loop executor for fsspec source #999

Conversation

lobis commented Oct 19, 2023 • edited Loading

lobis commented Oct 30, 2023

jpivarski left a comment

Choose a reason for hiding this comment

lobis commented Oct 31, 2023

feat: use only loop executor for `fsspec` source #999

feat: use only loop executor for `fsspec` source #999

lobis commented Oct 19, 2023 •

edited

Loading