I/O operation on closed file #1032

jontwo · 2022-08-31T12:49:10Z

The latest version has broken our CI, but it might be that you've just exposed an issue in dask and I actually need to create a ticket there instead. This test worked fine at fsspec==2022.7.1.

Callstack

_______________________ TestSortLargeCSV.test_one_column _______________________
farmlib/core/helpers/tests/test_io.py:657: in test_one_column
    sort_large_csv(self.input_filename,
farmlib/core/helpers/io.py:742: in sort_large_csv
    events = dd.read_csv(input_file, blocksize=blocksize)
venv/lib/python3.8/site-packages/dask/dataframe/io/csv.py:744: in read
    return read_pandas(
venv/lib/python3.8/site-packages/dask/dataframe/io/csv.py:548: in read_pandas
    b_out = read_bytes(
venv/lib/python3.8/site-packages/dask/bytes/core.py:[149](https://gitlab.com/jbariskmanagement/code/farmlib/-/jobs/2957863814#L149): in read_bytes
    values = [
venv/lib/python3.8/site-packages/dask/bytes/core.py:[150](https://gitlab.com/jbariskmanagement/code/farmlib/-/jobs/2957863814#L150): in <listcomp>
    delayed_read(
venv/lib/python3.8/site-packages/dask/delayed.py:695: in __call__
    return call_function(
venv/lib/python3.8/site-packages/dask/delayed.py:662: in call_function
    args2, collections = unzip(map(unpack_collections, args), 2)
venv/lib/python3.8/site-packages/dask/delayed.py:38: in unzip
    out = list(zip(*ls))
venv/lib/python3.8/site-packages/dask/delayed.py:93: in unpack_collections
    if is_dask_collection(expr):
venv/lib/python3.8/site-packages/dask/base.py:187: in is_dask_collection
    return x.__dask_graph__() is not None
venv/lib/python3.8/site-packages/fsspec/core.py:212: in __getattr__
    return getattr(self.f, item)
venv/lib/python3.8/site-packages/fsspec/core.py:149: in f
    raise ValueError(
E   ValueError: I/O operation on closed file. Please call open() or use a with context

Repro case

class TestSortLargeCSV:
    @pytest.fixture(autouse=True)
    def setup_method(self):
        self.temp_dir = TemporaryDirectory()
        self.input_filename = os.path.join(self.temp_dir.name, "input.csv")

    def test_one_column(self):
        df = pd.DataFrame(columns=["col1"],
                          data=[["a"], ["b"], ["z"], ["x"]])
        df.to_csv(self.input_filename, index=False)

        sort_large_csv(self.input_filename,
                       self.output_filename,
                       index_column="col1",
                       blocksize=self.blocksize)

The text was updated successfully, but these errors were encountered:

martindurant · 2022-08-31T13:43:55Z

Yes, there is a known regression that I should be able to clean up this morning.

tommyjcarpenter · 2022-08-31T18:36:51Z

Is it possible to yank the bad version? Our builds also failed to this (we are now pinning it back)

martindurant · 2022-08-31T18:40:59Z

The fixed version is now out. Do you still need the yank>

jasonwdon · 2022-08-31T21:19:41Z

Hi! I'm still having this issue on version 2022.8.1. I get this error when I use pandas.read_csv("s3://file", compression='gzip', header=0). Pinning to 2022.7.1 resolves the issue

cperriard · 2022-08-31T22:06:56Z

Hi, I get the same error when writing to S3 with pandas_df.to_json("s3://bucket/file.jsonlines", orient="records", lines=True).
It worked with version 2022.8.0.

tommyjcarpenter · 2022-09-01T00:09:25Z

We didn't need the yank since we back pinned, however this thread seems to have continued

martindurant · 2022-09-01T01:08:28Z

Sorry for the mess. I yanked and made 2022.8.2 which should work for everyone.

tommyjcarpenter · 2022-09-01T01:17:13Z

Thanks for your quick resolution. This package has huge indirect exposure since pandas depends on it, so errors propagate quickly :)

martindurant · 2022-09-01T02:32:36Z

...which is both good and bad. Created #1036 to try to do a better job at this.

tommyjcarpenter · 2022-09-01T09:32:33Z

IMO a very simple test is simply

import pandas as pd
pd.read_csv("s3://some_csv_in_s3_somewhere")
pd.read_parquet(..

we use this extensively - fsspec is mentioned even in the pandas documentation for read_csv (see "storage options" https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) which is how I even tracked down what broke yesterday.

martindurant mentioned this issue Sep 1, 2022

Don't autoclose OpenFile - user must do it #1035

Merged

martindurant closed this as completed in #1035 Sep 1, 2022

albertvillanova mentioned this issue Sep 1, 2022

Unpin fsspec huggingface/transformers#18846

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I/O operation on closed file #1032

I/O operation on closed file #1032

jontwo commented Aug 31, 2022

martindurant commented Aug 31, 2022

tommyjcarpenter commented Aug 31, 2022

martindurant commented Aug 31, 2022

jasonwdon commented Aug 31, 2022

cperriard commented Aug 31, 2022

tommyjcarpenter commented Sep 1, 2022

martindurant commented Sep 1, 2022

tommyjcarpenter commented Sep 1, 2022

martindurant commented Sep 1, 2022

tommyjcarpenter commented Sep 1, 2022

I/O operation on closed file #1032

I/O operation on closed file #1032

Comments

jontwo commented Aug 31, 2022

Callstack

Repro case

martindurant commented Aug 31, 2022

tommyjcarpenter commented Aug 31, 2022

martindurant commented Aug 31, 2022

jasonwdon commented Aug 31, 2022

cperriard commented Aug 31, 2022

tommyjcarpenter commented Sep 1, 2022

martindurant commented Sep 1, 2022

tommyjcarpenter commented Sep 1, 2022

martindurant commented Sep 1, 2022

tommyjcarpenter commented Sep 1, 2022