Skip to content

Conversation

@krfricke
Copy link
Contributor

These two methods don't accept **kwargs parameters, which makes them incompatible with libraries utilizing the abstract API. For example, pyarrow 6.0.1 fails for create dir:

import fsspec
import pyarrow.fs

fs = fsspec.filesystem("gs")
wrapped = pyarrow.fs.PyFileSystem(pyarrow.fs.FSSpecHandler(fs))
wrapped.create_dir("bucket/test", recursive=True)

with

  File "pyarrow/_fs.pyx", line 461, in pyarrow._fs.FileSystem.create_dir
  File "pyarrow/_fs.pyx", line 1129, in pyarrow._fs._cb_create_dir
  File "/Users/kai/.pyenv/versions/3.7.7/lib/python3.7/site-packages/pyarrow/fs.py", line 341, in create_dir
    self.fs.mkdir(path, create_parents=recursive)
  File "/Users/kai/.pyenv/versions/3.7.7/lib/python3.7/site-packages/fsspec/asyn.py", line 85, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/Users/kai/.pyenv/versions/3.7.7/lib/python3.7/site-packages/fsspec/asyn.py", line 47, in sync
    coro = func(*args, **kwargs)
TypeError: _mkdir() got an unexpected keyword argument 'create_parents'

Closes #404

@martindurant
Copy link
Member

Wouldn't it be better if we actually supported create_parents? For gcsfs, this would mean creating the bucket, if it doesn't already exist.

@krfricke
Copy link
Contributor Author

I think this PR is more about adhering to the fsspec API for forward compatibility. If creating an empty bucket is desired behavior for this API, this should probably be addressed in a different PR.

@martindurant
Copy link
Member

create_parents is in the spec signature, though: https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L245

@krfricke
Copy link
Contributor Author

krfricke commented Apr 19, 2022

The problem is that it is indeed part of the fsspec API: https://github.com/fsspec/filesystem_spec/blob/master/fsspec/asyn.py#L737-L738

    async def _mkdir(self, path, create_parents=True, **kwargs):
        pass  # not necessary to implement, may not have directories

but is not implemented in gcsfs: https://github.com/fsspec/gcsfs/blob/main/gcsfs/core.py#L606-L611

    async def _mkdir(
        self,
        bucket,
        acl="projectPrivate",
        default_acl="bucketOwnerFullControl",
        location=None,
    ):

(and mkdir is just the sync wrapper for _mkdir: https://github.com/fsspec/gcsfs/blob/main/gcsfs/core.py#L650)

@martindurant
Copy link
Member

Right, so instead of ignoring it with **kwargs, I am suggesting that now would be a good time to implement it - right in this PR.

@krfricke
Copy link
Contributor Author

I'm not familiar with gcs internals/API, so if someone would like to add to this PR, please go ahead.

@krfricke
Copy link
Contributor Author

krfricke commented Apr 19, 2022

Or is it literally just removing these two lines if create_parents=True:

        if "/" in bucket:
            return

@martindurant
Copy link
Member

How about that?

@krfricke
Copy link
Contributor Author

Thanks!

@martindurant martindurant merged commit 510b0ff into fsspec:main Apr 19, 2022
@krfricke krfricke deleted the mkdir-kwargs branch April 19, 2022 17:40
krfricke added a commit to ray-project/ray that referenced this pull request Apr 20, 2022
`gcsfs` complains about an invalid `create_parents` argument when using google cloud storage with cloud checkpoints. Thus we should use an alternative fs spec handler that omits this argument for gs.

The root issue will be fixed here: fsspec/gcsfs#471
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gcs.makedir raises exception: TypeError: _mkdir() got an unexpected keyword argument 'create_parents'

2 participants