Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 path resolution extra slash when joined to bucket with key #167

Closed
theogaraj opened this issue Dec 21, 2023 · 2 comments · Fixed by #191
Closed

S3 path resolution extra slash when joined to bucket with key #167

theogaraj opened this issue Dec 21, 2023 · 2 comments · Fixed by #191
Assignees
Labels
bug 🐛 Something isn't working
Milestone

Comments

@theogaraj
Copy link

Which operating system and Python version are you using?
Windows 11, Python 3.9.6

Which version of this project are you using?
0.1.4

What did you do?

  1. Created a UPath from an S3 URI of a bucket with key suffix and trailing slash
  2. Used the / operator to create a new S3 path by joining a string to the original UPath
>>> bucket_with_key = UPath('s3://mybucket/withkey/')   # created UPath consisting of bucket and key with trailing slash
>>> subpath_new = bucket_with_key / 'subfolder/myfile.txt'
>>> subpath_new
S3Path('s3://mybucket/withkey//subfolder/myfile.txt')

What did you expect to see?
I would expect to see a single slash between s3://mybucket/withkey and subfolder/myfile.txt

What did you see instead?
Resultant S3Path has a double slash: S3Path('s3://mybucket/withkey//subfolder/myfile.txt')

Additional info
This works as expected when I have just a bucket, or if I have bucket and key without a trailing slash
Just a bucket:

>>> from upath import UPath
>>> bucketpath = UPath('s3://mybucket/')        # trailing slash with bucket only
>>> subpath = bucketpath / 'subfolder/myfile.txt'
>>> subpath
S3Path('s3://mybucket/subfolder/myfile.txt')

Bucket and key but no trailing slash:

>>> from upath import UPath
>>> bucket_with_key = UPath('s3://mybucket/withkey')  # bucket and key but no trailing slash
>>> subpath_new = bucket_with_key / 'subfolder/myfile.txt'
>>> subpath_new
S3Path('s3://mybucket/withkey/subfolder/myfile.txt')
@theogaraj
Copy link
Author

While continuing to work with UPath I noticed another problem related to trailing slashes. I don't know if this has the same underlying problem as the issue I described above or if it should be its own separate thing, but here's what I'm seeing...

Attempting to glob over a directory (either local, or S3). I've defined a UPath called files_location, and I'm attempting to iterate over all the files with for filepath in files_location.glob('*.json')

  • if on my local filesystem, it doesn't matter if I do files_location = UPath('data/myjsons') or files_location = UPath('data/myjsons/') (note the trailing slash in the second one)
  • however, if in S3, then files_location = UPath('s3://mybucket/myjsons') works, whereas file_locations = UPath('s3://mybucket/myjsons/') (note the trailing slash) does not work

@ap--
Copy link
Collaborator

ap-- commented Jan 24, 2024

Thank you for reporting. Handling double slashes in s3 is still an open issue.

While it's supported from the s3 side, I've seen a few cases so far, where those keys were created unintentionally do to bugs in the scripts that copied the files to s3.

Nevertheless, we need to add better support for handling those cases where you want to access existing s3 buckets which are not under your control.

Would you be interested in creating a PR with a testcase for your specific issue?

Cheers,
Andreas

@ap-- ap-- added the bug 🐛 Something isn't working label Jan 24, 2024
@ap-- ap-- added this to the v0.2.1 milestone Feb 18, 2024
@ap-- ap-- self-assigned this Feb 18, 2024
ap-- added a commit to ap--/universal_pathlib that referenced this issue Feb 18, 2024
@ap-- ap-- mentioned this issue Feb 18, 2024
@ap-- ap-- closed this as completed in #191 Feb 18, 2024
@ap-- ap-- closed this as completed in 4e2afca Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants