-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/TST: Read from Public s3 Bucket Without Creds #34877
BUG/TST: Read from Public s3 Bucket Without Creds #34877
Conversation
pandas/io/s3.py
Outdated
@@ -16,8 +16,8 @@ def _strip_schema(url): | |||
return result.netloc + result.path | |||
|
|||
|
|||
def get_fs(): | |||
return s3fs.S3FileSystem(anon=False) | |||
def get_fs(anon: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes will be removed by: #34266 and reverts error originally introduced in: https://github.com/pandas-dev/pandas/pull/33632/files#diff-a37b395bed03f0404dec864a4529c97dL37
d393a66
to
5d63555
Compare
@alimcmaster1 #34266 is merged now, so this can be refactored on top of that now |
@alimcmaster1 I think you still need to carry over your original fix to latest master? |
s3.py was deleted with the fsspec refactor -> I’ll update my fix. |
@alimcmaster1 do you have time to update this? |
Yes I’ll update on the next few days, apols for the delay. Think we need some try catch logic here: https://github.com/pandas-dev/pandas/pull/34266/files#diff-0d7b5a2c72b4dfc11d80afe159d45ff8R205. Agree with your comment #34626 (comment) we should deprecate eventually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are still failing on Travis .. (didn't look in detail)
pandas/io/common.py
Outdated
@@ -201,10 +201,22 @@ def get_filepath_or_buffer( | |||
if filepath_or_buffer.startswith("s3n://"): | |||
filepath_or_buffer = filepath_or_buffer.replace("s3n://", "s3://") | |||
fsspec = import_optional_dependency("fsspec") | |||
from botocore.exceptions import NoCredentialsError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessarily sure that botocore is installed at this point? (there are other fsspec filesystems that can end up here and not only s3fs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah updated - thoughts on this approach?
d6dd877
to
03ee472
Compare
).open() | ||
# If botocore is installed we fallback to reading with anon=True | ||
# to allow reads from public buckets | ||
err_types_to_retry_with_anon: List[Any] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alimcmaster1 what happens if we just catch PermissionError? Do you know which cases raise a NoCredentialsError or ClientError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ClientError
was raised by test_read_without_creds_from_pub_bucket
in this 3.7 build - https://travis-ci.org/github/pandas-dev/pandas/jobs/706462160
We were catching NoCredentialsError
before the fspec change - https://github.com/pandas-dev/pandas/pull/34266/files#diff-a37b395bed03f0404dec864a4529c97dL34 - so I decided to catch to be on the safe side. (But can't see a case that raises)
Thoughts?
Happy to address that as a followup, since this is fixing the regrssion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just restore the previous implementation (so what you have).
Made a small suggestion. In the future storage_options
will come from the user. We don't want to mutate it.
Co-authored-by: Tom Augspurger <TomAugspurger@users.noreply.github.com>
…pandas into mcmali-s3-pub-test
Thanks @alimcmaster1! |
* Public Bucket Read Test
We should potentially merge this before #34266 to confirm reading from s3 without credentials works.
Ref discussion here: #34793 (comment)
cc. @jorisvandenbossche