Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/TST: Read from Public s3 Bucket Without Creds #34877

Merged
merged 15 commits into from
Jul 15, 2020

Conversation

alimcmaster1
Copy link
Member

@alimcmaster1 alimcmaster1 commented Jun 19, 2020

We should potentially merge this before #34266 to confirm reading from s3 without credentials works.

Ref discussion here: #34793 (comment)

cc. @jorisvandenbossche

@alimcmaster1 alimcmaster1 added IO CSV read_csv, to_csv Testing pandas testing functions or related to the test suite labels Jun 19, 2020
pandas/io/s3.py Outdated
@@ -16,8 +16,8 @@ def _strip_schema(url):
return result.netloc + result.path


def get_fs():
return s3fs.S3FileSystem(anon=False)
def get_fs(anon: bool = False):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes will be removed by: #34266 and reverts error originally introduced in: https://github.com/pandas-dev/pandas/pull/33632/files#diff-a37b395bed03f0404dec864a4529c97dL37

@alimcmaster1 alimcmaster1 added this to the 1.1 milestone Jun 20, 2020
@jorisvandenbossche
Copy link
Member

@alimcmaster1 #34266 is merged now, so this can be refactored on top of that now

@jorisvandenbossche
Copy link
Member

@alimcmaster1 I think you still need to carry over your original fix to latest master?

@alimcmaster1
Copy link
Member Author

@alimcmaster1 I think you still need to carry over your original fix to latest master?

s3.py was deleted with the fsspec refactor -> I’ll update my fix.

@TomAugspurger
Copy link
Contributor

@alimcmaster1 do you have time to update this?

@alimcmaster1
Copy link
Member Author

@alimcmaster1 do you have time to update this?

Yes I’ll update on the next few days, apols for the delay. Think we need some try catch logic here: https://github.com/pandas-dev/pandas/pull/34266/files#diff-0d7b5a2c72b4dfc11d80afe159d45ff8R205. Agree with your comment #34626 (comment) we should deprecate eventually.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are still failing on Travis .. (didn't look in detail)

@@ -201,10 +201,22 @@ def get_filepath_or_buffer(
if filepath_or_buffer.startswith("s3n://"):
filepath_or_buffer = filepath_or_buffer.replace("s3n://", "s3://")
fsspec = import_optional_dependency("fsspec")
from botocore.exceptions import NoCredentialsError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily sure that botocore is installed at this point? (there are other fsspec filesystems that can end up here and not only s3fs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah updated - thoughts on this approach?

@jreback jreback removed this from the 1.1 milestone Jul 9, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone Jul 13, 2020
@jorisvandenbossche jorisvandenbossche changed the title TST: Read from Public s3 Bucket Without Creds BUG/TST: Read from Public s3 Bucket Without Creds Jul 13, 2020
@jorisvandenbossche
Copy link
Member

@TomAugspurger

).open()
# If botocore is installed we fallback to reading with anon=True
# to allow reads from public buckets
err_types_to_retry_with_anon: List[Any] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alimcmaster1 what happens if we just catch PermissionError? Do you know which cases raise a NoCredentialsError or ClientError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ClientError was raised by test_read_without_creds_from_pub_bucket in this 3.7 build - https://travis-ci.org/github/pandas-dev/pandas/jobs/706462160

We were catching NoCredentialsError before the fspec change - https://github.com/pandas-dev/pandas/pull/34266/files#diff-a37b395bed03f0404dec864a4529c97dL34 - so I decided to catch to be on the safe side. (But can't see a case that raises)

Thoughts?

@TomAugspurger
Copy link
Contributor

Happy to address that as a followup, since this is fixing the regrssion.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just restore the previous implementation (so what you have).

Made a small suggestion. In the future storage_options will come from the user. We don't want to mutate it.

pandas/io/common.py Show resolved Hide resolved
@TomAugspurger TomAugspurger merged commit 32ed15d into pandas-dev:master Jul 15, 2020
@TomAugspurger
Copy link
Contributor

Thanks @alimcmaster1!

fangchenli pushed a commit to fangchenli/pandas that referenced this pull request Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: s3 reads from public buckets not working
4 participants