-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: s3 reads from public buckets not working #34626
Comments
@ayushdg thanks for the report! cc @simonjayhawkins @alimcmaster1 for 1.0.5, it might be safer to revert #33632, and then target the fixes (like #34500) to master |
Agree @jorisvandenbossche - do you want me to open a PR to revert #33632 on 1.0.x branch? Apologies for this change it didn’t go as planned. I’ll check why our test cases didn’t catch the above! |
Yes, that sounds good
No, no, nobody of us had foreseen the breakages ;) |
Can't seem to reproduce this using moto... Potentially related: https://github.com/dask/s3fs/blob/master/s3fs/tests/test_s3fs.py#L1089 (I can repo locally using the s3 URL above - if I remove AWS Creds from my environment) |
The fix for this to target 1.1 is to set ‘anon=True’ in S3FileSystem https://github.com/pandas-dev/pandas/pull/33632/files#diff-a37b395bed03f0404dec864a4529c97dR41 I’ll wait as we are moving to fsspec which gets rid of this logic #34266 - but we should definitely trying using moto to test this. |
Can anyone summarize the status here? 1.0.3: worked Do we have a plan in place to restore this? IIUC the old way was to
|
Yep, it broke in 1.0.4, and will be fixed in 1.0.5 by reverting the patch that broke it. The old way was indeed to try with |
Thanks
It's not. So we'll need to do that explicitly. Long-term we might want to get away from this logic by asking users to do |
Closes pandas-dev#34626 This works in 1.0.4 I think, so no whatsnew.
On the other hand, it seems nice that reading from a public bucket just works out of the box without needing the pass any option? |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
Error stack trace
Problem description
Reading directly from s3 public buckets (without manually configuring the
anon
parameter via s3fs) is broken with pandas 1.0.4 (worked with 1.0.3).Looks like reading from public buckets requires
anon=True
while creating the filesystem. This 22cf0f5 seems to have introduced the issue, whereanon=False
is passed when thenoCredentialsError
is encountered.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-55-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.4
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 47.1.1.post20200604
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : None
pyxlsb : None
s3fs : 0.4.2
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None
The text was updated successfully, but these errors were encountered: