-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lakeFS S3 gateway behaviour when accessed from R #5441
Comments
Notes from @nopcoder on Slack:
|
I added this code to fix #987. That was a similar problem with the Akka client -- but at least it did send us correct headers. It seems the R client is broken and requests a We could prefer On a personal note... I dream of a protocol. Its spec will look exactly like HTTP, but clients and servers will actually follow that spec. And not request things in headers that they cannot follow. |
✅ Listing buckets now works See this notebook for test code and working example against minio: |
@nopcoder @arielshaqed I'm revisiting this; any idea why the object listing against lakeFS wouldn't work when the same code is fine against MinIO? |
last time if I remember it was related to content type. will need to retest to see if there is an issue with list response. |
I did a bit of digging and it looks like this is different from the content type issue - the payload itself is different from how MinIO handles an object underneath an otherwise-empty base path. Here's MinIO's response for a repo that has a single object <?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>test</Name>
<Prefix></Prefix>
<Marker></Marker>
<MaxKeys>1000</MaxKeys>
<Delimiter></Delimiter>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>main/Action_5.png</Key>
<LastModified>2023-07-03T15:58:49.257Z</LastModified>
<ETag>"0130d0c155d312ea8214f1641062ce99"</ETag>
<Size>5696</Size>
<Owner>
<ID>02d6176db174dc93cb1b899f7c6078f08654445fe8cf1b6ce98d8855f66bdbf4</ID>
<DisplayName>minio</DisplayName>
</Owner>
<StorageClass>STANDARD</StorageClass>
</Contents>
</ListBucketResult> Here's lakeFS': <?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult>
<Name>quickstart</Name>
<IsTruncated>false</IsTruncated>
<Prefix></Prefix>
<KeyCount>0</KeyCount>
<MaxKeys>1000</MaxKeys>
<CommonPrefixes>
<Prefix>main/</Prefix>
</CommonPrefixes>
<Marker></Marker>
</ListBucketResult> So the |
Found this which references using some alternative code, which does work:
The question is, should lakeFS be returning a structure similar to MinIO? |
Re-reading the linked issue I found that get_bucket_df(
base_url=baseurl,
bucket="quickstart",
use_https=FALSE,
prefix="main/",
region="",
verbose=FALSE) Returns a dataframe with the all the keys as expected
The open question
|
@rmoff I think I understand the issue, but I'll try to explain first. When listing objects in lakeFS there is a special case which is the repository/bucket level. At his level we should list branches. Listing branches doesn't support recursive listing, unless you list something under the branch. The above output from MinIO does include the objects, like in recursive list, but it is because at this level lakeFS had branch and minio doesn't. We are getting no output in R at the bucket level because we return common prefixes (folders) for each branch in the repository. About require |
OK that makes sense, thanks @nopcoder. |
Added note to R doc. Closing. |
I'm trying to use the S3 gateway from R, but not having much luck.
R library and S3 HTTP code
AWS CLI
List buckets from boto
Output
Server log
List buckets from R
Output: Note that nothing is returned for either command
Server log
Prove R code works against another S3 implementation
Output
The text was updated successfully, but these errors were encountered: