Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: implementing the select object interface #143

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ExpandingMan
Copy link
Contributor

@ExpandingMan ExpandingMan commented Feb 5, 2021

S3 provides an interface for executing SQL style queries of slices of objects, see here. A basic method for this does appear in AWS.jl.

What I have here is a convenient interface for it. Unfortunately, it gets a little dicey, because AWS expects some rather elaborate arguments, that we need to provide reasonably defaults and options for, but ultimately what's here is not that much code.

For example, one should now be able to do

using AWSS3, Minio

csv = S3Path("s3://testbucket/test.csv", config=MinioConfig("http://localhost:9000"))
s3select(csv, "select * from s3object s where s.C >= 4")

Unfortunately this does not seem to be working, at least it is not working for me with min.io. I get the following error from my min.io server

ERROR: AWS.AWSExceptions.AWSException("405", "AWSException", Dict{String, Dict}(), HTTP.ExceptionReq
uest.StatusError(405, "POST", "/testbucket/test.csv", HTTP.Messages.Response:
"""
HTTP/1.1 405 Method Not Allowed
Accept-Ranges: bytes
Content-Length: 49
Server: MinIO
Vary: Origin
Date: Fri, 05 Feb 2021 00:47:23 GMT
Content-Type: text/plain; charset=utf-8

Not allowed (POST /testbucket/test.csv on S3 API)"""))

And in case you were wondering, yes, min.io does support this and yes, I'm using the latest version.

It seems that either AWS.jl is giving a bad request, or there is a bug in min.io causing the select interface not to conform to the AWS spec (presumably if that is the case, the bug becomes visible because of differences between AWS.jl and boto3).

To get to the bottom of it, we will need easier access to the explicit requests and receipts in the AWS.jl package. Ideally, this would already be provided by HTTP, but it isn't (at least not through debug logging) and that package is quite a lot more complicated than AWS.jl. So my next step should be to implement some useful debug logging in AWS.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant