Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rclone sync from S3 to lakeFS failing #2580

Closed
peacing opened this issue Oct 20, 2021 · 2 comments
Closed

Rclone sync from S3 to lakeFS failing #2580

peacing opened this issue Oct 20, 2021 · 2 comments
Assignees

Comments

@peacing
Copy link
Contributor

peacing commented Oct 20, 2021

Running the command (rclone v1.56.2) to sync data from S3 to lakeFS (deployed on ECS):
rclone sync remote://lakefs-scale-demo/data/ lakefs:my-repo/main/ --dump requests

give the following response:

paulsingman@Pauls-MacBook-Pro ~ %  rclone sync remote://lakefs-scale-demo/data/ lakefs:my-repo/main/ --dump requests
2021/10/20 11:36:30 NOTICE: Automatically setting -vv as --dump is enabled
2021/10/20 11:36:30 DEBUG : rclone: Version "v1.56.2" starting with parameters ["rclone" "sync" "remote://lakefs-scale-demo/data/" "lakefs:my-repo/main/" "--dump" "requests"]
2021/10/20 11:36:30 DEBUG : Creating backend with remote "remote://lakefs-scale-demo/data/"
2021/10/20 11:36:30 DEBUG : Using config file from "/Users/paulsingman/.config/rclone/rclone.conf"
2021/10/20 11:36:30 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/10/20 11:36:30 DEBUG : fs cache: renaming cache item "remote://lakefs-scale-demo/data/" to be canonical "remote:lakefs-scale-demo/data"
2021/10/20 11:36:30 DEBUG : Creating backend with remote "lakefs:my-repo/main/"
2021/10/20 11:36:30 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/10/20 11:36:30 DEBUG : fs cache: renaming cache item "lakefs:my-repo/main/" to be canonical "lakefs:my-repo/main"
2021/10/20 11:36:30 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/10/20 11:36:30 DEBUG : HTTP REQUEST (req 0xc000596800)
2021/10/20 11:36:30 DEBUG : GET /?delimiter=%2F&encoding-type=url&max-keys=1000&prefix=data%2F HTTP/1.1
Host: lakefs-scale-demo.s3.us-east-1.amazonaws.com
User-Agent: rclone/v1.56.2
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20211020T153630Z
Accept-Encoding: gzip

2021/10/20 11:36:30 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/10/20 11:36:30 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/10/20 11:36:30 DEBUG : HTTP REQUEST (req 0xc000428700)
2021/10/20 11:36:30 DEBUG : GET /my-repo?delimiter=%2F&max-keys=1000&prefix=main%2F HTTP/1.1
Host: penv.lakefs.dev
User-Agent: rclone/v1.56.2
Accept-Encoding: gzip

2021/10/20 11:36:30 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/10/20 11:36:32 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/10/20 11:36:32 DEBUG : HTTP RESPONSE (req 0xc000428700)
2021/10/20 11:36:32 DEBUG : HTTP/2.0 200 OK
Content-Length: 517
Accept-Ranges: bytes
Cache-Control: no-cache, private, max-age=0
Content-Type: text/html; charset=utf-8
Date: Wed, 20 Oct 2021 15:36:32 GMT
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Last-Modified: Sun, 10 Oct 2021 15:11:56 GMT
Pragma: no-cache
X-Accel-Expires: 0

2021/10/20 11:36:32 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/10/20 11:36:32 DEBUG : S3 bucket my-repo path main: Retrying listing because of characters which can't be XML encoded
2021/10/20 11:36:32 DEBUG : pacer: low level retry 1/10 (error SerializationError: failed to decode REST XML response
	status code: 200, request id:
caused by: XML syntax error on line 9: attribute name without = in element)

My rclone config looks like:

paulsingman@Pauls-MacBook-Pro ~ % cat /Users/paulsingman/.config/rclone/rclone.conf
[lakefs]
type = s3
provider = Other
endpoint = https://penv.lakefs.dev
no_check_bucket = true
force_path_style = true

[remote]
type = s3
provider = AWS
env_auth = true
region = us-east-1
acl = private
bucket_acl = private
sse_kms_key_id = q
@arielshaqed arielshaqed self-assigned this Oct 21, 2021
@arielshaqed
Copy link
Contributor

arielshaqed commented Oct 21, 2021

Sorry! I am unable to reproduce this so I would like some more information:

  1. The reason for no_check_bucket is RClone requires S3 setting no-check-bucket #2447. However this prevents rclone from verifying some things, so (sorry) I'm going to have to ask you to verify that you have a repo my-repo with a branch main (e.g. lakectl fs ls lakefs://my-repo/main/), and then to remove no_check_bucket.
  2. What version of lakeFS are you using?
  3. Can you try again, this time with --dump headers,bodies,requests,responses? Maybe I can get enough information from there.

@peacing
Copy link
Contributor Author

peacing commented Oct 21, 2021

Fixed by setting lakeFS creds correctly in rclone.conf

@peacing peacing closed this as completed Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants