Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move s3 path style into s3.begin/s3.end blocks #1601

Merged
merged 1 commit into from
Oct 2, 2024

Conversation

jhiemstrawisc
Copy link
Member

@jhiemstrawisc jhiemstrawisc commented Sep 27, 2024

It looks like the path style directive is scoped to each s3.begin/s3.end block, which is why the previous declaration failed for multiple buckets. Technically it would succeed if all the buckets were supposed to use virtual paths (the default), but in the path-style case, only the first bucket inherited the value. Other buckets reverted to the default.

This commit scopes the path style to each origin export to guarantee each of the request URLs generated by XRootD follow the same bucket convention.

The easiest way to test this is probably by starting an Origin with full debug logging at the origin/xrootd level and point it at two public AWS buckets. Here's origin config I used:

Logging:
  Level: debug
  Origin:
    Xrootd: trace
    Http: debug
Origin:
  S3UrlStyle: "path"
  S3Region: "us-east-1"
  S3ServiceUrl:  https://s3.us-east-1.amazonaws.com
  StorageType: "s3"
  Exports:
    - FederationPrefix: "/aws-opendata/noaa-wod-pds"
      S3Bucket: "noaa-wod-pds"
      Capabilities: ["PublicReads", "Listings", "DirectReads"]
    - FederationPrefix: "/aws-opendata/genome-browser"
      S3Bucket: "genome-browser"
      Capabilities: ["PublicReads", "Listings", "DirectReads"]

From there, curl one object from each namespace. Two objects for testing are

/noaa-wod-pds/MD5SUMS
/genome-browser/htdocs/.welcome.msg

You can then search through the origin log for the generated URLs. Prior to this commit, you'd find that the object request for the first listed bucket (in this case noaa-wod-pds) generates the URL https://s3.us-east-1.amazonaws.com/noaa-wod-pds/MD5SUMS while the object request for any subsequent buckets reverts to virtual-style, e.g. https://genome-browser.s3.us-east-1.amazonaws.com/htdocs/.welcome.msg

After the change, both of these URLs should have the form https://s3.us-east-1.amazonaws.com/<bucket>/<object>

Closes #1561

It looks like the path style directive is scoped to each s3.begin/s3.end block,
which is why the previous declaration failed for multiple buckets. Technically it
would succeed if all the buckets were supposed to use virtual paths (the default).

This commit scopes the path style to each origin export to guarantee each of the
request URLs generated by XRootD follow the same bucket convention.
@jhiemstrawisc jhiemstrawisc added critical High priority for next release origin Issue relating to the origin component labels Sep 27, 2024
@jhiemstrawisc jhiemstrawisc added this to the v7.11.0 milestone Sep 27, 2024
@turetske turetske merged commit 5dd84a7 into PelicanPlatform:main Oct 2, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical High priority for next release origin Issue relating to the origin component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure to handle multiple buckets with S3 url style parameter
2 participants