Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: S3 uris are allowed capital letters #294

Merged
merged 4 commits into from
Aug 5, 2024
Merged

fix: S3 uris are allowed capital letters #294

merged 4 commits into from
Aug 5, 2024

Conversation

CallumNZ
Copy link
Contributor

@CallumNZ CallumNZ commented Aug 2, 2024

No description provided.

S3 uris are allowed to have capital letters.
Copy link

@junghao junghao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I just popped in this PR and saw the unusual regex.

@@ -81,7 +81,7 @@ jobs:
echo "session-name=$SESSION_NAME" >> $GITHUB_OUTPUT
- name: validate
env:
REGEXP_S3_BUCKET: ^s3://[a-z0-9_/.-]+$
REGEXP_S3_BUCKET: ^s3://[a-zA-Z0-9_/.-]+$
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since A-Z being added in, we can simply use \w to represent a-zA-Z0-9_

Also, not sure how it worked, but the . here means everything. And the / there doesn't seem to be valid?

I think the regex should be \w\/\.- ? Or it's not actually a regex, just something similar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default the end point bucket needs to be lower case, e.g.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html

The following naming rules apply for directory buckets.

  • Be unique within the chosen AWS Region and Availability Zone.
  • Name must be between 3 (min) and 63 (max) characters long, including the suffix.
  • Consists only of lowercase letters, numbers and hyphens (-).
  • Begin and end with a letter or number.
  • Must include the following suffix: --azid--x-s3.
  • Bucket names must not start with the prefix xn--.
  • Bucket names must not start with the prefix sthree-.
  • Bucket names must not start with the prefix sthree-configurator.

For objects:

You can use any UTF-8 character in an object key name. However, using certain characters in key names can cause problems with some applications and protocols. The following guidelines help you maximize compliance with DNS, web-safe characters, XML parsers, and other APIs.

and then there's a list of recommendations:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that the workflow input s3-bucket in practice can be bucket-only or full s3 URI. For example, from docker-gamit-base:

fetch:
    uses: GeoNet/Actions/.github/workflows/reusable-copy-to-s3.yml@main
    with:
      artifact-name: gamit
      artifact-path: ./gg/
      s3-bucket: s3://yum-prod.geonet.org.nz/docker-extras/docker-gamit/
      cp-or-sync: cp
      direction: from # 'to' or 'from'

the reusable s3 workflow uses aws s3 as the copy mechanism, so I think that it should be treated as a URI.

@CallumNZ
Copy link
Contributor Author

CallumNZ commented Aug 4, 2024

@junghao

  • It seems you don't need to escape / is this environment.
  • I have added the full set of allowed characters as per AWS documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
  • . within [] seems to work as a literal.
  • I opted not to use \w because I think writing them out in full is clearer.
  • I have updated the description of the parameter s3-bucket to indicate that it is being used as a URI, not simply a bucket name.
  • I tried to deprecate the input to then use a clearer name (e.g. s3-bucket-uri), but that functionality is only available for Actions, not Workflows. So for now, I think we'll have to live with the misleading input name s3-bucket.
  • I have tried the regex on https://regex101.com and it works to filter out disallowed characters.

@CallumNZ CallumNZ requested a review from junghao August 4, 2024 23:45
@junghao
Copy link

junghao commented Aug 5, 2024

@CallumNZ
Would ' and ! causing issue when passing the regex string to https://github.com/GeoNet/Actions/blob/main/.github/workflows/reusable-copy-to-s3.yml#L96 ?

To avoid issue of ' being interpreted
@CallumNZ
Copy link
Contributor Author

CallumNZ commented Aug 5, 2024

@junghao good catch. A ' caused issues in script. Enclosing the regex pattern in "" fixes it.

@CallumNZ CallumNZ requested a review from wilsonjord August 5, 2024 02:19
@CallumNZ CallumNZ merged commit a3d3ed7 into main Aug 5, 2024
16 checks passed
@CallumNZ CallumNZ deleted the bucketURIregex branch August 5, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants