Skip to content

Strike regex only matches against file name for an S3 monitor #1928

@jw-s2eas

Description

@jw-s2eas

Pain Point? Please describe.
Setting up an S3 workspace only allows for it to point to the top level bucket name and not a "folder" within the bucket. Setting up an associated strike, the regular expression file ingest rule only gets applied to the base file name and not the entire file path within the bucket. This prevents the strike from matching its rules against "folders" within the bucket. For example:

bucket: my-data-bucket
Two files within bucket:
s3://my-data-bucket/source-alpha/2021AUG08_Image.png
s3://my-data-bucket/source-beta/2021AUG08_Image.png

Creating a strike ingest rule matching:
".*beta.*Image.png"

This would NOT match the file in the source-beta path as the rule only matches against "2021AUG08_Image.png"

Changing the rule to match ".*Image.png" would include files from other folder that I do not want.

Desired Solution
Change the rule matching to run against the entire file path instead of just the file name. This should probably be done in the ingest.models.is_there_rule_match() function. Alternatively allow the workspaces to monitor a "folder" within the S3 bucket directly.

Alternative / Workaround
Make buckets for every data feed with all folders at the top level? Not feasible if I don't control the bucket or am trying to group similar feeds.

Additional Context
This is probably a one line fix. Just change the self.file_name evaluation in the function to self.file_path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions