Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add unsafe_dynamic_query to gcp_bigquery_select #190

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jem-davies
Copy link
Collaborator

Will enable things like:

# {"table": "test.people", "where": "city IN (?,?,?)", "columns": ["name", "age", "city"], "args": ["London", "Paris", "Dublin"]}
pipeline:
  processors:
    - gcp_bigquery_select:
        project: ${GGP_PROJECT}
        table: ${! this.table } # test.people
        columns_mapping: root = this.columns #["name", "age", "city"]
        where:  ${! "city IN ("+this.args.join(",").re_replace_all("\\b\\w+\\b","?")+")" } # city IN (?,?,?)
        args_mapping: root = this.args # ["London", "Paris", "Dublin"]
        unsafe_dynamic_query: true

So that there is greater flexibility with this processor.

…erpolation for dynamic queries

Signed-off-by: Jem Davies <jemsot@gmail.com>
Signed-off-by: Jem Davies <jemsot@gmail.com>
Signed-off-by: Jem Davies <jemsot@gmail.com>
pipeline:
processors:
- gcp_bigquery_select:
project: ${GGP_PROJECT}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
project: ${GGP_PROJECT}
project: ${GCP_PROJECT}

return
}
}
if inConf.Contains("columns_mapping") {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably need to add a linter such that only 1 columns / columns_mapping field is provided

Description("A list of columns to query.").
Optional()).
Field(service.NewBloblangField("columns_mapping").
Description("An optional [Bloblang mapping](/docs/guides/bloblang/about) which should evaluate to an array of column names to query.").
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add better description - think about how we need to set unsafe_dynamic_query for this field to be evaluated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just have a single column field that either supports plain string or bloblang interpolation?

`
An example to show the use of the unsafe_dynamic_query field:`,
`
# {"table": "test.people", "where": "city IN (?,?,?)", "columns": ["name", "age", "city"], "args": ["London", "Paris", "Dublin"]}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# {"table": "test.people", "where": "city IN (?,?,?)", "columns": ["name", "age", "city"], "args": ["London", "Paris", "Dublin"]}
# {"table": "test.people", "columns": ["name", "age", "city"], "args": ["London", "Paris", "Dublin"]}

Copy link
Collaborator

@gregfurman gregfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one here. Some questions/feedback.

Also, can we write a test for this? Just a unit test with this new interpolation functionality would be fine.

Stretch goal though: am thinking it could be nice to add an integration test using that bigquery emulator. Check our the setupBigQueryEmulator function and the TestGCPBigQueryStorageOutputWriteOK test for reference.

Description("A list of columns to query.").
Optional()).
Field(service.NewBloblangField("columns_mapping").
Description("An optional [Bloblang mapping](/docs/guides/bloblang/about) which should evaluate to an array of column names to query.").
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just have a single column field that either supports plain string or bloblang interpolation?

Comment on lines +129 to +130
Field(service.NewBoolField("unsafe_dynamic_query").
Description("Whether to enable [interpolation functions](/docs/configuration/interpolation/#bloblang-queries) in the columns & where fields. Great care should be made to ensure your queries are defended against injection attacks.").
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add an experimental note here or some admonition that specifies this approach will likely be re-worked in future releases.


proc.config.queryParts.columns, err = toStringSlice(cols)
if err != nil {
msg.SetError(fmt.Errorf("%w", err))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either wrap this with an error message or just pass the err

}

for i, msg := range batch {
outBatch = append(outBatch, msg)

if proc.config.unsafeDyn {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider putting this into its own function? that way instead of this msg.SetError we can just return an err and set it once.

Thinking something like:

if proc.config.unsafeDyn {
  result, err := proc.executeDynamicQuery(msg)
  if err != nil {
    msg.SetError(err)
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants