Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search attribute validation toggling #5033

Conversation

charlese-instaclustr
Copy link
Contributor

  • Introduced a new dynamic config parameter, frontend.enableQueryAttributeValidation. This parameter is true by default, and when it is true there is no change to the current Cadence behaviour.
  • When set to false, the following changes in behaviour have been made:
  1. When running a query for workflows using a custom search attribute in an Advanced Visibility setup, the attributes will not be validated against the frontend.validSearchAttributes map before being used to query from Elasticsearch/OpenSearch.
  2. When starting a workflow and attaching custom attribute values to it, the attribute key will not be validated against the frontend.validSearchAttributes map before attempting to write to Elasticsearch/OpenSearch. Similarly, the datatype of the attribute value will not be checked against the validation map.

Addresses issue #4948

When running Cadence in a multi-node setup, updates to the frontend.validSearchAttributes map in the dynamic config is required on every node every time a new custom search attribute is introduced. It may be desirable to be able to add custom search attributes and not have to subsequently update the map in some cases. For example, this can be limiting if the dev. seeking to add the search attributes does not have the requisite access to the node configs; this could create limiting interdependencies hindering development.

Setting this dynamic config to disable validation on custom search attributes may therefore be appropriate in certain Cadence use cases.

How did you test it?

Tested locally by running Cadence with Cassandra, Kafka, and OpenSearch. Ran the following operation tests to confirm none of them caused issues for Cadence or OpenSearch:

  • Regression tested adding, starting workflows with and querying with a search attribute TEST1 (i.e. with the config property still set to true)
  • Set the property to false
  • Added a new search attribute TEST2 which was then made sure to be removed from the frontend.validSearchAttributes map.
  • Started a workflow with TEST2 set to a valid value, workflow ran and could be queried with this attribute successfully.
  • Started a workflow with TEST2 set to an invalid value, validation was not applied as expected but this did not cause any breaking errors in Cadence / OpenSearch
  • Started a workflow with TEST3 (an attribute key which had not been added to the Cadence cluster). Validation was not applied as expected, but this did not cause any breaking errors in Cadence / Opensearch.

Potential risks

This change will pose no risk for all running clusters unless they specifically set the new frontend.EnableQueryAttributeValidation value to false, because it defaults to true and true will produce no change in behaviour.

When used, the new property/behaviour should not pose significant risk. Tests have been done to check that the removal of the validation does not lead to Cadence or OpenSearch crashing with unexpected inputs or edge case user behaviour. Obviously, without the validation being active, the onus will be on the user to be diligent in remembering what search attributes are available to be set and their correct datatypes, however as mentioned this will only be the case if they actively opt in to using this feature, at which point they should be well-informed about its effect.

No, because the default value of the property produces no change in behaviour

Documentation Changes

Generated automatically with this change.

@demirkayaender demirkayaender requested a review from a team November 28, 2022 16:42
@ZackLK
Copy link
Contributor

ZackLK commented Nov 28, 2022

Code and idea look good. One question: did you test out the case of different data types and whether that caused any breaking errors in Opensearch? (I think you may have covered this with TEST2, but wanted to double-check)

@charlese-instaclustr
Copy link
Contributor Author

Code and idea look good. One question: did you test out the case of different data types and whether that caused any breaking errors in Opensearch? (I think you may have covered this with TEST2, but wanted to double-check)

Hi @ZackLK , thanks for the review.
Yep, so as you suggested, I've tried several permutations of incorrect datatypes (i.e. passing a double to a key expecting a string, a string to a key expecting a double, and similarly with ints). I've not produced any breaking errors by doing so.

The worst behaviour I've seen is when passing a string i.e. "hello" when the key expects a numerical value when starting a workflow. The workflow seemingly gets accepted by the CLI, e.g.

cadence --do samples-domain workflow start --tl helloWorldGroup --wt main.Workflow --et 30 --dt 10 -i '"vancexu"' -search_attr_key 'testInt1' -search_attr_value '"hello"'   

Started Workflow Id: ad7afa94-94f2-4a4b-9be5-a5274e88c4b8, run Id: 9e7ae1ec-46dc-4ac4-935c-d4bf3a85a29e

However Cadence logs will show that the cadence-worker services raises an ES request failed error and a nack message and publish to DLQ warning. Then, when running cadence --do samples-domain wf list the workflow will not be listed under any status. So it seems the workflow will not actually start, in spite of the CLI message received when requesting the start.

So, not a breaking error by any means, but a slightly confusing behaviour (albeit one that only arises if users are misusing their search attribute keys). Do you have any suggestions about modifications that should be made to address this, if at all?

Cheers

@davidporter-id-au davidporter-id-au enabled auto-merge (squash) November 29, 2022 02:06
@davidporter-id-au
Copy link
Member

ty for the solid PR and docs

…rlese-instaclustr/cadence into search-attribute-validation-toggling

Merge with upstream changes
auto-merge was automatically disabled December 12, 2022 18:59

Head branch was pushed to by a user without write access

@charlese-instaclustr
Copy link
Contributor Author

Hi @ZackLK ,

Found that there was a unit test case failing as I missed adding the new config property to that one; hopefully all is well now. Also, please let me know if there's any recommended action on my point about edge case handling, above. Thanks!

If you're happy with the change, could I also please request support with re-enabling auto-merge? Thanks.

@charlese-instaclustr
Copy link
Contributor Author

Hi @ZackLK ,

Apologies that this has taken a couple of rounds, I'm still learning my way around the integration tests. But I believe I've fixed the issue -- seeing the tests pass locally now -- so if you would be able to provide one more approval if you're happy that'd be great, thanks!

@davidporter-id-au davidporter-id-au enabled auto-merge (squash) December 28, 2022 17:55
@coveralls
Copy link

Pull Request Test Coverage Report for Build 018559e0-14f4-4cd0-9149-b2fc693b049c

  • 37 of 40 (92.5%) changed or added relevant lines in 6 files are covered.
  • 67 unchanged lines in 11 files lost coverage.
  • Overall coverage increased (+0.03%) to 57.231%

Changes Missing Coverage Covered Lines Changed/Added Lines %
common/elasticsearch/validator/queryValidator.go 8 9 88.89%
service/worker/indexer/processor.go 1 3 33.33%
Files with Coverage Reduction New Missed Lines %
common/task/weightedRoundRobinTaskScheduler.go 1 89.64%
common/persistence/historyManager.go 2 66.67%
common/persistence/persistenceMetricClients.go 2 55.45%
service/matching/matcher.go 2 91.46%
common/persistence/serialization/parser.go 4 65.41%
common/persistence/serialization/thrift_decoder.go 4 61.22%
service/history/task/fetcher.go 4 91.24%
common/task/fifoTaskScheduler.go 7 82.47%
common/persistence/sql/workflowStateMaps.go 8 86.21%
common/persistence/serialization/getters.go 12 61.46%
Totals Coverage Status
Change from base Build 018531b6-fc4b-4f20-bc5d-416bed083f59: 0.03%
Covered Lines: 84920
Relevant Lines: 148381

💛 - Coveralls

@davidporter-id-au davidporter-id-au merged commit d618e32 into cadence-workflow:master Dec 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants