Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: backfill @AlsoRequired annotation to processor attributes #5086

Conversation

chenqi0805
Copy link
Collaborator

@chenqi0805 chenqi0805 commented Oct 18, 2024

Description

This PR backfills @AlsoRequired annotation to processor attributes for schema generation purpose:

aggregate:

  • aggregated_events_tag is required when output_unaggregated_events is non null.

flatten:

  • remove_bracket can only be set to true if remove_list_indices is true

key_value:

  • field_split_characters and field_delimiter_regex are mutually exclusive
  • field_delimiter_regex are mutually exclusive with field_split_characters and value_grouping
  • value_split_characters and key_value_delimiter_regex are mutually exclusive
  • string_literal_character depends on value_grouping enabled

add_entry:

  • string_literal_character depends on value_grouping enabled
  • key and metadata_key are mutually exclusive
  • format, value and value_expression are mutually exclusive
  • overwrite_if_key_exists and append_if_key_exists are mutually exclusive

convert_entry:

  • key and keys are mutually exclusive

copy_value

  • from_list and to_list are mutually dependent

split string:

  • delimiter_regex and delimiter are mutually exclusive

split event:

  • delimiter_regex and delimiter are mutually exclusive

Example schema after this change:

{
  "$schema" : "https://json-schema.org/draft/2020-12/schema",
  "type" : "object",
  "properties" : {
    "entries" : {
      "description" : "A list of entries to add to the event.",
      "minItems" : 1,
      "type" : "array",
      "items" : {
        "type" : "object",
        "properties" : {
          "key" : {
            "type" : "string",
            "description" : "The key of the new entry to be added. Some examples of keys include <code>my_key</code>, <code>myKey</code>, and <code>object/sub_Key</code>. The key can also be a format expression, for example, <code>${/key1}</code> to use the value of field <code>key1</code> as the key."
          },
          "metadata_key" : {
            "type" : "string",
            "description" : "The key for the new metadata attribute. The argument must be a literal string key and not a JSON Pointer. Either one of <code>key</code> or <code>metadata_key</code> is required."
          },
          "value" : {
            "description" : "The value of the new entry to be added, which can be used with any of the following data types: strings, Booleans, numbers, null, nested objects, and arrays."
          },
          "format" : {
            "type" : "string",
            "description" : "A format string to use as the value of the new entry, for example, <code>${key1}-${key2}</code>, where <code>key1</code> and <code>key2</code> are existing keys in the event. Required if neither<code>value</code> nor <code>value_expression</code> is specified."
          },
          "value_expression" : {
            "type" : "string",
            "description" : "An expression string to use as the value of the new entry. For example, <code>/key</code> is an existing key in the event with a type of either a number, a string, or a Boolean. Expressions can also contain functions returning number/string/integer. For example, <code>length(/key)</code> will return the length of the key in the event when the key is a string. For more information about keys, see <a href=\"https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/\">Expression syntax</a>."
          },
          "overwrite_if_key_exists" : {
            "type" : "boolean",
            "description" : "When set to <code>true</code>, the existing value is overwritten if <code>key</code> already exists in the event. The default value is <code>false</code>."
          },
          "append_if_key_exists" : {
            "type" : "boolean",
            "description" : "When set to <code>true</code>, the existing value will be appended if a <code>key</code> already exists in the event. An array will be created if the existing value is not an array. Default is <code>false</code>."
          },
          "add_when" : {
            "type" : "string",
            "description" : "A <a href=\"https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/\">conditional expression</a>, such as <code>/some-key == \"test\"'</code>, that will be evaluated to determine whether the processor will be run on the event."
          }
        },
        "dependentRequired" : {
          "key" : [ "metadata_key:[null]" ],
          "metadata_key" : [ "key:[null]" ],
          "value" : [ "format:[null]", "value_expression:[null]" ],
          "format" : [ "value:[null]", "value_expression:[null]" ],
          "value_expression" : [ "value:[null]", "format:[null]" ],
          "overwrite_if_key_exists" : [ "append_if_key_exists:[false]" ],
          "append_if_key_exists" : [ "overwrite_if_key_exists:[false]" ]
        }
      }
    }
  },
  "required" : [ "entries" ],
  "description" : "The <code>add_entries</code> processor adds entries to an event.",
  "name" : "add_entries",
  "documentation" : "https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/add_entries/"
}

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: George Chen <qchea@amazon.com>
…cessors

Signed-off-by: George Chen <qchea@amazon.com>
@chenqi0805 chenqi0805 marked this pull request as ready for review October 18, 2024 18:51
@JsonPropertyDescription("Specifies whether to group values using predefined grouping delimiters. " +
"If this flag is enabled, then the content between the delimiters is considered to be one entity and " +
"they are not parsed as key-value pairs. The following characters are used a group delimiters: " +
"<code>{...}</code>, <code>[...]</code>, <code>&lt;...&gt;</code>, <code>(...)</code>, <code>\"...\"</code>, <code>'...'</code>, <code>http://... (space)</code>, and <code>https:// (space)</code>. " +
"Default is <code>false</code>. For example, if <code>value_grouping</code> is <code>true</code>, then " +
"<code>{\"key1=[a=b,c=d]&amp;key2=value2\"}</code> parses to <code>{\"key1\": \"[a=b,c=d]\", \"key2\": \"value2\"}</code>.")
@AlsoRequired(values = {
@AlsoRequired.Required(name = FIELD_DELIMITER_REGEX_KEY, allowedValues = {"null"})
})
private boolean valueGrouping = false;

@JsonProperty(value = "recursive", defaultValue = "false")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recursive config has a restriction where remove_brackets must be false, skip_duplicate_values is always true, and whitespace is always strict

Signed-off-by: George Chen <qchea@amazon.com>
Copy link
Collaborator

@oeyh oeyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a merge conflict.

Also, there're some other cases that the annotation can be added:
list_to_map: key is required if use_source_key is false
translate: under targets, map and regex are mutually exclusive

Signed-off-by: George Chen <qchea@amazon.com>
Signed-off-by: George Chen <qchea@amazon.com>
Signed-off-by: George Chen <qchea@amazon.com>
@chenqi0805
Copy link
Collaborator Author

chenqi0805 commented Oct 18, 2024

@oeyh Thanks for the catch!

list_to_map: key is required if use_source_key is false

We might need separate annotation to capture this case.

@chenqi0805 chenqi0805 requested a review from oeyh October 21, 2024 14:45
@chenqi0805 chenqi0805 merged commit 87f0649 into opensearch-project:main Oct 21, 2024
45 of 47 checks passed
@chenqi0805 chenqi0805 deleted the enh/backfill-also-required-annotation-to-processors branch October 21, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants