Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add patterns and grok command #813

Merged
merged 28 commits into from
Sep 30, 2022
Merged

Conversation

joshuali925
Copy link
Member

@joshuali925 joshuali925 commented Sep 9, 2022

Description

see #814 , #815

  • add java-grok source code into sql.common
    • java-grok is an independent library that's used for the PPL's grok command, but in the latest release Grok is not serializable (and SQL grok expression cannot be pushed down). After discussing with peng I'm tentatively moving the source code into sql.common module.
  • add grok PPL command, similar to how parse currently works with regex
  • add patterns command, which removes unwanted characters from a log line to create a new field as the pattern of that log line
  • patterns currently supports default pattern (remove all alphanumerical characters) or user can define what characters should be removed

Issues Resolved

closes #814
closes #815

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
@joshuali925 joshuali925 requested a review from a team as a code owner September 9, 2022 20:36
@codecov-commenter
Copy link

codecov-commenter commented Sep 9, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.97%. Comparing base (8245943) to head (9ec7825).
Report is 549 commits behind head on 2.x.

Additional details and impacted files
@@             Coverage Diff              @@
##                2.x     #813      +/-   ##
============================================
+ Coverage     94.87%   94.97%   +0.09%     
- Complexity     2956     3004      +48     
============================================
  Files           291      294       +3     
  Lines          7869     8017     +148     
  Branches        572      586      +14     
============================================
+ Hits           7466     7614     +148     
  Misses          349      349              
  Partials         54       54              
Flag Coverage Δ
query-workbench 62.76% <ø> (ø)
sql-engine 97.85% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@penghuo
Copy link
Collaborator

penghuo commented Sep 14, 2022

@joshuali925 as comments in #814. I think we could add grok as a method in parse command for now.

if (match != null) {
return new ExprStringValue(match.toString());
}
log.warn("failed to extract pattern {} from input {}", grok.getOriginalGrokPattern(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should sanitize rawString when logging

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it warn? or debug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to debug. rawString is unstructured text not sure how to sanitize, let me know if i should remove it from log

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should replace rawString as ***.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard coded it to ***

Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
@joshuali925 joshuali925 changed the base branch from main to 2.x September 19, 2022 17:35
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files in common/test/resources/nasa is required? seems like split file

@joshuali925
Copy link
Member Author

files in common/test/resources/nasa is required? seems like split file

yes it's split file, used here

public void test002_nasa_httpd_access() throws GrokException, IOException {
Grok grok = compiler.compile("%{COMMONAPACHELOG}");
System.out.println("Starting test with nasa log -- may take a while");
BufferedReader br;
String line;
File dir = new File(LOG_DIR_NASA);
for (File child : dir.listFiles()) {
br = new BufferedReader(new FileReader(LOG_DIR_NASA + child.getName()));
while ((line = br.readLine()) != null) {
Match gm = grok.match(line);
final Map<String, Object> capture = gm.capture();
Assertions.assertThat(capture).doesNotContainKey("Error");
}
br.close();
}
}

Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
docs/user/ppl/cmd/grok.rst Outdated Show resolved Hide resolved
docs/user/ppl/cmd/patterns.rst Outdated Show resolved Hide resolved
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Signed-off-by: Joshua Li <joshuali925@gmail.com>
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!


patternsParameter
: (NEW_FIELD EQUAL new_field=stringLiteral)
| (PATTERN EQUAL pattern=stringLiteral)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the pattern itself contains single quotes and double quotes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user should escape quotes in quotes

@joshuali925 joshuali925 merged commit 7b1574e into opensearch-project:2.x Sep 30, 2022
MitchellGale pushed a commit to Bit-Quill/opensearch-project-sql that referenced this pull request Oct 3, 2022
Signed-off-by: Joshua Li <joshuali925@gmail.com>
GabeFernandez310 pushed a commit to Bit-Quill/opensearch-project-sql that referenced this pull request Oct 19, 2022
Signed-off-by: Joshua Li <joshuali925@gmail.com>
@dai-chen dai-chen added enhancement New feature or request PPL Piped processing language labels Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support extracting log patterns using PPL [FEATURE] Support grok command in PPL
6 participants