Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shingle token filter docs #8398

Conversation

AntonEliatra
Copy link
Contributor

Description

Add shingle token filter docs

Issues Resolved

Closes #8275

Version

all

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@kolchfa-aws kolchfa-aws assigned vagimeli and unassigned kolchfa-aws Sep 30, 2024
@vagimeli vagimeli added 3 - Tech review PR: Tech review in progress Needs SME Waiting on input from subject matter expert analyzers labels Sep 30, 2024
@vagimeli
Copy link
Contributor

@udabhas Will you review this PR for technical accuracy, or have a peer review it? Thank you.

AntonEliatra and others added 2 commits October 16, 2024 18:13
Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws assigned kolchfa-aws and unassigned vagimeli Nov 18, 2024
@kolchfa-aws kolchfa-aws added 5 - Editorial review PR: Editorial review in progress backport 2.18 PR: Backport label for 2.18 and removed 3 - Tech review PR: Tech review in progress Needs SME Waiting on input from subject matter expert labels Nov 18, 2024
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Several changes. Thanks!

@@ -50,7 +50,7 @@ Normalization | `arabic_normalization`: [ArabicNormalizer](https://lucene.apache
`predicate_token_filter` | N/A | Removes tokens that don’t match the specified predicate script. Supports inline Painless scripts only.
`remove_duplicates` | [RemoveDuplicatesTokenFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilter.html) | Removes duplicate tokens that are in the same position.
`reverse` | [ReverseStringFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html) | Reverses the string corresponding to each token in the token stream. For example, the token `dog` becomes `god`.
`shingle` | [ShingleFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/shingle/ShingleFilter.html) | Generates shingles of lengths between `min_shingle_size` and `max_shingle_size` for tokens in the token stream. Shingles are similar to n-grams but apply to words instead of letters. For example, two-word shingles added to the list of unigrams [`contribute`, `to`, `opensearch`] are [`contribute to`, `to opensearch`].
[`shingle`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/shingle/) | [ShingleFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/shingle/ShingleFilter.html) | Generates shingles of lengths between `min_shingle_size` and `max_shingle_size` for tokens in the token stream. Shingles are similar to n-grams but apply to words instead of letters. For example, two-word shingles added to the list of unigrams [`contribute`, `to`, `opensearch`] are [`contribute to`, `to opensearch`].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"apply to" => "are generated from"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change to "are generated using"

_analyzers/token-filters/shingle.md Outdated Show resolved Hide resolved
_analyzers/token-filters/shingle.md Outdated Show resolved Hide resolved
_analyzers/token-filters/shingle.md Outdated Show resolved Hide resolved
_analyzers/token-filters/shingle.md Outdated Show resolved Hide resolved
_analyzers/token-filters/shingle.md Outdated Show resolved Hide resolved
_analyzers/token-filters/shingle.md Outdated Show resolved Hide resolved
_analyzers/token-filters/shingle.md Outdated Show resolved Hide resolved
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
@kolchfa-aws kolchfa-aws merged commit e019d96 into opensearch-project:main Dec 2, 2024
5 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 2, 2024
* adding shingle token filter docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* updating parameter table

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* Doc review

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

---------

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
(cherry picked from commit e019d96)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Editorial review PR: Editorial review in progress analyzers backport 2.18 PR: Backport label for 2.18 Content gap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Token filters - shingle [DOC]
4 participants