-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Reformat CJK bigram and CJK width token filter docs #48210
Conversation
Pinging @elastic/es-search (:Search/Analysis) |
Pinging @elastic/es-docs (>docs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment but it's a great improvement over all. LGTM.
Forms https://en.wikipedia.org/wiki/Bigram[bigrams] out of the CJK (Chinese, | ||
Japanese, and Korean) terms generated by the | ||
<<analysis-standard-tokenizer,standard tokenizer>> or the | ||
{plugins}/analysis-icu-tokenizer.html[ICU tokenizer]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, it will form bigrams from the CJK tokens produced by any tokenizer, so I'm not sure we need to refer to standard and icu here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romseygeek. I removed the standard and ICU reference with cecd9bc.
Reformats the CJK bigram and CJK width token filter docs:
I hope to re-use this format for other token filter docs. All feedback is welcome!