Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Reformat CJK bigram and CJK width token filter docs #48210

Merged
merged 3 commits into from
Oct 21, 2019
Merged

[DOCS] Reformat CJK bigram and CJK width token filter docs #48210

merged 3 commits into from
Oct 21, 2019

Conversation

jrodewig
Copy link
Contributor

Reformats the CJK bigram and CJK width token filter docs:

  • Adds a title abbreviation
  • Updates the description with a short example and Lucene link
  • Adds an analyze API example with resulting tokens
  • Adds or updates an example adding the token filter to an analyzer
  • Updates the parameter docs and custom token filter example

I hope to re-use this format for other token filter docs. All feedback is welcome!

@jrodewig jrodewig added >docs General docs changes :Search Relevance/Analysis How text is split into tokens v8.0.0 v7.5.0 v7.6.0 v7.4.2 labels Oct 17, 2019
@jrodewig jrodewig requested a review from romseygeek October 17, 2019 19:11
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment but it's a great improvement over all. LGTM.

Comment on lines 7 to 10
Forms https://en.wikipedia.org/wiki/Bigram[bigrams] out of the CJK (Chinese,
Japanese, and Korean) terms generated by the
<<analysis-standard-tokenizer,standard tokenizer>> or the
{plugins}/analysis-icu-tokenizer.html[ICU tokenizer].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, it will form bigrams from the CJK tokens produced by any tokenizer, so I'm not sure we need to refer to standard and icu here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romseygeek. I removed the standard and ICU reference with cecd9bc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants