Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Bleve to the latest #7040

Merged
merged 1 commit into from
Dec 3, 2020
Merged

Upgrade Bleve to the latest #7040

merged 1 commit into from
Dec 3, 2020

Conversation

chewxy
Copy link
Contributor

@chewxy chewxy commented Dec 1, 2020

Upgraded Bleve to the latest version.

https://discuss.dgraph.io/t/can-upgrade-bleve-version-to-the-latest/11673/4


This change is Reviewable

@netlify
Copy link

netlify bot commented Dec 1, 2020

Deploy preview for dgraph-docs ready!

Built with commit bdea6d6

https://deploy-preview-7040--dgraph-docs.netlify.app

@chewxy chewxy removed the request for review from manishrjain December 1, 2020 22:16
Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changes are we bringing in here by upgrading bleve?

Reviewable status: 0 of 2 files reviewed, all discussions resolved (waiting on @ajeetdsouza, @danielmai, and @vvbalaji-dgraph)

@chewxy
Copy link
Contributor Author

chewxy commented Dec 1, 2020

some tokenizers were updated (language tokenizers)'

Also, Bleve's dependencies have shrunk too. So we are depending on fewer odd case things

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to document any user-facing changes?

Reviewable status: 0 of 2 files reviewed, all discussions resolved (waiting on @ajeetdsouza, @danielmai, and @vvbalaji-dgraph)

@chewxy
Copy link
Contributor Author

chewxy commented Dec 1, 2020

nope. Chinese works better now I think

@chewxy
Copy link
Contributor Author

chewxy commented Dec 1, 2020

but fundamentally the functions are the same.

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like updating bleve generates different tokens, then? I'm ok bringing this into v20.11. Is it right that this changes the index tokens that require a re-index to create the proper tokens for language full-text indices? So, it should not be cherry-picked as patches to the current releases.

:lgtm:

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ajeetdsouza and @vvbalaji-dgraph)

@chewxy
Copy link
Contributor Author

chewxy commented Dec 2, 2020

tbqh, I've checked out some chinese tokens... it looks exactly the same to me. But the algorithms look to be less probabilistic than before. (the issue with Chinese tokenization of course, is that Chinese doesn't use spaces to delimit words, so to guess "words" as tokens you need to guess, hence probabilistic).

@chewxy
Copy link
Contributor Author

chewxy commented Dec 2, 2020

I think we should bring this to 20.11. The current tagged Bleve is 2 years old, and in 2 years there are A LOT of advancements to tokenization tech.

@chewxy chewxy merged commit 9013d22 into master Dec 3, 2020
@joshua-goldstein joshua-goldstein deleted the chewxy/upgradebleve branch August 11, 2022 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants