Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(search): Only reindex if the mappings for an existing field changed #4629

Merged

Conversation

dexter-mh-lee
Copy link
Contributor

Currently, everytime mappings or settings for a given entity updates, we reindex the whole search index. This has led to us reindexing huge indices every time a new searchable field has been added to the model. This is very unnecessary.

There have been two sources of diffs
a) a new searchable field causes mappings diff
b) a new entity causes changes in urn stop filters (which adds the list of entities to the list of stop words to remove when indexing urns)

This PR aims to make sure we do not reindex on the above two cases. Reindex will only happen when a mappings for an existing field needs to change.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions
Copy link

github-actions bot commented Apr 11, 2022

Unit Test Results (build & test)

  96 files    96 suites   17m 31s ⏱️
689 tests 630 ✔️ 59 💤 0

Results for commit 9d8a6cb.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@gabe-lyons gabe-lyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat!

@dexter-mh-lee dexter-mh-lee merged commit 90bc00c into datahub-project:master Apr 11, 2022
@dexter-mh-lee dexter-mh-lee deleted the dl--reindex-criterion branch April 11, 2022 23:09
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
…ged (datahub-project#4629)

* Just update mappings as much as possible

* Fix checkstyle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants