-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store index metadata file for Lucene text indexes #13948
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13948 +/- ##
============================================
+ Coverage 61.75% 63.83% +2.08%
- Complexity 207 1530 +1323
============================================
Files 2436 2621 +185
Lines 133233 144069 +10836
Branches 20636 22034 +1398
============================================
+ Hits 82274 91971 +9697
- Misses 44911 45318 +407
- Partials 6048 6780 +732
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
thanks @itschrispeck for the contribution and detailed information! |
This diff would be a pre-req to do so, but that functionality would still have to be added in I have not added the functionality in this PR since internally we take advantage of the fact that it is not currently possible. When that functionality is added, we'll add some config to enable/disable it as well. |
CommonsConfigurationUtils.saveToFile(properties, propertiesFile); | ||
} | ||
|
||
public static TextIndexConfig getUpdatedConfigFromPropertiesFile(File file, TextIndexConfig config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add javadoc to public methods.
} | ||
|
||
return (Constructor<QueryParserBase>) queryParserClass.getConstructor(String.class, Analyzer.class); | ||
} | ||
|
||
public static void writeConfigToPropertiesFile(File indexDir, TextIndexConfig config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add javadoc to public methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address minor comments. LGTM.
@@ -300,6 +300,12 @@ public static Constructor<QueryParserBase> getQueryParserWithStringAndAnalyzerTy | |||
return (Constructor<QueryParserBase>) queryParserClass.getConstructor(String.class, Analyzer.class); | |||
} | |||
|
|||
/** | |||
* Writes the config to the properties file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more specific on the fields which are written to the properties files?
@chenboat , @itschrispeck - We ran into an NPE with this change. When we don't have all fields populated in the TextIndexConfig, we will run into NPE, via code path that calls. The call path is from newly added code getUpdatedConfigFromPropertiesFile . public AbstractBuilder(TextIndexConfig other) { java.lang.RuntimeException: java.lang.NullPointerException: Cannot invoke "java.util.Collection.toArray()" because "c" is null |
Raised a fix #14616 |
Currently the Lucene text index is written using some Analyzer and queried with some QueryParser. Updating these in the table config can cause unpredictable behavior as when the segments are loaded they'll use the new configs but segments were built with the old configs. This can cause incorrect results when querying.
For example,
Analyzer1
andQueryPaserForAnalyzer1
Analyzer2
andQueryParserForAnalyzer2
Analyzer2
andQueryParserForAnalyzer2
QueryParserForAnalyzer2
but index usesAnalyzer1
Storing metadata per segment allows for compatible QueryParser/Analyzer pairs to be maintained. Now,
4
becomes4. segment 1 loaded, Reader uses
QueryParserForAnalyzer1
and index usesAnalyzer1
.This allows for seamless updates of realtime tables without breaking queries against old segments (provided the
QueryParser
andAnalyzer
classes are managed correctly). In the future this metadata can be leveraged by TextIndexHandler to provide better index reload/rebuilding capabilities for Lucene text index.Testing: we've been using this internally for around half a year
tags:
enhancement
,docs