Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: use PropertiesWriter to escape index_map keys properly #12018

Merged
merged 2 commits into from
Dec 4, 2024

Conversation

itschrispeck
Copy link
Collaborator

Previously segment build failed if a column name contained : or =, since these characters have special meaning in Properties files. For example, an exception produced for a column like headers.:auth is:

j.lang.IllegalStateException: Index separator not found: headers. , segment: /data/pinot-server/dataDir/table_REALTIME/table__127__0__20231020T1604Z/v3 at c.g.common.base.Preconditions.checkState(Preconditions.java:504) at o.a.p.s.s.s.ColumnIndexUtils.parseIndexMapKeys(ColumnIndexUtils.java:43) at o.a.p.s.l.s.s.SingleFileIndexDirectory.loadMap(SingleFileIndexDirectory.java:220) at o.a.p.s.l.s.s.SingleFileIndexDirectory.load(SingleFileIndexDirectory.java:209) at o.a.p.s.l.s.s.SingleFileIndexDirectory.<init>(SingleFileIndexDirectory.java:121) at o.a.p.s.l.s.s.SegmentLocalFSDirectory.loadData(SegmentLocalFSDirectory.java:262) at o.a.p.s.l.s.s.SegmentLocalFSDirectory.load(SegmentLocalFSDirectory.java:247) at o.a.p.s.l.s.s.SegmentLocalFSDirectory.<init>(SegmentLocalFSDirectory.java:98) at o.a.p.s.l.s.s.SegmentLocalFSDirectory.<init>(SegmentLocalFSDirectory.java:81) at o.a.p.s.l.l...

This changes the writing to be done via PropertiesWriter instead of building the string explicitly.

Testing: unit tests + deployed in a cluster and verified segments were sealed properly

@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.97%. Comparing base (59551e4) to head (64068c1).
Report is 1427 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12018      +/-   ##
============================================
+ Coverage     61.75%   63.97%   +2.22%     
- Complexity      207     1572    +1365     
============================================
  Files          2436     2687     +251     
  Lines        133233   147672   +14439     
  Branches      20636    22631    +1995     
============================================
+ Hits          82274    94479   +12205     
- Misses        44911    46261    +1350     
- Partials       6048     6932     +884     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.91% <100.00%> (+2.20%) ⬆️
java-21 63.87% <100.00%> (+2.25%) ⬆️
skip-bytebuffers-false 63.95% <100.00%> (+2.20%) ⬆️
skip-bytebuffers-true 63.83% <100.00%> (+36.10%) ⬆️
temurin 63.97% <100.00%> (+2.22%) ⬆️
unittests 63.97% <100.00%> (+2.22%) ⬆️
unittests1 55.67% <100.00%> (+8.78%) ⬆️
unittests2 34.60% <100.00%> (+6.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang
Copy link
Contributor

I guess this is not the only place where it can fail. E.g. segment metadata is also a properties configuration.
Should we consider counting them as reserved characters, and add validation on table name and column name to not allow using them?

@itschrispeck
Copy link
Collaborator Author

I guess this is not the only place where it can fail. E.g. segment metadata is also a properties configuration. Should we consider counting them as reserved characters, and add validation on table name and column name to not allow using them?

We would really like the ability to use these characters in column names, since other DBs like ClickHouse do and supporting them makes programmatic column creation simple. This let's us avoid complexity/avoid creating some mapping to substitute out reserved characters.

Re: segment metadata, this already correctly handles these characters. e.g. from metadata.properties we see these characters are escaped correctly:

column.headers.\:authority.cardinality = -2147483648
column.headers.\:authority.totalDocs = 1108200
column.headers.\:authority.dataType = STRING
column.headers.\:authority.bitsPerElement = 31
column.headers.\:authority.lengthOfEachEntry = 0
column.headers.\:authority.columnType = DIMENSION
column.headers.\:authority.isSorted = false
column.headers.\:authority.hasDictionary = false
column.headers.\:authority.isSingleValues = true
column.headers.\:authority.maxNumberOfMultiValues = -1
column.headers.\:authority.totalNumberOfEntries = 1108200
column.headers.\:authority.isAutoGenerated = false
column.headers.\:authority.minValue = 127.0.0.1:5435
column.headers.\:authority.maxValue = null
column.headers.\:authority.defaultNullValue = null

AFAI could tell the issue is only for index_map, which used some custom logic to write the file instead of relying on PropertiesConfiguration functionality. We've been using this patch for a couple weeks and haven't run into any other query/operational/ingestion bugs due to the special column name

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, seems the problem is that we use PropertiesConfiguration to load the file, but not using it to write the file.
Can you change it to directly use PropertiesConfiguration to write the file (i.e. remove PrintWriter)? Similar to how SegmentColumnarIndexCreator.writeMetadata() handles the metadata write

@@ -431,24 +434,26 @@ private static String getKey(String column, String indexName, boolean isStartOff
}

@VisibleForTesting
static void persistIndexMaps(List<IndexEntry> entries, PrintWriter writer) {
static void persistIndexMaps(List<IndexEntry> entries, PrintWriter writer) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(format) Please apply Pinot Style

@Jackie-Jiang Jackie-Jiang merged commit 84ca2fc into apache:master Dec 4, 2024
21 checks passed
davecromberge pushed a commit to davecromberge/pinot that referenced this pull request Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants