-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[PARQUET-1968] FilterApi support In predicate #923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gszadovszky @shangxinli @rdblue Could you please take a look at this PR when you have time? Thanks a lot! |
|
also cc @chenjunjiedada |
gszadovszky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some comments in the code. All of them are more to open discussions so I neither approve nor disapprove for now.
Otherwise the code seems good to me. Thanks a lot for working on it!
...rc/main/java/org/apache/parquet/filter2/recordlevel/IncrementallyUpdatedFilterPredicate.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| // base class for In and NotIn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a better comment since it is public method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
| this.values = Objects.requireNonNull(values, "values cannot be null"); | ||
| checkArgument(!values.isEmpty(), "values in SetColumnFilterPredicate shouldn't be empty!"); | ||
|
|
||
| String name = getClass().getSimpleName().toLowerCase(Locale.ENGLISH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you have a 'toString' to cache but do we see generally this is reused multiple times? If no, proactively converting to string will be a waste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
| iter++; | ||
| } | ||
| String valueStr = values.size() <= 100 ? str.substring(0, str.length() - 2) : str + "..."; | ||
| this.toString = name + "(" + column.getColumnPath().toDotString() + ", " + valueStr + ")"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to merge lines 272 and 273 into the above code of that building? the string? String operations sometimes consume a lot of memory like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just enough to replace str + "..." to str.append("...").toString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str.substring(0, str.length() - 2) is still StringBuilder operation. Seems fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can replace line 273 with StringBuilder operation too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
| @Override | ||
| public boolean equals(Object o) { | ||
| if (this == o) return true; | ||
| if (o == null || getClass() != o.getClass()) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you can just 'return this.getClass() == o.getClass()'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but just trying to follow the style at https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java#L150
| if (this == o) return true; | ||
| if (o == null || getClass() != o.getClass()) return false; | ||
| SetColumnFilterPredicate<?> that = (SetColumnFilterPredicate<?>) o; | ||
| return column.equals(that.column) && values.equals(that.values) && Objects.equals(toString, that.toString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is toString comparison still needed here? It seems toString have (values and class). You can just compare class here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed toString comparison
…ions by throwing Exception
|
@gszadovszky @shangxinli @dbtsai Thank you all very much for reviewing! I have changed the code to generate the visit methods for in/notIn and also added the default by throwing Exception. Will address the rest of the comments tomorrow or the day after tomorrow. |
gszadovszky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some more concrete comments in the code. Some more work is needed but I think it is going to the good direction. Thanks a lot for your efforts.
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/filter2/bloomfilterlevel/BloomFilterImpl.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
...umn/src/test/java/org/apache/parquet/internal/filter2/columnindex/TestColumnIndexFilter.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/parquet/filter2/IncrementallyUpdatedFilterPredicateGenerator.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/parquet/filter2/IncrementallyUpdatedFilterPredicateGenerator.java
Show resolved
Hide resolved
gszadovszky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple of more comments/requests.
Sorry if I am a bit strict here but filtering is not an easy topic and can have serious issues (lost of data).
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/parquet/filter2/IncrementallyUpdatedFilterPredicateGenerator.java
Show resolved
Hide resolved
parquet-hadoop/src/test/java/org/apache/parquet/filter2/recordlevel/TestRecordLevelFilters.java
Show resolved
Hide resolved
parquet-hadoop/src/test/java/org/apache/parquet/filter2/recordlevel/TestRecordLevelFilters.java
Show resolved
Hide resolved
gszadovszky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of comments for the new MinMax class but otherwise everything seems great!
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
| T element = iterator.next(); | ||
| if (max == null) { | ||
| max = element; | ||
| } else if (max != null && element != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are already in the else path so do not need to check for max != null.
gszadovszky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor issues only. Thanks a lot for your efforts to implement this! I think it is a great improvement for the query engines.
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
|
@gszadovszky @shangxinli @viirya @dbtsai Thank you so much for all your help!! |
|
Thank you for your contribution, @huaxingao! Great work! |
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation