You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each Instance of ColumnFilterPredicate stores the filter values in toString variable eagerly. Which is not useful
staticabstractclassColumnFilterPredicate<TextendsComparable<T>> implementsFilterPredicate, Serializable {
privatefinalColumn<T> column;
privatefinalTvalue;
privatefinalStringtoString;
protectedColumnFilterPredicate(Column<T> column, Tvalue) {
this.column = Objects.requireNonNull(column, "column cannot be null");
// Eq and NotEq allow value to be null, Lt, Gt, LtEq, GtEq however do not, so they guard against// null in their own constructors.this.value = value;
Stringname = getClass().getSimpleName().toLowerCase(Locale.ENGLISH);
this.toString = name + "(" + column.getColumnPath().toDotString() + ", " + value + ")";
}
If your filter predicate is too long/nested this can take a lot of memory while creating Filter.
We have seen in our productions this can go upto 4gbs of space while opening multiple parquet readers
Same thing is replicated in BinaryLogicalFilterPredicate. Where toString is eagerly calculated and stored in string and lot of duplication is happening while making And/or filter.
Gabor Szadovszky / @gszadovszky: [~abhiSumo304], I agree eagerly storing the toString value is not a good idea. I don't think it has proper use case either. toString should be used for debugging purposes anyway so eagerly storing the value does not really make sense. Unfortunately, I don't work on the Parquet code base actively anymore. Feel free to put up a PR to fix this and I'll try to review it in time.
Each Instance of ColumnFilterPredicate stores the filter values in toString variable eagerly. Which is not useful
If your filter predicate is too long/nested this can take a lot of memory while creating Filter.
We have seen in our productions this can go upto 4gbs of space while opening multiple parquet readers
Same thing is replicated in BinaryLogicalFilterPredicate. Where toString is eagerly calculated and stored in string and lot of duplication is happening while making And/or filter.
I did not find use case of storing it so eagerly
Reporter: Abhishek Jain
Note: This issue was originally created as PARQUET-2220. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: