-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Refactor/rename metrics related classes for NaN support #1829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @Override | ||
| public int getInt(int ordinal) { | ||
| return struct.get(ordinal, Integer.class); | ||
| Object integer = struct.get(ordinal, Object.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for changing this class is discussed in this thread
giovannifumarola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Yan for the patch.
| } | ||
|
|
||
| public static MetricsModes.MetricsMode getMetricsMode(Schema inputSchema, MetricsConfig metricsConfig, int fieldId) { | ||
| String columnName = inputSchema.findColumnName(fieldId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputschema can be null at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a validation check to ensure they are not null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Let's add checks to fail when metricsConfig or inputSchema is null. This is a useful method so I think it is worth keeping it public, but we would ideally not fail with a NullPointerException. This is not called in a tight loop, so it should be fine to add the checks on each call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry I overwrote my own earlier change that addressed this when I renamed this file...
|
|
||
| public static MetricsModes.MetricsMode getMetricsMode(Schema inputSchema, MetricsConfig metricsConfig, int fieldId) { | ||
| String columnName = inputSchema.findColumnName(fieldId); | ||
| return metricsConfig.columnMode(columnName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for metrics config.
| private MetricsUtil() { | ||
| } | ||
|
|
||
| public static Map<Integer, Long> getNanValueCounts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: add javadoc.
| * exceptions when they are accessed. | ||
| */ | ||
| public class ParquetFieldMetrics extends FieldMetrics { | ||
| public class NaNFieldMetrics extends FieldMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we expect to add lower and upper bounds to this class (since Parquet and ORC do not ignore NaN values), I think this should probably be called FloatFieldMetrics. That way we don't need to rename when this adds support for lower and upper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; I'll keep the javadoc as is since they currently reflect reality, and will update comments to mention tracking float fields when the class starts to track upper/lower bound
| public ByteBuffer upperBound() { | ||
| throw new IllegalStateException( | ||
| "Shouldn't access upperBound() within ParquetFieldMetrics, as this metric is tracked by Parquet footer. "); | ||
| "Shouldn't access upperBound() within NaNOnlyFieldMetrics, as this metric is tracked in file statistics. "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This demonstrates the drawback to adding too much context to an exception message. The method and class should be in the stack trace and renames require updating this message. Let's keep the good error message, but remove the class and method names.
b7d88f2 to
4bb427f
Compare
c0c758d to
37a8681
Compare
|
Thanks for updating this, @yyanyy! I'll merge it. |
StructInternalRow