Skip to content

Conversation

@yyanyy
Copy link
Contributor

@yyanyy yyanyy commented Nov 25, 2020

  • separate refactoring/renaming changes in Core: Implement NaN counts in ORC #1790 to this PR to reduce the size of that PR
  • no behavior change in metrics related classes
  • added extra supports in StructInternalRow

@Override
public int getInt(int ordinal) {
return struct.get(ordinal, Integer.class);
Object integer = struct.get(ordinal, Object.class);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for changing this class is discussed in this thread

Copy link

@giovannifumarola giovannifumarola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yan for the patch.

}

public static MetricsModes.MetricsMode getMetricsMode(Schema inputSchema, MetricsConfig metricsConfig, int fieldId) {
String columnName = inputSchema.findColumnName(fieldId);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputschema can be null at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a validation check to ensure they are not null

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Let's add checks to fail when metricsConfig or inputSchema is null. This is a useful method so I think it is worth keeping it public, but we would ideally not fail with a NullPointerException. This is not called in a tight loop, so it should be fine to add the checks on each call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry I overwrote my own earlier change that addressed this when I renamed this file...


public static MetricsModes.MetricsMode getMetricsMode(Schema inputSchema, MetricsConfig metricsConfig, int fieldId) {
String columnName = inputSchema.findColumnName(fieldId);
return metricsConfig.columnMode(columnName);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for metrics config.

private MetricsUtil() {
}

public static Map<Integer, Long> getNanValueCounts(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: add javadoc.

* exceptions when they are accessed.
*/
public class ParquetFieldMetrics extends FieldMetrics {
public class NaNFieldMetrics extends FieldMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we expect to add lower and upper bounds to this class (since Parquet and ORC do not ignore NaN values), I think this should probably be called FloatFieldMetrics. That way we don't need to rename when this adds support for lower and upper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; I'll keep the javadoc as is since they currently reflect reality, and will update comments to mention tracking float fields when the class starts to track upper/lower bound

public ByteBuffer upperBound() {
throw new IllegalStateException(
"Shouldn't access upperBound() within ParquetFieldMetrics, as this metric is tracked by Parquet footer. ");
"Shouldn't access upperBound() within NaNOnlyFieldMetrics, as this metric is tracked in file statistics. ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This demonstrates the drawback to adding too much context to an exception message. The method and class should be in the stack trace and renames require updating this message. Let's keep the good error message, but remove the class and method names.

@yyanyy yyanyy mentioned this pull request Dec 16, 2020
@rdblue
Copy link
Contributor

rdblue commented Dec 17, 2020

Thanks for updating this, @yyanyy! I'll merge it.

@rdblue rdblue merged commit 425a45f into apache:master Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants