Skip to content

Conversation

@gszadovszky
Copy link
Contributor

@gszadovszky gszadovszky commented Jan 15, 2018

No description provided.

… be represented properly in tools/logs etc.
builder.append(BINARY_HEXA_PREFIX);
for (int i = buffer.position(), n = buffer.limit(); i < n; ++i) {
byte b = buffer.get(i);
builder.append(digits[(b >>> 4) & 0x0F]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using String.format("%02X", b)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance overhead for using String.format for each byte elements would be significant. A don't think the exchanging of the existing two lines (one for the upper and one for the lower bits) to one would worth it.


// Test print formatting
assertEquals(stats.toString(), String.format("min: %.5f, max: %.5f, num_nulls: %d", 0.00010, 553.59998, 0));
assertEquals("min: 1.0E-4, max: 553.6, num_nulls: 0", stats.toString());
Copy link
Contributor

@zivanfi zivanfi Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These asserts were using String.format for the expected output intentionally. (Actually the expected and the actual values used to be swapped, thanks for fixing that.) Earlier expected values used to be hard-coded as string literals (e.g, "0.5"), but since the textual representation depends on the locale, tests used to fail on locales that use comma as a decimal separator (e.g., "0,5").

Float.toString() and similar calls that you use for stringifying are locale-aware, therefore you should also use String.format for the expected value. Alternatively, you may change the stringifying logic to work in a locale-agnostic way.

Since this occurs many times below, I suggest you double-check your tests by running them using LANG=hu_HU or LC_NUMERIC=hu_HU LC_MEASUREMENT=hu_HU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation (which uses Double.toString and Float.toString is independent from the actual locale settings. I'll add a unit test to ensure this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks for adding unit tests anyway.

@gszadovszky gszadovszky changed the title PARQUET-1170: Implement toString based on logical type so values will… PARQUET-1170: Logical-type-based toString for proper representeation in tools/logs Jan 16, 2018

// Test print formatting
assertEquals(stats.toString(), String.format("min: %.5f, max: %.5f, num_nulls: %d", 0.00010, 553.59998, 0));
assertEquals("min: 1.0E-4, max: 553.6, num_nulls: 0", stats.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks for adding unit tests anyway.

}
ByteBuffer buffer = value.toByteBuffer();
int pos = buffer.position();
String months = UNSIGNED_STRINGIFIER.stringify(buffer.getInt(pos));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether intentionally or by mistake, but parquet-format specifies intervals to be little-endian. @rdblue, @julienledem, do you know whether intervals really should be little-endian?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants