-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-1170: Logical-type-based toString for proper representeation in tools/logs #448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… be represented properly in tools/logs etc.
| builder.append(BINARY_HEXA_PREFIX); | ||
| for (int i = buffer.position(), n = buffer.limit(); i < n; ++i) { | ||
| byte b = buffer.get(i); | ||
| builder.append(digits[(b >>> 4) & 0x0F]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using String.format("%02X", b)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance overhead for using String.format for each byte elements would be significant. A don't think the exchanging of the existing two lines (one for the upper and one for the lower bits) to one would worth it.
|
|
||
| // Test print formatting | ||
| assertEquals(stats.toString(), String.format("min: %.5f, max: %.5f, num_nulls: %d", 0.00010, 553.59998, 0)); | ||
| assertEquals("min: 1.0E-4, max: 553.6, num_nulls: 0", stats.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These asserts were using String.format for the expected output intentionally. (Actually the expected and the actual values used to be swapped, thanks for fixing that.) Earlier expected values used to be hard-coded as string literals (e.g, "0.5"), but since the textual representation depends on the locale, tests used to fail on locales that use comma as a decimal separator (e.g., "0,5").
Float.toString() and similar calls that you use for stringifying are locale-aware, therefore you should also use String.format for the expected value. Alternatively, you may change the stringifying logic to work in a locale-agnostic way.
Since this occurs many times below, I suggest you double-check your tests by running them using LANG=hu_HU or LC_NUMERIC=hu_HU LC_MEASUREMENT=hu_HU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation (which uses Double.toString and Float.toString is independent from the actual locale settings. I'll add a unit test to ensure this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Thanks for adding unit tests anyway.
|
|
||
| // Test print formatting | ||
| assertEquals(stats.toString(), String.format("min: %.5f, max: %.5f, num_nulls: %d", 0.00010, 553.59998, 0)); | ||
| assertEquals("min: 1.0E-4, max: 553.6, num_nulls: 0", stats.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Thanks for adding unit tests anyway.
| } | ||
| ByteBuffer buffer = value.toByteBuffer(); | ||
| int pos = buffer.position(); | ||
| String months = UNSIGNED_STRINGIFIER.stringify(buffer.getInt(pos)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether intentionally or by mistake, but parquet-format specifies intervals to be little-endian. @rdblue, @julienledem, do you know whether intervals really should be little-endian?
No description provided.