PARQUET-1170: Logical-type-based toString for proper representeation in tools/logs #448

gszadovszky · 2018-01-15T08:54:32Z

No description provided.

… be represented properly in tools/logs etc.

zivanfi · 2018-01-15T10:38:26Z

parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java

+      builder.append(BINARY_HEXA_PREFIX);
+      for (int i = buffer.position(), n = buffer.limit(); i < n; ++i) {
+        byte b = buffer.get(i);
+        builder.append(digits[(b >>> 4) & 0x0F]);


Have you considered using String.format("%02X", b)?

The performance overhead for using String.format for each byte elements would be significant. A don't think the exchanging of the existing two lines (one for the upper and one for the lower bits) to one would worth it.

zivanfi · 2018-01-15T10:47:38Z

parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java


    // Test print formatting
-    assertEquals(stats.toString(), String.format("min: %.5f, max: %.5f, num_nulls: %d", 0.00010, 553.59998, 0));
+    assertEquals("min: 1.0E-4, max: 553.6, num_nulls: 0", stats.toString());


These asserts were using String.format for the expected output intentionally. (Actually the expected and the actual values used to be swapped, thanks for fixing that.) Earlier expected values used to be hard-coded as string literals (e.g, "0.5"), but since the textual representation depends on the locale, tests used to fail on locales that use comma as a decimal separator (e.g., "0,5").

Float.toString() and similar calls that you use for stringifying are locale-aware, therefore you should also use String.format for the expected value. Alternatively, you may change the stringifying logic to work in a locale-agnostic way.

Since this occurs many times below, I suggest you double-check your tests by running them using LANG=hu_HU or LC_NUMERIC=hu_HU LC_MEASUREMENT=hu_HU.

The current implementation (which uses Double.toString and Float.toString is independent from the actual locale settings. I'll add a unit test to ensure this.

You're right. Thanks for adding unit tests anyway.

zivanfi · 2018-01-16T14:02:03Z

parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java


    // Test print formatting
-    assertEquals(stats.toString(), String.format("min: %.5f, max: %.5f, num_nulls: %d", 0.00010, 553.59998, 0));
+    assertEquals("min: 1.0E-4, max: 553.6, num_nulls: 0", stats.toString());


You're right. Thanks for adding unit tests anyway.

zivanfi · 2018-01-16T14:22:54Z

parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java

+      }
+      ByteBuffer buffer = value.toByteBuffer();
+      int pos = buffer.position();
+      String months = UNSIGNED_STRINGIFIER.stringify(buffer.getInt(pos));


I don't know whether intentionally or by mistake, but parquet-format specifies intervals to be little-endian. @rdblue, @julienledem, do you know whether intervals really should be little-endian?

PARQUET-1170: Implement toString based on logical type so values will…

d8c5204

… be represented properly in tools/logs etc.

zivanfi reviewed Jan 15, 2018

View reviewed changes

gszadovszky changed the title ~~PARQUET-1170: Implement toString based on logical type so values will…~~ PARQUET-1170: Logical-type-based toString for proper representeation in tools/logs Jan 16, 2018

PARQUET-1170: Add unit test for different locale

612d70b

zivanfi reviewed Jan 16, 2018

View reviewed changes

Gabor Szadovszky added 2 commits January 16, 2018 15:43

PARQUET-1170: Fix endianess of interval

90f73b5

PARQUET-1170: Make interval test more readable

8f1f8cc

zivanfi approved these changes Jan 18, 2018

View reviewed changes

asfgit closed this in 89aeec0 Jan 22, 2018

asfimport mentioned this pull request Jun 23, 2024

Logical-type-based toString for proper representeation in tools/logs #2105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PARQUET-1170: Logical-type-based toString for proper representeation in tools/logs #448

PARQUET-1170: Logical-type-based toString for proper representeation in tools/logs #448

Uh oh!

gszadovszky commented Jan 15, 2018 •

edited

Loading

Uh oh!

zivanfi Jan 15, 2018

Uh oh!

gszadovszky Jan 15, 2018

Uh oh!

zivanfi Jan 15, 2018 •

edited

Loading

Uh oh!

gszadovszky Jan 16, 2018

Uh oh!

zivanfi Jan 16, 2018

Uh oh!

zivanfi Jan 16, 2018

Uh oh!

zivanfi Jan 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PARQUET-1170: Logical-type-based toString for proper representeation in tools/logs #448

PARQUET-1170: Logical-type-based toString for proper representeation in tools/logs #448

Uh oh!

Conversation

gszadovszky commented Jan 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zivanfi Jan 15, 2018

Choose a reason for hiding this comment

Uh oh!

gszadovszky Jan 15, 2018

Choose a reason for hiding this comment

Uh oh!

zivanfi Jan 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gszadovszky Jan 16, 2018

Choose a reason for hiding this comment

Uh oh!

zivanfi Jan 16, 2018

Choose a reason for hiding this comment

Uh oh!

zivanfi Jan 16, 2018

Choose a reason for hiding this comment

Uh oh!

zivanfi Jan 16, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gszadovszky commented Jan 15, 2018 •

edited

Loading

zivanfi Jan 15, 2018 •

edited

Loading