Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 15 additions & 11 deletions LogicalTypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ may require additional metadata fields, as well as rules for those fields.
`UTF8` may only be used to annotate the binary primitive type and indicates
that the byte array should be interpreted as a UTF-8 encoded character string.

The sort order used for `UTF8` strings is `UNSIGNED` byte-wise comparison.
The sort order used for `UTF8` strings is unsigned byte-wise comparison.

## Numeric Types

Expand All @@ -57,7 +57,7 @@ allows.
implied by the `int32` and `int64` primitive types if no other annotation is
present and should be considered optional.

The sort order used for signed integer types is `SIGNED`.
The sort order used for signed integer types is signed.

### Unsigned Integers

Expand All @@ -74,7 +74,7 @@ allows.
`UINT_8`, `UINT_16`, and `UINT_32` must annotate an `int32` primitive type and
`UINT_64` must annotate an `int64` primitive type.

The sort order used for unsigned integer types is `UNSIGNED`.
The sort order used for unsigned integer types is unsigned.

### DECIMAL

Expand Down Expand Up @@ -104,8 +104,8 @@ integer. A precision too large for the underlying type (see below) is an error.
A `SchemaElement` with the `DECIMAL` `ConvertedType` must also have both
`scale` and `precision` fields set, even if scale is 0 by default.

The sort order used for `DECIMAL` values is `SIGNED`. The order is equivalent
to signed comparison of decimal values.
The sort order used for `DECIMAL` values is signed comparison of the represented
value.

If the column uses `int32` or `int64` physical types, then signed comparison of
the integer values produces the correct ordering. If the physical type is
Expand All @@ -121,39 +121,39 @@ comparison.
annotate an `int32` that stores the number of days from the Unix epoch, 1
January 1970.

The sort order used for `DATE` is `SIGNED`.
The sort order used for `DATE` is signed.

### TIME\_MILLIS

`TIME_MILLIS` is used for a logical time type with millisecond precision,
without a date. It must annotate an `int32` that stores the number of
milliseconds after midnight.

The sort order used for `TIME\_MILLIS` is `SIGNED`.
The sort order used for `TIME\_MILLIS` is signed.

### TIME\_MICROS

`TIME_MICROS` is used for a logical time type with microsecond precision,
without a date. It must annotate an `int64` that stores the number of
microseconds after midnight.

The sort order used for `TIME\_MICROS` is `SIGNED`.
The sort order used for `TIME\_MICROS` is signed.

### TIMESTAMP\_MILLIS

`TIMESTAMP_MILLIS` is used for a combined logical date and time type, with
millisecond precision. It must annotate an `int64` that stores the number of
milliseconds from the Unix epoch, 00:00:00.000 on 1 January 1970, UTC.

The sort order used for `TIMESTAMP\_MILLIS` is `SIGNED`.
The sort order used for `TIMESTAMP\_MILLIS` is signed.

### TIMESTAMP\_MICROS

`TIMESTAMP_MICROS` is used for a combined logical date and time type with
microsecond precision. It must annotate an `int64` that stores the number of
microseconds from the Unix epoch, 00:00:00.000000 on 1 January 1970, UTC.

The sort order used for `TIMESTAMP\_MICROS` is `SIGNED`.
The sort order used for `TIMESTAMP\_MICROS` is signed.

### INTERVAL

Expand All @@ -169,7 +169,7 @@ example, there is no requirement that a large number of days should be
expressed as a mix of months and days because there is not a constant
conversion from days to months.

The sort order used for `INTERVAL` is `UNSIGNED`, produced by sorting by
The sort order used for `INTERVAL` is unsigned, produced by sorting by
the value of months, then days, then milliseconds with unsigned comparison.

## Embedded Types
Expand All @@ -184,6 +184,8 @@ string of valid JSON as defined by the [JSON specification][json-spec]

[json-spec]: http://json.org/

The sort order used for `JSON` is unsigned byte-wise comparison.

### BSON

`BSON` is used for an embedded BSON document. It must annotate a `binary`
Expand All @@ -192,6 +194,8 @@ defined by the [BSON specification][bson-spec].

[bson-spec]: http://bsonspec.org/spec.html

The sort order used for `BSON` is unsigned byte-wise comparison.

## Nested Types

This section specifies how `LIST` and `MAP` can be used to encode nested types
Expand Down
67 changes: 49 additions & 18 deletions src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,6 @@ namespace java org.apache.parquet.format
* with the encodings to control the on disk storage format.
* For example INT16 is not included as a type since a good encoding of INT32
* would handle this.
*
* When a logical type is not present, the type-defined sort order of these
* physical types are:
* * BOOLEAN - false, true
* * INT32 - signed comparison
* * INT64 - signed comparison
* * INT96 - signed comparison
* * FLOAT - signed comparison
* * DOUBLE - signed comparison
* * BYTE_ARRAY - unsigned byte-wise comparison
* * FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
*/
enum Type {
BOOLEAN = 0;
Expand Down Expand Up @@ -219,12 +208,12 @@ struct Statistics {
* Values are encoded using PLAIN encoding, except that variable-length byte
* arrays do not include a length prefix.
*
* These fields encode min and max values determined by SIGNED comparison
* These fields encode min and max values determined by signed comparison
* only. New files should use the correct order for a column's logical type
* and store the values in the min_value and max_value fields.
*
* To support older readers, these may be set when the column order is
* SIGNED.
* signed.
*/
1: optional binary max;
2: optional binary min;
Expand Down Expand Up @@ -582,7 +571,9 @@ struct RowGroup {
struct TypeDefinedOrder {}

/**
* Union to specify the order used for min, max, and sorting values in a column.
* Union to specify the order used for the min_value and max_value fields for a
* column. This union takes the role of an enhanced enum that allows rich
* elements (which will be needed for a collation-based ordering in the future).
*
* Possible values are:
* * TypeDefinedOrder - the column uses the order defined by its logical or
Expand All @@ -592,6 +583,41 @@ struct TypeDefinedOrder {}
* for this column should be ignored.
*/
union ColumnOrder {

/**
* The sort orders for logical types are:
* UTF8 - unsigned byte-wise comparison
* INT8 - signed comparison
* INT16 - signed comparison
* INT32 - signed comparison
* INT64 - signed comparison
* UINT8 - unsigned comparison
* UINT16 - unsigned comparison
* UINT32 - unsigned comparison
* UINT64 - unsigned comparison
* DECIMAL - signed comparison of the represented value
* DATE - signed comparison
* TIME_MILLIS - signed comparison
* TIME_MICROS - signed comparison
* TIMESTAMP_MILLIS - signed comparison
* TIMESTAMP_MICROS - signed comparison
* INTERVAL - unsigned comparison
* JSON - unsigned byte-wise comparison
* BSON - unsigned byte-wise comparison
* ENUM - unsigned byte-wise comparison
* LIST - undefined
* MAP - undefined
*
* In the absence of logical types, the sort order is determined by the physical type:
* BOOLEAN - false, true
* INT32 - signed comparison
* INT64 - signed comparison
* INT96 (only used for legacy timestamps) - unsigned comparison
* FLOAT - signed comparison of the represented value
* DOUBLE - signed comparison of the represented value
* BYTE_ARRAY - unsigned byte-wise comparison
* FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
*/
1: TypeDefinedOrder TYPE_ORDER;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a custom sort order for Impala timestamp values until Int96 is removed from the parquet format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember in which context the discussion was but I think INT96 timestamps should be sortable correctly with signed comparison.

Copy link

@majetideepak majetideepak Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are referring to Impala timestamps, then I believe signed comparison is not sufficient.
Quoting from https://github.com/cloudera/Impala/blob/b402e342d42b60ff3d01e87d83e9bfba635488cf/tests/util/get_parquet_metadata.py:75

# Impala writes timestamps as 12-byte values. The first 8 byte store a
# boost::posix_time::time_duration, which is the time within the current day in
# nanoseconds stored as int64. The last 4 bytes store a boost::gregorian::date,
# which is the Julian date, stored as utin32.

The comparator should compare the date value first (last 4 bytes) and then the nanoseconds (first 8 bytes)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that probably someone else also should review (cc @rdblue ;) ) but I guess that due to storing the values as little endian, this should be correct. Please don't take this for granted, I hadn't had to deal with endianness in the last years explicitly, so I might have good this wrong.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt if little endian layout that works at the byte level can help with this multi-byte value comparison.

Copy link
Contributor Author

@zivanfi zivanfi Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the timestamp '2000-01-01 12:34:56' stored as an int96:

$ parquet-tools dump hdfs://n1/user/hive/warehouse/test/11481f03a2ea6bed-b19656cd00000000_1937418586_data.0.parq | tail -n 1
value 1: R:0 D:1 V:117253024523396126668760320

Since 117253024523396126668760320 = 0x60FD4B3229000059682500, the 12 bytes are 00 60 FD 4B 32 29 00 00 | 59 68 25 00, where | shows the boundary between the time and the date parts.

00 60 FD 4B 32 29 00 00 is the time part, if we reverse the bytes we get 0x000029324BFD6000 = 45296 * 10^9 nanoseconds = 45296 seconds = 12 hours + 34 minutes + 56 seconds.

59 68 25 00 is the date part, if we reverse the bytes we get 0x00256859 = 2451545 as the Julian day number, which corresponds to 2000-01-01.

To correctly sort these values without interpreting them as timestamps, the bigger unit (date) should precede the smaller unit (time). In this case they are in the opposite order, but they are also stored with little-endian byte-order (individually), which means that they will be in the correct order if we interpret the whole value in a little-endian manner. So, for correct ordering based purely on numerical value, in comparisons the example above should not be interpreted as 0x0060FD4B3229000059682500 = 117253024523396126668760320 like parquet-tools did, but as 0x00256859000029324BFD6000 = 45223023200227578716446720 instead. Or, to put it more simply, a byte-by-byte comparison starting from the end of the values results in the correct ordering.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! An Int96 intrinsic hardware type would have handled the value as is. Otherwise a byte-by-byte comparison in the reverse order is needed.

}

Expand Down Expand Up @@ -626,11 +652,16 @@ struct FileMetaData {
6: optional string created_by

/**
* Sort order used for each column in this file.
* Sort order used for the min_value and max_value fields of each column in
* this file. Each sort order corresponds to one column, determined by its
* position in the list, matching the position of the column in the schema.
*
* Without column_orders, the meaning of the min_value and max_value fields is
* undefined. To ensure well-defined behaviour, if min_value and max_value are
* written to a Parquet file, column_orders must be written as well.
*
* If this list is not present, then the order for each column is assumed to
* be Signed. In addition, min and max values for INTERVAL or DECIMAL stored
* as fixed or bytes should be ignored.
* The obsolete min and max fields are always sorted by signed comparison
* regardless of column_orders.
*/
7: optional list<ColumnOrder> column_orders;
}
Expand Down