Spec: Add v3 types and type promotion #10955

rdblue · 2024-08-16T23:32:43Z

This PR includes the changes to v3 for type promotion. It also includes some changes to introduce unknown and variant types to show what the type promotion language will be for those types.

I'll remove the type additions before merging. We probably want those in separate commits.

format/spec.md

jacobmarble · 2024-08-21T16:24:23Z

format/spec.md

+| `unknown`        |                              | _any type_                                 | |
+| `int`            | `long`                       | `long`, `string`                           | |
+| `long`           |                              | `string`                                   | |
+| `date`           |                              | `timestamp`, `timestamp_ns`                | Promotion to `timestamptz` or `timestamptz_ns` is **not** allowed |


How about long => timestamp/timestamp_ns?

We wanted to do this (although it is closer to timestamptz) but that hits the problem covered by the next paragraph and table. You can't tell whether a lower/upper bound value was written as a timestamptz or a long. If both values are in microseconds, we'd be fine. But most people store timestamps as long in milliseconds. That's what we were hoping to support with long to timestamp promotion, but it won't work until we have more information about the original type of the bounds.

format/spec.md

RussellSpitzer · 2024-09-19T21:21:43Z

format/spec.md


 | Added by version | Primitive type     | Description                                                              | Requirements                                     |
 |------------------|--------------------|--------------------------------------------------------------------------|--------------------------------------------------|
+| [v3](#version-3) | **`unknown`**      | Default / null column type used when a more specific type is not known   | Must be optional with `null` defaults            |


I think I missed where this was proposed, do we have a doc anywhere explaining the use case for this?

This is for cases where you create a table from example data, but have only seen null values for a column. The column exists (and should be accessible) but the table doesn't know what the type is yet.

does it pay to say this type may not be used for equality deletes, ordering, etc?

I don't think so. We have well-defined ways to handle null values in those cases.

RussellSpitzer · 2024-09-19T21:24:39Z

format/spec.md

-* `int` to `long`
-* `float` to `double`
-* `decimal(P, S)` to `decimal(P', S)` if `P' > P` -- widen the precision of decimal types.
+Iceberg's Avro manifest format does not store the type of lower and upper bounds, and type promotion does not rewrite existing bounds. For example, when a `float` is promoted to `double`, existing data file bounds are encoded as 4 little-endian bytes rather than 8 little-endian bytes for `double`. To correctly decode the value, the original type at the time the file was written must be inferred according to the following table:


We discussed this a little, but I want to double check. We aren't allowing any promotion to variant here correct? It would basically be impossible to reverse engineer the original write time metric so I assume we are giving up on that. It could be possible in the future if we change the metric layout.

Yes, if we want to have variant upper and lower bounds, then we cannot have type promotion from primitive types to variant.

RussellSpitzer · 2024-09-19T21:26:08Z

format/spec.md

+| `timestamp_ns`   | 8 bytes          | `timestamp_ns`              |
+| `decimal(P, S)`  | _any_            | `decimal(P', S)`; `P' <= P` |
+
+Type promotion is not allowed for a field that is referenced by `source-id` or `source-ids` of a partition field if the partition transform would produce a different value after promoting the type. For example, `bucket[N]` produces different hash values for `34` and `"34"` (2017239379 != -427558391) but the same value for `34` and `34L`; when an `int` field is the source for a bucket partition field, it may be promoted to `long` but not to `string`.


I think the example here is now not allowed at all since you cannot do a int -> string by definition.

I agree. There aren't any cases where a type promotion would violate this rule, but I think it is still valuable to add it so that it is clear now and in future versions that even if the type promotion is valid, it is disallowed if used in a transform that would change.

commented below, and I couldn't be thinking about this incorrectly, but I think bucket fails for date->timestamp since the hash functions are different (date appears to has on day count, and timestamp using microseconds count)

format/spec.md

RussellSpitzer

One other thing to update here is I don't think we want to define equality (or hash) for Variant type so we should exclude Variant from the identity partition transform.

rdblue · 2024-09-27T21:29:17Z

@aihuaxu, @RussellSpitzer, I've removed variant from this PR so that we can make progress on the type promotion. Now this just includes unknown and the new promotions. We can make progress on variant separately once the Parquet community standardizes it more.

emkornfield · 2024-09-27T22:12:28Z

format/spec.md

+| `timestamp_ns`   | 8 bytes          | `timestamp_ns`              |
+| `decimal(P, S)`  | _any_            | `decimal(P', S)`; `P' <= P` |
+
+Type promotion is not allowed for a field that is referenced by `source-id` or `source-ids` of a partition field if the partition transform would produce a different value after promoting the type. For example, `bucket[N]` produces different hash values for `34` and `"34"` (2017239379 != -427558391) but the same value for `34` and `34L`; when an `int` field is the source for a bucket partition field, it may be promoted to `long` but not to `string`.


Have we analyzed the impact on other places type IDs (i.e. should this be expanded). equality deletes (I think this is OK but there might be edge cases for dropped columns), table level statistics since they use theta sketches.

Also, it might pay to spell out the list of type promotion + transform pairings that are explicitly disallowed.

There is also a concern for order by columns if the comparison order of the type promotion changes (for the current set of transforms proposed, I don't think this is a problem).

Right now, there are no allowed type promotions (even after this PR) that violate the rule. This is just for future cases (see my reply to @RussellSpitzer).

This concern has been around since the beginning of the format and is why the hash definition requires that ints and longs produce the same hash value. In that time, I've not come across any other areas that have similar requirements. This is the only case because this is related to the correctness of filtering. Other areas that are similar (like the use of transforms in ordering) are not risky because they are not involved in filtering.

Doesn't date to timestamp fail for bucket here, I think this necessitates a multiplication which should change the hash value? Isn't is also a problem theta sketch in puffin files for stats, I guess this is a pre-existing problem since single-value serialization is used, so even int->long would cause some amount of inaccuracy?

(sorry for the multiple edits) I think date->timestamps_ns might yield different results depending on how/if overflow for the conversion is handled.

@rdblue sorry for the yet another comment here but I wanted to address:

This is the only case because this is related to the correctness of filtering. Other areas that are similar (like the use of transforms in ordering) are not risky because they are not involved in filtering.

I'm not sure this is accurate:

For sketches it might be OK to potentially have an estimate that is off by up to 2x but it should probably be clarified. As you point out it wouldn't affect correctness necessarily but could in som cases affect performance. I'll follow-up on the mailing list about this.

If a file is said to be ordered, then I think it would be reasonable for an engine doing a point lookup on an ordered column (e.g. sorted_col_x = 'some_id') to stop scanning after it encountered a value greater then "some_id". So while the ordering doesn't impact metadata filtering, it would violate a contract that some engines might rely on.

The second point falls into the category as not something that is possible with the current set of transforms described in the PR so can probably wait.

Yes, you're right that date to timestamp promotion would hit this case. Good catch (and I'm glad I left this in!).

You're also right that this would be a problem for the theta sketch definition. We should put some extra work in there to mitigate problems like this. When a type is promoted, we need to be able to detect and discard the existing sketch and replace it.

For #2, it sounds like the problem would occur when the ordering changes. Or in other words, if the transformation from one type to another is not monotonic. That's probably a good thing to call out as well, or at least look for when we add new type promotion cases. I don't think any of our existing cases would hit that situation, right? We should be careful about changes that alter the order.

For #2, it sounds like the problem would occur when the ordering changes. Or in other words, if the transformation from one type to another is not monotonic. That's probably a good thing to call out as well, or at least look for when we add new type promotion cases. I don't think any of our existing cases would hit that situation, right? We should be careful about changes that alter the order.

Correct, but the example given (number->string) is not in fact 'monotonic'. If we remove this example and just enumerate bucket transform is not valid that is probably sufficient for this PR.

Do we really need to change the example? It is demonstrating the point that a bucket function might change even if the value is equivalent. If we change it, then the other example (int to long) no longer makes sense. I think I'd prefer to keep the current examples. I'm not worried about the sorting use case needing to be called out here just because the sort order for that example changes.

I guess we don't need to change the example. It might be nice to document the sort order exception as well so it isn't forgotten if current reviewers aren't around and it doesn't come up. I do think for the specification we should explicitly call out which transforms fall into into this category (i.e. add the date->timestamp* for bucket[N] transform so implementors don't need to figure it out themselves)

I added the case where this is known to happen.

danielcweeks · 2024-09-30T17:33:55Z

format/spec.md


 The `initial-default` and `write-default` produce SQL default value behavior, without rewriting data files. SQL default value behavior when a field is added handles all existing rows as though the rows were written with the new field's default value. Default value changes may only affect future records and all known fields are written into data files. Omitting a known field when writing a data file is never allowed. The write default for a field must be written if a field is not supplied to a write. If the write default for a required field is not set, the writer must fail.

+All columns of `unknown` type must default to null. Non-null values for `initial-default` or `write-default` are invalid.


I'm a little confused about how physical columns in datafiles should be handled if they are present. If there is a column with a matching field id and values present, should those be materialized as null as well?

Further down, it states that these columns should be omitted from ORC and Parquet files. Avro can keep the type in the schema, but the serialized format ends up being the same.

It probably makes sense to update the column projection rules to specify this type shouldn't be matched? At least for old data (i.e. migrated from hive tables), parquet supports UNKNOWN logical types as well.

What do you mean? How would this type be matched in column projection? If we're reading a Parquet file with a matching column ID, then it is invalid because you can't write these columns to Parquet. I don't think that we would specifically ignore that case for an unknown column. Maybe it should be an error?

@rdblue I think one could potentially import parquet files that have an existing column annotated as a logical type "Unknown". I'd expect for these files the inferred schema would be the unknown iceberg type as well. So they could in theory be read from the files (via column name mapping) but we potentially want to skip them, do avoid any complication.

I've added a note in the Parquet Appendix that Parquet columns that correspond to unknown columns must be ignored and replaced with null. I think that's the cleanest location for this.

danielcweeks · 2024-09-30T17:35:17Z

The description still says we're introducing variant, but I assume that's out-of-date and being handled separately?

emkornfield · 2024-09-30T23:17:33Z

format/spec.md

+|------------------|------------------------------|------------------------------|--------------|
+| `unknown`        |                              | _any type_                   | |
+| `int`            | `long`                       | `long`                       | |
+| `date`           |                              | `timestamp`, `timestamp_ns`  | Promotion to `timestamptz` or `timestamptz_ns` is **not** allowed |


Should we specify how overflow should be handled for date->timestamp_ns ?

Yes, we should. Should it fail or result in null?

After thinking about this, I'm going to update it to require failure when there is an overflow. That doesn't introduce any issues with required fields, where engines may not be expecting a null value.

The other option would be return the MIN/MAX value as appropriated depending on the which side the date is out of range. I agree returning an error is probably the least likely to cause bugs (I haven't through through all implications of mutating values).

I agree Null's is not viable, in addition to required columns, it potentially breaks ordering since it would conflate overflow in both directions into the same class.

emkornfield · 2024-10-09T05:58:20Z

format/spec.md


 | Primitive type     | Hash specification                        | Test value                                 |
 |--------------------|-------------------------------------------|--------------------------------------------|
+| **`unknown`**      | `hashInt(0)`                              |                                            |


Sorry I missed this on my prior reviews, but shouldn't unknown be treated as null and return null for any cases that require a hash (also was bucket[N] included to be defined for 'unknown' if not then I think the only other case is puffin files which are today always defined as the hash of the values)?

Yes, you're right. I'll fix it.

emkornfield

Thanks for addressing my feedback.

rdblue · 2024-10-09T21:19:35Z

Thanks for the reviews!

rdblue requested a review from Fokko August 16, 2024 23:32

github-actions bot added the Specification Issues that may introduce spec changes. label Aug 16, 2024

rdblue requested review from amogh-jahagirdar and danielcweeks August 16, 2024 23:33

jacobmarble reviewed Aug 21, 2024

View reviewed changes

rdblue requested a review from RussellSpitzer September 11, 2024 23:02

rdblue force-pushed the spec-v3-type-promotion branch from f68559d to da2b721 Compare September 11, 2024 23:07

aihuaxu reviewed Sep 13, 2024

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

RussellSpitzer reviewed Sep 19, 2024

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

RussellSpitzer reviewed Sep 19, 2024

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

RussellSpitzer reviewed Sep 19, 2024

View reviewed changes

nastra added this to the Iceberg V3 Spec milestone Sep 25, 2024

rdblue force-pushed the spec-v3-type-promotion branch from bd35432 to c8aa57e Compare September 27, 2024 21:22

Spec: Add unknown type and additional type promotion cases.

b23f67b

rdblue force-pushed the spec-v3-type-promotion branch from c8aa57e to b23f67b Compare September 27, 2024 21:23

emkornfield reviewed Sep 27, 2024

View reviewed changes

danielcweeks reviewed Sep 30, 2024

View reviewed changes

Clarify that unknown type is not stored in data files.

9cebaaf

amogh-jahagirdar approved these changes Sep 30, 2024

View reviewed changes

emkornfield reviewed Sep 30, 2024

View reviewed changes

nastra approved these changes Oct 1, 2024

View reviewed changes

Update for Micah's suggestions.

8e388c7

emkornfield reviewed Oct 9, 2024

View reviewed changes

More fixes from review.

18ad2ef

emkornfield approved these changes Oct 9, 2024

View reviewed changes

rdblue merged commit 67dc9e5 into apache:main Oct 9, 2024
2 checks passed

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Spec: Add v3 types and type promotion (apache#10955)

f3cda9d


		The `initial-default` and `write-default` produce SQL default value behavior, without rewriting data files. SQL default value behavior when a field is added handles all existing rows as though the rows were written with the new field's default value. Default value changes may only affect future records and all known fields are written into data files. Omitting a known field when writing a data file is never allowed. The write default for a field must be written if a field is not supplied to a write. If the write default for a required field is not set, the writer must fail.

		All columns of `unknown` type must default to null. Non-null values for `initial-default` or `write-default` are invalid.

Spec: Add v3 types and type promotion #10955

Spec: Add v3 types and type promotion #10955

Uh oh!

Conversation

rdblue commented Aug 16, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

rdblue commented Sep 27, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emkornfield Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emkornfield Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielcweeks commented Sep 30, 2024

Uh oh!

Choose a reason for hiding this comment

rdblue Sep 27, 2024 •

edited

Loading

emkornfield Sep 30, 2024 •

edited

Loading

emkornfield Sep 30, 2024 •

edited

Loading