You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[v3](#version-3)|**`unknown`**| Default / null column type used when a more specific type is not known | Must be optional with `null` defaults; not stored in data files |
187
188
||**`boolean`**| True or false ||
188
189
||**`int`**| 32-bit signed integers | Can promote to `long`|
189
190
||**`long`**| 64-bit signed integers ||
@@ -221,6 +222,8 @@ The `initial-default` is set only when a field is added to an existing schema. T
221
222
222
223
The `initial-default` and `write-default` produce SQL default value behavior, without rewriting data files. SQL default value behavior when a field is added handles all existing rows as though the rows were written with the new field's default value. Default value changes may only affect future records and all known fields are written into data files. Omitting a known field when writing a data file is never allowed. The write default for a field must be written if a field is not supplied to a write. If the write default for a required field is not set, the writer must fail.
223
224
225
+
All columns of `unknown` type must default to null. Non-null values for `initial-default` or `write-default` are invalid.
226
+
224
227
Default values are attributes of fields in schemas and serialized with fields in the JSON format. See [Appendix C](#appendix-c-json-serialization).
225
228
226
229
@@ -230,11 +233,32 @@ Schemas may be evolved by type promotion or adding, deleting, renaming, or reord
230
233
231
234
Evolution applies changes to the table's current schema to produce a new schema that is identified by a unique schema ID, is added to the table's list of schemas, and is set as the table's current schema.
232
235
233
-
Valid type promotions are:
234
-
235
-
*`int` to `long`
236
-
*`float` to `double`
237
-
*`decimal(P, S)` to `decimal(P', S)` if `P' > P` -- widen the precision of decimal types.
236
+
Valid primitive type promotions are:
237
+
238
+
| Primitive type | v1, v2 valid type promotions | v3+ valid type promotions | Requirements |
|`date`||`timestamp`, `timestamp_ns`| Promotion to `timestamptz` or `timestamptz_ns` is **not** allowed; values outside the promoted type's range must result in a runtime failure |
243
+
|`float`|`double`|`double`||
244
+
|`decimal(P, S)`|`decimal(P', S)` if `P' > P`|`decimal(P', S)` if `P' > P`| Widen precision only |
245
+
246
+
Iceberg's Avro manifest format does not store the type of lower and upper bounds, and type promotion does not rewrite existing bounds. For example, when a `float` is promoted to `double`, existing data file bounds are encoded as 4 little-endian bytes rather than 8 little-endian bytes for `double`. To correctly decode the value, the original type at the time the file was written must be inferred according to the following table:
247
+
248
+
| Current type | Length of bounds | Inferred type at write time |
Type promotion is not allowed for a field that is referenced by `source-id` or `source-ids` of a partition field if the partition transform would produce a different value after promoting the type. For example, `bucket[N]` produces different hash values for `34` and `"34"` (2017239379 != -427558391) but the same value for `34` and `34L`; when an `int` field is the source for a bucket partition field, it may be promoted to `long` but not to `string`. This may happen for the following type promotion cases:
261
+
*`date` to `timestamp` or `timestamp_ns`
238
262
239
263
Any struct, including a top-level schema, can evolve through deleting fields, adding new fields, renaming existing fields, reordering existing fields, or promoting a primitive using the valid type promotions. Adding a new field assigns a new ID for that field and for any nested fields. Renaming an existing field must change the name, but not the field ID. Deleting a field removes it from the current schema. Field deletion cannot be rolled back unless the field was nullable or if the current snapshot has not changed.
240
264
@@ -949,6 +973,7 @@ Maps with non-string keys must use an array representation with the `map` logica
949
973
950
974
|Type|Avro type|Notes|
951
975
|--- |--- |--- |
976
+
|**`unknown`**|`null` or omitted||
952
977
|**`boolean`**|`boolean`||
953
978
|**`int`**|`int`||
954
979
|**`long`**|`long`||
@@ -1002,6 +1027,7 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo
1002
1027
1003
1028
| Type | Parquet physical type | Logical type | Notes |
|**`boolean`**|`0x00` for false, non-zero byte for true |
1271
1304
|**`int`**| Stored as 4-byte little-endian |
1272
1305
|**`long`**| Stored as 8-byte little-endian |
@@ -1319,10 +1352,11 @@ This serialization scheme is for storing single values as individual binary valu
1319
1352
### Version 3
1320
1353
1321
1354
Default values are added to struct fields in v3.
1355
+
1322
1356
* The `write-default` is a forward-compatible change because it is only used at write time. Old writers will fail because the field is missing.
1323
1357
* Tables with `initial-default` will be read correctly by older readers if `initial-default` is always null for optional fields. Otherwise, old readers will default optional columns with null. Old readers will fail to read required fields which are populated by `initial-default` because that default is not supported.
1324
1358
1325
-
Types `timestamp_ns` and `timestamptz_ns` are added in v3.
1359
+
Types `unknown`, `timestamp_ns`, and `timestamptz_ns` are added in v3.
1326
1360
1327
1361
All readers are required to read tables with unknown partition transforms, ignoring the unsupported partition fields when filtering.
1328
1362
@@ -1423,3 +1457,4 @@ Iceberg supports two types of histories for tables. A history of previous "curre
1423
1457
might indicate different snapshot IDs for a specific timestamp. The discrepancies can be caused by a variety of table operations (e.g. updating the `current-snapshot-id` can be used to set the snapshot of a table to any arbitrary snapshot, which might have a lineage derived from a table branch or no lineage at all).
1424
1458
1425
1459
When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata.
0 commit comments