Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
| alltypes_tiny_pages_plain.parquet | small page sizes with plain encoding with page index [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
| rle_boolean_encoding.parquet | option boolean columns with RLE encoding |
| fixed_length_byte_array.parquet | optional FIXED_LENGTH_BYTE_ARRAY column with page index. See [fixed_length_byte_array.md](fixed_length_byte_array.md) for details. |
| int32_with_null_pages.parquet | optional INT32 column with random null pages. See [int32_with_null_pages.md](int32_with_null_pages.md) for details. |
| datapage_v1-uncompressed-checksum.parquet | uncompressed INT32 columns in v1 data pages with a matching CRC |
| datapage_v1-snappy-compressed-checksum.parquet | compressed INT32 columns in v1 data pages with a matching CRC |
| datapage_v1-corrupt-checksum.parquet | uncompressed INT32 columns in v1 data pages with a mismatching CRC |
Expand Down
73 changes: 73 additions & 0 deletions data/int32_with_null_pages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->

`int32_with_null_pages.parquet` is generated by parquet-mr version 1.13.0-SNAPSHOT.

It has a single column of int32 type with 1000 values and page index enabled.

Both integer and null values are random generated. However, a null page is generated by purpose.

# File Metadata (from parquet-cli meta command)
```
File path: int32_with_null_pages.parquet
Created by: parquet-mr version 1.13.0-SNAPSHOT (build 433de8df33fcf31927f7b51456be9f53e64d48b9)
Properties:
writer.model.name: example
Schema:
message schema {
optional int32 int32_field;
}


Row group 0: count: 1000 3.33 B records start: 4 total(compressed): 3.250 kB total(uncompressed):3.250 kB
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
int32_field INT32 _ _ 1000 3.33 B 275 "-2136906554" / "2145722375"
```

# Column Index (from parquet-cli column-index command)
```
row-group 0:
column index for column int32_field:
Boundary order: UNORDERED
null count min max
page-0 8 -2135807632 2144701119
page-1 55 -2104090659 1745329571
page-2 100 <none> <none>
page-3 52 -2116849709 2077105757
page-4 16 -2048691758 2143189382
page-5 12 -2017923401 2087827129
page-6 5 -2136906554 2125689411
page-7 7 -2113313110 2145722375
page-8 8 -2046900272 2087168549
page-9 12 -1941944785 2078586537

offset index for column int32_field:
offset compressed size first row index
page-0 4 415 0
page-1 419 220 100
page-2 639 31 200
page-3 670 228 300
page-4 898 382 400
page-5 1280 402 500
page-6 1682 422 600
page-7 2104 411 700
page-8 2515 417 800
page-9 2932 400 900
```
Binary file added data/int32_with_null_pages.parquet
Binary file not shown.