Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
| alltypes_tiny_pages.parquet | small page sizes with dictionary encoding with page index from [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
| alltypes_tiny_pages_plain.parquet | small page sizes with plain encoding with page index [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
| rle_boolean_encoding.parquet | option boolean columns with RLE encoding |
| fixed_length_byte_array.parquet | optional FIXED_LENGTH_BYTE_ARRAY column with page index. See [fixed_length_byte_array.md](fixed_length_byte_array.md) for details. |
| datapage_v1-uncompressed-checksum.parquet | uncompressed INT32 columns in v1 data pages with a matching CRC |
| datapage_v1-snappy-compressed-checksum.parquet | compressed INT32 columns in v1 data pages with a matching CRC |
| datapage_v1-corrupt-checksum.parquet | uncompressed INT32 columns in v1 data pages with a mismatching CRC |
Expand Down
73 changes: 73 additions & 0 deletions data/fixed_length_byte_array.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->

`fixed_length_byte_array.parquet` is generated by parquet-mr version 1.13.0-SNAPSHOT.

It has a single column of fixed length byte array type with size 4.

In total there are 1000 values written in the descending order with some random nulls.

# File Metadata (from parquet-cli meta command)
```
File path: fixed_length_byte_array.parquet
Created by: parquet-mr version 1.13.0-SNAPSHOT (build d057b39d93014fe40f5067ee4a33621e65c91552)
Properties:
writer.model.name: example
Schema:
message schema {
required fixed_len_byte_array(4) flba_field;
}


Row group 0: count: 1000 3.84 B records start: 4 total(compressed): 3.749 kB total(uncompressed):3.749 kB
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
flba_field FIXED[4] _ _ 1000 3.84 B 105 "0x00000001" / "0x000003E8"
```

# Column Index (from parquet-cli column-index command)
```
row-group 0:
column index for column flba_field:
Boundary order: DESCENDING
null count min max
page-0 9 0x00000385 0x000003E8
page-1 9 0x00000321 0x00000384
page-2 19 0x000002BD 0x00000320
page-3 10 0x00000259 0x000002BC
page-4 13 0x000001F5 0x00000258
page-5 11 0x00000191 0x000001F4
page-6 11 0x0000012D 0x00000190
page-7 8 0x000000C9 0x0000012C
page-8 9 0x00000065 0x000000C8
page-9 6 0x00000001 0x00000064

offset index for column flba_field:
offset compressed size first row index
page-0 4 390 0
page-1 394 390 100
page-2 784 350 200
page-3 1134 386 300
page-4 1520 373 400
page-5 1893 382 500
page-6 2275 382 600
page-7 2657 394 700
page-8 3051 390 800
page-9 3441 402 900
```
Binary file added data/fixed_length_byte_array.parquet
Binary file not shown.