|
19 | 19 |
|
20 | 20 | # Parquet #
|
21 | 21 |
|
| 22 | +### Version 1.14.0 ### |
| 23 | + |
| 24 | +Release Notes - Parquet - Version 1.14.0 |
| 25 | + |
| 26 | +#### Bug |
| 27 | + |
| 28 | +* [PARQUET-2260](https://issues.apache.org/jira/browse/PARQUET-2260) - Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration |
| 29 | +* [PARQUET-2266](https://issues.apache.org/jira/browse/PARQUET-2266) - Fix support for files without ColumnIndexes |
| 30 | +* [PARQUET-2276](https://issues.apache.org/jira/browse/PARQUET-2276) - ParquetReader reads do not work with Hadoop version 2.8.5 |
| 31 | +* [PARQUET-2300](https://issues.apache.org/jira/browse/PARQUET-2300) - Update jackson-core 2.13.4 to a version without CVE PRISMA-2023-0067 |
| 32 | +* [PARQUET-2325](https://issues.apache.org/jira/browse/PARQUET-2325) - Fix parquet-cli's dictionary subcommand to work with FIXED_LEN_BYTE_ARRAY |
| 33 | +* [PARQUET-2329](https://issues.apache.org/jira/browse/PARQUET-2329) - Fix wrong help messages of parquet-cli subcommands |
| 34 | +* [PARQUET-2330](https://issues.apache.org/jira/browse/PARQUET-2330) - Fix convert-csv to show the correct position of the invalid record |
| 35 | +* [PARQUET-2332](https://issues.apache.org/jira/browse/PARQUET-2332) - Fix unexpectedly disabled tests to be executed |
| 36 | +* [PARQUET-2336](https://issues.apache.org/jira/browse/PARQUET-2336) - Add caching key to CodecFactory |
| 37 | +* [PARQUET-2342](https://issues.apache.org/jira/browse/PARQUET-2342) - Parquet writer produced a corrupted file due to page value count overflow |
| 38 | +* [PARQUET-2343](https://issues.apache.org/jira/browse/PARQUET-2343) - Fixes NPE when rewriting file with multiple rowgroups |
| 39 | +* [PARQUET-2348](https://issues.apache.org/jira/browse/PARQUET-2348) - Recompression/Re-encrypt should rewrite bloomfilter |
| 40 | +* [PARQUET-2354](https://issues.apache.org/jira/browse/PARQUET-2354) - Apparent race condition in CharsetValidator |
| 41 | +* [PARQUET-2363](https://issues.apache.org/jira/browse/PARQUET-2363) - ParquetRewriter should encrypt the V2 page header |
| 42 | +* [PARQUET-2365](https://issues.apache.org/jira/browse/PARQUET-2365) - Fixes NPE when rewriting column without column index |
| 43 | +* [PARQUET-2408](https://issues.apache.org/jira/browse/PARQUET-2408) - Fix license header in .gitattributes |
| 44 | +* [PARQUET-2420](https://issues.apache.org/jira/browse/PARQUET-2420) - ThriftParquetWriter converts thrift byte to int32 without adding logical type |
| 45 | +* [PARQUET-2429](https://issues.apache.org/jira/browse/PARQUET-2429) - Direct buffer churn in NonBlockedDecompressor |
| 46 | +* [PARQUET-2438](https://issues.apache.org/jira/browse/PARQUET-2438) - Fixes minMaxSize for BinaryColumnIndexBuilder |
| 47 | +* [PARQUET-2442](https://issues.apache.org/jira/browse/PARQUET-2442) - Remove Parquet Site from parquet-mr |
| 48 | +* [PARQUET-2448](https://issues.apache.org/jira/browse/PARQUET-2448) - parquet-avro does not support nested logical-type for avro <= 1.8 |
| 49 | +* [PARQUET-2449](https://issues.apache.org/jira/browse/PARQUET-2449) - Writing using LocalOutputFile creates a large buffer |
| 50 | +* [PARQUET-2450](https://issues.apache.org/jira/browse/PARQUET-2450) - ParquetAvroReader throws exception projecting a single field of a repeated record type |
| 51 | +* [PARQUET-2456](https://issues.apache.org/jira/browse/PARQUET-2456) - avro schema conversion may fail with name conflict when using fixed types |
| 52 | +* [PARQUET-2457](https://issues.apache.org/jira/browse/PARQUET-2457) - Missing maven-scala-plugin version |
| 53 | +* [PARQUET-2458](https://issues.apache.org/jira/browse/PARQUET-2458) - Java compiler should use release instead of source/target |
| 54 | + |
| 55 | +#### New Feature |
| 56 | + |
| 57 | +* [PARQUET-1647](https://issues.apache.org/jira/browse/PARQUET-1647) - Java support for Arrow's float16 |
| 58 | +* [PARQUET-2171](https://issues.apache.org/jira/browse/PARQUET-2171) - Implement vectored IO in parquet file format |
| 59 | +* [PARQUET-2318](https://issues.apache.org/jira/browse/PARQUET-2318) - Implement a tool to list page headers |
| 60 | + |
| 61 | +#### Improvement |
| 62 | + |
| 63 | +* [PARQUET-1629](https://issues.apache.org/jira/browse/PARQUET-1629) - Page-level CRC checksum verification for DataPageV2 |
| 64 | +* [PARQUET-1822](https://issues.apache.org/jira/browse/PARQUET-1822) - Parquet without Hadoop dependencies |
| 65 | +* [PARQUET-1942](https://issues.apache.org/jira/browse/PARQUET-1942) - Bump Apache Arrow 2.0.0 |
| 66 | +* [PARQUET-2060](https://issues.apache.org/jira/browse/PARQUET-2060) - Parquet corruption can cause infinite loop with Snappy |
| 67 | +* [PARQUET-2212](https://issues.apache.org/jira/browse/PARQUET-2212) - Add ByteBuffer api for decryptors to allow direct memory to be decrypted |
| 68 | +* [PARQUET-2254](https://issues.apache.org/jira/browse/PARQUET-2254) - Build a BloomFilter with a more precise size |
| 69 | +* [PARQUET-2263](https://issues.apache.org/jira/browse/PARQUET-2263) - Upgrade maven-shade-plugin to 3.4.1 |
| 70 | +* [PARQUET-2265](https://issues.apache.org/jira/browse/PARQUET-2265) - AvroParquetWriter should default to data supplier model from Configuration |
| 71 | +* [PARQUET-2267](https://issues.apache.org/jira/browse/PARQUET-2267) - Add dependabot to update dependencies |
| 72 | +* [PARQUET-2268](https://issues.apache.org/jira/browse/PARQUET-2268) - Bump Thrift to 0.18.1 |
| 73 | +* [PARQUET-2272](https://issues.apache.org/jira/browse/PARQUET-2272) - Bump protobuf-java from 3.17.3 to 3.19.6 |
| 74 | +* [PARQUET-2273](https://issues.apache.org/jira/browse/PARQUET-2273) - Remove Travis from the repository |
| 75 | +* [PARQUET-2274](https://issues.apache.org/jira/browse/PARQUET-2274) - Remove Yetus |
| 76 | +* [PARQUET-2275](https://issues.apache.org/jira/browse/PARQUET-2275) - Upgrade `cyclonedx-maven-plugin` to 2.7.6 |
| 77 | +* [PARQUET-2277](https://issues.apache.org/jira/browse/PARQUET-2277) - Bump hadoop.version from 3.2.3 to 3.3.5 |
| 78 | +* [PARQUET-2278](https://issues.apache.org/jira/browse/PARQUET-2278) - Bump re2j from 1.1 to 1.7 |
| 79 | +* [PARQUET-2279](https://issues.apache.org/jira/browse/PARQUET-2279) - Bump slf4j.version from 1.7.22 to 1.7.33 |
| 80 | +* [PARQUET-2280](https://issues.apache.org/jira/browse/PARQUET-2280) - Bump h2 from 2.1.210 to 2.1.214 |
| 81 | +* [PARQUET-2282](https://issues.apache.org/jira/browse/PARQUET-2282) - Dont initialize HadoopCodec |
| 82 | +* [PARQUET-2283](https://issues.apache.org/jira/browse/PARQUET-2283) - Remove Hadoop HiddenFileFilter |
| 83 | +* [PARQUET-2290](https://issues.apache.org/jira/browse/PARQUET-2290) - Add CI for Hadoop 2 |
| 84 | +* [PARQUET-2291](https://issues.apache.org/jira/browse/PARQUET-2291) - Remove lingering japicmp exclusions |
| 85 | +* [PARQUET-2292](https://issues.apache.org/jira/browse/PARQUET-2292) - Improve default SpecificRecord model selection for Avro{Write,Read}Support |
| 86 | +* [PARQUET-2293](https://issues.apache.org/jira/browse/PARQUET-2293) - Bump guava from 27.0.1-jre to 31.1-jre |
| 87 | +* [PARQUET-2294](https://issues.apache.org/jira/browse/PARQUET-2294) - Bump fastutil from 8.4.2 to 8.5.12 |
| 88 | +* [PARQUET-2295](https://issues.apache.org/jira/browse/PARQUET-2295) - Bump truth-proto-extension from 1.0 to 1.1.3 |
| 89 | +* [PARQUET-2296](https://issues.apache.org/jira/browse/PARQUET-2296) - Bump easymock from 3.4 to 5.1.0 |
| 90 | +* [PARQUET-2297](https://issues.apache.org/jira/browse/PARQUET-2297) - Encrypted files should not be checked for delta encoding problem |
| 91 | +* [PARQUET-2301](https://issues.apache.org/jira/browse/PARQUET-2301) - Add missing argument in ParquetRewriter logging |
| 92 | +* [PARQUET-2302](https://issues.apache.org/jira/browse/PARQUET-2302) - Bump joda-time from 2.9.7 to 2.12.5 |
| 93 | +* [PARQUET-2303](https://issues.apache.org/jira/browse/PARQUET-2303) - Bump cyclonedx-maven-plugin from 2.7.6 to 2.7.9 |
| 94 | +* [PARQUET-2304](https://issues.apache.org/jira/browse/PARQUET-2304) - Bump buildnumber-maven-plugin from 1.1 to 3.1.0 |
| 95 | +* [PARQUET-2305](https://issues.apache.org/jira/browse/PARQUET-2305) - Allow Parquet to Proto conversion even though Target Schema has less fields |
| 96 | +* [PARQUET-2307](https://issues.apache.org/jira/browse/PARQUET-2307) - Bump zero-allocation-hashing from 0.9 to 0.16 |
| 97 | +* [PARQUET-2308](https://issues.apache.org/jira/browse/PARQUET-2308) - Bump powermock.version from 2.0.2 to 2.0.9 |
| 98 | +* [PARQUET-2309](https://issues.apache.org/jira/browse/PARQUET-2309) - Bump site-maven-plugin from 0.8 to 0.12 |
| 99 | +* [PARQUET-2312](https://issues.apache.org/jira/browse/PARQUET-2312) - Bump snappy-java from 1.1.8.3 to 1.1.10.1 in /parquet-hadoop |
| 100 | +* [PARQUET-2314](https://issues.apache.org/jira/browse/PARQUET-2314) - Bump jackson.version from 2.15.0 to 2.15.2 |
| 101 | +* [PARQUET-2319](https://issues.apache.org/jira/browse/PARQUET-2319) - Upgrade Avro to version 1.11.2 |
| 102 | +* [PARQUET-2320](https://issues.apache.org/jira/browse/PARQUET-2320) - Bump jackson-databind from 2.14.2 to 2.15.2 |
| 103 | +* [PARQUET-2322](https://issues.apache.org/jira/browse/PARQUET-2322) - Bump h2 from 2.1.214 to 2.2.220 in /parquet-column |
| 104 | +* [PARQUET-2324](https://issues.apache.org/jira/browse/PARQUET-2324) - Bump cobertura-maven-plugin from 2.5.2 to 2.7 |
| 105 | +* [PARQUET-2326](https://issues.apache.org/jira/browse/PARQUET-2326) - Bump jcommander from 1.72 to 1.82 |
| 106 | +* [PARQUET-2328](https://issues.apache.org/jira/browse/PARQUET-2328) - Add overwrite option to the parquet-cli's rewrite subcommand |
| 107 | +* [PARQUET-2331](https://issues.apache.org/jira/browse/PARQUET-2331) - Allow convert-csv to take multiple input files |
| 108 | +* [PARQUET-2333](https://issues.apache.org/jira/browse/PARQUET-2333) - Support bzip2 and xz compressions in the to-avro subcommand |
| 109 | +* [PARQUET-2334](https://issues.apache.org/jira/browse/PARQUET-2334) - Allow the cat subcommand to take multiple files |
| 110 | +* [PARQUET-2335](https://issues.apache.org/jira/browse/PARQUET-2335) - Allow the scan subcommand to take multiple files |
| 111 | +* [PARQUET-2347](https://issues.apache.org/jira/browse/PARQUET-2347) - Add interface layer between Parquet and Hadoop Configuration |
| 112 | +* [PARQUET-2349](https://issues.apache.org/jira/browse/PARQUET-2349) - Move from deprecated BytesCompressor/Decompressor to BytesInputCompressor/Decompressor |
| 113 | +* [PARQUET-2357](https://issues.apache.org/jira/browse/PARQUET-2357) - Modest refactor of CapacityByteArrayOutputStream |
| 114 | +* [PARQUET-2359](https://issues.apache.org/jira/browse/PARQUET-2359) - Simple Parquet Configuration implementation |
| 115 | +* [PARQUET-2364](https://issues.apache.org/jira/browse/PARQUET-2364) - Encrypt all columns option |
| 116 | +* [PARQUET-2366](https://issues.apache.org/jira/browse/PARQUET-2366) - Optimize random seek during rewriting |
| 117 | +* [PARQUET-2368](https://issues.apache.org/jira/browse/PARQUET-2368) - Update japicmp to 1.18.1 |
| 118 | +* [PARQUET-2370](https://issues.apache.org/jira/browse/PARQUET-2370) - Crypto factory activation of "all column encryption" mode |
| 119 | +* [PARQUET-2371](https://issues.apache.org/jira/browse/PARQUET-2371) - Resolve japicmp failure for CI |
| 120 | +* [PARQUET-2372](https://issues.apache.org/jira/browse/PARQUET-2372) - Avoid unnecessary reading of RowGroup data during rewriting |
| 121 | +* [PARQUET-2373](https://issues.apache.org/jira/browse/PARQUET-2373) - Improve I/O performance with bloom_filter_length |
| 122 | +* [PARQUET-2374](https://issues.apache.org/jira/browse/PARQUET-2374) - Add metrics support for parquet file reader |
| 123 | +* [PARQUET-2375](https://issues.apache.org/jira/browse/PARQUET-2375) - Extend vectorized bit unpacking benchmark for various bit sizes. |
| 124 | +* [PARQUET-2380](https://issues.apache.org/jira/browse/PARQUET-2380) - Decouple RewriteOptions from Hadoop classes |
| 125 | +* [PARQUET-2383](https://issues.apache.org/jira/browse/PARQUET-2383) - Bump parquet-format to 2.10.0 |
| 126 | +* [PARQUET-2384](https://issues.apache.org/jira/browse/PARQUET-2384) - Mark toOriginalType as deprecated |
| 127 | +* [PARQUET-2385](https://issues.apache.org/jira/browse/PARQUET-2385) - Don't initialize CodecFactory in ParquetWriter |
| 128 | +* [PARQUET-2386](https://issues.apache.org/jira/browse/PARQUET-2386) - More consistent code style in parquet-mr |
| 129 | +* [PARQUET-2387](https://issues.apache.org/jira/browse/PARQUET-2387) - Simplify `hasFieldsIgnored` expression |
| 130 | +* [PARQUET-2388](https://issues.apache.org/jira/browse/PARQUET-2388) - Deprecate `CHARSETS` on `PlainValuesWriter` |
| 131 | +* [PARQUET-2389](https://issues.apache.org/jira/browse/PARQUET-2389) - Remove redundant initializers |
| 132 | +* [PARQUET-2390](https://issues.apache.org/jira/browse/PARQUET-2390) - Replace anonymouse functions with lambda's |
| 133 | +* [PARQUET-2391](https://issues.apache.org/jira/browse/PARQUET-2391) - Remove unnecessary unboxing |
| 134 | +* [PARQUET-2392](https://issues.apache.org/jira/browse/PARQUET-2392) - Remove StringBuilder in `LogicalTypeAnnotation` |
| 135 | +* [PARQUET-2393](https://issues.apache.org/jira/browse/PARQUET-2393) - Make `ColumnIOCreatorVisitor` static |
| 136 | +* [PARQUET-2394](https://issues.apache.org/jira/browse/PARQUET-2394) - Use `computeIfAbsent` in `MessageColumnIO` |
| 137 | +* [PARQUET-2395](https://issues.apache.org/jira/browse/PARQUET-2395) - Prefer `singletonList` over `asList` |
| 138 | +* [PARQUET-2396](https://issues.apache.org/jira/browse/PARQUET-2396) - Refactor `ColumnIndexBuilder` |
| 139 | +* [PARQUET-2397](https://issues.apache.org/jira/browse/PARQUET-2397) - Make use of `isEmpty` |
| 140 | +* [PARQUET-2398](https://issues.apache.org/jira/browse/PARQUET-2398) - Make static variables final |
| 141 | +* [PARQUET-2399](https://issues.apache.org/jira/browse/PARQUET-2399) - Use deprecated tag in Javadoc |
| 142 | +* [PARQUET-2400](https://issues.apache.org/jira/browse/PARQUET-2400) - Update Spotless command in PR prompt to include vector plugins |
| 143 | +* [PARQUET-2401](https://issues.apache.org/jira/browse/PARQUET-2401) - Synchronize on final fields |
| 144 | +* [PARQUET-2406](https://issues.apache.org/jira/browse/PARQUET-2406) - Remove redundant valueOf calls |
| 145 | +* [PARQUET-2407](https://issues.apache.org/jira/browse/PARQUET-2407) - Add custom .asf.yaml for finer-grained control of email notifications |
| 146 | +* [PARQUET-2410](https://issues.apache.org/jira/browse/PARQUET-2410) - Use row count instead of value count to get row count from OffsetIndex |
| 147 | +* [PARQUET-2413](https://issues.apache.org/jira/browse/PARQUET-2413) - Support custom file footer metadata via ParquetWriter |
| 148 | +* [PARQUET-2417](https://issues.apache.org/jira/browse/PARQUET-2417) - Update NOTICE |
| 149 | +* [PARQUET-2419](https://issues.apache.org/jira/browse/PARQUET-2419) - Reduce noisy logging when running test suite |
| 150 | +* [PARQUET-2422](https://issues.apache.org/jira/browse/PARQUET-2422) - Prevent unwrapping of Hadoop filestreams |
| 151 | +* [PARQUET-2425](https://issues.apache.org/jira/browse/PARQUET-2425) - AvroSchemaConverter doesn't support non-grouped repeated fields |
| 152 | +* [PARQUET-2426](https://issues.apache.org/jira/browse/PARQUET-2426) - Add lz4_raw compression to README |
| 153 | +* [PARQUET-2428](https://issues.apache.org/jira/browse/PARQUET-2428) - Make RawPagesReader support specified columns |
| 154 | +* [PARQUET-2432](https://issues.apache.org/jira/browse/PARQUET-2432) - Use ByteBufferAllocator instead of hardcoded heap allocation |
| 155 | +* [PARQUET-2436](https://issues.apache.org/jira/browse/PARQUET-2436) - More optimal memory usage in compression codecs |
| 156 | +* [PARQUET-2437](https://issues.apache.org/jira/browse/PARQUET-2437) - Avoid flushing at Parquet writes after an exception |
| 157 | +* [PARQUET-2439](https://issues.apache.org/jira/browse/PARQUET-2439) - Upgrade ZSTD-JNI to 1.5.5-11 |
| 158 | +* [PARQUET-2445](https://issues.apache.org/jira/browse/PARQUET-2445) - Fix log exception when FieldsMarker.visitedIndexes is empty |
| 159 | +* [PARQUET-2446](https://issues.apache.org/jira/browse/PARQUET-2446) - ProtoParquetWriter Not Support DynamicMessage |
| 160 | +* [PARQUET-2451](https://issues.apache.org/jira/browse/PARQUET-2451) - Add BYTE_STREAM_SPLIT support for FIXED_LEN_BYTE_ARRAY, INT32 and INT64 |
| 161 | +* [PARQUET-2453](https://issues.apache.org/jira/browse/PARQUET-2453) - Add build-helper-maven-plugin for parquet-column/common module |
| 162 | +* [PARQUET-2454](https://issues.apache.org/jira/browse/PARQUET-2454) - Invoking flush before closing the output stream in ParquetFileWriter |
| 163 | +* [PARQUET-2463](https://issues.apache.org/jira/browse/PARQUET-2463) - Bump japicmp to 0.21.0 |
| 164 | + |
| 165 | +#### Test |
| 166 | + |
| 167 | +* [PARQUET-2361](https://issues.apache.org/jira/browse/PARQUET-2361) - Reduce failure rate of unit test testParquetFileWithBloomFilterWithFpp |
| 168 | + |
| 169 | +#### Task |
| 170 | + |
| 171 | +* [PARQUET-2418](https://issues.apache.org/jira/browse/PARQUET-2418) - Add integration test for BYTE_STREAM_SPLIT |
| 172 | + |
22 | 173 | ### Version 1.13.1 ###
|
23 | 174 |
|
24 | 175 | Release Notes - Parquet - Version 1.13.1
|
|
0 commit comments