Skip to content

Commit 94e8fda

Browse files
committed
Merge branch 'master' into PR-171
2 parents 0a9ac9f + 2e62764 commit 94e8fda

File tree

607 files changed

+15793
-6500
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

607 files changed

+15793
-6500
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,5 +27,5 @@ env:
2727
- HADOOP_PROFILE=default
2828
- HADOOP_PROFILE=hadoop-2
2929

30-
install: mvn install --batch-mode -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true > mvn_install.log || cat mvn_install.log
30+
install: mvn install --batch-mode -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true > mvn_install.log || mvn install --batch-mode -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true > mvn_install.log || (cat mvn_install.log && false)
3131
script: mvn test -P $HADOOP_PROFILE

CHANGES.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,124 @@
1919

2020
# Parquet #
2121

22+
### Version 1.6.0 ###
23+
24+
#### Bug
25+
26+
* [PARQUET-3](https://issues.apache.org/jira/browse/PARQUET-3) - tool to merge pull requests based on Spark
27+
* [PARQUET-4](https://issues.apache.org/jira/browse/PARQUET-4) - Use LRU caching for footers in ParquetInputFormat.
28+
* [PARQUET-8](https://issues.apache.org/jira/browse/PARQUET-8) - [parquet-scrooge] mvn eclipse:eclipse fails on parquet-scrooge
29+
* [PARQUET-9](https://issues.apache.org/jira/browse/PARQUET-9) - InternalParquetRecordReader will not read multiple blocks when filtering
30+
* [PARQUET-18](https://issues.apache.org/jira/browse/PARQUET-18) - Cannot read dictionary-encoded pages with all null values
31+
* [PARQUET-19](https://issues.apache.org/jira/browse/PARQUET-19) - NPE when an empty file is included in a Hive query that uses CombineHiveInputFormat
32+
* [PARQUET-21](https://issues.apache.org/jira/browse/PARQUET-21) - Fix reference to 'github-apache' in dev docs
33+
* [PARQUET-56](https://issues.apache.org/jira/browse/PARQUET-56) - Added an accessor for the Long column type in example Group
34+
* [PARQUET-62](https://issues.apache.org/jira/browse/PARQUET-62) - DictionaryValuesWriter dictionaries are corrupted by user changes.
35+
* [PARQUET-63](https://issues.apache.org/jira/browse/PARQUET-63) - Fixed-length columns cannot be dictionary encoded.
36+
* [PARQUET-66](https://issues.apache.org/jira/browse/PARQUET-66) - InternalParquetRecordWriter int overflow causes unnecessary memory check warning
37+
* [PARQUET-69](https://issues.apache.org/jira/browse/PARQUET-69) - Add committer doc and REVIEWERS files
38+
* [PARQUET-70](https://issues.apache.org/jira/browse/PARQUET-70) - PARQUET #36: Pig Schema Storage to UDFContext
39+
* [PARQUET-75](https://issues.apache.org/jira/browse/PARQUET-75) - String decode using 'new String' is slow
40+
* [PARQUET-80](https://issues.apache.org/jira/browse/PARQUET-80) - upgrade semver plugin version to 0.9.27
41+
* [PARQUET-82](https://issues.apache.org/jira/browse/PARQUET-82) - ColumnChunkPageWriteStore assumes pages are smaller than Integer.MAX\_VALUE
42+
* [PARQUET-88](https://issues.apache.org/jira/browse/PARQUET-88) - Fix pre-version enforcement.
43+
* [PARQUET-94](https://issues.apache.org/jira/browse/PARQUET-94) - ParquetScroogeScheme constructor ignores klass argument
44+
* [PARQUET-96](https://issues.apache.org/jira/browse/PARQUET-96) - parquet.example.data.Group is missing some methods
45+
* [PARQUET-97](https://issues.apache.org/jira/browse/PARQUET-97) - ProtoParquetReader builder factory method not static
46+
* [PARQUET-101](https://issues.apache.org/jira/browse/PARQUET-101) - Exception when reading data with parquet.task.side.metadata=false
47+
* [PARQUET-104](https://issues.apache.org/jira/browse/PARQUET-104) - Parquet writes empty Rowgroup at the end of the file
48+
* [PARQUET-106](https://issues.apache.org/jira/browse/PARQUET-106) - Relax InputSplit Protections
49+
* [PARQUET-107](https://issues.apache.org/jira/browse/PARQUET-107) - Add option to disable summary metadata aggregation after MR jobs
50+
* [PARQUET-114](https://issues.apache.org/jira/browse/PARQUET-114) - Sample NanoTime class serializes and deserializes Timestamp incorrectly
51+
* [PARQUET-122](https://issues.apache.org/jira/browse/PARQUET-122) - make parquet.task.side.metadata=true by default
52+
* [PARQUET-124](https://issues.apache.org/jira/browse/PARQUET-124) - parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
53+
* [PARQUET-132](https://issues.apache.org/jira/browse/PARQUET-132) - AvroParquetInputFormat should use a parameterized type
54+
* [PARQUET-135](https://issues.apache.org/jira/browse/PARQUET-135) - Input location is not getting set for the getStatistics in ParquetLoader when using two different loaders within a Pig script.
55+
* [PARQUET-136](https://issues.apache.org/jira/browse/PARQUET-136) - NPE thrown in StatisticsFilter when all values in a string/binary column trunk are null
56+
* [PARQUET-142](https://issues.apache.org/jira/browse/PARQUET-142) - parquet-tools doesn't filter \_SUCCESS file
57+
* [PARQUET-145](https://issues.apache.org/jira/browse/PARQUET-145) - InternalParquetRecordReader.close() should not throw an exception if initialization has failed
58+
* [PARQUET-150](https://issues.apache.org/jira/browse/PARQUET-150) - Merge script requires ':' in PR names
59+
* [PARQUET-157](https://issues.apache.org/jira/browse/PARQUET-157) - Divide by zero in logging code
60+
* [PARQUET-159](https://issues.apache.org/jira/browse/PARQUET-159) - paquet-hadoop tests fail to compile
61+
* [PARQUET-162](https://issues.apache.org/jira/browse/PARQUET-162) - ParquetThrift should throw when unrecognized columns are passed to the column projection API
62+
* [PARQUET-168](https://issues.apache.org/jira/browse/PARQUET-168) - Wrong command line option description in parquet-tools
63+
* [PARQUET-173](https://issues.apache.org/jira/browse/PARQUET-173) - StatisticsFilter doesn't handle And properly
64+
* [PARQUET-174](https://issues.apache.org/jira/browse/PARQUET-174) - Fix Java6 compatibility
65+
* [PARQUET-176](https://issues.apache.org/jira/browse/PARQUET-176) - Parquet fails to parse schema contains '\r'
66+
* [PARQUET-180](https://issues.apache.org/jira/browse/PARQUET-180) - Parquet-thrift compile issue with 0.9.2.
67+
* [PARQUET-184](https://issues.apache.org/jira/browse/PARQUET-184) - Add release scripts and documentation
68+
* [PARQUET-186](https://issues.apache.org/jira/browse/PARQUET-186) - Poor performance in SnappyCodec because of string concat in tight loop
69+
* [PARQUET-187](https://issues.apache.org/jira/browse/PARQUET-187) - parquet-scrooge doesn't compile under 2.11
70+
* [PARQUET-188](https://issues.apache.org/jira/browse/PARQUET-188) - Parquet writes columns out of order (compared to the schema)
71+
* [PARQUET-189](https://issues.apache.org/jira/browse/PARQUET-189) - Support building parquet with thrift 0.9.0
72+
* [PARQUET-196](https://issues.apache.org/jira/browse/PARQUET-196) - parquet-tools command to get rowcount & size
73+
* [PARQUET-197](https://issues.apache.org/jira/browse/PARQUET-197) - parquet-cascading and the mapred API does not create metadata file
74+
* [PARQUET-202](https://issues.apache.org/jira/browse/PARQUET-202) - Typo in the connection info in the pom prevents publishing an RC
75+
* [PARQUET-207](https://issues.apache.org/jira/browse/PARQUET-207) - ParquetInputSplit end calculation bug
76+
* [PARQUET-208](https://issues.apache.org/jira/browse/PARQUET-208) - revert PARQUET-197
77+
* [PARQUET-214](https://issues.apache.org/jira/browse/PARQUET-214) - Avro: Regression caused by schema handling
78+
* [PARQUET-215](https://issues.apache.org/jira/browse/PARQUET-215) - Parquet Thrift should discard records with unrecognized union members
79+
* [PARQUET-216](https://issues.apache.org/jira/browse/PARQUET-216) - Decrease the default page size to 64k
80+
* [PARQUET-217](https://issues.apache.org/jira/browse/PARQUET-217) - Memory Manager's min allocation heuristic is not valid for schemas with many columns
81+
* [PARQUET-232](https://issues.apache.org/jira/browse/PARQUET-232) - minor compilation issue
82+
* [PARQUET-234](https://issues.apache.org/jira/browse/PARQUET-234) - Restore ParquetInputSplit methods from 1.5.0
83+
* [PARQUET-235](https://issues.apache.org/jira/browse/PARQUET-235) - Fix compatibility of parquet.metadata with 1.5.0
84+
* [PARQUET-236](https://issues.apache.org/jira/browse/PARQUET-236) - Check parquet-scrooge compatibility
85+
* [PARQUET-237](https://issues.apache.org/jira/browse/PARQUET-237) - Check ParquetWriter constructor compatibility with 1.5.0
86+
* [PARQUET-239](https://issues.apache.org/jira/browse/PARQUET-239) - Make AvroParquetReader#builder() static
87+
* [PARQUET-242](https://issues.apache.org/jira/browse/PARQUET-242) - AvroReadSupport.setAvroDataSupplier is broken
88+
89+
#### Improvement
90+
91+
* [PARQUET-2](https://issues.apache.org/jira/browse/PARQUET-2) - Adding Type Persuasion for Primitive Types
92+
* [PARQUET-25](https://issues.apache.org/jira/browse/PARQUET-25) - Pushdown predicates only work with hardcoded arguments
93+
* [PARQUET-52](https://issues.apache.org/jira/browse/PARQUET-52) - Improve the encoding fall back mechanism for Parquet 2.0
94+
* [PARQUET-57](https://issues.apache.org/jira/browse/PARQUET-57) - Make dev commit script easier to use
95+
* [PARQUET-61](https://issues.apache.org/jira/browse/PARQUET-61) - Avoid fixing protocol events when there is not required field missing
96+
* [PARQUET-74](https://issues.apache.org/jira/browse/PARQUET-74) - Use thread local decoder cache in Binary toStringUsingUTF8()
97+
* [PARQUET-79](https://issues.apache.org/jira/browse/PARQUET-79) - Add thrift streaming API to read metadata
98+
* [PARQUET-84](https://issues.apache.org/jira/browse/PARQUET-84) - Add an option to read the rowgroup metadata on the task side.
99+
* [PARQUET-87](https://issues.apache.org/jira/browse/PARQUET-87) - Better and unified API for projection pushdown on cascading scheme
100+
* [PARQUET-89](https://issues.apache.org/jira/browse/PARQUET-89) - All Parquet CI tests should be run against hadoop-2
101+
* [PARQUET-92](https://issues.apache.org/jira/browse/PARQUET-92) - Parallel Footer Read Control
102+
* [PARQUET-105](https://issues.apache.org/jira/browse/PARQUET-105) - Refactor and Document Parquet Tools
103+
* [PARQUET-108](https://issues.apache.org/jira/browse/PARQUET-108) - Parquet Memory Management in Java
104+
* [PARQUET-115](https://issues.apache.org/jira/browse/PARQUET-115) - Pass a filter object to user defined predicate in filter2 api
105+
* [PARQUET-116](https://issues.apache.org/jira/browse/PARQUET-116) - Pass a filter object to user defined predicate in filter2 api
106+
* [PARQUET-117](https://issues.apache.org/jira/browse/PARQUET-117) - implement the new page format for Parquet 2.0
107+
* [PARQUET-119](https://issues.apache.org/jira/browse/PARQUET-119) - add data\_encodings to ColumnMetaData to enable dictionary based predicate push down
108+
* [PARQUET-121](https://issues.apache.org/jira/browse/PARQUET-121) - Allow Parquet to build with Java 8
109+
* [PARQUET-128](https://issues.apache.org/jira/browse/PARQUET-128) - Optimize the parquet RecordReader implementation when: A. filterpredicate is pushed down , B. filterpredicate is pushed down on a flat schema
110+
* [PARQUET-133](https://issues.apache.org/jira/browse/PARQUET-133) - Upgrade snappy-java to 1.1.1.6
111+
* [PARQUET-134](https://issues.apache.org/jira/browse/PARQUET-134) - Enhance ParquetWriter with file creation flag
112+
* [PARQUET-140](https://issues.apache.org/jira/browse/PARQUET-140) - Allow clients to control the GenericData object that is used to read Avro records
113+
* [PARQUET-141](https://issues.apache.org/jira/browse/PARQUET-141) - improve parquet scrooge integration
114+
* [PARQUET-160](https://issues.apache.org/jira/browse/PARQUET-160) - Simplify CapacityByteArrayOutputStream
115+
* [PARQUET-165](https://issues.apache.org/jira/browse/PARQUET-165) - A benchmark module for Parquet would be nice
116+
* [PARQUET-177](https://issues.apache.org/jira/browse/PARQUET-177) - MemoryManager ensure minimum Column Chunk size
117+
* [PARQUET-181](https://issues.apache.org/jira/browse/PARQUET-181) - Scrooge Write Support
118+
* [PARQUET-191](https://issues.apache.org/jira/browse/PARQUET-191) - Avro schema conversion incorrectly converts maps with nullable values.
119+
* [PARQUET-192](https://issues.apache.org/jira/browse/PARQUET-192) - Avro maps drop null values
120+
* [PARQUET-193](https://issues.apache.org/jira/browse/PARQUET-193) - Avro: Implement read compatibility rules for nested types
121+
* [PARQUET-203](https://issues.apache.org/jira/browse/PARQUET-203) - Consolidate PathFilter for hidden files
122+
* [PARQUET-204](https://issues.apache.org/jira/browse/PARQUET-204) - Directory support for parquet-schema
123+
* [PARQUET-210](https://issues.apache.org/jira/browse/PARQUET-210) - JSON output for parquet-cat
124+
125+
#### New Feature
126+
127+
* [PARQUET-22](https://issues.apache.org/jira/browse/PARQUET-22) - Parquet #13: Backport of HIVE-6938
128+
* [PARQUET-49](https://issues.apache.org/jira/browse/PARQUET-49) - Create a new filter API that supports filtering groups of records based on their statistics
129+
* [PARQUET-64](https://issues.apache.org/jira/browse/PARQUET-64) - Add new logical types to parquet-column
130+
* [PARQUET-123](https://issues.apache.org/jira/browse/PARQUET-123) - Add dictionary support to AvroIndexedRecordReader
131+
* [PARQUET-198](https://issues.apache.org/jira/browse/PARQUET-198) - parquet-cascading Add Parquet Avro Scheme
132+
133+
#### Task
134+
135+
* [PARQUET-50](https://issues.apache.org/jira/browse/PARQUET-50) - Remove items from semver blacklist
136+
* [PARQUET-139](https://issues.apache.org/jira/browse/PARQUET-139) - Avoid reading file footers in parquet-avro InputFormat
137+
* [PARQUET-190](https://issues.apache.org/jira/browse/PARQUET-190) - Fix an inconsistent Javadoc comment of ReadSupport.prepareForRead
138+
* [PARQUET-230](https://issues.apache.org/jira/browse/PARQUET-230) - Add build instructions to the README
139+
22140
### Version 1.5.0 ###
23141
* ISSUE [399](https://github.com/Parquet/parquet-mr/pull/399): Fixed resetting stats after writePage bug, unit testing of readFooter
24142
* ISSUE [397](https://github.com/Parquet/parquet-mr/pull/397): Fixed issue with column pruning when using requested schema

CONTRIBUTING.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@
1818
-->
1919

2020
# Contributing
21-
We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [github.com/apache/incubator-parquet-mr](https://github.com/apache/incubator-parquet-mr) repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to https://github.com/apache/incubator-parquet-mr.git
21+
We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [github.com/apache/parquet-mr](https://github.com/apache/parquet-mr) repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to https://github.com/apache/parquet-mr.git
2222
Here are a few tips to get your contribution in:
2323

2424
1. Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features.
2525
2. Create a JIRA for your patch on the [Parquet Project JIRA](https://issues.apache.org/jira/browse/PARQUET).
26-
3. Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the JIRA name (ex: https://github.com/apache/incubator-parquet-mr/pull/5).
26+
3. Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the JIRA name (ex: https://github.com/apache/parquet-mr/pull/117).
2727
4. Make sure that your code passes the unit tests. You can run the tests with `mvn test` in the root directory.
2828
5. Add new unit tests for your code.
2929

30-
If you’d like to report a bug but don’t have time to fix it, you can still post it to our [issue tracker](https://issues.apache.org/jira/browse/PARQUET), or email the mailing list (dev@parquet.incubator.apache.org).
30+
If you’d like to report a bug but don’t have time to fix it, you can still post it to our [issue tracker](https://issues.apache.org/jira/browse/PARQUET), or email the mailing list (dev@parquet.apache.org).

0 commit comments

Comments
 (0)