Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
26ccfdb
PARQUET-423: Make writing Avro to Parquet less noisy
nielsbasjes Jan 12, 2016
f65ae99
PARQUET-423: Fix various missing bits
nielsbasjes Jan 12, 2016
d206a6f
Undo most of the experiment
nielsbasjes Jan 15, 2016
225bc33
Implement a few of the the isXxxxEnabled methods in the Log class and…
nielsbasjes Jan 15, 2016
73da74a
PARQUET-212: Implement LIST read compatibility rules in Thrift
rdblue Jan 12, 2016
2aa830c
PARQUET-421: Fix mismatch of javadoc names and method parameters in m...
lw-lin Jan 12, 2016
96c862e
PARQUET-422: Fix a potential bug in MessageTypeParser where we ignore…
lw-lin Jan 12, 2016
fd9b49b
PARQUET-393: Update to parquet-format 2.3.1.
rdblue Jan 29, 2016
84facee
PARQUET-432: Complete a todo for method ColumnDescriptor.compareTo()
lw-lin Jan 29, 2016
0c054b0
PARQUET-480: Update for Cascading 3.0
Feb 1, 2016
e99b852
PARQUET-495: Fix mismatches in Types class comments
lw-lin Feb 1, 2016
6439fdd
PARQUET-410: Fix hanging subprocess call in merge script.
rdblue Feb 3, 2016
4a3a606
PARQUET-415: Fix ByteBuffer Binary serialization.
rdblue Feb 3, 2016
78f8224
PARQUET-509: Fix args passed to string format calls
nezihyigitbasi Feb 6, 2016
966153b
PARQUET-385 PARQUET-379: Fixes strict schema merging
liancheng Feb 6, 2016
811fbc4
PARQUET-430: Change to use Locale parameterized version of String.toU…
lw-lin Feb 16, 2016
177ec56
PARQUET-431: Make ParquetOutputFormat.memoryManager volatile
lw-lin Feb 16, 2016
b1b9f84
PARQUET-529: Avoid evoking job.toString() in ParquetLoader
lw-lin Feb 22, 2016
7f737f4
PARQUET-397: Implement Pig predicate pushdown
Feb 26, 2016
558027f
PARQUET-528: Fix flush() for RecordConsumer and implementations
lw-lin Mar 5, 2016
c49c8aa
PARQUET-384: Add dictionary filtering.
rdblue Mar 9, 2016
f109237
PARQUET-571: Fix potential leak in ParquetFileReader.close()
nezihyigitbasi Mar 25, 2016
970ab8d
PARQUET-581: Fix two instances of the conflation of the min and max row
Apr 17, 2016
521a179
PARQUET-580: Switch int[] initialization in IntList to be lazy
Apr 17, 2016
c56ec5a
PARQUET-584 show proper command usage when there's no arguments
Apr 19, 2016
6a17899
PARQUET-484: Warn when Decimal is stored as INT64 while could be stor…
lw-lin Apr 19, 2016
5779f42
PARQUET-358: Add support for Avro's logical types API.
rdblue Apr 20, 2016
a3670b1
PARQUET-585: Slowly ramp up sizes of int[]s in IntList to keep sizes …
Apr 21, 2016
0fee134
PARQUET-327. Show statistics in the dump output.
tomwhite Jul 7, 2015
8a8d8ee
PARQUET-225: Add support for INT64 delta encoding.
Mar 25, 2015
7e68058
PARQUET-548: Add EncodingStats.
rdblue Apr 23, 2016
ff8b7b5
PARQUET-569: Separate metadata filtering for ranges and offsets.
rdblue Apr 23, 2016
56aff64
PARQUET-560: Synchronize writes to the finishCalled variable
nezihyigitbasi Apr 25, 2016
fb3f250
PARQUET-372: Do not write stats larger than 4k.
rdblue May 5, 2016
f8cdfac
PARQUET-367: "parquet-cat -j" doesn't show all records.
sircodesalotOfTheRound May 5, 2016
2528d0c
PARQUET-544: Add closed flag to allow for closeable contract adherence
mred-cmd Jun 30, 2016
935512c
PARQUET-645: Fix null handling in DictionaryFilter.
rdblue Jun 30, 2016
c18cc10
PARQUET-642: Improve performance of ByteBuffer based read / write paths
Jun 30, 2016
87e658f
PARQUET-612: Add compression codec to FileEncodingsIT.
rdblue Jun 30, 2016
c9e6a67
PARQUET-654: Add option to disable record-level filtering.
rdblue Jul 13, 2016
687e7de
PARQUET-663: Update README.md
nihed Jul 15, 2016
8bc1849
PARQUET-389: Support predicate push down on missing columns.
rdblue Jul 15, 2016
1b4be63
PARQUET-540: Fix Cascading 3 build thrift and SLF4J.
rdblue Jul 15, 2016
cafea9a
PARQUET-651: Improve Avro's isElementType check.
rdblue Jul 17, 2016
ecce6ed
PARQUET-543: Remove unused boundedint package.
rdblue Jul 17, 2016
2fae7eb
PARQUET-667: Update committers lists to point to apache website
isnotinvain Jul 27, 2016
e1838bb
PARQUET-511: Integer overflow when counting values in column.
goreckm Aug 1, 2016
30a431e
PARQUET-668 - Provide option to disable auto crop feature in dump
djhworld Aug 3, 2016
db6296a
PARQUET-669: allow reading footers from provided file listing and str…
Aug 3, 2016
93ab78e
PARQUET-667: Add back + update committers table
isnotinvain Aug 5, 2016
f360b38
PARQUET-601: Add support to configure the encoding used by ValueWriters
Aug 11, 2016
f61432f
PARQUET-146: Move Parquet to Java 7
nezihyigitbasi Aug 15, 2016
a7b7d12
PARQUET-400: Replace CompatibilityUtil with SeekableInputStream.
rdblue Aug 16, 2016
f8aa5b1
PARQUET-460: merge multi parquet files to one file
flykobe Aug 16, 2016
b1ddae9
PARQUET-696: fix travis build. Broken because google code shut down
julienledem Aug 29, 2016
8ca7a4b
PARQUET-623: Fix DeltaByteArrayReader#skip.
rdblue Sep 8, 2016
ec69100
PARQUET-660: Ignore extension fields in protobuf messages.
jkukul Sep 8, 2016
3a44101
PARQUET-423: Replace old Log class with SLF4J Logging
nielsbasjes Sep 23, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,6 @@ dependency-reduced-pom.xml
parquet-scrooge/.cache
.idea/*
target/
.cache
*~
mvn_install.log
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ before_install:
- sudo apt-get install build-essential
- mkdir protobuf_install
- pushd protobuf_install
- wget http://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
- wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
- tar xzf protobuf-2.5.0.tar.gz
- cd protobuf-2.5.0
- ./configure
Expand All @@ -24,8 +24,8 @@ before_install:
- cd ..

env:
- HADOOP_PROFILE=default
- HADOOP_PROFILE=hadoop-2
- HADOOP_PROFILE=hadoop-1 TEST_CODECS=uncompressed
- HADOOP_PROFILE=default TEST_CODECS=gzip,snappy

install: mvn install --batch-mode -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true > mvn_install.log || mvn install --batch-mode -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true > mvn_install.log || (cat mvn_install.log && false)
script: mvn test -P $HADOOP_PROFILE
8 changes: 8 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,14 @@

--------------------------------------------------------------------------------

This product includes code from Apache Avro.

Copyright: 2014 The Apache Software Foundation.
Home page: https://avro.apache.org/
License: http://www.apache.org/licenses/LICENSE-2.0

--------------------------------------------------------------------------------

This project includes code from Daniel Lemire's JavaFastPFOR project. The
"Lemire" bit packing source code produced by parquet-generator is derived from
the JavaFastPFOR project.
Expand Down
11 changes: 11 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,14 @@ with the following copyright notice:
See the License for the specific language governing permissions and
limitations under the License.

--------------------------------------------------------------------------------

This product includes code from Apache Avro, which includes the following in
its NOTICE file:

Apache Avro
Copyright 2010-2015 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

20 changes: 6 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Parquet-MR uses Maven to build and depends on both the thrift and protoc compile
To build and install the protobuf compiler, run:

```
wget http://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar xzf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./configure
Expand All @@ -62,7 +62,7 @@ sudo make install
Once protobuf and thrift are available in your path, you can build the project by running:

```
mvn clean install
LC_ALL=C mvn clean install
```

## Features
Expand Down Expand Up @@ -111,8 +111,8 @@ Avro conversion is implemented via the [parquet-avro](https://github.com/apache/
* the ParquetInputFormat can be provided a ReadSupport to materialize your own objects by implementing a RecordMaterializer

See the APIs:
* [Record conversion API](https://github.com/apache/parquet-mr/tree/master/parquet-column/src/main/java/parquet/io/api)
* [Hadoop API](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop/src/main/java/parquet/hadoop/api)
* [Record conversion API](https://github.com/apache/parquet-mr/tree/master/parquet-column/src/main/java/org/apache/parquet/io/api)
* [Hadoop API](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/api)

## Apache Pig integration
A [Loader](https://github.com/apache/parquet-mr/blob/master/parquet-pig/src/main/java/org/apache/parquet/pig/ParquetLoader.java) and a [Storer](https://github.com/apache/parquet-mr/blob/master/parquet-pig/src/main/java/org/apache/parquet/pig/ParquetStorer.java) are provided to read and write Parquet files with Apache Pig
Expand Down Expand Up @@ -202,16 +202,8 @@ Thank you for getting involved!

## Authors and contributors

* Julien Le Dem [@J_](http://twitter.com/J_) <https://github.com/julienledem>
* Tom White <https://github.com/tomwhite>
* Mickaël Lacour <https://github.com/mickaellcr>
* Remy Pecqueur <https://github.com/Lordshinjo>
* Avi Bryant <https://github.com/avibryant>
* Dmitriy Ryaboy [@squarecog](https://twitter.com/squarecog) <https://github.com/dvryaboy>
* Jonathan Coveney <http://twitter.com/jco>
* Brock Noland <https://github.com/brockn>
* Tianshuo Deng <https://github.com/tsdeng>
* and many others -- see the [Contributor report]( https://github.com/apache/parquet-mr/contributors)
* [Contributors](https://github.com/apache/parquet-mr/graphs/contributors)
* [Committers](dev/COMMITTERS.md)

## Code of Conduct

Expand Down
55 changes: 33 additions & 22 deletions dev/COMMITTERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,30 +17,41 @@
~ under the License.
-->

# Committers (in aplhabetical order):
# Committers (in alphabetical order):

| Name | Apache Id | github id | JIRA id |
|--------------------|------------|----------------|-------------|
| Aniket Mokashi | aniket486 | aniket486 | |
| Brock Noland | brock | brockn | |
| Cheng Lian | lian | liancheng | lian cheng |
| Chris Aniszczyk | caniszczyk | | |
| Dmitriy Ryaboy | dvryaboy | dvryaboy | |
| Jake Farrell | jfarrell | | |
| Jonathan Coveney | jcoveney | jcoveney | |
| Julien Le Dem | julien | julienledem | julienledem |
| Lukas Nalezenec | lukas | lukasnalezenec | |
| Marcel Kornacker | marcel | | |
| Mickael Lacour | mlacour | mickaellcr | |
| Nong Li | nong | nongli | |
| Remy Pecqueur | rpecqueur | Lordshinjo | |
| Ryan Blue | blue | rdblue | |
| Sergio Pena | spena | spena | spena |
| Tianshuo Deng | tianshuo | tsdeng | |
| Tom White | tomwhite | tomwhite | |
| Wesley Graham Peck | wesleypeck | wesleypeck | |
The official list of committers can be found here: [Apache Parquet Committers and PMC](http://people.apache.org/committers-by-project.html#parquet)

Reviewing guidelines:
Below is more information about each committer (in alphabetical order). If this information becomes out of date, please send a PR to update!

| Name | Apache Id | github id | JIRA id |
|------------------------|-----------------|---------------------|----------------|
| Alex Levenson | alexlevenson | @isnotinvain | alexlevenson |
| Aniket Mokashi | aniket486 | @aniket486 | |
| Brock Noland | brock | @brockn | |
| Cheng Lian | lian | @liancheng | liancheng |
| Chris Aniszczyk | caniszczyk | @caniszczyk | |
| Chris Mattmann | mattmann | @chrismattmann | |
| Daniel C. Weeks | dweeks | @danielcweeks | |
| Dmitriy Ryaboy | dvryaboy | @dvryaboy | |
| Jake Farrell | jfarrell | | |
| Jonathan Coveney | jcoveney | @jcoveney | |
| Julien Le Dem | julien | @julienledem | julienledem |
| Lukas Nalezenec | lukas | @lukasnalezenec | |
| Marcel Kornacker | marcel | @mkornacker | |
| Mickael Lacour | mlacour | @mickaellcr | |
| Nong Li | nong | @nongli | |
| Remy Pecqueur | rpecqueur | @Lordshinjo | |
| Roman Shaposhnik | rvs | @rvs | |
| Ryan Blue | blue | @rdblue | |
| Sergio Pena | spena | @spena | spena |
| Tianshuo Deng | tianshuo | @tsdeng | |
| Todd Lipcon | todd | @toddlipcon | |
| Tom White | tomwhite | @tomwhite | |
| Wes McKinney | wesm | @wesm | |
| Wesley Graham Peck | wesleypeck | @wesleypeck | |


# Reviewing guidelines:
Committers have the responsibility to give constructive and timely feedback on the pull requests.
Anybody can give feedback on a pull request but only committers can merge it.

Expand Down
4 changes: 2 additions & 2 deletions dev/merge_parquet_pr.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,9 @@ def fail(msg):
def run_cmd(cmd):
try:
if isinstance(cmd, list):
return subprocess.check_output(cmd, stderr=subprocess.STDOUT)
return subprocess.check_output(cmd)
else:
return subprocess.check_output(cmd.split(" "), stderr = subprocess.STDOUT)
return subprocess.check_output(cmd.split(" "))
except subprocess.CalledProcessError as e:
# this avoids hiding the stdout / stderr of failed processes
print 'Command failed: %s' % cmd
Expand Down
13 changes: 8 additions & 5 deletions parquet-avro/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,6 @@
<name>Apache Parquet Avro</name>
<url>https://parquet.apache.org</url>

<properties>
<avro.version>1.7.6</avro.version>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.parquet</groupId>
Expand Down Expand Up @@ -71,7 +67,7 @@
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>11.0</version>
<version>${guava.version}</version>
<scope>test</scope>
</dependency>
<dependency>
Expand All @@ -87,6 +83,13 @@
<version>${slf4j.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
</dependencies>

<build>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
import java.lang.reflect.Constructor;
import java.util.HashMap;
import java.util.Map;
import org.apache.avro.Conversion;
import org.apache.avro.LogicalType;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericArray;
import org.apache.avro.generic.GenericData;
Expand Down Expand Up @@ -111,6 +113,11 @@ public void add(Object value) {

@SuppressWarnings("unchecked")
private static <T> Class<T> getDatumClass(GenericData model, Schema schema) {
if (model.getConversionFor(schema.getLogicalType()) != null) {
// use generic classes to pass data to conversions
return null;
}

if (model instanceof SpecificData) {
return (Class<T>) ((SpecificData) model).getClass(schema);
}
Expand All @@ -133,7 +140,16 @@ private Schema.Field getAvroField(String parquetFieldName) {
}

private static Converter newConverter(Schema schema, Type type,
GenericData model, ParentValueContainer parent) {
GenericData model, ParentValueContainer setter) {

LogicalType logicalType = schema.getLogicalType();
// the expected type is always null because it is determined by the parent
// datum class, which never helps for generic. when logical types are added
// to specific, this should pass the expected type here.
Conversion<?> conversion = model.getConversionFor(logicalType);
ParentValueContainer parent = ParentValueContainer
.getConversionContainer(setter, conversion, schema);

if (schema.getType().equals(Schema.Type.BOOLEAN)) {
return new AvroConverters.FieldBooleanConverter(parent);
} else if (schema.getType().equals(Schema.Type.INT)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,9 @@ public RecordMaterializer<T> prepareForRead(
MessageType parquetSchema = readContext.getRequestedSchema();
Schema avroSchema;

if (readContext.getReadSupportMetadata().get(AVRO_READ_SCHEMA_METADATA_KEY) != null) {
if (metadata.get(AVRO_READ_SCHEMA_METADATA_KEY) != null) {
// use the Avro read schema provided by the user
avroSchema = new Schema.Parser().parse(readContext.getReadSupportMetadata().get(AVRO_READ_SCHEMA_METADATA_KEY));
avroSchema = new Schema.Parser().parse(metadata.get(AVRO_READ_SCHEMA_METADATA_KEY));
} else if (keyValueMetaData.get(AVRO_SCHEMA_METADATA_KEY) != null) {
// use the Avro schema from the file metadata if present
avroSchema = new Schema.Parser().parse(keyValueMetaData.get(AVRO_SCHEMA_METADATA_KEY));
Expand Down
Loading