-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Milestone
Description
The AvroParquetInputFormat currently extends ParquetInputFormat<IndexedRecord>, which works for regular MR cases. But Spark's hadoopRDD and [newAPIHadoopRDD](https://people.apache.org/~pwendell/spark-1.1.0-rc3-docs/api/java/org/apache/spark/SparkContext.html#newAPIHadoopRDD(org.apache.hadoop.conf.Configuration, java.lang.Class, java.lang.Class, java.lang.Class)) methods (correctly) create a RDD with the types from the InputFormat. This means that the RDD always uses IndexedRecord rather than the correct type.
The AvroParquetInputFormat should be AvroParquetInputFormat<T extends IndexedRecord> extends ParquetInputFormat<T>
Note: This issue was originally created as PARQUET-132. Please see the migration documentation for further details.