org.apache.avro.SchemaParseException: Can't redefine: list #2058

heuermh · 2018-09-30T02:54:39Z

...
Cause: org.apache.avro.SchemaParseException: Can't redefine: list
  at org.apache.avro.Schema$Names.put(Schema.java:1128)
  at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
  at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690)
  at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:805)
  at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882)
  at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716)
  at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)
  at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882)
  at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716)
  at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)
  at org.apache.avro.Schema.toString(Schema.java:324)
  at org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility(SchemaCompatibility.java:68)
  at org.apache.parquet.avro.AvroRecordConverter.isElementType(AvroRecordConverter.java:866)
  at org.apache.parquet.avro.AvroIndexedRecordConverter$AvroArrayConverter.<init>(AvroIndexedRecordConverter.java:333)
  at org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:172)
  at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:94)
  at org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:168)
  at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:94)
  at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:66)
  at org.apache.parquet.avro.AvroCompatRecordMaterializer.<init>(AvroCompatRecordMaterializer.java:34)
  at org.apache.parquet.avro.AvroReadSupport.newCompatMaterializer(AvroReadSupport.java:144)
  at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:136)
  at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204)
  at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
  at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)

$ parquet-tools schema variants-from-dataset.adam/part-00000-20884749-ccd3-4a54-8700-104ac2f709d1-c000.snappy.parquet
message spark_schema {
  optional binary contigName (UTF8);
  optional int64 start;
  optional int64 end;
  optional group names (LIST) {
    repeated group list {
      optional binary element (UTF8);
    }
  }
  optional boolean splitFromMultiAllelic;
  optional binary referenceAllele (UTF8);
  optional binary alternateAllele (UTF8);
  optional double quality;
  optional boolean filtersApplied;
  optional boolean filtersPassed;
  optional group filtersFailed (LIST) {
    repeated group list {
      optional binary element (UTF8);
    }
  }
  optional group annotation {
    optional binary ancestralAllele (UTF8);
    optional int32 alleleCount;
    optional int32 readDepth;
    optional int32 forwardReadDepth;
    optional int32 reverseReadDepth;
    optional int32 referenceReadDepth;
    optional int32 referenceForwardReadDepth;
    optional int32 referenceReverseReadDepth;
    optional float alleleFrequency;
    optional binary cigar (UTF8);
    optional boolean dbSnp;
    optional boolean hapMap2;
    optional boolean hapMap3;
    optional boolean validated;
    optional boolean thousandGenomes;
    optional boolean somatic;
    optional group transcriptEffects (LIST) {
      repeated group list {
        optional group element {
          optional binary alternateAllele (UTF8);
          optional group effects (LIST) {
            repeated group list {
              optional binary element (UTF8);
            }
          }
          optional binary geneName (UTF8);
          optional binary geneId (UTF8);
          optional binary featureType (UTF8);
          optional binary featureId (UTF8);
          optional binary biotype (UTF8);
          optional int32 rank;
          optional int32 total;
          optional binary genomicHgvs (UTF8);
          optional binary transcriptHgvs (UTF8);
          optional binary proteinHgvs (UTF8);
          optional int32 cdnaPosition;
          optional int32 cdnaLength;
          optional int32 cdsPosition;
          optional int32 cdsLength;
          optional int32 proteinPosition;
          optional int32 proteinLength;
          optional int32 distance;
          optional group messages (LIST) {
            repeated group list {
              optional binary element (UTF8);
            }
          }
        }
      }
    }
    optional group attributes (MAP) {
      repeated group key_value {
        required binary key (UTF8);
        optional binary value (UTF8);
      }
    }
  }
}

$ parquet-tools schema variants-from-rdd.adam/part-r-00000.gz.parquet
message org.bdgenomics.formats.avro.Variant {
  optional binary contigName (UTF8);
  optional int64 start;
  optional int64 end;
  required group names (LIST) {
    repeated binary array (UTF8);
  }
  optional boolean splitFromMultiAllelic;
  optional binary referenceAllele (UTF8);
  optional binary alternateAllele (UTF8);
  optional double quality;
  optional boolean filtersApplied;
  optional boolean filtersPassed;
  required group filtersFailed (LIST) {
    repeated binary array (UTF8);
  }
  optional group annotation {
    optional binary ancestralAllele (UTF8);
    optional int32 alleleCount;
    optional int32 readDepth;
    optional int32 forwardReadDepth;
    optional int32 reverseReadDepth;
    optional int32 referenceReadDepth;
    optional int32 referenceForwardReadDepth;
    optional int32 referenceReverseReadDepth;
    optional float alleleFrequency;
    optional binary cigar (UTF8);
    optional boolean dbSnp;
    optional boolean hapMap2;
    optional boolean hapMap3;
    optional boolean validated;
    optional boolean thousandGenomes;
    optional boolean somatic;
    required group transcriptEffects (LIST) {
      repeated group array {
        optional binary alternateAllele (UTF8);
        required group effects (LIST) {
          repeated binary array (UTF8);
        }
        optional binary geneName (UTF8);
        optional binary geneId (UTF8);
        optional binary featureType (UTF8);
        optional binary featureId (UTF8);
        optional binary biotype (UTF8);
        optional int32 rank;
        optional int32 total;
        optional binary genomicHgvs (UTF8);
        optional binary transcriptHgvs (UTF8);
        optional binary proteinHgvs (UTF8);
        optional int32 cdnaPosition;
        optional int32 cdnaLength;
        optional int32 cdsPosition;
        optional int32 cdsLength;
        optional int32 proteinPosition;
        optional int32 proteinLength;
        optional int32 distance;
        required group messages (LIST) {
          repeated binary array (ENUM);
        }
      }
    }
    required group attributes (MAP) {
      repeated group map (MAP_KEY_VALUE) {
        required binary key (UTF8);
        required binary value (UTF8);
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

heuermh mentioned this issue Sep 30, 2018

[ADAM-2044] Update Spark version to 2.4.3, add move to Scala 2.12 script #2056

Merged

heuermh mentioned this issue Jan 13, 2019

Always use Spark SQL in GenomicDataset read path #2114

Closed

heuermh added this to the 0.26.0 milestone Jan 14, 2019

heuermh modified the milestones: 0.26.0, 0.27.0 Feb 18, 2019

heuermh mentioned this issue Mar 10, 2019

PARQUET-1441: SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter apache/parquet-java#560

Merged

heuermh mentioned this issue Mar 21, 2019

[UTILS-133] Update Spark dependency to 2.4.2, Parquet to 1.10.1 bigdatagenomics/utils#136

Merged

heuermh closed this as completed in #2056 May 11, 2019

asfimport mentioned this issue Jun 23, 2024

SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter apache/parquet-java#2239

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

org.apache.avro.SchemaParseException: Can't redefine: list #2058

org.apache.avro.SchemaParseException: Can't redefine: list #2058

heuermh commented Sep 30, 2018

org.apache.avro.SchemaParseException: Can't redefine: list #2058

org.apache.avro.SchemaParseException: Can't redefine: list #2058

Comments

heuermh commented Sep 30, 2018