Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Parquet and Orc sources support path expressions on the schema projection? #232

Open
hannesmiller opened this issue Jan 25, 2017 · 0 comments
Milestone

Comments

@hannesmiller
Copy link
Contributor

hannesmiller commented Jan 25, 2017

  • I can successfully write out rows with structs like so with Parquet:
    val personDetailsStruct = Field.createStructField("PERSON_DETAILS",
      Seq(
        Field("NAME", StringType),
        Field("AGE", IntType.Signed),
        Field("SALARY", DecimalType(Precision(38), Scale(5))),
        Field("CREATION_TIME", TimestampMillisType)
      )
    )
    val schema = StructType(personDetailsStruct)

    val rows = Vector(
      Vector(Vector("Fred", 50, BigDecimal("50000.99000"), new Timestamp(System.currentTimeMillis()))),
      Vector(Vector("Gary", 50, BigDecimal("20000.34000"), new Timestamp(System.currentTimeMillis()))),
      Vector(Vector("Alice", 50, BigDecimal("99999.98000"), new Timestamp(System.currentTimeMillis())))
    )
    Frame.fromValues(schema, rows)
      .to(ParquetSink(parquetFilePath))
  • But fails to read it back using the path expression PERSON_DETAILS.NAME:
    ParquetSource(parquetFilePath)
      .withProjection("PERSON_DETAILS.NAME")
      .toFrame()
      .collect()
      .foreach(row => println(row))
  • Note path expressions work with withPredicate
    ParquetSource(parquetFilePath)
      .withPredicate(Predicate.or(Predicate.equals("PERSON_DETAILS.NAME", "Alice"), Predicate.equals("PERSON_DETAILS.NAME", "Gary")))
      .toFrame()
      .collect()
      .foreach(row => println(row))
  • Note a Hive table pointing at this file works using the following HiveQL syntax:
hive> select person_details.name, person_details.age
    > from struct_person
    > where person_details.name in ('Alice', 'Gary' );
OK
Gary    50
Alice   50
Time taken: 0.067 seconds, Fetched: 2 row(s)
hive>
@sksamuel sksamuel added this to the 1.4 milestone Jul 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants