Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need an example of creating DDL for a Hive Parquet table with EEL #261

Open
hannesmiller opened this issue Mar 1, 2017 · 0 comments
Open
Assignees
Milestone

Comments

@hannesmiller
Copy link
Contributor

hannesmiller commented Mar 1, 2017

  • CSVSource to HiveSink
val schema = AvroSchemaFns.fromAvroSchema(new Schema.Parser().parse(new File("user.avsc")))
CsvSource(path)
  .withSchema(schema)
  .to(HiveSink("mydatabase", "myTable"))
  • Table field: fname, lname, age, salary
  • 2 partition keys of country and city
object EelCreateTableExample extends App {
  val crateTableCommand = HiveDDL.showDDL(
    tableName = "mydatabase.mytable",
    partitions = Seq(
      PartitionColumn("country", StringType),
      PartitionColumn("city", StringType)
    ),
    fields = Seq(
      Field("fname", StringType),
      Field("lname", StringType),
      Field("age", IntType.Signed),
      Field("salary", DecimalType(38, 5))
    ),
    tableType = TableType.EXTERNAL_TABLE,
    location = Some("hdfs://nameservice1/blah/mytable_location"),
    serde = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
    inputFormat = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
    outputFormat = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
    props = Map.empty,
    tableComment = Some("my lovely table"),
    ifNotExists = true
  )
  println(crateTableCommand)
}
  • Ouput:
CREATE EXTERNAL TABLE IF NOT EXISTS `mydatabase.mytable` (
   `fname` string,
   `lname` string,
   `age` int,
   `salary` decimal(38,5))
PARTITIONED BY (
   `country` string,
   `city` string)
ROW FORMAT SERDE
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/blah/mytable_location'
@garyfrost garyfrost added this to the 1.4 milestone Feb 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants