spark-hive-recursion

This project is intended to create a recursive data set, taking in a hive select query with child and parent values, writing a flattened data set with 4 columns: child, parent, level, dp_proc_time, in the format specified.

Build the package with sbt-clean-assembly
Execute with spark package with spark-submit

sh //bin/spark-submit --class org.kaveh_hariri.utility.spark.hive_recursion.MainRun --master --conf --conf "SELECT child, parent FROM <hiveschema.hivetable>" "s3a:///" format (orc,parquet,etc)

This is a remake of this udf using spark -- the original udf did not function properly due to the distributed nature of these frameworks. This project works correctly because a distinct map of the child/parent values is distributed to each node using a broadcast variable. https://blog.pythian.com/recursion-in-hive/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
project		project
src/main/scala/org/kaveh_hariri/utility/spark/hive_recursion		src/main/scala/org/kaveh_hariri/utility/spark/hive_recursion
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-hive-recursion

About

Releases

Packages

Languages

kaveh-hariri/spark-hive-recursion

Folders and files

Latest commit

History

Repository files navigation

spark-hive-recursion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages