Skip to content

saraswatmks/Data-Engineering-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Data-Engineering-Challenge

In this challenge, I solved a data engineering problem. Mainly, it involved doing complex feature engineering on a big data set. I used the following stack to solve the given problem:

  • Pypspark 2.4 (for doing all feature engineering)
  • Docker (for launching spark cluster in a local mode)

The notebook above is a good reference point if you want to see how pandas + pyspark is used together to create complex features.

About

In this challenge, I solved a data engineering problem.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published