Data-Engineering-Challenge

In this challenge, I solved a data engineering problem. Mainly, it involved doing complex feature engineering on a big data set. I used the following stack to solve the given problem:

Pypspark 2.4 (for doing all feature engineering)
Docker (for launching spark cluster in a local mode)

The notebook above is a good reference point if you want to see how pandas + pyspark is used together to create complex features.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
01-Solution.ipynb		01-Solution.ipynb
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Engineering-Challenge

About

Releases

Packages

Languages

saraswatmks/Data-Engineering-Challenge

Folders and files

Latest commit

History

Repository files navigation

Data-Engineering-Challenge

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages