ETL-Process

In this project I will be performing a process in which I will extract information, clean the data and then save that information in a SQL Server database.

1.-First we will be doing web scraping to a page to extract important data that can help us to perform a small analysis at the end of this work.for this I will realize 2 scripts made in python with the help of the Spider library for a better web scraping.

2.-I will clean the data so that they can be stored in a better way in our database and for this we will use the pyspark sql functions and we will clean the outputs in the best way to generate the database as shown in the following picture

3.- As a final step we will store the collected data in a SQL Server database for this we will need to create it in SQL Server otherwise pyspark will create the tables in its own way.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
1_Python_Web_Scraping		1_Python_Web_Scraping
2_Python_Outputs		2_Python_Outputs
3_SQL_Server_Script		3_SQL_Server_Script
4_Python_Cleaning		4_Python_Cleaning
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL-Process

About

Releases

Packages

Languages

DaniArguelles/ETL-Process

Folders and files

Latest commit

History

Repository files navigation

ETL-Process

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages