Skip to content

DaniArguelles/ETL-Process

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL-Process

In this project I will be performing a process in which I will extract information, clean the data and then save that information in a SQL Server database.

1.-First we will be doing web scraping to a page to extract important data that can help us to perform a small analysis at the end of this work.for this I will realize 2 scripts made in python with the help of the Spider library for a better web scraping.

2.-I will clean the data so that they can be stored in a better way in our database and for this we will use the pyspark sql functions and we will clean the outputs in the best way to generate the database as shown in the following picture

Hoteles

3.- As a final step we will store the collected data in a SQL Server database for this we will need to create it in SQL Server otherwise pyspark will create the tables in its own way.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published