Web Scraping

In this project, we use Web Scraping to extract content from an old html file and migrate it to a new format. The libraries used are:

Beautiful Soup - for parsing HTML data
Pandas - for formatting the data into the required format
urllib - for reading the url from a local path

The 'WebScraping.ipynb' file introduces migration while 'WebScraping2.ipynb' shows how to modify the existing html content in a given file. The original file is which has been used for extraction is "Research Higher Degree Students_ Chemical and Biomolecular Engineering, The University of Melbourne.html" and its associated folder.

The html file and the csv created using the script are 'rhd-new.html' and 'rhd.csv' respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Research Higher Degree Students_ Chemical and Biomolecular Engineering, The University of Melbourne_files		Research Higher Degree Students_ Chemical and Biomolecular Engineering, The University of Melbourne_files
README.md		README.md
Research Higher Degree Students_ Chemical and Biomolecular Engineering, The University of Melbourne.html		Research Higher Degree Students_ Chemical and Biomolecular Engineering, The University of Melbourne.html
Web Scraping.ipynb		Web Scraping.ipynb
rhd-new.html		rhd-new.html
rhd.csv		rhd.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping

About

Releases

Packages

Languages

siftnoorsingh/WebScraping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages