Skip to content

Web scraping content using Beautiful Soup and Pandas

Notifications You must be signed in to change notification settings

siftnoorsingh/WebScraping

Repository files navigation

Web Scraping

In this project, we use Web Scraping to extract content from an old html file and migrate it to a new format. The libraries used are:

  • Beautiful Soup - for parsing HTML data
  • Pandas - for formatting the data into the required format
  • urllib - for reading the url from a local path

The 'WebScraping.ipynb' file introduces migration while 'WebScraping2.ipynb' shows how to modify the existing html content in a given file. The original file is which has been used for extraction is "Research Higher Degree Students_ Chemical and Biomolecular Engineering, The University of Melbourne.html" and its associated folder.

The html file and the csv created using the script are 'rhd-new.html' and 'rhd.csv' respectively.

About

Web scraping content using Beautiful Soup and Pandas

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published