In this project, we use Web Scraping to extract content from an old html file and migrate it to a new format. The libraries used are:
- Beautiful Soup - for parsing HTML data
- Pandas - for formatting the data into the required format
- urllib - for reading the url from a local path
The 'WebScraping.ipynb' file introduces migration while 'WebScraping2.ipynb' shows how to modify the existing html content in a given file. The original file is which has been used for extraction is "Research Higher Degree Students_ Chemical and Biomolecular Engineering, The University of Melbourne.html" and its associated folder.
The html file and the csv created using the script are 'rhd-new.html' and 'rhd.csv' respectively.