Web-Scraping

WEB SCRAPING USING BeautifulSoup

Install requests and beautifulsoup4 libraries using the commands -pip install requests -pip install beautifulsoup4

Change the URL accordingly

Python scripts extracts visible text content from a webpage and saves it to a JSON file

one.py Using BeautifulSoup, it parses the HTML content and extracts visible text using the get_text() method. It removes unwanted substrings such as newline characters, tabs, and specific strings like "Skip to Top Main Navigation" from the extracted text content.
p_tags.py Using BeautifulSoup, it parses the HTML content. It finds all p (paragraph) elements in the parsed HTML and extracts their text content using a generator expression within the join() function. This concatenates the text of all paragraphs into a single string.
tagsdata.py Using BeautifulSoup, it parses the HTML content. It defines a list of HTML tags from which text will be extracted. These include paragraph tags (p), heading tags (h1 to h6), strong tags (strong), anchor tags (a), etc. Extracting Text: It iterates over each specified tag, extracts the text content, removes unwanted characters, and appends the cleaned text to the extracted_text list. It removes any empty strings from the extracted_text list.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
one.py		one.py
p_tags.py		p_tags.py
tagsdata.py		tagsdata.py
webpage_text.json		webpage_text.json
webpage_text_tags.json		webpage_text_tags.json
website_content.json		website_content.json

Provide feedback