Sharing some Python notebooks created along my Data Science Learning Journey
Scraping is simply a process of extracting (from various means), copying, and screening data. Web scraping provides a way for developers to collect and analyze data from the internet.
Web-scraping provides one of the great tools to automate most of the things a human does while browsing.
In this project, we will explore "How to extract information from the popular Good Reads platform to analyze and generate interesting insights around Book Trends"
Goodreads is one the world’s largest community for reviewing and recommending books. It's a favorite platform for many voracious readers!!
This project is partly inspired by the following linked project. Reference link: https://medium.com/@soodakriti175/goodreads-web-scraping-92345b620f9c
I have structured this first Python notebook detailing below tasks:
How to scrape certain sections of a page using Beautiful Soup? In particular, all books are listed under a Good Reads user-defined list on a given page. How to iteratively scrape all pages to obtain specific attributes on all books-related information belonging to a particular list? How to load the scraped contents into a Pandas data frame? How to expand the scope and iteratively scrape all lists for book-related information for a list of user-defined tags and append the extracted info to an existing .csv file loaded in Google Drive? Example: Tags such as "fiction", "science-fiction" etc.
Relevant files: (Output) goodreads_fiction_types - goodreads_fiction_types goodreads_web_scraping.py GoodReads_Web_Scraping.ipynb
Converting Natural Language to SQL and querying a database using Gemini Pro LLM
CodeBase: Text2SQLAppGeminiPro
A telecom company wants to use its historical customer data and leverage machine learning to predict behavior in an attempt to retain customers. The end goal is to develop focused customer retention programs
The objective, as a data scientist hired by the telecom company, is to build a model that will help to identify the potential customers who have a higher probability of churn. This will help the company to understand the pain points and patterns of customer churn and will increase the focus on strategizing customer retention.
CodeBase: Ensemble_Project.ipynb
Building a Convnet from Scratch [Estimated completion time: 20 minutes]
In this exercise, we will build a classifier model from scratch that can distinguish dogs from cats. We will follow these steps:
- Explore the example data
- Build a small convnet from scratch to solve our classification problem
- Evaluate training and validation accuracy
CodeBase: Cat vs. Dog Image Classification
Let’s connect at https://www.linkedin.com/in/vpnarayanan/ and exchange ideas about the latest tech trends and advancements! 🌟