Skip to content

Determine if there is any bias toward favourable reviews from Vine members (paid reviews) in Amazon product reviews.

Notifications You must be signed in to change notification settings

suyinwb/Amazon_Vine_Analysis

Repository files navigation

Amazon Vine Analysis

Background

Analyzing Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies like SellBy pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.

Overview of Project

Purpose

In this project, you’ll have access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. You’ll need to pick one of these datasets and use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Next, you’ll use PySpark, Pandas, or SQL to determine if there is any bias toward favourable reviews from Vine members in your dataset. Then, you’ll write a summary of the analysis for Jennifer to submit to the SellBy stakeholders.

Analysis And Challenges

Methodology: Analytics Paradigm

1. Decomposing the Ask

Determine if there is any bias toward favourable reviews from Vine members (paid reviews) in Amazon product reviews.

2. Identify the Datasource

From this list: https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt

3. Define Strategy & Metrics

Resource: Google Colab, PySpark, AWS RDS, AWS S3, Postgres 12

4. Data Retrieval Plan

  1. Retrieve the Amazon Reviews dataset
  2. Upload in my AWS S3 bucket
  3. Call the S3 dataset from my Google Colab workbook
  4. Assemble the data as indicated in 5. Assemble & Clean the Data
  5. create database in Amazon RDS instance
  6. create connection & corresponding server in Postgres
  7. create database schema in Postgres database
  8. From Google Colab, connect to the AWS RDS instance and populate the tables which will then populate the database tables in Postgres
  • S3 --> Google Colab --> AWS RDS instance --> Postgres RDS

5. Assemble & Clean the Data

Create 4 dataframes from the dataset to fit in with our database tables:

  • review_id_table
  • products_table
  • customers_table
  • vine_table

6. Analyse for Trends

The analysis is indicated below in Analysis

7. Acknowledging Limitations

The dataset is only limited to year 2015 so the trend might have changed since then.

8. Making the Call:

The "Proper" Conclusion is indicated below in Summary

Analysis

  1. How many Vine reviews and non-Vine reviews were there?

Paid Total Reviews

paid

There is a total of 1207 of paid reviews that have received 20 or more helpful votes and those helpful votes are 50% or more than total votes.

Unpaid Total Reviews

unpaid

There is a total of 97839 of unpaid reviews that have received 20 or more helpful votes and those helpful votes are 50% or more than total votes.

  1. How many Vine reviews were 5 stars? How many non-Vine reviews were 5 stars?
  2. What percentage of Vine reviews were 5 stars? What percentage of non-Vine reviews were 5 stars?

Percentage 5 Stars Paid

paid

Percentage of 5 stars paid review is 42.170671% at 509 5 stars reviews

Percentage 5 Stars UnPaid

unpaid

Percentage of 5 stars unpaid review is 46.870880% at 45858 5 stars reviews

Summary

Looking at the analysis of Amazon Kitchen reviews above, there is no positive bias in the Vine program as paid 5 stars reviews is at 42% from total paid reviews and unpaid reviews are at 45% from total unpaid reviews. This means the percentage of unpaid 5 stars reviews are more than paid 5 stars reviews. From the total of 5 stars reviews for paid and unpaid program, Vine paid 5 stars is only 1% (509) of unpaid 5 stars reviews (45858).

Additional Information

add

Currently there are 107421 reviews that have received 20 or more helpful votes, see above. That means paid reviews is only 1% of the total helpful reviews in this category.

Future Work

Given the dataset above, I will propose additional analysis with NLP for the columns below:

  1. review_headline: Title of Reviews
  2. review_body: Review sentences

The above analysis will be able to give us customer sentiments on products and potential improvements and suggestions for the products above. Furthermore, it could also enable potentially new products to be invented that will solve their pain points.

Appendix

References

Overview

https://www.analyticsvidhya.com/blog/2021/06/part-1-step-by-step-guide-to-master-natural-language-processing-nlp-in-python/

NLP Tutorial series

https://eugenia-anello.medium.com/nlp-tutorial-series-d0baaf7616e0

Python

https://www.analyticsvidhya.com/blog/2017/01/ultimate-guide-to-understand-implement-natural-language-processing-codes-in-python/

AWS Case Studies

About

Determine if there is any bias toward favourable reviews from Vine members (paid reviews) in Amazon product reviews.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published