Data Mining and Wrangling Course Submission
See full report HERE
Goodreads is the leading book review website trusted by millions of users worldwide. In this project, our group analyzed the different book features and their relationship with book ratings. This was done using various data mining, wrangling, and visualization techniques. Results show that the most likely predictors for book rating are the number of book pages and the number of ratings. Insights on the other feature interactions also emerged. It was found that e-books are more prevalent for the romance genre and scarce for children’s books and comics. Further, faster reading time is observed for the e-book format. The data also validated the common notion that reading time is longer for books with higher number of pages and that there is a higher occurrence of text reviews compared to non-text.
Since there are limited book features in the dataset, it is recommended in future studies to extract user profiles as well, such as their age, gender, and other demographic and psychometric information of the reviewers. As for the current dataset available, it would be better to perform the methodology on a larger portion of the database to validate this study and gain more accurate information. Machine learning algorithms could also be explored.
dela Resma, Marvee
Ginez, Zhoya
Inocencio, Ken
Nepomuceno, Colleen
Piquero, Geran
Punzalan, Paolo