This is a project for the Human Language Technology 2023/2024 course. This project focuses on detecting spoiler reviews on IMDb. The dataset is taken from Kaggle and consists of two files:
- Movie dataset: Contains information about the movies reviewed by the users, such as the movie plot synopsis.
- Review dataset: Contains the text of the reviews and other metadata.
The idea is to use different language models to classify the reviews. We start from a baseline using simple models like Logistic Regression and Naive Bayes, and gradually move to more complex models like LLama3. A more detailed description of the work is in the PDF This project is being made in group: