The '90s hit, Seinfeld, is an American sitcom that features four friends, Jerry Seinfeld (Jerry Seinfeld), George Costanza (Jason Alexander), Elaine Benes (Julia Louis-Dreyfus), and Cosmo Kramer (Michael Richards), and their daily lives in New York City. The show follows Jerry, a stand-up comedian, and the lives of his best friend, George, his neighbor, Kramer, and his ex-girlfriend, Elaine, all played by real-life comedians. The show has been characterized as a “show about nothing”. While its popularity peaked in the '90s, the show continues to garner new fans while retaining old ones. It is recognized as one of the most influential sitcoms in TV history.
Despite the show having cemented its place as one of the greatest sitcoms to date, it can be difficult for Seinfeld fans, as well as those curious about the show, to search and discover certain episode and series characteristics when their only resource is manual web-searching. To this, we develop an interactive web application that covers the extensive capabilities of Wikipedia, IMDb, and Netflix all in one–acting as a descriptive and exploratory tool for users to query specific scenes, visualize character quirks, and receive recommendations for future watches. We hope to not only uncover new insights about the classic show, but also provide a centralized tool for fans to access and explore all things Seinfeld!
Public Website: nothing.streamlit.app
Name | GitHub | Name | GitHub |
---|---|---|---|
Yash Manne | yashmanne | Yamina Katariya | YaminaKat7 |
Aditi Shrivastava | ad-iti | Chandler Ault | Dreamweaver2k |
- Interactive Visual Homepage:
- Chronicles number of dialogue lines for each character across the series across an interactive visual dashboard.
- Showcases change in episode rating across the series.
- Search Query:
- Allows users to search for specific episodes based on partially remembered dialogue snippets or keywords.
- Includes additional advanced search functionality for filtering search based on season, episode rating, and desired characters.
- Showcases multiple visualizations detailing the emotional distribution of the episode among the main characters.
- Episode Recommender:
- Finds similar episode(s) to a user's favorite(s) based on dialogue, episode description, IMDb keywords, episode summary, emotional distribution of each character, the number of lines for each character, and the IMDb audience rating.
Here is an overview of our project structure:
├── an_analysis_of_nothing/
│ ├── app_pages/
│ │ ├── __init__.py
│ │ ├── write_about_page.py
│ │ ├── write_episode_query.py
│ │ ├── write_home_page.py
│ │ ├── write_recommender_page.py
│ ├── static/
│ │ ├── data/
│ │ │ ├── dialogue_tensors/
│ │ │ │ ├── tensor_0.npy
│ │ │ │ ├── tensor_1.npy
│ │ │ │ ├── ...
│ │ │ │ ├── tensor_9.npy
│ │ │ ├── metadata.csv
│ │ │ ├── scripts.csv
│ │ │ ├── README.md
│ │ ├── images/
│ │ │ ├── analysis_of_nothing.png
│ │ │ ├── elaine.png
│ │ │ ├── elaine_.jpg
│ │ │ ├── elaine_typing.gif
│ │ │ ├── george.png
│ │ │ ├── george_.jpg
│ │ │ ├── giphy.gif
│ │ │ ├── jerry.png
│ │ │ ├── kramer.png
│ │ │ ├── kramer_.jpg
│ │ │ ├── lead.png
│ │ │ ├── newman_.jpg
│ │ │ ├── README.md
│ │ ├── README.md
│ ├── tests/
│ │ ├── __init__.py
│ │ ├── mock_functions.py
│ │ ├── test_data_manager.py
│ │ ├── test_episode_query.py
│ │ ├── test_recommender.py
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── data_constants.py
│ │ ├── data_manager.py
│ │ ├── episode_query.py
│ │ ├── recommender.py
│ ├── app.py
│ ├── README.md
│ ├── requirements.txt
├── doc/
│ ├── Component_Specification.md
│ ├── Episode_Query_Interaction.png
│ ├── Episode_Recommender_Interaction.png
│ ├── Final_Presentation.pdf
│ ├── Functional_Specification.md
│ ├── General_Analytics_Interaction.png
│ ├── README.md
│ ├── Sequence_Diagram.png
│ ├── Technology_Review.pdf
├── examples/
│ │ ├── images/
│ │ │ ├── README.md
│ │ │ ├── site_nav_1.png
│ │ │ ├── site_nav_2.png
│ │ │ ├── site_nav_3.png
│ │ │ ├── site_nav_4.png
│ │ │ ├── site_nav_5.png
│ │ ├── data.ipynb
│ │ ├── README.md
│ │ ├── site_navigation.md
│ ├── README.md
├── scripts/
│ ├── data_tools/
│ │ ├── __init__.py
│ │ ├── _scrape_epi_pages.py
│ │ ├── data_constants.py
│ │ ├── load_data.py
│ ├── precompute_tools/
│ │ ├── __init__.py
│ │ ├── query_vectors.py
│ │ ├── sentiment.py
│ ├── get_final_data.py
│ ├── README.md
├── .gitignore
├── environment.yml
├── .gitignore
├── LICENSE
├── pylintrc
├── pyproject.toml
├── README.md (Current File)
This repository can be cloned locally by running the following git
command:
git clone https://github.com/yashmanne/an_analysis_of_nothing.git
Please note that Git is required to run the above command. For instructions on downloading Git, please see the GitHub guide.
This application is built on top of multiple Python packages with specific version requirements. Installing these packages can cause conflicts with other packages in the workspace. As a work-around, we recommend to use conda
to create an isolated Python environment with all necessary packages. Specifically, the list of necessary packages can be found at in the environment.yml
file.
To create our specified nothing
Conda environment, run the following command:
conda env create -f environment.yml
Once the Conda environment is created, it can be activated by:
conda activate nothing
After coding inside the environment, it can be deactivated with the command:
conda deactivate
Please note that Conda must be installed for the above commands to work. For instructions on installing Conda, please visit Conda.io.
The raw data for our project was obtained from three different sources:
- Complete script dialogue and metadata for all episodes was found on Kaggle.
- Additional production information and audience rating was obtained from IMDb.
- Individual IMDb episode pages were scraped for additional information such as episode summary, episode description, and episode keywords.
To process & store the data for future analyses, run the following code:
conda activate nothing
python ./scripts/get_final_data.py
conda deactivate
Please note that this code not only cleans & merges the data, but also calculates the emotional distribution of each dialogue line and generates a pretrained BERT vector embedding for each line. This script may take up to 3 hours. More details can be found here and here.
Details about the data variables can be found here.
We generated our application through the open-source streamlit
package. A local application can be generated with the code:
conda activate nothing
streamlit run an_analysis_of_nothing/app.py
This will pop up a browser window with the functioning web-application. More details can be found here.
A video demonstration of our working application can be seen here.
More details on how to run our code can be found here.