This project is Khoa Lam's passion project at the Metis data science bootcamp in NYC. Recycling contamination is not only an environmental but also an economic issue as recycling companies often redirect contaminated bales of recyclables to landfills. As a result, it increases human waste output and costs businesses resources. Here, I used a convolutional neural network (CNN) to predict if an object is recyclable from its image. My project aims to help consumers minimize recycling contamination. This goal is a shared goal with other projects and organizations (e.g., TrashNet, Multilayer Hybrid Deep-Learning Method for Waste Classification and Recycling, and ZenRobotics). This project, however, differs in that it uses mixed image sources (i.e., digital images and photographs), whereas many other projects use only photos. The final CNN was trained on the AWS server and achieved F0.5 = 0.90 for recyclability, and averaged AUC = 0.75 for material classification (with 60/20/20 train-validate-test split). Lastly, the model was deployed into a Dash web app (currently defunct) on AWS Elastic Beanstalk. Presentation of this project can be found here.
demo.mov
The dataset (in zip files) is now accessible in a GDrive.
Image sources for this project include:
- Google Image Search, URLs from Google Custom Search API (code in getting-urls notebook)
- TrashNet
- A subset of Caltech 256 Image Dataset
- A subset of Flickr Material Database (FMD)
Currently, the dataset consists of 11045 images separated into 8 categories:
- Recyclables: 7543 images
- Glass (e.g., jars, bottles): 729 images
- Metal (e.g., cans, aluminum foil): 1747 images
- Paper (e.g., cardboard, books): 3230 images
- Plastic (e.g., soda bottles, food containers): 1837 images
- Non-recyclables: 3502 images
- Glass (e.g., lightbulbs, mirror): 531 images
- Plastics (e.g., styrofoam, sports balls): 1850 images
- Tanglers (e.g., wire, cable): 290 images
- Other (e.g., battery, ceramic): 831 images
This model has two distinct outputs: (1) recyclability (binary output), and (2) material classification (categorical output). Recyclability is trained with F0.5 as the metric, as F0.5 weighs precision twice as much as recall (minimize true recyclable contamination). Material classification is trained with AUC to balance separation of one class from others.
The code presented here is slightly simplified to be run on a local machine. To train the full dataset (~11000 images), an AWS Deep Learning AMI is recommended.
Python packages required: pandas, numpy, seaborn, matplotlib, keras, tensowflow, sklearn, PIL, cv2