Full Stack Deep Learning Course 2022 project
Team 44:
- Aleks Hiidenhovi: alekshiidenhovi
- Bernardo García: bergr7
- Diego Quintana: diegoquintanav
- Edoardo Abati: EdAbati
- Juan Manuel: jmpc11
- Kushal Atul Ramaiya: ramaiyaKushal
AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .
The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
https://huggingface.co/datasets/bergr7/weakly_supervised_ag_news
from datasets import load_dataset
dataset = load_dataset("bergr7/weakly_supervised_ag_news")
...
- Docker
conda
- a
.env
file in the root of this project (based on.env.sample
) - Optional, for AWS deployment:
- An AWS Account
- AWS CLI installed and configured
Create a virtual environment using conda
: make conda-env-and-update
The project includes many services. They are either running as docker containers or available as public online services.
This project has the following services:
jupyter
: a Jupyter notebook server with support for the NVIDIA A100-SXM4-40GB GPU (CUDA 11.6) and contains scripts for training and evaluating modelslambda
: a lambda function for serving predictionsstreamlit
: a Streamlit application for displaying a user interfacerubrix
: a Rubrix server for annotating dataelastic
: an Elasticsearch server for storing data from Rubrixkibana
: a Kibana server for visualizing data from Elasticsearch
Launch locally the rubrix
service (you also need the elasticsearch
and kibana
service):
docker compose -f docker-compose.yml up -d rubrix elasticsearch kibana
Afterward, you should be able to access the web app at http://localhost:6900/.
The default username and password are rubrix
and 1234
.
-
Add a registered model in the W&B model registry (guide here)
-
Run the command using the name in the registry as 'model_name':
python cli/deploy.py register-and-convert-model --artifact-name="[entity]/[project]/[artifact_name]:[alias]" --model-name="[entity]/[project]/[model_name]"
This will register the model artifact in the registry and it will convert it into the TorchScript format.
All the services can be launched using docker-compose
, and its configuration is shared across many yml
files that can by chained with the -f
command. Read more about this in the docker-compose documentation. You will need to provide a .env
file with variables listed in .env.sample
for this deployment to work.
In particular, the following files are used:
docker-compose.yml
(base
) is used to run the project in a container. It contains all the services a part from the proxy.docker-compose.override.yml
(override
) is used to run the project as development environment. It exposes many volumes to allow quick prototyping.docker-compose.nvidia.yml
(nvidia
) extends thejupyter
container and adds support for GPU usage.
These can be launched using make
, e.g. make dev.all.up
will launch all the services combining the base
and override
configs. The Makefile
contains a list of all the commands.
As the project is made by multiple services, please follow the guide corresponding to the service you want to contribute to.
If you need to work on a feature that requires all the services to be active and running (e.g. active learning loop with notebook and rubrix), you can run make dev.all.up.build
.
-
Create a dev environment either with
conda
or withdocker
(suggested way):-
Docker: Launch locally the
jupyter
service using thedev
config fordocker-compose
. This will mount the local directory to the container and it will allow you to change the local files and see changes in the container straight away:docker compose build jupyter docker compose -f docker-compose.yml -f docker-compose.override.yml up -d jupyter
(Optional) Install
pre-commit
hooks:make install-pre-commit
-
-
Update the code in the folder:
services/jupyter/src/app
: contains the main code of the application (model training, deployment)services/jupyter/src/notebooks
: contains useful notebooks for data exploration, testing, ...
Please follow the guide here
Please follow the guide here