This category classification model was trained on the v4 Data For Good 2022 category dataset using code in this version of the off-category-classification repository.
Training was tracked on Wandb.
This release provides the following assets:
Dataset assets:
predict_categories_dataset_products.jsonl.gz
: product selected fields.predict_categories_dataset_images_ids.jsonl.gz
: IDs of images associated with each product.predict_categories_dataset_ocrs.jsonl.gz
: extracted OCR texts for each product.(train|test|val).txt
: train, test and validation splits (list of barcodes).
Training-related assets:
config.json
providing the parameter configuration used during training.categories.full.json.gz
containing the category taxonomy version used in this model's training.ingredients.full.json.gz
containing the ingredient taxonomy version used in this model's training.training.log
: training logs.
Validation assets:
classification_report_(test|val).json
is the classification report for test/val datasets.threshold_report_0.99.json
: category-specific thresholds required to reach aprecision >= 0.99
on a merged validation + test set.(test|val)_top_predictions.tsv
: top-10 predictions on validation/test sets.
Serving assets:
saved_model.tar.gz
containing the model saved in SavedModel format.