Q: Speed and Accuracy for zero-shot classificatio #369

Rm1n90 · 2025-01-24T15:07:39Z

Hello,

I wrote the code based on the text classification example for zero-shot classification. However I'm facing two issues:

The accuracy of the model will drop significantly. For example, this is the result I get with normal transformer for my input:
{'food quality': 0.7271687984466553, 'service': 0.6853761672973633, 'price': 0.6715865135192871, 'ambiance': 0.3189621865749359, 'cleanliness': 0.24270476400852203, 'menu variety': 0.17212778329849243, 'portion size': 0.06296943873167038, 'wait time': 0.026042930781841278}

and after using quanto the results for both float and quantized model are as follow:

Float Model:
Scores: [('price', 0.2421216070652008), ('food quality', 0.20708876848220825), ('service', 0.18631604313850403), ('menu variety', 0.11014683544635773), ('ambiance', 0.09374429285526276), ('cleanliness', 0.06582632660865784), ('portion size', 0.057072483003139496), ('wait time', 0.03768354654312134)]

Quantized Model:
Scores: [('price', 0.23580333590507507), ('food quality', 0.200372114777565), ('service', 0.18851585686206818), ('menu variety', 0.10964863002300262), ('ambiance', 0.09499731659889221), ('cleanliness', 0.06895963102579117), ('portion size', 0.05967756733298302), ('wait time', 0.042025450617074966)]

why is this happening?

The model becomes much slower after converting to the quantized model than transformers or float models. Shouldn't it be the opposite?

Here is my code

import torch
import time
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
from optimum.quanto import freeze, qint8, quantize, qfloat8


def evaluate_zero_shot(model, tokenizer, device, text, hypothesis_template, classes_verbalized, warmup_steps=3):
    p = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer, device=device)

    print(f"Warming up {warmup_steps} steps...")
    for _ in range(warmup_steps):
        _ = p(text, classes_verbalized, hypothesis_template=hypothesis_template)

    start_time = time.time()
    result = p(text, classes_verbalized, hypothesis_template=hypothesis_template)
    end_time = time.time()

    print(f"Scores: {list(zip(result['labels'], result['scores']))}")
    print(f"Inference Time: {end_time - start_time:.4f} seconds")


def main():
    model_name = "facebook/bart-large-mnli"
    device = torch.device("cpu")

    model = AutoModelForSequenceClassification.from_pretrained(model_name).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    text = "The prices were reasonable for the quality."
    hypothesis_template = "The topic of this text is about {}"
    classes_verbalized = ["food quality", "service", "ambiance", "price", "cleanliness", "portion size", "wait time",
                          "menu variety"]

    print("Float Model:")
    evaluate_zero_shot(model, tokenizer, device, text, hypothesis_template, classes_verbalized)

    quantize(model, weights=qfloat8, activations=None)
    freeze(model)

    print("\nQuantized Model:")
    evaluate_zero_shot(model, tokenizer, device, text, hypothesis_template, classes_verbalized)


if __name__ == "__main__":
    main()

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q: Speed and Accuracy for zero-shot classificatio #369

Q: Speed and Accuracy for zero-shot classificatio #369

Rm1n90 commented Jan 24, 2025

Q: Speed and Accuracy for zero-shot classificatio #369

Q: Speed and Accuracy for zero-shot classificatio #369

Comments

Rm1n90 commented Jan 24, 2025