Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Speed and Accuracy for zero-shot classificatio #369

Open
Rm1n90 opened this issue Jan 24, 2025 · 0 comments
Open

Q: Speed and Accuracy for zero-shot classificatio #369

Rm1n90 opened this issue Jan 24, 2025 · 0 comments

Comments

@Rm1n90
Copy link

Rm1n90 commented Jan 24, 2025

Hello,

I wrote the code based on the text classification example for zero-shot classification. However I'm facing two issues:

  1. The accuracy of the model will drop significantly. For example, this is the result I get with normal transformer for my input:
    {'food quality': 0.7271687984466553, 'service': 0.6853761672973633, 'price': 0.6715865135192871, 'ambiance': 0.3189621865749359, 'cleanliness': 0.24270476400852203, 'menu variety': 0.17212778329849243, 'portion size': 0.06296943873167038, 'wait time': 0.026042930781841278}

and after using quanto the results for both float and quantized model are as follow:

Float Model:
Scores: [('price', 0.2421216070652008), ('food quality', 0.20708876848220825), ('service', 0.18631604313850403), ('menu variety', 0.11014683544635773), ('ambiance', 0.09374429285526276), ('cleanliness', 0.06582632660865784), ('portion size', 0.057072483003139496), ('wait time', 0.03768354654312134)]

Quantized Model:
Scores: [('price', 0.23580333590507507), ('food quality', 0.200372114777565), ('service', 0.18851585686206818), ('menu variety', 0.10964863002300262), ('ambiance', 0.09499731659889221), ('cleanliness', 0.06895963102579117), ('portion size', 0.05967756733298302), ('wait time', 0.042025450617074966)]

why is this happening?

  1. The model becomes much slower after converting to the quantized model than transformers or float models. Shouldn't it be the opposite?

Here is my code

import torch
import time
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
from optimum.quanto import freeze, qint8, quantize, qfloat8


def evaluate_zero_shot(model, tokenizer, device, text, hypothesis_template, classes_verbalized, warmup_steps=3):
    p = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer, device=device)

    print(f"Warming up {warmup_steps} steps...")
    for _ in range(warmup_steps):
        _ = p(text, classes_verbalized, hypothesis_template=hypothesis_template)

    start_time = time.time()
    result = p(text, classes_verbalized, hypothesis_template=hypothesis_template)
    end_time = time.time()

    print(f"Scores: {list(zip(result['labels'], result['scores']))}")
    print(f"Inference Time: {end_time - start_time:.4f} seconds")


def main():
    model_name = "facebook/bart-large-mnli"
    device = torch.device("cpu")

    model = AutoModelForSequenceClassification.from_pretrained(model_name).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    text = "The prices were reasonable for the quality."
    hypothesis_template = "The topic of this text is about {}"
    classes_verbalized = ["food quality", "service", "ambiance", "price", "cleanliness", "portion size", "wait time",
                          "menu variety"]

    print("Float Model:")
    evaluate_zero_shot(model, tokenizer, device, text, hypothesis_template, classes_verbalized)

    quantize(model, weights=qfloat8, activations=None)
    freeze(model)

    print("\nQuantized Model:")
    evaluate_zero_shot(model, tokenizer, device, text, hypothesis_template, classes_verbalized)


if __name__ == "__main__":
    main()

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant