Integrate quanto commands to optimum-cli #244

dacorvo · 2024-07-17T13:47:09Z

What does this PR do?

This adds a subpackage module that registers quanto commands to optimum CLI.

The subpackage contains for now only a very simple quantize command.

$ optimum-cli -h
usage: optimum-cli

positional arguments:
  {export,env,quanto}
    export             Export PyTorch and TensorFlow models to several format.
    env                Get information about the environment used.
    quanto             Hugging Face models quantization tools

optional arguments:
  -h, --help           show this help message and exit

$ optimum-cli quanto -h
usage: optimum-cli quanto [-h] {quantize} ...

positional arguments:
  {quantize}
    quantize  Quantize Hugging Face models.

optional arguments:
  -h, --help  show this help message and exit

$ optimum-cli quanto quantize -h
usage: optimum-cli quanto quantize [-h] -m MODEL [--weights {int2,int4,int8,float8}] [--revision REVISION] [--trust_remote_code] [--library {transformers}] [--task TASK]
                                   [--torch_dtype {auto,fp16,bf16}] [--device DEVICE]
                                   output

optional arguments:
  -h, --help            show this help message and exit

Required arguments:
  output                The path to save the quantized model.
  -m MODEL, --model MODEL
                        Hugging Face Hub model id or path to a local model.
  --weights {int2,int4,int8,float8}
                        The Hugging Face library to use to load the model.

Optional arguments:
  --revision REVISION   The Hugging Face model revision.
  --trust_remote_code   Trust remote code when loading the model.
  --library {transformers}
                        The Hugging Face library to use to load the model.
  --task TASK           The model task (useful for models supporting multiple tasks).
  --torch_dtype {auto,fp16,bf16}
                        The torch dtype to use when loading the model weights.
  --device DEVICE       The device to use when loading the model.

refactor(models): add QuantizedTransformersModel

ff49299

dacorvo requested review from michaelbenayoun and mfuntowicz July 17, 2024 13:47

feat: add subpackage for optimum integration

58bca22

dacorvo force-pushed the optimum-subpackage branch 2 times, most recently from 225bc1e to 8fabcc2 Compare July 17, 2024 14:23

dacorvo added 2 commits July 17, 2024 16:28

test: add CLI tests

7b29b73

ci: add CLI tests

1b23899

dacorvo force-pushed the optimum-subpackage branch from 8fabcc2 to 1b23899 Compare July 17, 2024 14:28

dacorvo merged commit 8064266 into main Jul 17, 2024
12 checks passed

dacorvo deleted the optimum-subpackage branch July 17, 2024 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate quanto commands to optimum-cli #244

Integrate quanto commands to optimum-cli #244

dacorvo commented Jul 17, 2024

Integrate quanto commands to optimum-cli #244

Integrate quanto commands to optimum-cli #244

Conversation

dacorvo commented Jul 17, 2024

What does this PR do?