Skip to content

Conversation

asmigosw
Copy link
Contributor

Added flags:

  1. --iteration: Number of iterations to run the inference after loading the QPC once.
  2. --automation: If true, it prints input, output, and performance stats.

Example command: python -m QEfficient.cloud.infer --model_name gpt2 --batch_size 1 --prompt_len 32 --ctx_len 128 --mxfp6 --num_cores 16 --device_group [0] --prompt "My name is" --mos 1 --aic_enable_depth_first --iteration 2 --automation

Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com>
@quic-rishinr quic-rishinr marked this pull request as draft July 10, 2025 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants