You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[eplatero] Add support for exporting and compiling models for SpD
(https://jira-dc.qualcomm.com/jira/browse/CLOUDPERF-43)
This change has been validated and posted on behalf of Erick Platero.
It adds support for generating a Target LM to run as a verifier model
by outputting all logits instead of just that of the last position for the
input sequence.
It also allows compiling the Target and Draft LMs with specializations
that support SpD
Usage:
TLM:
tlm = QEFFAutoModelForCausalLM.from_pretrained(<tlm-model-card>)
tlm.transform(num_speculative_tokens=<k>)
tlm.export_and_compile(<compiler-args>)
DLM:
dlm = QEFFAutoModelForCausalLM.from_pretrained(<dlm-model-card>)
dlm.transform(is_dlm=True)
dlm.export_and_compile(<compiler-args>)
# If continuous batching is enabled by proving full_batch_size we need to add FBS to the specialization file and update the batch size of decoder part to FBS
0 commit comments