Skip to content

llama : speed-up grammar sampling #299

@jakexcosme

Description

@jakexcosme

Note: This issue was copied from ggml-org#4218

Original Author: @ggerganov
Original Issue Number: ggml-org#4218
Created: 2023-11-25T17:04:06Z


There have been a few reports where the grammar sampling can significantly degrade the performance.
It would be nice to profile and optimize the implementation - there should be room for improvements.

Already on-going efforts:

Probably worth looking in multi-threading the implementation as well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions