I am using huggigface candle for smollm local inference. it's fast-growing optimized framework that can be used across different devices. we can add it here. i can raise PR, it's already integrated in candle as it's architecture is same as llama.
https://github.com/huggingface/candle/tree/main/candle-examples/examples/quantized
https://github.com/huggingface/candle/tree/main/candle-examples/examples/llama