Add Neuron backend #3033

dacorvo · 2025-02-18T14:34:43Z

What does this PR do?

This adds the neuron backend that was previously maintained in the optimum-neuron repository.

This backend is built on top of the AWS Neuron SDK, and comprises:

the legacy v2 TGI launcher and router,
a neuron specific inference server for text-generation.

Documentation

A dedicated documentation page has been added in the backends subsection.

Tests

The backend comes with some dedicated tests:

neuron server tests (using only the server python package),
integration tests (using docker images).

Both set of tests require some models to be pre-exported and cached to test:

deploying pre-exported neuron models from the hub directly,
deploying vanilla models using cached neuron graphs.

The server tests are not run for the moment, only the integration tests.

Since these tests are very specific to neuron, they can only be activated by specifying the new --neuron python option.
Conversely, as soon as the --neuron option is set, all tests that do not have the neuron marker are disabled.

The neuron integraiton tests use also specific fixtures that have been added as local plugins:

fixtures.neuron.model (takes care of preexporting the models and just fetches them if it has been done before),
fixtures.neuron.service (custom service loop that uses the specific neuron launch env variables).

Next steps

use python 3.11 and align on the main Dockerfile,
add a custom launcher that only exposes the relevant parameters and sets default values,
add a new router for servers that have a static batch size.

HuggingFaceDocBuilderDev · 2025-02-18T14:36:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dacorvo · 2025-02-18T14:36:34Z

This is the continuation of #3018.

dacorvo · 2025-02-20T09:14:14Z

The neuron tests are correctly skipped by the other workflows. See for instance the CUDA integration tests:

The base image used to compile the rust components seems to have a low ulimit for opened files, which leads to errors during compilation.

The neuron tests require models to have been previously exported and cached on the hub. This is done automatically by the neuron.model fixture the first time the tests are ran for a specific version. This fixture used to export the models using optimum-neuron directly, but this package is not necessarily present on the system. Instead, it is now done through the neuron TGI itself, since it contains all the tools required to export the models. Note that since the CI runs docker in docker (dind) it does not seem possible to share a volume between the CI container and the container used to export the model. For that reason, a specific image with a modified entrypoint is built on-the-fly when a model export is required.

The SageMaker image is built differently anyway.

We now manually evaluate the apparent hash of the neuron backend by combining the hash of the neuron backend directory and Dockerfile. This new hash is used to identify exported neuron models instead of the image sha. This has two benefits: - it changes less frequently (only hwen the neuron backend changes), which means less neuron models being pushed to the hub, - it can be evaluated locally, meaning that running the tests once locally will export the models before the CI uses them.

dacorvo mentioned this pull request Feb 18, 2025

Add Neuron backend #3018

Closed

dacorvo changed the title ~~Neuron backend~~ Add Neuron backend Feb 18, 2025

dacorvo marked this pull request as ready for review February 18, 2025 15:12

dacorvo requested review from Narsil and drbh February 18, 2025 15:12

drbh mentioned this pull request Feb 18, 2025

Pr 3018 ci branch #3028

Closed

dacorvo force-pushed the neuron_backend branch from 470c526 to f40f4e4 Compare February 19, 2025 17:42

dacorvo force-pushed the neuron_backend branch 2 times, most recently from bd528c4 to 9c1d121 Compare February 20, 2025 10:35

dacorvo and others added 15 commits February 20, 2025 16:14

feat: add neuron backend

13caf6d

feat(neuron): add server standalone installation

f085204

feat(neuron): add server and integration tests

0b7c7c3

fix(neuron): increase ulimit when building image

542eee6

The base image used to compile the rust components seems to have a low ulimit for opened files, which leads to errors during compilation.

test(neuron): merge integration tests and fixtures

2c37e8a

test: add --neuron option

4a16e8e

review: do not use latest tag

bc95ef2

review: remove ureq pinned version

c7f49d8

review: --privileged should be the exception

8b04be3

feat: add neuron case to build ci

0e00296

refactor: remove sagemaker entry-point

393753b

The SageMaker image is built differently anyway.

fix(neuron): avoid using Levenshtein

e890439

test(neuron): use smaller llama model

7d6ff64

feat(neuron): avoid installing CUDA in image

d39e002

Narsil force-pushed the neuron_backend branch from 832ba35 to 73c1dd2 Compare February 20, 2025 15:15

test(neuron): no error anymore when requesting too many tokens

debf032

dacorvo force-pushed the neuron_backend branch from 40ce497 to e86544e Compare February 20, 2025 17:35

dacorvo force-pushed the neuron_backend branch from e86544e to 262abb8 Compare February 21, 2025 08:36

Narsil and others added 3 commits February 21, 2025 15:56

ci: doing a precompilation step (with a different token).

ace39ee

test(neuron): added a small script to prune test models

19166bc

dacorvo force-pushed the neuron_backend branch from 091bff5 to 19166bc Compare February 21, 2025 15:56

Don't redo the tests for neuron.

38c711b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Neuron backend #3033

Add Neuron backend #3033

dacorvo commented Feb 18, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 18, 2025

dacorvo commented Feb 18, 2025

dacorvo commented Feb 20, 2025 •

edited

Loading

Add Neuron backend #3033

Are you sure you want to change the base?

Add Neuron backend #3033

Conversation

dacorvo commented Feb 18, 2025 • edited Loading

What does this PR do?

Documentation

Tests

Next steps

HuggingFaceDocBuilderDev commented Feb 18, 2025

dacorvo commented Feb 18, 2025

dacorvo commented Feb 20, 2025 • edited Loading

dacorvo commented Feb 18, 2025 •

edited

Loading

dacorvo commented Feb 20, 2025 •

edited

Loading