Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Neuron backend #3033

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Add Neuron backend #3033

wants to merge 20 commits into from

Conversation

dacorvo
Copy link
Collaborator

@dacorvo dacorvo commented Feb 18, 2025

What does this PR do?

This adds the neuron backend that was previously maintained in the optimum-neuron repository.

This backend is built on top of the AWS Neuron SDK, and comprises:

  • the legacy v2 TGI launcher and router,
  • a neuron specific inference server for text-generation.

Documentation

A dedicated documentation page has been added in the backends subsection.

Tests

The backend comes with some dedicated tests:

  • neuron server tests (using only the server python package),
  • integration tests (using docker images).

Both set of tests require some models to be pre-exported and cached to test:

  • deploying pre-exported neuron models from the hub directly,
  • deploying vanilla models using cached neuron graphs.

The server tests are not run for the moment, only the integration tests.

Since these tests are very specific to neuron, they can only be activated by specifying the new --neuron python option.
Conversely, as soon as the --neuron option is set, all tests that do not have the neuron marker are disabled.

The neuron integraiton tests use also specific fixtures that have been added as local plugins:

  • fixtures.neuron.model (takes care of preexporting the models and just fetches them if it has been done before),
  • fixtures.neuron.service (custom service loop that uses the specific neuron launch env variables).

Next steps

  • use python 3.11 and align on the main Dockerfile,
  • add a custom launcher that only exposes the relevant parameters and sets default values,
  • add a new router for servers that have a static batch size.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@dacorvo
Copy link
Collaborator Author

dacorvo commented Feb 18, 2025

This is the continuation of #3018.

@dacorvo dacorvo mentioned this pull request Feb 18, 2025
@dacorvo dacorvo changed the title Neuron backend Add Neuron backend Feb 18, 2025
@dacorvo dacorvo marked this pull request as ready for review February 18, 2025 15:12
@dacorvo dacorvo requested review from Narsil and drbh February 18, 2025 15:12
@drbh drbh mentioned this pull request Feb 18, 2025
@dacorvo
Copy link
Collaborator Author

dacorvo commented Feb 20, 2025

The neuron tests are correctly skipped by the other workflows. See for instance the CUDA integration tests:

image

@dacorvo dacorvo force-pushed the neuron_backend branch 2 times, most recently from bd528c4 to 9c1d121 Compare February 20, 2025 10:35
dacorvo and others added 15 commits February 20, 2025 16:14
The base image used to compile the rust components seems to have a low
ulimit for opened files, which leads to errors during compilation.
The neuron tests require models to have been previously exported and
cached on the hub. This is done automatically by the neuron.model
fixture the first time the tests are ran for a specific version.
This fixture used to export the models using optimum-neuron directly,
but this package is not necessarily present on the system.
Instead, it is now done through the neuron TGI itself, since it
contains all the tools required to export the models.
Note that since the CI runs docker in docker (dind) it does not seem
possible to share a volume between the CI container and the container
used to export the model.
For that reason, a specific image with a modified entrypoint is built
on-the-fly when a model export is required.
The SageMaker image is built differently anyway.
Narsil and others added 3 commits February 21, 2025 15:56
We now manually evaluate the apparent hash of the neuron backend by
combining the hash of the neuron backend directory and Dockerfile.
This new hash is used to identify exported neuron models instead of the
image sha.
This has two benefits:
- it changes less frequently (only hwen the neuron backend changes),
  which means less neuron models being pushed to the hub,
- it can be evaluated locally, meaning that running the tests once
  locally will export the models before the CI uses them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants