740 add generic support for different gpu hardware #3371

jakki-amd · 2024-12-02T08:01:47Z

Description

This PR decouples the hardware layer from the front- and backend of TorchServe.
Relates to #740

'Add AMD backend support'
Rony Leppänen rleppane@amd.com

'Add AMD frontend support'
Anders Smedegaard Pedersen asmedega@amd.com

'Add Dockerfile.rocm'
Samu Tamminen stammine@amd.com
Jarkko Lehtiranta jlehtira@amd.com

'Add AMD documentation'
Anders Smedegaard Pedersen asmedega@amd.com
Rony Leppänen rleppane@amd.com
Jarkko Lehtiranta jlehtira@amd.com

Other contributions:

Bipradip Chowdhury bichowdh@amd.com
Jarkko Vainio javainio@amd.com
Tero Kemppi tekemppi@amd.com

Requirement Files

Added requirements/torch_rocm62.txt, requirements/torch_rocm61.txt and requirements/torch_rocm60.txt for easy install of dependencies needed for AMD support.

Backend

The Python backend supports currently NVIDIA GPUs using hardware specific libraries. There were also a number of functions that could be refactored using more generalized interfaces.

Changes Made to Backend

Use torch.cuda for detecting GPU availability and torch.version for differentiating between GPU vendors (NVIDIA, AMD)
Use torch.cuda for collecting GPU metrics
- Exclude nvgpu library usage which is a quick and dirty solution calling nvidia-smi and parsing its output
- Currently temporary solution for AMD GPUs which relies on using amdsmi library directly
- When the bug is changed in torch.cuda, same functions can be used for collecting metrics from different GPUs (NVIDIA, AMD)
Extend print_env_info for AMD GPUs and reimplement a number of functions
- Detect versions of HIP runtime, ROCm and MIOpen
- Collect model names of available GPUs with torch.cuda (NVIDIA, AMD)
- Use pynvml for detecting nvidia driver and cuda versions
- Use torch for detecting compiled cuda and cudnn versions
Refactor nvidia-specific code in several places

Frontend

The Java frontend that acts as the workload manager had calls to SMIs hard-coded in a few places. This made it difficult for TorchServe to support multiple hardware vendors in a graceful manner.

Changes Made to Frontend

We've introduced a new package org.pytorch.serve.device with the classes SystemInfo and Accelerator. SystemInfo holds an array list of Accelerator objects that holds static information about the specific accelerators on a machine, and the relevant metrics.

Instead of calling the SMIs directly in multiple places in the frontend code we have abstracted the hardware away by adding an instance of SystemInfo to the pre-existing ConfigManager. Now the frontend can get data from the hardware via the methods on SystemInfo without knowing about the specifics of the hardware and SMIs.

To implement the specifics for each of the vendors that was already partially supported we have created a number of utility classes that communicates with the hardware via the relevant SMI.

The following steps are taken in the SystemInfo constructor.

Detect the relevant vendor by calling which {relevant smi} for each of the supported vendors.
This is how vendor detection was done previously. There might be more robust ways. where is used on Windows systems.
When the accelerator vendor is detected it creates an instance of the relevant utility class , for example ROCmUtility for AMD.
Accelerators are detected, respecting the relevant environment variable for selecting devices. HIP_VISIBLE_DEVICES for AMD, CUDA_VISIBLE_DEVICES for nvidia and XPU_VISIBLE_DEVICES for Intel. All devices are detected if the relevant environment variable is not set.
Finally the metrics for the detected devices are updated

The following is a class diagram showing how the new classes relate to the existing code

classDiagram
    class Accelerator {
        +Integer id
        +AcceleratorVendor vendor
        +String model
        +IAcceleratorUtility acceleratorUtility
        +Float usagePercentage
        +Float memoryUtilizationPercentage
        +Integer memoryAvailableMegabytes
        +Integer memoryUtilizationMegabytes
        +getVendor()
        +getAcceleratorModel()
        +getAcceleratorId()
        +getMemoryAvailableMegaBytes()
        +getUsagePercentage()
        +getMemoryUtilizationPercentage()
        +getMemoryUtilizationMegabytes()
        +setMemoryAvailableMegaBytes()
        +setUsagePercentage()
        +setMemoryUtilizationPercentage()
        +setMemoryUtilizationMegabytes()
        +utilizationToString()
        +updateDynamicAttributes()
    }

    class SystemInfo {
        -AcceleratorVendor acceleratorVendor
        -ArrayList<Accelerator> accelerators
        -IAcceleratorUtility acceleratorUtil
        +hasAccelerators()
        +getNumberOfAccelerators()
        +getAccelerators()
        +updateAcceleratorMetrics()
    }

    class AcceleratorVendor {
        <<enumeration>>
        AMD
        NVIDIA
        INTEL
        APPLE
        UNKNOWN
    }

    class IAcceleratorUtility {
        <<interface>>
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +getUpdatedAcceleratorsUtilization()
    }

    class ICsvSmiParser {
        <<interface>>
        +csvSmiOutputToAccelerators()
    }

    class IJsonSmiParser {
        <<interface>>
        +jsonOutputToAccelerators()
        +extractAcceleratorId()
        +jsonObjectToAccelerator()
        +extractAccelerators()
    }

    class CudaUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +parseAccelerator()
        +parseUpdatedAccelerator()
    }

    class ROCmUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +extractAccelerators()
        +extractAcceleratorId()
        +jsonObjectToAccelerator()
    }

    class XpuUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +parseDiscoveryOutput()
        +parseUtilizationOutput()
    }

    class AppleUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +jsonObjectToAccelerator()
        +extractAcceleratorId()
        +extractAccelerators()
    }

        class ConfigManager {
        -SystemInfo systemInfo
        +init(Arguments args)
    }

    class WorkerThread {
        #ConfigManager configManager
        #WorkerLifeCycle lifeCycle
    }

    class AsyncWorkerThread {
        #boolean loadingFinished
        #CountDownLatch latch
        +run()
        #connect()
    }

    class SystemInfo {
        -Logger logger
        -AcceleratorVendor acceleratorVendor
        -ArrayList<Accelerator> accelerators
        -IAcceleratorUtility acceleratorUtil
        +SystemInfo()
        -createAcceleratorUtility() IAcceleratorUtility
        -populateAccelerators()
        +hasAccelerators() boolean
        +getNumberOfAccelerators() Integer
        +static detectVendorType() AcceleratorVendor
        -static isCommandAvailable(String) boolean
        +getAccelerators() ArrayList<Accelerator>
        -updateAccelerators(List<Accelerator>)
        +updateAcceleratorMetrics()
        +getAcceleratorVendor() AcceleratorVendor
        +getVisibleDevicesEnvName() String
    }

    class Accelerator {
        +Integer id
        +AcceleratorVendor vendor
        +String model
        +Float usagePercentage
        +Float memoryUtilizationPercentage
        +Integer memoryAvailableMegabytes
        +Integer memoryUtilizationMegabytes
        +getVendor() AcceleratorVendor
        +getAcceleratorModel() String
        +getAcceleratorId() Integer
        +getUsagePercentage() Float
        +setUsagePercentage(Float)
        +setMemoryUtilizationPercentage(Float)
        +setMemoryUtilizationMegabytes(Integer)
    }

    class WorkerLifeCycle {
        -ConfigManager configManager
        -ModelManager modelManager
        -Model model
    }

    class WorkerThread {
        #ConfigManager configManager
        #int port
        #Model model
        #WorkerState state
        #WorkerLifeCycle lifeCycle

    }

    WorkerLifeCycle --> "1" ConfigManager
    WorkerLifeCycle --> "1" Model
    WorkerLifeCycle --> "1" Connector
    WorkerThread --> "1" WorkerLifeCycle

    ConfigManager "1" --> "1" SystemInfo
    ConfigManager "1" --> "*" Accelerator
    WorkerThread --> "1" ConfigManager

    WorkerThread --> "1" WorkerLifeCycle
    AsyncWorkerThread --|> WorkerThread

    SystemInfo --> "0..*" Accelerator
    SystemInfo --> "1" IAcceleratorUtility
    SystemInfo --> "1" AcceleratorVendor
    Accelerator --> "1" AcceleratorVendor
    CudaUtil ..|> IAcceleratorUtility
    CudaUtil ..|> ICsvSmiParser
    ROCmUtil ..|> IAcceleratorUtility
    ROCmUtil ..|> IJsonSmiParser
    XpuUtil ..|> IAcceleratorUtility
    XpuUtil ..|> ICsvSmiParser
    AppleUtil ..|> IAcceleratorUtility
    AppleUtil ..|> IJsonSmiParser

Documentation

Added the section "Hardware Support" in the table of contents
Moved the pages about hardware support to serve/docs/hardware_support/ and added them under "Hardware Support" in the TOC
Added the page "AMD Support"

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

We build new docker container for ROCm using Dockerfile.rocm and build argument USE_ROCM_VERSION. For other platforms we used build_image.sh script.

# AMD instance
docker build -f docker/Dockerfile.rocm -t torch-serve-dev-image-rocm --build-arg USE_ROCM_VERSION=rocm62 --build-arg BUILD_FROM_SRC=true .

Run containers

# AMD instance
docker run --rm -it -w /serve --device=/dev/kfd --device=/dev/dri torch-serve-dev-image-rocm bash

Tests

Frontend tests, CPU

Logs:

> ./frontend/gradlew -p frontend clean build
...
BUILD SUCCESSFUL in 6m 35s

Frontend tests, CUDA

Logs:

> ./frontend/gradlew -p frontend clean build
...
BUILD SUCCESSFUL in 6m 5s

Frontend tests, ROCm

Logs:

> ./frontend/gradlew -p frontend clean build
...
BUILD SUCCESSFUL in 6m 43s

Backend tests, CPU

Logs:

> python3 -m pytest ts/tests/unit_tests ts/torch_handler/unit_tests
============================================================================ 113 passed, 30 warnings in 38.09s ============================================================================
> cd workflow-archiver && python3 -m pytest workflow_archiver/tests/unit_tests workflow_archiver/tests/integ_tests
=================================================================================== 20 passed in 0.36s ====================================================================================
> cd model-archiver && python3 -m pytest model_archiver/tests/unit_tests model_archiver/tests/integ_tests
=================================================================================== 33 passed in 0.20s ====================================================================================

Backend tests, CUDA

Logs:

> python3 -m pytest ts/tests/unit_tests ts/torch_handler/unit_tests
======================================================================= 113 passed, 21 warnings in 83.76s (0:01:23) =======================================================================
> cd workflow-archiver && python3 -m pytest workflow_archiver/tests/unit_tests workflow_archiver/tests/integ_tests
=================================================================================== 20 passed in 0.31s ====================================================================================
> cd model-archiver && python3 -m pytest model_archiver/tests/unit_tests model_archiver/tests/integ_tests
=================================================================================== 33 passed in 0.20s ====================================================================================

Backend tests, ROCm

Logs:

> python3 -m pytest ts/tests/unit_tests ts/torch_handler/unit_tests
============================ 113 passed, 21 warnings in 48.06s ============================
> cd workflow-archiver && python3 -m pytest workflow_archiver/tests/unit_tests workflow_archiver/tests/integ_tests
=================================== 20 passed in 0.32s ====================================
> cd model-archiver && python3 -m pytest model_archiver/tests/unit_tests model_archiver/tests/integ_tests
=================================== 33 passed in 0.16s ====================================

Regression tests, CPU

Logs:

> git submodule update --init --recursive
> python3 test/regression_tests.py
================================================================ 163 passed, 40 skipped, 15 warnings in 2014.67s (0:33:34) ================================================================

Regression tests, CUDA

Logs:

> git submodule update --init --recursive
> python3 test/regression_tests.py
====================================================== 156 passed, 47 skipped, 10 warnings in 8067.30s (2:14:27) =======================================================

Regression tests, ROCm

Logs:

> git submodule update --init --recursive
> python3 test/regression_tests.py
FAILED test_handler.py::test_huggingface_bert_model_parallel_inference - assert 'Bloomberg has decided to publish a new report on the global economy' in '{\n  ...
=========== 1 failed, 162 passed, 40 skipped, 11 warnings in 2085.45s (0:34:45) ===========

OBS! The test test_handler.py::test_huggingface_bert_model_parallel_inference fails due to:

ValueError: Input length of input_ids is 150, but max_length is set to 50. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.

This indicates that preprocessing uses a different max_length than inference, which can be verified when looking at the handler when the test was originally implemented: model.generate() has max_length=50 by default, while tokenizer uses max_length from setup_config (max_length=150). It seems that the BERT-based Textgeneration.mar needs an update.

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

agunapal

Thank you for the PR. We are still reviewing it

agunapal · 2024-12-04T01:16:42Z

docker/Dockerfile.rocm

+#       For reference:
+#           https://docs.docker.com/develop/develop-images/build_enhancements/
+
+ARG BASE_IMAGE=ubuntu:24.04


The rest of TorchServe images are still on ubuntu 20.04 as we had issues with github runners with versions greater. Haven't tried this in a while.

Thanks for the feedback @agunapal

agunapal · 2024-12-04T01:29:27Z

requirements/torch_rocm62.txt

+--index-url https://download.pytorch.org/whl/rocm6.2
+torch==2.5.1+rocm6.2; sys_platform == 'linux'
+torchvision==0.20.1+rocm6.2; sys_platform == 'linux'
+torchaudio==2.5.1+rocm6.2; sys_platform == 'linux'


We haven't yet updated the Pytorch version for the rest of the project. But this should be ok. I will update it for other platforms too

agunapal · 2024-12-09T21:28:31Z

@smedegaard Looks like the PR is breaking TorchServe on linux-aarch64. Can you please check.

agunapal · 2024-12-11T02:25:41Z

@smedegaard Looks like the PR is breaking TorchServe on linux-aarch64. Can you please check.

It seems like this test is failing in CI https://github.com/nod-ai/serve/blob/31824434aa2acd3ff8261bd18cf6f1d925b8e22a/frontend/server/src/test/java/org/pytorch/serve/util/ConfigManagerTest.java#L110

agunapal · 2024-12-11T02:27:00Z

@smedegaard Looks like the PR is breaking TorchServe on linux-aarch64. Can you please check.

It seems like this test is failing in CI https://github.com/nod-ai/serve/blob/31824434aa2acd3ff8261bd18cf6f1d925b8e22a/frontend/server/src/test/java/org/pytorch/serve/util/ConfigManagerTest.java#L110

smedegaard

🚀

Co-authored-by: Samu Tamminen <stammine@amd.com>

…leUtil GPU env value

agunapal

LGTM. Tested with the changes manually on Graviton 3 and there are no issues.
The failures with the runners can be debugged at a later point

facebook-github-bot added the module: rocm label Dec 2, 2024

jakki-amd force-pushed the 740-add-generic-support-for-different-GPU-hardware branch from 70c500c to 54ef2bd Compare December 2, 2024 09:38

jakki-amd marked this pull request as ready for review December 2, 2024 10:49

jakki-amd force-pushed the 740-add-generic-support-for-different-GPU-hardware branch from 54ef2bd to bc96fa7 Compare December 2, 2024 10:59

jeffdaily approved these changes Dec 2, 2024

View reviewed changes

agunapal reviewed Dec 4, 2024

View reviewed changes

smedegaard approved these changes Dec 18, 2024

View reviewed changes

eppaneamd and others added 6 commits December 18, 2024 11:55

Add AMD backend support

a6e8c44

Add AMD frontend support

0152039

Add Dockerfile.rocm

530f25e

Co-authored-by: Samu Tamminen <stammine@amd.com>

Add AMD documentation

27201eb

Fix null pointer bug with populateAccelerators trying to get null App…

b8840d6

…leUtil GPU env value

Fix formatting

cc0809d

jakki-amd force-pushed the 740-add-generic-support-for-different-GPU-hardware branch from 844806c to cc0809d Compare December 18, 2024 09:56

agunapal approved these changes Dec 19, 2024

View reviewed changes

agunapal added this pull request to the merge queue Dec 19, 2024

Merged via the queue into pytorch:master with commit 9bcbd22 Dec 19, 2024
9 of 14 checks passed

glen-amd mentioned this pull request Mar 19, 2025

More ROCm support #3401

Open

10 tasks

This was referenced May 7, 2025

Add vendor RBLN rebel-ysseo/serve#1

Merged

Adding vendor RBLN(Rebellions) #3416

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

740 add generic support for different gpu hardware #3371

740 add generic support for different gpu hardware #3371

Uh oh!

jakki-amd commented Dec 2, 2024 •

edited

Loading

Uh oh!

agunapal left a comment

Uh oh!

agunapal Dec 4, 2024

Uh oh!

smedegaard Dec 4, 2024

Uh oh!

agunapal Dec 4, 2024

Uh oh!

agunapal commented Dec 9, 2024

Uh oh!

agunapal commented Dec 11, 2024

Uh oh!

agunapal commented Dec 11, 2024

Uh oh!

smedegaard left a comment

Uh oh!

agunapal left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

740 add generic support for different gpu hardware #3371

740 add generic support for different gpu hardware #3371

Uh oh!

Conversation

jakki-amd commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Requirement Files

Backend

Changes Made to Backend

Frontend

Changes Made to Frontend

Documentation

Type of change

Feature/Issue validation/testing

Tests

Checklist:

Uh oh!

agunapal left a comment

Choose a reason for hiding this comment

Uh oh!

agunapal Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

smedegaard Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

agunapal Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

agunapal commented Dec 9, 2024

Uh oh!

agunapal commented Dec 11, 2024

Uh oh!

agunapal commented Dec 11, 2024

Uh oh!

smedegaard left a comment

Choose a reason for hiding this comment

Uh oh!

agunapal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jakki-amd commented Dec 2, 2024 •

edited

Loading