Fix TRT-LLM Multigpu Compatibility #2837

nik-mosaic · 2024-01-11T13:36:43Z

[Wip] What does this PR do?

We need to use Composer to run our evaluation framework on TRT-LLM models. Unfortunately, this breaks in the Multi-GPU case. These fixes allow Composer to run N copies in parallel and feed data in a way that works with multi-gpu TRT-LLM models. Essentially, these changes are (a) not initializing dist and (b) fixing some race conditions related to data loading.

TODO:

Replace commented out code with parameters we can pass in.

mvpatel2000 · 2024-03-19T20:42:02Z

@nik-mosaic is this still relevant

nik-mosaic and others added 7 commits December 18, 2023 23:01

fix multigpu trt race condition

023a6f2

Change if and remove padding

f40c66c

Merge branch 'mosaicml:dev' into trt-race-condition

b89c211

Do not initialize dist

94e7d3f

Merge branch 'dev' into trt-race-condition

7bc42bf

Update icl

4e06538

Merge branch 'dev' into trt-race-condition

7e375c5

nik-mosaic and others added 2 commits March 19, 2024 14:54

Merge branch 'dev' into trt-race-condition

77954a2

update with small fixes

636457f

mvpatel2000 force-pushed the dev branch from 8a09a3b to 6f8831d Compare July 22, 2024 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TRT-LLM Multigpu Compatibility #2837

Fix TRT-LLM Multigpu Compatibility #2837

nik-mosaic commented Jan 11, 2024 •

edited

Loading

mvpatel2000 commented Mar 19, 2024

Fix TRT-LLM Multigpu Compatibility #2837

Are you sure you want to change the base?

Fix TRT-LLM Multigpu Compatibility #2837

Conversation

nik-mosaic commented Jan 11, 2024 • edited Loading

[Wip] What does this PR do?

mvpatel2000 commented Mar 19, 2024

nik-mosaic commented Jan 11, 2024 •

edited

Loading