You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The library is not tested with multi-GPU use cases. We assume the intervening model can be loaded into a single GPU. This is not ideal for interventions on 70B models, for instance. We want to be able to load the model into multiple GPUs using sharding.
Static interventions need to be attached to the right component on the right machine in case of model sharing. Training interventions need to be mapped onto the right machine where the corresponding model component lives as well.
This could be a large task. The first step is clear: try out static interventions (e.g., vanilla interventions) when models are loaded into multiple GPUs during inference time.
The text was updated successfully, but these errors were encountered:
frankaging
changed the title
[P2] Multi-GPU intervening evaluation and training
[P2] Multi-GPU model sharing with intervening evaluation and training
Jan 17, 2024
frankaging
changed the title
[P2] Multi-GPU model sharing with intervening evaluation and training
[P2] Multi-GPU model sharding with intervening evaluation and training
Jan 17, 2024
aryamanarora
changed the title
[P2] Multi-GPU model sharding with intervening evaluation and training
[P1] Multi-GPU model sharding with intervening evaluation and training
Jul 16, 2024
Descriptions:
The library is not tested with multi-GPU use cases. We assume the intervening model can be loaded into a single GPU. This is not ideal for interventions on 70B models, for instance. We want to be able to load the model into multiple GPUs using sharding.
Static interventions need to be attached to the right component on the right machine in case of model sharing. Training interventions need to be mapped onto the right machine where the corresponding model component lives as well.
This could be a large task. The first step is clear: try out static interventions (e.g., vanilla interventions) when models are loaded into multiple GPUs during inference time.
The text was updated successfully, but these errors were encountered: