[P0] Multigpu and model sharding #25

frankaging · 2024-03-28T19:10:23Z

Descriptions:

pyvene library was designed for model interpretability, not for some production use case which requires training and inference efficiency. pyreft is different. It will have some practical use cases, and require all those production-ready training and inference efficiency.

This ticket may require multiple PRs, including changes in pyvene:

Support multigpu training
Support data parallel
Support model parallel
Support deepspeed at all stage, including gradient checkpoint, model sharding, gpu/cpu offloading
Integrate with accelerate

The text was updated successfully, but these errors were encountered:

frankaging · 2024-04-06T15:53:42Z

currently, only a single GPU is supported by pyvene.

if don't do single GPU guarding like running 'export CUDA_VISIBLE_DEVICES=0'

the training will throw an error: ref:: #31

danikhan632 · 2024-04-24T02:41:46Z

    def compute_loss(self, intervenable, inputs, return_outputs=False):
        # Directly use tensors; avoid premature conversions
        subspaces = inputs["subspaces"].permute(1, 0, 2) if "subspaces" in inputs else None

        # Prepare unit locations
        unit_locations = {
            "sources->base": (None, inputs["intervention_locations"].permute(1, 0, 2))
        }

        # Ensure tensor dimensions and devices are correctly set
        print("Debug Info: ", unit_locations["sources->base"][1].shape, inputs["input_ids"].device)
        
        # Forward pass
        _, cf_outputs = intervenable(
            {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]},
            unit_locations=unit_locations,
            labels=inputs["labels"],
            subspaces=subspaces
        )

        # Return outputs
        return (cf_outputs.loss, cf_outputs) if return_outputs else cf_outputs.loss

I fixed the previous issue in pyvene but now encounter this odd issue depending on number for GPUs
compute_loss isn't called by any pyreft/pyvene code but rather huggingface Trainer

output using single gpu:
Debug Info: torch.Size([4, 4, 1]) cuda:0
Intervening...

output using two gpus:
Debug Info: torch.Size([4, 8, 1]) cuda:0
Intervening...

output using three gpus:
Debug Info: torch.Size([4, 12, 1]) cuda:0
Intervening...

frankaging mentioned this issue Apr 6, 2024

TypeError: IntervenableModel.train() takes 1 positional argument but 2 were given #31

Closed

aryamanarora added the enhancement New feature or request label Apr 8, 2024

frankaging pinned this issue Apr 19, 2024

frankaging added the help wanted Extra attention is needed label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P0] Multigpu and model sharding #25

[P0] Multigpu and model sharding #25

frankaging commented Mar 28, 2024

frankaging commented Apr 6, 2024

danikhan632 commented Apr 24, 2024 •

edited

Loading

[P0] Multigpu and model sharding #25

[P0] Multigpu and model sharding #25

Comments

frankaging commented Mar 28, 2024

frankaging commented Apr 6, 2024

danikhan632 commented Apr 24, 2024 • edited Loading

danikhan632 commented Apr 24, 2024 •

edited

Loading