Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[P0] Multigpu and model sharding #25

Open
frankaging opened this issue Mar 28, 2024 · 2 comments
Open

[P0] Multigpu and model sharding #25

frankaging opened this issue Mar 28, 2024 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@frankaging
Copy link
Collaborator

Descriptions:

pyvene library was designed for model interpretability, not for some production use case which requires training and inference efficiency. pyreft is different. It will have some practical use cases, and require all those production-ready training and inference efficiency.

This ticket may require multiple PRs, including changes in pyvene:

  • Support multigpu training
  • Support data parallel
  • Support model parallel
  • Support deepspeed at all stage, including gradient checkpoint, model sharding, gpu/cpu offloading
  • Integrate with accelerate
@frankaging
Copy link
Collaborator Author

currently, only a single GPU is supported by pyvene.

if don't do single GPU guarding like running 'export CUDA_VISIBLE_DEVICES=0'

the training will throw an error: ref:: #31

@aryamanarora aryamanarora added the enhancement New feature or request label Apr 8, 2024
@frankaging frankaging pinned this issue Apr 19, 2024
@danikhan632
Copy link

danikhan632 commented Apr 24, 2024

    def compute_loss(self, intervenable, inputs, return_outputs=False):
        # Directly use tensors; avoid premature conversions
        subspaces = inputs["subspaces"].permute(1, 0, 2) if "subspaces" in inputs else None

        # Prepare unit locations
        unit_locations = {
            "sources->base": (None, inputs["intervention_locations"].permute(1, 0, 2))
        }

        # Ensure tensor dimensions and devices are correctly set
        print("Debug Info: ", unit_locations["sources->base"][1].shape, inputs["input_ids"].device)
        
        # Forward pass
        _, cf_outputs = intervenable(
            {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]},
            unit_locations=unit_locations,
            labels=inputs["labels"],
            subspaces=subspaces
        )

        # Return outputs
        return (cf_outputs.loss, cf_outputs) if return_outputs else cf_outputs.loss

I fixed the previous issue in pyvene but now encounter this odd issue depending on number for GPUs
compute_loss isn't called by any pyreft/pyvene code but rather huggingface Trainer

output using single gpu:
Debug Info: torch.Size([4, 4, 1]) cuda:0
Intervening...

output using two gpus:
Debug Info: torch.Size([4, 8, 1]) cuda:0
Intervening...

output using three gpus:
Debug Info: torch.Size([4, 12, 1]) cuda:0
Intervening...

@frankaging frankaging added the help wanted Extra attention is needed label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants