-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepspeed base handler and example #2218
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2218 +/- ##
==========================================
- Coverage 70.40% 69.73% -0.67%
==========================================
Files 75 77 +2
Lines 3392 3420 +28
Branches 57 57
==========================================
- Hits 2388 2385 -3
- Misses 1001 1032 +31
Partials 3 3
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of feedback, the two main things I'm worried about is how you will test these changes in CI and whether there is a way to control some of the code duplication across this PR and Hamid's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this empty file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove it.
def initialize(self, ctx: Context): | ||
ds_engine = get_ds_engine(self.model, ctx) | ||
self.model = ds_engine.module | ||
self.device = int(os.getenv("LOCAL_RANK", 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just trying to learn, does deepspeed automatically figure out the world size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it read env to get rank and world size
) | ||
if not os.path.exists(ds_config): | ||
raise ValueError( | ||
f"{ctx.model_name} has no deepspeed config file {ds_config}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does deepspeed have some configuration object we can use directly, there's a lot of config files now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deepspeed has tons of configuration. it is difficult to cover all of them. A central configuration file is the simplest and easiest way to maintain.
raise ValueError( | ||
f"{ctx.model_name} has no deepspeed checkpoint file {checkpoint}" | ||
) | ||
logging.debug("Creating DeepSpeed engine") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be info
return ds_engine | ||
else: | ||
raise ValueError( | ||
f"{ctx.model_name} has no deepspeed config in model config yaml file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we provide a default one? I'm worried about overhead users will need to learn about a ll the config systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example shows the minimum parameters.
test/pytest/test_handler.py
Outdated
) | ||
|
||
|
||
def test_huggingface_opt_distributed_inference_deepspeed(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made similar feedback to @HamidShojanazeri but the way I'd like to see this tested is
- Have a seperate file for distributed tests
- Make sure tests are skipped unless a CUDA multi GPU machine is found
- Create a new requirements.txt with optional dependencies including all the distributed libraries to test
self.tokenizer = AutoTokenizer.from_pretrained(model_dir, return_tensors="pt") | ||
super().initialize(ctx) | ||
|
||
def preprocess(self, requests): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some code duplication between this PR and @HamidShojanazeri PR so now if you need to fix some bug you'd need to fix in as many large model backends as you've integrated, would prefer we refactor this into utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the two examples apply different model parallel library on the same model. @HamidShojanazeri pr demo pippy, here demo ds. that's why there are common parts to create model and tokenizer.
|
||
def hf_model(model_str): | ||
api = HfApi() | ||
models = [m.modelId for m in api.list_models()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does this take a long time to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this should be fast
def initialize(self, ctx: Context): | ||
self.max_length = int(ctx.model_yaml_config["handler"]["max_length"]) | ||
self.model = AutoModelForCausalLM.from_pretrained( | ||
"bigscience/bloom-7b1", use_cache=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for an example but I'm still curious how you expect people to use DeepSpeed
- Derive the DeepSpeed handler?
- Copy paste this file and make appropriate changes?
I'm not sure I'm fan of having a tutorial per large model vendor and then for each one have lots of code duplication between the custom handlers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per our offline sync up, we want to let user clearly know the steps, instead of hiding the details in deepspeed base handler.
will add instruction in readme
@@ -0,0 +1,29 @@ | |||
# Loading large Huggingface models with constrained resources using accelerate | |||
|
|||
This document briefs on serving large HG models with limited resource using deepspeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by limited resource do you mean with CPU offloading? If so I'd mention that explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limited by one GPU device memory. will update doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lxning what is the main difference between OPT and Bloom example here?
super().initialize(ctx) | ||
self.max_length = int(ctx.model_yaml_config["handler"]["max_length"]) | ||
self.model = AutoModelForCausalLM.from_pretrained( | ||
"bigscience/bloom-7b1", use_cache=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lxning could read the model name from config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main diff b/w bloom example and opt example s to demo the model can be either downloaded via HF model name or via dowmload tool.
paste the token generated from huggingface hub. | ||
|
||
```bash | ||
python Download_models.py --model_path model --model_name facebook/opt-350m --revision main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change this to a bigger model
Description
Please read our CONTRIBUTING.md prior to creating your first pull request.
Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
reg.txt
Logs for Test A
Logs for Test B
Checklist: