Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable models for NeurIPS Efficiency Challenge #1861

Merged
merged 7 commits into from
Oct 3, 2023

Conversation

yifanmai
Copy link
Collaborator

@yifanmai yifanmai commented Sep 26, 2023

This supports running a model for NeurIPS Efficiency Challenge with a user-configurable model name, HTTP service URL, window service and tokenizer.


For models that use a built-in WindowService:

prod_env/model_deployments.yaml:

model_deployments:
  - name: neurips/my-pythia-model
    window_service_spec:
      class_name: "helm.benchmark.window_services.gptneox_window_service.GPTNeoXWindowService"
      args: {}
    client_spec:
      class_name: "helm.proxy.clients.http_model_client.HTTPModelClient"
      args: {
        base_url: "http://localhost:2345"
      }

The following models are supported, with corresponding class_name inside window_service_spec:

  • LLaMA: helm.benchmark.window_services.llama_window_service.LlamaWindowService
  • Llama 2: helm.benchmark.window_services.llama_window_service.Llama2WindowService
  • Red Pajama Base (not instruction tuned models): helm.benchmark.window_services.gptneox_window_service.GPTNeoXWindowService
  • MPT: helm.benchmark.window_services.gptneox_window_service.GPTNeoXWindowService
  • OPT: helm.benchmark.window_services.opt_window_service.OPTWindowService
  • Bloom: helm.benchmark.window_services.bloom_window_service.BloomWindowService
  • GPT Neo, J, NeoX, Pythia: helm.benchmark.window_services.gptneox_window_service.GPTNeoXWindowService
  • GPT2: helm.benchmark.window_services.gpt2_window_service.GPT2WindowService
  • T5 (not Flan-T5): helm.benchmark.window_services.t511b_window_service.T511bWindowService
  • UL2: helm.benchmark.window_services.ul2_window_serviceUL2WindowService

For models that use Hugging Face AutoTokenizer:

prod_env/model_deployments.yaml:

model_deployments:
  - name: neurips/my-falcon-7b-model
    tokenizer_name: "tiiuae/falcon-7b"
    sequence_length: 2048
    window_service_spec:
      class_name: "helm.benchmark.window_services.huggingface_window_service.HuggingFaceWindowService"
      args: {}
    client_spec:
      class_name: "helm.proxy.clients.http_model_client.HTTPModelClient"
      args: {
        base_url: "http://localhost:2345"
      }

Change tokenizer_name and sequence_length accordingly. Refer to the Hugging Face model card for the correct values.

If sequence_length is not set, it will be auto-inferred from Hugging Face Hub's AutoTokenizer, which can result in incorrect values because many Hugging Face Hub AutoTokenizers have incorrect metadata.

The following models are supported, with corresponding tokenizer_name:

  • Falcon: tiiuae/falcon-7b

For any models not listed, you can fall back to the HTTP service tokenizer:

prod_env/model_deployments.yaml:

model_deployments:
  - name: neurips/my-model
    tokenizer_name: neurips/my-tokenizer
    max_sequence_length: 2048
    client_spec:
      class_name: "helm.proxy.clients.http_model_client.HTTPModelClient"
      args: {
        base_url: "http://localhost:2345"
      }

prod_env/tokenizer_configs.yaml:

tokenizer_configs:
  - name: neurips/my-tokenizer
    tokenizer_spec:
      class_name: "helm.proxy.clients.http_model_client.HTTPModelClient"
      args: {
        base_url: "http://localhost:1234"
      }

@yifanmai
Copy link
Collaborator Author

yifanmai commented Sep 26, 2023

cc @drisspg @msaroufim For NeurIPS Efficiency Challenge.

cc @aniketmaurya This will eventually allow the user to configure parameters for the Lit-GPT client directly.

client_spec: ClientSpec
"""Specification for instantiating the client for this model deployment."""

max_sequence_length: Optional[int]
"""Maximum equence length for this model deployment."""
model_name: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this moved down?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering should be that all the required parameters come first, then all the optional parameters with default arguments.

client_spec: ClientSpec
"""Specification for instantiating the client for this model deployment."""

max_sequence_length: Optional[int]
"""Maximum equence length for this model deployment."""
model_name: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should put an example in the docstring so people have a sense of what the difference between name and model_name is, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to defer this until we actually implement the multi-deployments feature, which isn't on the roadmap yet.



def maybe_register_model_metadata_from_base_path(base_path: str) -> None:
path = os.path.join(base_path, MODEL_METADATA_FILE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docstring.

try:
model = get_model(run_spec.adapter_spec.model)
except ValueError:
# Models registered from configs cannot have expanders applied to them,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError means that the model has not been loaded yet? I was a bit confused by the comment at first, maybe connect the dots a bit more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means the model has not been registered yet. I'll add more docs.


@dataclass(frozen=True)
class TokenizerConfigs:
tokenizers: List[TokenizerConfig]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this field be tokenizers or tokenizer_configs?

from helm.common.object_spec import ObjectSpec


TOKENIEZR_CONFIGS_FILE = "tokenizer_configs.yaml"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file be tokenizer_configs.yaml or tokenizers.yaml?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this tokenizer_configs.yaml.

name: str
"""Name of the tokenizer."""

tokenizer_spec: TokenizerSpec
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tokenizer_spec instead of client_spec because I think that eventually Tokenizers and Clients should be separate classes...

find one with a key matching the missing parameter's name.
If found in constant_bindings, add the corresponding value to args.
If found in provider_bindings, call the corresponding value and add the return values to args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an example or two of usage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added example.

)
return deployment_api_keys[model]

client_spec = inject_object_spec_args(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write some comments on why this injection is needed? My initial impression is that it seems a bit complicated / fancy...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a passage.

Dependency injection is needed here for these reasons:

  1. Different clients have different parameters. Dependency injection provides arguments that match the parameters of the client.
  2. Some arguments, such as the tokenizer, are not static data objects that can be in the users configuration file. Instead, they have to be constructed dynamically at runtime.
  3. The providers must be lazily-evaluated, because eager evaluation can result in an exception. For instance, some clients do not require an API key, so trying to fetch the API key from configuration eagerly will result in an exception because the user will not have configured an API key.

@yifanmai
Copy link
Collaborator Author

@msaroufim @drisspg I will merge this soon to unblock other work, but feel free to leave post-merge comments and I'll make requested changes in a follow-up PR.

@drisspg
Copy link
Collaborator

drisspg commented Sep 28, 2023

@msaroufim @drisspg I will merge this soon to unblock other work, but feel free to leave post-merge comments and I'll make requested changes in a follow-up PR.

Hey, that makes sense. Lets say we wanted all models to use the http tokenizer. I am don't think we have had to setup

prod_env/model_deployments.yaml
or
prod_env/tokenizer_configs.yaml

Does the workflow change from start llm service on local port -> helm-run with some givein run_spec.conf?

@JosselinSomervilleRoberts
Copy link
Contributor

I thought the description of this PR was really useful, should we not add this in a README somewhere so that it's easier for people to add their own model?

@yifanmai
Copy link
Collaborator Author

yifanmai commented Oct 3, 2023

@drisspg The current documented workflow does not change. Basically, if people were using neurips/local before, they can continue to do so.

This new integration provides a new workflow with the following benefits:

  1. The model name can be set by the users individually, so each submission can have a different model name rather than having them all be neurips/local.
  2. This allows running with a different URL or port, which allows the model to be on a different machine as the HELM machine.
  3. This allows using local tokenizers, as opposed to making a HTTP call for each tokenizer, which should provide some speedup.

@yifanmai
Copy link
Collaborator Author

yifanmai commented Oct 3, 2023

@JosselinSomervilleRoberts I'll move most of this to documentation when this is a little more baked and less experimental. I think that some of this API might still be subject to change.

@yifanmai yifanmai merged commit 10a27a6 into main Oct 3, 2023
@yifanmai yifanmai deleted the yifanmai/fix-neurips-config branch October 3, 2023 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants