Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributing/Adding new model #1542

Closed
PhantomSpike opened this issue May 9, 2023 · 10 comments
Closed

Contributing/Adding new model #1542

PhantomSpike opened this issue May 9, 2023 · 10 comments
Labels
competition Support for the NeurIPS Large Language Model Efficiency Challenge user question

Comments

@PhantomSpike
Copy link

Apologies if this is already described somewhere, but how can one add a new model not in the library of models already?

I am particularly interested in OSS models and would like to add many of them.

Given that the LLM area is very dynamic/fast-paced, what would be a good solution to make this a streamlined process where users can contribute easily?

@rfernand2
Copy link

@PhantomSpike - great idea. I'd like to nominate StableVicuna and Mosic MPT-7B, for starters. Since many of the OSS models are in HuggingFace, I wonder of HELM already has HF support, and we just need the eval compute. Or perhaps, each model still needs some HELM adaption beyond HF.

@JosselinSomervilleRoberts
Copy link
Contributor

Hi,
Thanks for asking this.
We are currently working on a standardized way of adding new models. In the meantime, here is the process to add a new model:

  • First, you need to register the model in schema.yaml and models.py. Make sure to use the same name.
  • Then you need a Client. Fo that you will have to go to auto_client.py and handle your new model in _get_client() and probably _get_tokenizer_client(). if your model comes from HuggingFace, it is probably already supported by our HuggingFaceClient. If not, you might need to implement a Client yourself. The amount of work this will require depends on how "compatible" the API of the model is with HELM. In the future, we hope to release an API doc that should be respected in order to be evaluated on HELM but this is not yet the case. If you need to write your own Client, you should take an example on simple clients like palmyra_client.py.
  • Finally, you need a WindowService (this is something we hope to remove soon, but it's not a priority right now). To do so, you need to go to window_service_factory.py and add a case to handle your model in get_window_service. The WindowService defines properties of your model and the tokenizer associated with it such as the max_request_length, tokenizer_name, and special tokens, ... You probably can reuse an existing WindowService. Often we use MODEL_TAGS to identify the proper WindowService. Make sure to check out existing tags in models.py and add them to your model.
  • If you've followed all the steps, normally you should be able to run your model. If your code is clean and you respected the previous steps, please consider sending a PR as it might interest other people to have this model added and we cannot add all of them ourselves

Let me know if you have more questions and feel free to add me as a reviewer to your PR.

Happy coding!

@yifanmai
Copy link
Collaborator

You might also find the existing documentation handy: development setup and adding new models.

As Josselin mentioned, we plan on improving the workflow and documentation soon.

@msaroufim
Copy link
Collaborator

msaroufim commented Jun 14, 2023

Hi @JosselinSomervilleRoberts @yifanmai - I'm helping organize a competition to eval LLMs so was very interested in easier ways of adding models

  1. If registering a new model requires updates to the repo then it might make more sense to recommend users pip install -e . since a common usecase for HELM is to evaluate some new model on a broad suite. Ideally there should be something like helm register that would make these changes on behalf of users
  2. A new client feels necessary only if users are using some strange object store, in which case settling on something like fsspec should help reduce the number of clients
  3. Can we just drop the WindowService abstraction? it feels like all the data should just be in the same place in the model registration

FWIW I'm OK on settling on a single client like HF Hub if they make 1 and 3 more seamless

EDIT: This already does what I need https://github.com/stanford-crfm/helm/blob/main/docs/huggingface_models.md

@yifanmai
Copy link
Collaborator

Hi @msaroufim,

  1. Indeed; we support Hugging Face Hub, and we will soon support Hugging Face on local disk in Eval local HF models with flag, add LLaMA and Alpaca #1505.
  2. If you need Hugging Face on Azure Blob Store / GCS / S3 support, we can add fsspec support.
  3. If you use Hugging Face, we indeed auto-detect all this information from the model and tokenizer, so you don't need to write a window service.

It mostly depends on what kind of inference servers / libraries you want to support. I would be interested in adding integrations with other inference servers / libraries that work similarly to the Hugging Face one.

I'm also considering moving the Hugging Face flags into a config file, because it get very unwieldy if you have lots of models.

@msaroufim
Copy link
Collaborator

msaroufim commented Jun 15, 2023

Thanks @yifanmai!

Yes for our use cases a local eval and an eval on object stores would be really invaluable, I'd imagine many competitors would not want to publicly share their weights until the competition is complete

Regarding other libraries or inference servers that part is also interesting, on one hand, if all competitors settled on HF then that makes our eval simpler but we also want to make it easy for people who may not want to use HF to also also contribute. Maybe they're using torchserve/kserve/triton or maybe they dont like the HF abstractions and would prefer to use their favorite training loop provider lightning/mosaic/something custom, I'm still undecided what to do in this case. We figured maybe a custom helm client for the competition

And yes moving the HF flags to a file would be very helfpul

@yifanmai yifanmai added the competition Support for the NeurIPS Large Language Model Efficiency Challenge label Jun 16, 2023
@yifanmai
Copy link
Collaborator

I added #1673 and #1674 for these requests. I also made the "competition" label; feel free to use this to tag competition-related issues.

Maybe they're using torchserve/kserve/triton or maybe they dont like the HF abstractions and would prefer to use their favorite training loop provider lightning/mosaic/something custom, I'm still undecided what to do in this case. We figured maybe a custom helm client for the competition

For common things, we can add a client e.g. I think torchserve/kserve/triton would make sense.

If we want to be entirely framework agnostic, the submitter could provide a OpenAI API compatible HTTP server, or a batch script that outputs a request / response JSON file in a HELM-compatible format, and we could make HELM have better support for these.

@yifanmai
Copy link
Collaborator

yifanmai commented Jun 21, 2023

Another thought on private model hosting: HELM already supports private repositories

@yifanmai yifanmai reopened this Jun 21, 2023
@yifanmai
Copy link
Collaborator

Another thought: HELM already supports private repositories on Hugging Face Hub. The user just needs to set the Hugging Face authentication as shell environment variables before running HELM. So that could be another way of uploading and hosting a private model.

@yifanmai
Copy link
Collaborator

Closing this - we now support many other ways to run private models, including Hugging Face models on disk, vLLM, OpenAI compatible servers, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
competition Support for the NeurIPS Large Language Model Efficiency Challenge user question
Projects
None yet
Development

No branches or pull requests

5 participants