Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaMA #21796

Closed
2 tasks done
michaelroyzen opened this issue Feb 24, 2023 · 16 comments
Closed
2 tasks done

LLaMA #21796

michaelroyzen opened this issue Feb 24, 2023 · 16 comments

Comments

@michaelroyzen
Copy link

Model description

New model series from Facebook (7B, 33B, 66B) that is broadly competitive with Flan-PALM-540B.

https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

No response

@sayantan1410
Copy link

Hello, @michaelroyzen , I want to work on this issue, can you please clarify this:-

  1. The objective of this issue is to add the Llama model to the 🤗 models section right ?
    The inference code for the Llama models is open sourced and weights and tokenizers are available as you mentioned.
    I can try to work on this issue, Please let me know if this issue is open for working and should I proceed or not.

@Eric-Wallace-WebHost
Copy link

Hello @sayantan1410. At this moment the code for inference is available, but to get the weights you need to fill out the request form from their github. It'd be great for you to work on this, but it would require doing so with a hypothethical set of weights, given that they have not started actually releasing weights to people who asked for it just yet.

@sayantan1410
Copy link

Hello @Eric-Wallace-WebHost , I have actually filled up the form for the weights and the tokenizers but since I don't have any related publications so probably, I will not get that. But for now, I will try to work with some hypothetical weights until the weights are released !

@Sea-Snell
Copy link

Also will there be a Jax implementation? It would be super helpful. I can help contribute to it as well

@young-geng
Copy link

I can contribute as well for the Jax implementation! Also I'm not sure if we can just use their pytorch code, since it is released under GPLv3 instead of the Apache License of transformers.

@honglu2875
Copy link

honglu2875 commented Mar 2, 2023

I have the weights. Haven't checked out the rules and I'm gonna assume I can't share it, but if you guys have an implementation I would love to help by testing it out.

@sgugger
Copy link
Collaborator

sgugger commented Mar 2, 2023

At this stage we don't know if there is going to be an implementation in Transformers due to:

  • inaccessibility of weights (no one who got them is allowed to share them on the Hub)
  • different license of the code

We are looking if the Meta folks would be happy to release the weights in a gated repo on the Hub and if the code will be in Transformers or just put as code on the Hub because of the license. @thomasw21 is working on a PyTorch port that our research team will use in any case.

So stay tuned!

@henk717
Copy link

henk717 commented Mar 2, 2023

At this stage we don't know if there is going to be an implementation in Transformers due to:

  • inaccessibility of weights (no one who got them is allowed to share them on the Hub)

Even if there is no permission to have the weights on the hub, usually transformers models are released with the conversion scripts done for the conversion. Even an implementation combined with the needed conversion script can be useful, because then researchers can convert the model to HF if needed and still use it within their HF based projects without having to reinvent the wheel.

@hughbzhang
Copy link

+1 to henk717. Would be super useful even if there was just a way to plug in your own weights and use the existing transformers library!

@AmericanPresidentJimmyCarter

It looks like the weights are right here.

https://huggingface.co/nyanko7/LLaMA-7B
https://huggingface.co/ricecake/LLaMA/tree/main
https://huggingface.co/datasets/nyanko7/LLaMA-65B

License is here:

https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform

@zphang
Copy link
Contributor

zphang commented Mar 4, 2023

Working on this today!

@zphang zphang mentioned this issue Mar 5, 2023
5 tasks
@elephantpanda
Copy link

elephantpanda commented Mar 6, 2023

Are weights actually copyrightable? Technically, they are just a list of numbers generated by a machine and hence don't fall under US copyright laws.

I say, just upload the weights and call Meta's bluff.

@dustydecapod
Copy link

Are weights actually copyrightable? Technically, they are just a list of numbers generated by a machine and hence don't fall under US copyright laws.

I say, just upload the weights and call Meta's bluff.

lots of people are way ahead of you on this.

@elephantpanda
Copy link

elephantpanda commented Mar 6, 2023

Can someone make an ONNX version? I tried to convert it but I ran out of RAM.

I would quite like to try it with Onnxruntime. Even though I think this uses far more VRAM than using torch. Also onnxruntime has a memory leak with external weight files. But still...

@simhallq
Copy link

simhallq commented Mar 7, 2023

I'm interested in fine-tuning LLaMa for creating text embeddings, anyone have any tips for how to do it with the LLaMa architecture? Can I just add a pooling layer at the end?

https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama

Here's code for RLHF training btw

@Sea-Snell
Copy link

I have a working Jax implementation here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.