Add LLAMA #1446

msaroufim · 2023-03-03T00:54:31Z

Notable things I had to do - @ezyang hopefully these are OK

Since LLAMA requires special permission to download weights and checkpoints and the tokenizer, I went ahead with random checkpoints and random tokenizer - not sure CI qualifies as a valid research endeavour
I removed the dependency on fairscale so had to make a few adjustments like turning ParallelLinear into Linear or ParallelEmbedding into Embedding and things mostly seem to work fine. And added bonus is you can run the example on a single machine
Inference in the code using torch inference mode, I removed it since it has a weird interaction with torch.compile
The open source LLAMA repo is inference only so there is no training support in this script

Some other things I can improve in an another PR

Better configuration including sequence length and batching
Reenabling distributed support with FAIRSCALE

I can run the code now

(bench) ubuntu@ip-172-31-39-186:~/benchmark$ python run.py llama -d cuda
Running eval method from llama on cuda in eager mode with input batch size 32.
GPU Time:             10.006 milliseconds
CPU Total Wall Time:  10.045 milliseconds

xuzhao9 · 2023-03-06T17:44:02Z

I get the following error when running the model locally with run.py:

$ python run.py llama -d cpu
Warning: Could not find dependent module llama for Model llama, skip it.
 No module named 'llama'
Unable to find model matching llama.

Can you help try and see if you can reproduce?

ezyang · 2023-03-07T02:26:51Z

The limitations look fine; I prefer not downloading the checkpoint.

msaroufim · 2023-03-08T22:15:36Z

Ok this should work

(bench) ubuntu@ip-172-31-38-220:~/benchmark$ python run.py llama -d cpu
Running eval method from llama on cpu in eager mode with input batch size 32.
CPU Total Wall Time:  10.136 milliseconds
CPU Peak Memory:                2.4590 GB
(bench) ubuntu@ip-172-31-38-220:~/benchmark$ python run.py llama -d cuda
Running eval method from llama on cuda in eager mode with input batch size 32.
GPU Time:              8.040 milliseconds
CPU Total Wall Time:   8.067 milliseconds
GPU 0 Peak Memory:              1.9117 GB
CPU Peak Memory:                2.0645 GB

torchbenchmark/models/llama/__init__.py

facebook-github-bot · 2023-03-09T22:13:57Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torchbenchmark/models/llama/__init__.py

xuzhao9

LGTM!

facebook-github-bot · 2023-03-10T20:22:35Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torchbenchmark/models/llama/__init__.py

Co-authored-by: Xu Zhao <xzhao9@fb.com>

xuzhao9 · 2023-03-10T20:44:22Z

torchbenchmark/models/llama/__init__.py

+
+
+        if device == "cuda":
+            torch.set_default_device("cuda")


I am wondering why torch.set_default_device('cuda') is needed?

It's convenient in general to make sure everything runs on the same device that way you dont need to add to(device) calls for inputs and models - but it works for any device so just made that clearer now

torchbenchmark/models/llama/test.py

torchbenchmark/models/llama/tokenizer.py

facebook-github-bot · 2023-03-10T21:34:44Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-03-10T21:41:11Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xuzhao9 · 2023-03-10T22:01:46Z

torchbenchmark/models/llama/__init__.py

+        self.model_args = ModelArgs(vocab_size=32)
+        self.model = Transformer(self.model_args)
+
+        torch.set_default_device(device)


I am curious why we don't need to explicitly move self.model and self.example_inputs to the device here? For example:

self.model = Transformer(self.model_args).to(self.device) self.example_inputs = (torch.tensor([[1, 1], [1,1]], dtype=torch.int).to(self.device), 1)

facebook-github-bot · 2023-03-11T01:06:40Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-03-11T01:07:28Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-03-11T13:08:28Z

@xuzhao9 merged this pull request in c78f1f3.

msaroufim · 2023-03-13T21:55:52Z

This weeekend the internet was buzzing with news of this cpp llama implementation ggml-org/llama.cpp#33 (comment) - It seems like it might not be that tricky to also validate correctness for this model by loading in real checkpoints and tokenizer despite minor differences in the model

Might need to do a round 2 pass on this PR soon

* fix pytorch#1446 by removing the address operator * add test * format --------- Co-authored-by: Thomas <thomas.maierbacher@rohde-schwarz.com> Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>

msaroufim added 2 commits March 3, 2023 00:49

LLAMA

0332c34

Add LLAMA

2242fc7

facebook-github-bot added the cla signed label Mar 3, 2023

add intall.py

c0468f4

msaroufim marked this pull request as draft March 3, 2023 01:47

msaroufim mentioned this pull request Mar 4, 2023

Add instructions to test model after submitting #1447

Closed

msaroufim added 8 commits March 6, 2023 18:39

Fixed some stuff

54b82af

test now runs

b451954

fix model

b95af59

upd

1a7833a

updat docs

5ca98df

add stuff

aa0d4cb

minor fix

1a5f4d2

flatten

7a7627f

msaroufim changed the title ~~Add LLAMA~~ Add Preliminary support for LLAMA Mar 6, 2023

msaroufim marked this pull request as ready for review March 6, 2023 22:21

msaroufim requested review from wconstab and xuzhao9 March 6, 2023 22:32

msaroufim added 3 commits March 8, 2023 21:59

fixed CI issues

0ca561f

Merge branch 'main' into llama

4b6a2ec

made sure model runs on GPU

2033cdd

msaroufim changed the title ~~Add Preliminary support for LLAMA~~ Add LLAMA Mar 9, 2023

pass

6b617f0

xuzhao9 reviewed Mar 9, 2023

View reviewed changes

torchbenchmark/models/llama/__init__.py Outdated Show resolved Hide resolved

torchbenchmark/models/llama/__init__.py Outdated Show resolved Hide resolved

torchbenchmark/models/llama/__init__.py Outdated Show resolved Hide resolved

msaroufim added 2 commits March 9, 2023 22:37

push

6d36574

update

2be2c07

msaroufim added 3 commits March 10, 2023 00:54

upd

db7690c

fixed test_llama_example_cuda

d039c6d

clarify batching limitation

4ee4b71

msaroufim requested a review from xuzhao9 March 10, 2023 02:45

xuzhao9 reviewed Mar 10, 2023

View reviewed changes

torchbenchmark/models/llama/__init__.py Outdated Show resolved Hide resolved

msaroufim added 2 commits March 10, 2023 17:40

Address Xu feedback

17051bd

Added support for batching

2fc7143

msaroufim requested a review from xuzhao9 March 10, 2023 17:46

xuzhao9 approved these changes Mar 10, 2023

View reviewed changes

xuzhao9 reviewed Mar 10, 2023

View reviewed changes

torchbenchmark/models/llama/__init__.py Outdated Show resolved Hide resolved

push

89d3724

xuzhao9 reviewed Mar 10, 2023

View reviewed changes

torchbenchmark/models/llama/__init__.py Outdated Show resolved Hide resolved

Update torchbenchmark/models/llama/__init__.py

21e93c0

Co-authored-by: Xu Zhao <xzhao9@fb.com>

xuzhao9 reviewed Mar 10, 2023

View reviewed changes

msaroufim added 3 commits March 10, 2023 20:52

update

3854311

Merge branch 'llama' of https://github.com/pytorch/benchmark into llama

2ec84a1

push

faf928f

xuzhao9 reviewed Mar 10, 2023

View reviewed changes

Update __init__.py

d827288

facebook-github-bot closed this in c78f1f3 Mar 11, 2023

facebook-github-bot added the Merged label Mar 11, 2023

Add LLAMA #1446

Add LLAMA #1446

Uh oh!

Conversation

msaroufim commented Mar 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuzhao9 commented Mar 6, 2023

Uh oh!

ezyang commented Mar 7, 2023

Uh oh!

msaroufim commented Mar 8, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Mar 9, 2023

Uh oh!

Uh oh!

xuzhao9 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 10, 2023

Uh oh!

Uh oh!

Uh oh!

xuzhao9 Mar 10, 2023

Choose a reason for hiding this comment

Uh oh!

msaroufim Mar 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuzhao9 Mar 10, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Mar 10, 2023

Uh oh!

facebook-github-bot commented Mar 10, 2023

Uh oh!

xuzhao9 Mar 10, 2023

Choose a reason for hiding this comment

Uh oh!

msaroufim Mar 11, 2023

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 11, 2023

Uh oh!

facebook-github-bot commented Mar 11, 2023

Uh oh!

facebook-github-bot commented Mar 11, 2023

Uh oh!

msaroufim commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

msaroufim commented Mar 3, 2023 •

edited

Loading

msaroufim Mar 10, 2023 •

edited

Loading

msaroufim commented Mar 13, 2023 •

edited

Loading