Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
karpathy authored Jul 25, 2023
1 parent 34ccb64 commit 6cf34d6
Showing 1 changed file with 6 additions and 7 deletions.
13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,22 +44,23 @@ This still runs at interactive rates and samples more coherent and diverse stori

## Meta's Llama 2 models

As the neural net architecture is identical, we can also inference the Llama 2 models released by Meta. First you'll have to export these weights in the llama2.c format. Git clone the main repo from Meta, and cp the `export_meta_llama_bin.py` file (in the root directory of this project) over, and run it:
As the neural net architecture is identical, we can also inference the Llama 2 models released by Meta. Sadly there is a bit of friction here due to licensing (I can't directly upload the checkpoints, I think). First you'll have to export these weights in the llama2.c format. Git clone the main repo from Meta, follow their instructions to request and download the 7B model, then cp the `export_meta_llama_bin.py` file (in the root directory of this project) over, and run it:

```bash
git clone https://github.com/facebookresearch/llama.git
cd llama
./download.sh # download the 7B checkpoint
cp /path/to/llama2.c/export_meta_llama_bin.py .
torchrun --nproc_per_node 1 export_meta_llama_bin.py
```

The export will take ~10 minutes or so and generate a 26GB file (the weights of the 7B model in float32) called `llama2_7b.bin` in the current directory. Go back to the root directory of llama2.c and run!
Sadly right now this export script requires GPU, NCCL, etc. (hope to fix, or accepting PRs). The export will take ~10 minutes or so and generate a 26GB file (the weights of the 7B model in float32) called `llama2_7b.bin` in the current directory. Go back to the root directory of llama2.c and run:

```bash
./run path/to/llama2_7b.bin
```

This ran at about 4 tokens/s compiled with OpenMP on 96 threads on my CPU Linux box in the cloud. Example output:
This ran at about 4 tokens/s compiled with OpenMP on 96 threads on my CPU Linux box in the cloud. (On my MacBook Air M1, currently it's closer to 30 seconds per token if you just build with `make runfast`.) Example output:

*The purpose of this document is to highlight the state-of-the-art of CoO generation technologies, both recent developments and those in commercial use. The focus is on the technologies with the highest merit to become the dominating processes of the future and therefore to be technologies of interest to S&T ... R&D. As such, CoO generation technologies developed in Russia, Japan and Europe are described in some depth. The document starts with an introduction to cobalt oxides as complex products and a short view on cobalt as an essential material. The document continues with the discussion of the available CoO generation processes with respect to energy and capital consumption as well as to environmental damage.*

Expand All @@ -77,11 +78,9 @@ For the sake of examples of smaller, from-scratch models, I trained multiple mod

You'll notice that the 110M model is equivalent to GPT-1 in size. Alternatively, this is also the smallest model in the GPT-2 series (`GPT-2 small`), except the max context length is only 1024 instead of 2048. The only notable changes from GPT-1/2 architecture is that Llama uses RoPE relatively positional embeddings instead of absolute/learned positional embeddings, a bit more fancy SwiGLU non-linearity in the MLP, RMSNorm instead of LayerNorm, bias=False on all Linear layers, and is optionally multiquery (but this is not yet supported in llama2.c).

## howto
## training

It should be possible to load the weights released by Meta but I haven't tried because the inference speed, even of the 7B model, would probably be not great with this baby single-threaded C program. So in this repo we focus on more narrow applications, and train the same architecture but from scratch, in this case on the TinyStories dataset for fun.

First let's download and pretokenize some source dataset, e.g. I like [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) so this is the only example currently available in this repo. But it should be very easy to add datasets, see the code.
Let's see how we can train a baby Llama 2 from scratch using the code in this repo. First let's download and pretokenize some source dataset, e.g. I like [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) so this is the only example currently available in this repo. But it should be very easy to add datasets, see the code.

```bash
python tinystories.py download
Expand Down

0 comments on commit 6cf34d6

Please sign in to comment.