Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supported context window length for each model? #194

Closed
bryanhpchiang opened this issue Mar 16, 2023 · 13 comments
Closed

Supported context window length for each model? #194

bryanhpchiang opened this issue Mar 16, 2023 · 13 comments
Labels
generation quality Quality of model output model Model specific

Comments

@bryanhpchiang
Copy link

what's the supported context window length for each model?

@j-f1
Copy link
Collaborator

j-f1 commented Mar 16, 2023

I think it’s 512 for all of them at the moment.

@mattsta
Copy link

mattsta commented Mar 16, 2023

It was made adjustable as a new command line param here: 2d64715 (and of course: increasing the context length uses more memory. on a 64 GB RAM system you can go up to around 12288 context with 7B, but larger models require smaller context).

@setzer22
Copy link

Does anyone have evidence of this working correctly, however?

I thought the size of the context window was baked into the model. But I'm not an expert here.

@ggerganov
Copy link
Member

Does anyone have evidence of this working correctly, however?

I thought the size of the context window was baked into the model. But I'm not an expert here.

Would like to know as well. Normally it is baked, but it looked like in LLaMA it can be changed.
Can someone confirm?

@slaren
Copy link
Member

slaren commented Mar 16, 2023

According to this, LLaMA has a context window of 2048.

@setzer22
Copy link

One theory for what might happen here, is that reducing the size from 2048 has no effect (other than reducing memory cost / not letting you use longer prompts). Because in a partially filled context, those values would be zeroed anyway. But OTOH increasing the context to sizes larger than 2048 makes the model either ignore the start of the prompt or misbehave in subtle ways?

@gjmulder gjmulder changed the title context length? Supported context window length for each model? Mar 16, 2023
@gjmulder gjmulder added model Model specific generation quality Quality of model output labels Mar 16, 2023
@noughtmare
Copy link

But OTOH increasing the context to sizes larger than 2048 makes the model either ignore the start of the prompt or misbehave in subtle ways?

I've noticed it deteriorating suddenly after some large number of tokens. I don't remember exactly how many, but I guess it could be 2048. It was definitely not just ignoring the start. It just produced nonsensical text.

@setzer22
Copy link

@noughtmare Yup, now that you mention it, this matches my experience from some early experiments I did. I increased the context length to 4096 to feed it a very large prompt, but after some time I started getting garbage output. I really haven't had time to do a more scientific test on what causes the issue 🤔

@SuajCarrot
Copy link
Contributor

Just mentioning here that although I couldn't manage to get longer outputs (issue #210), the quality increased A LOT by setting the tokens to predict to 2048. Huh.

@setzer22
Copy link

setzer22 commented Mar 16, 2023

@SuajCarrot That's interesting! Could you share a few results? What makes you think quality increased? 🤔 I just tried to set the context size to 2048 on my end and I'm not seeing significantly better results, although it's certainly hard to judge.

@SuajCarrot
Copy link
Contributor

SuajCarrot commented Mar 17, 2023

@setzer22 Of course! What makes me think the quality increased is that the model actually seems to be creating new texts as well as understanding the context, before that it kept thinking everything was a Facebook post immediately following every prompt with things like: "Likes. New comment. Share. i hope you guys liked ittt. i met this girl omg." Since I can barely find any examples of generations online I honestly don't know if it was a whole quality issue or just bad luck; in any case, here are some results:

Settings

n_predict = 2048
repeat_last_n = 64
repeat_penalty = 1.0
top_k = 40
top_p = 0.9
temp = 0.8
seed = -1
threads = 6 (on a processor with 6 real cores and 12 logical threads, I read somewhere that the model only escalates well with those)
model: 13B
Using the Dalai web interface.

The results were not cherry-picked, they were the very first thing that came out. This method of writing the prompts as "What follows is a..." may not be the best way to achieve the best results, but it's what has worked the best for me.

PROMPT: What follows is a brief summary of The Little Prince that also explains the moral of the story:

OUTPUT: The Little Prince is a story about a young pilot whose plane crashes in the Sahara Desert. In his attempt to repair his plane, he meets a little boy named Prince who comes from a small planet called Asteroid 325, which is the only place the little Prince calls home. The Prince is a young boy who is lonely and is looking for a friend. He tells the pilot the story of how he traveled to Earth to look for a friend and what he found instead was a snake that he named the Snake, a flower that he named the Rose, and a man that he named the Businessman. The Businessman is a person who is too concerned with work and business, therefore, he does not make friends.

As The Little Prince tells his story, he and the pilot are able to repair his plane and fly back to his planet. In the end, the pilot learns that he should be careful what he does, because he is the only person that is responsible for what he does.

What is the setting of The Little Prince?

Why does The Little Prince have to leave his planet?

What are some examples of imagery in The Little Prince?

In The Little Prince, what is the Rose's problem with the Little Prince?

What is the moral of The Little Prince?


PROMPT: The following is the story of the Cold War, explained with Minecraft analogies:

Minecraft and Communism

Minecraft is an online game, and Communism is an online philosophy. Both are based on the notion of a group of people working together towards a common goal.

In Minecraft, the common goal is building a better world, brick by brick. Communism, on the other hand, has a slightly different take. According to Communism, the goal is to work together to build a better world, brick by brick, but only if the group of people is all on the same team.

In Communism, all the people are on the same team. There is no room for different teams.

In Minecraft, different teams can be a good thing.

A team of players may build a castle, while a team of players may build a spaceship. These are two different things, and both are wonderful in their own ways.

In Communism, all players must be on the same team. The only acceptable team is the team that builds a castle, and the only acceptable castle is the castle that is built by one team.

In Minecraft, players can build different things and still be on the same team.

In Communism, the only acceptable thing is the castle. The castle is what the group is working towards.

In Minecraft, the only acceptable thing is the castle. The castle is what the group is working towards.

Minecraft and Capitalism

Minecraft and Capitalism both require players to work together. Both systems require players to build their own things, in their own ways, on their own teams.

Minecraft and Capitalism are both based on the notion of working together for a common goal, but the goal is different in each.

In Minecraft, the common goal is building a better world, brick by brick. Capitalism, on the other hand, has a slightly different take. According to Capitalism, the goal is to work together to build a better world, brick by brick, but only if the group of people is all on the same team.

In Capitalism, all the people are on the same team. There is no room for different teams.

In Minecraft, different teams can be a good thing. A team of players may build a castle, while a team of players may build a spaceship. These are two different things, and both are wonderful in their own ways.


PROMPT: Here is a brief explanation of what are Python's anonymous functions, how to use them, and why they're useful:

OUTPUT: Anonymous functions are functions that have no name. Instead, they are just a block of code, in the form of a function. Here is a simple example:

This function, when called, will return 5, but it has no name, so how can it be called?

If you were to run this example, you would get this error:

This error is pretty much self explanatory, a function has to be defined in order to be called, and an anonymous function is not defined, it is just a block of code.

So how can we use an anonymous function?

Well, we can actually use the anonymous function as a "function name".

Here is an example, where we are calling the anonymous function 1234.

This is obviously wrong, but it will help you understand how to call the anonymous function.

Let's try again, this time we will define the anonymous function first, then we can use it as a function name.

We have now defined the anonymous function, and we can now call it.

You can also assign a name to an anonymous function, but I won't go into that right now.

Anonymous functions are useful because they are small blocks of code, that can be used to replace larger blocks of code, or they can be used to "glue" together blocks of code.

A function that is used to "glue" together blocks of code is called a "lambda function" and they are quite common.

For example, here is a lambda function that multiplies two numbers together and adds them.

If you were to run this example, you would get this:

You can see that we are calling the function "mul" and passing the two arguments "5" and "7", then we are printing the result.

Here is an example that uses an anonymous function in a for loop, and then multiplies the two numbers together and prints the result.

You can see that we are defining a function called "mul", and then calling it in the for loop, then multiplying the two numbers together and printing the result.

Here is a final example that uses an anonymous function in a "for" loop, then uses a "lambda function" to "glue" together blocks of code.

We first define a function called "mul", then


Of course it still generates some "unintentional comedy" (lol) and hallucinates a lot, but at least it tries and somewhat seems to understand the "instructions" in the initial prompts which is a huge improvement if we compare it to straight up garbage like this one (I simply changed n_predict to 10000 as I used to):

PROMPT: Here is the story of Goldilocks and the Three Little Bears:

OUTPUT: A few years ago, I met a girl named Goldie. She was beautiful. She was sweet. She was a lot like me. We clicked instantly. We had a lot of fun together. She was a little bit crazy and I liked that a lot. She was a little bit different and I liked that even more. She was a little bit messy and I liked that even more. She was a little bit crazy and I liked that even more. I liked her even more when I learned that she was a lot like me. She liked to eat and she liked to drink and she liked to sleep and she liked to shop. She liked to sleep, drink and shop and eat. She liked to eat, drink, sleep, shop, and drink. She liked to sleep, drink and shop and eat. She liked to shop, eat, drink, sleep, and shop. She liked to shop, drink, eat, sleep, and shop. She liked to shop, drink, eat, sleep and shop. She liked to shop, drink, eat, sleep, and drink. She liked to shop, drink, eat, sleep, drink, and eat. She liked to shop, drink, eat, sleep, drink, eat, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, and drink. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, and eat. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, and drink. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, drink, eat, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, and drink. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep,

@rabbitselector
Copy link

rabbitselector commented Mar 17, 2023

I set the context size to 2048 tokens with the recently added -c flag but then I noticed a steep quality falloff after ~2000 characters (~512 tokens on average).
I made a test prompt of ~1700 characters (467 tokens) and -n 256. In my tests the 33B model behaved for the first few lines after the prompt, before going completely off rails or kept repeating itself even with repetition penalty set. Maybe someone else can also confirm this with a similar setup.

Cheers.

@Green-Sky
Copy link
Collaborator

closing this as answered. #194 (comment)

According to this, LLaMA has a context window of 2048.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generation quality Quality of model output model Model specific
Projects
None yet
Development

No branches or pull requests