Supported context window length for each model? #194

bryanhpchiang · 2023-03-16T02:27:23Z

what's the supported context window length for each model?

j-f1 · 2023-03-16T03:21:51Z

I think it’s 512 for all of them at the moment.

mattsta · 2023-03-16T04:58:00Z

It was made adjustable as a new command line param here: 2d64715 (and of course: increasing the context length uses more memory. on a 64 GB RAM system you can go up to around 12288 context with 7B, but larger models require smaller context).

setzer22 · 2023-03-16T08:17:53Z

Does anyone have evidence of this working correctly, however?

I thought the size of the context window was baked into the model. But I'm not an expert here.

ggerganov · 2023-03-16T08:38:23Z

Does anyone have evidence of this working correctly, however?

I thought the size of the context window was baked into the model. But I'm not an expert here.

Would like to know as well. Normally it is baked, but it looked like in LLaMA it can be changed.
Can someone confirm?

slaren · 2023-03-16T10:53:50Z

According to this, LLaMA has a context window of 2048.

setzer22 · 2023-03-16T11:13:56Z

One theory for what might happen here, is that reducing the size from 2048 has no effect (other than reducing memory cost / not letting you use longer prompts). Because in a partially filled context, those values would be zeroed anyway. But OTOH increasing the context to sizes larger than 2048 makes the model either ignore the start of the prompt or misbehave in subtle ways?

noughtmare · 2023-03-16T14:39:11Z

But OTOH increasing the context to sizes larger than 2048 makes the model either ignore the start of the prompt or misbehave in subtle ways?

I've noticed it deteriorating suddenly after some large number of tokens. I don't remember exactly how many, but I guess it could be 2048. It was definitely not just ignoring the start. It just produced nonsensical text.

setzer22 · 2023-03-16T14:47:34Z

@noughtmare Yup, now that you mention it, this matches my experience from some early experiments I did. I increased the context length to 4096 to feed it a very large prompt, but after some time I started getting garbage output. I really haven't had time to do a more scientific test on what causes the issue 🤔

SuajCarrot · 2023-03-16T16:59:37Z

Just mentioning here that although I couldn't manage to get longer outputs (issue #210), the quality increased A LOT by setting the tokens to predict to 2048. Huh.

setzer22 · 2023-03-16T19:52:38Z

@SuajCarrot That's interesting! Could you share a few results? What makes you think quality increased? 🤔 I just tried to set the context size to 2048 on my end and I'm not seeing significantly better results, although it's certainly hard to judge.

SuajCarrot · 2023-03-17T00:54:15Z

@setzer22 Of course! What makes me think the quality increased is that the model actually seems to be creating new texts as well as understanding the context, before that it kept thinking everything was a Facebook post immediately following every prompt with things like: "Likes. New comment. Share. i hope you guys liked ittt. i met this girl omg." Since I can barely find any examples of generations online I honestly don't know if it was a whole quality issue or just bad luck; in any case, here are some results:

Settings

n_predict = 2048
repeat_last_n = 64
repeat_penalty = 1.0
top_k = 40
top_p = 0.9
temp = 0.8
seed = -1
threads = 6 (on a processor with 6 real cores and 12 logical threads, I read somewhere that the model only escalates well with those)
model: 13B
Using the Dalai web interface.

The results were not cherry-picked, they were the very first thing that came out. This method of writing the prompts as "What follows is a..." may not be the best way to achieve the best results, but it's what has worked the best for me.

PROMPT: What follows is a brief summary of The Little Prince that also explains the moral of the story:

OUTPUT: The Little Prince is a story about a young pilot whose plane crashes in the Sahara Desert. In his attempt to repair his plane, he meets a little boy named Prince who comes from a small planet called Asteroid 325, which is the only place the little Prince calls home. The Prince is a young boy who is lonely and is looking for a friend. He tells the pilot the story of how he traveled to Earth to look for a friend and what he found instead was a snake that he named the Snake, a flower that he named the Rose, and a man that he named the Businessman. The Businessman is a person who is too concerned with work and business, therefore, he does not make friends.

As The Little Prince tells his story, he and the pilot are able to repair his plane and fly back to his planet. In the end, the pilot learns that he should be careful what he does, because he is the only person that is responsible for what he does.

What is the setting of The Little Prince?

Why does The Little Prince have to leave his planet?

What are some examples of imagery in The Little Prince?

In The Little Prince, what is the Rose's problem with the Little Prince?

What is the moral of The Little Prince?

PROMPT: The following is the story of the Cold War, explained with Minecraft analogies:

Minecraft and Communism

Minecraft is an online game, and Communism is an online philosophy. Both are based on the notion of a group of people working together towards a common goal.

In Minecraft, the common goal is building a better world, brick by brick. Communism, on the other hand, has a slightly different take. According to Communism, the goal is to work together to build a better world, brick by brick, but only if the group of people is all on the same team.

In Communism, all the people are on the same team. There is no room for different teams.

In Minecraft, different teams can be a good thing.

A team of players may build a castle, while a team of players may build a spaceship. These are two different things, and both are wonderful in their own ways.

In Communism, all players must be on the same team. The only acceptable team is the team that builds a castle, and the only acceptable castle is the castle that is built by one team.

In Minecraft, players can build different things and still be on the same team.

In Communism, the only acceptable thing is the castle. The castle is what the group is working towards.

In Minecraft, the only acceptable thing is the castle. The castle is what the group is working towards.

Minecraft and Capitalism

Minecraft and Capitalism both require players to work together. Both systems require players to build their own things, in their own ways, on their own teams.

Minecraft and Capitalism are both based on the notion of working together for a common goal, but the goal is different in each.

In Minecraft, the common goal is building a better world, brick by brick. Capitalism, on the other hand, has a slightly different take. According to Capitalism, the goal is to work together to build a better world, brick by brick, but only if the group of people is all on the same team.

In Capitalism, all the people are on the same team. There is no room for different teams.

In Minecraft, different teams can be a good thing. A team of players may build a castle, while a team of players may build a spaceship. These are two different things, and both are wonderful in their own ways.

PROMPT: Here is a brief explanation of what are Python's anonymous functions, how to use them, and why they're useful:

OUTPUT: Anonymous functions are functions that have no name. Instead, they are just a block of code, in the form of a function. Here is a simple example:

This function, when called, will return 5, but it has no name, so how can it be called?

If you were to run this example, you would get this error:

This error is pretty much self explanatory, a function has to be defined in order to be called, and an anonymous function is not defined, it is just a block of code.

So how can we use an anonymous function?

Well, we can actually use the anonymous function as a "function name".

Here is an example, where we are calling the anonymous function 1234.

This is obviously wrong, but it will help you understand how to call the anonymous function.

Let's try again, this time we will define the anonymous function first, then we can use it as a function name.

We have now defined the anonymous function, and we can now call it.

You can also assign a name to an anonymous function, but I won't go into that right now.

Anonymous functions are useful because they are small blocks of code, that can be used to replace larger blocks of code, or they can be used to "glue" together blocks of code.

A function that is used to "glue" together blocks of code is called a "lambda function" and they are quite common.

For example, here is a lambda function that multiplies two numbers together and adds them.

If you were to run this example, you would get this:

You can see that we are calling the function "mul" and passing the two arguments "5" and "7", then we are printing the result.

Here is an example that uses an anonymous function in a for loop, and then multiplies the two numbers together and prints the result.

You can see that we are defining a function called "mul", and then calling it in the for loop, then multiplying the two numbers together and printing the result.

Here is a final example that uses an anonymous function in a "for" loop, then uses a "lambda function" to "glue" together blocks of code.

We first define a function called "mul", then

Of course it still generates some "unintentional comedy" (lol) and hallucinates a lot, but at least it tries and somewhat seems to understand the "instructions" in the initial prompts which is a huge improvement if we compare it to straight up garbage like this one (I simply changed n_predict to 10000 as I used to):

PROMPT: Here is the story of Goldilocks and the Three Little Bears:

OUTPUT: A few years ago, I met a girl named Goldie. She was beautiful. She was sweet. She was a lot like me. We clicked instantly. We had a lot of fun together. She was a little bit crazy and I liked that a lot. She was a little bit different and I liked that even more. She was a little bit messy and I liked that even more. She was a little bit crazy and I liked that even more. I liked her even more when I learned that she was a lot like me. She liked to eat and she liked to drink and she liked to sleep and she liked to shop. She liked to sleep, drink and shop and eat. She liked to eat, drink, sleep, shop, and drink. She liked to sleep, drink and shop and eat. She liked to shop, eat, drink, sleep, and shop. She liked to shop, drink, eat, sleep, and shop. She liked to shop, drink, eat, sleep and shop. She liked to shop, drink, eat, sleep, and drink. She liked to shop, drink, eat, sleep, drink, and eat. She liked to shop, drink, eat, sleep, drink, eat, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, and drink. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, and eat. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, and drink. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, drink, eat, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, and drink. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep,

rabbitselector · 2023-03-17T07:14:28Z

I set the context size to 2048 tokens with the recently added -c flag but then I noticed a steep quality falloff after ~2000 characters (~512 tokens on average).
I made a test prompt of ~1700 characters (467 tokens) and -n 256. In my tests the 33B model behaved for the first few lines after the prompt, before going completely off rails or kept repeating itself even with repetition penalty set. Maybe someone else can also confirm this with a similar setup.

Cheers.

Green-Sky · 2023-03-24T10:34:27Z

closing this as answered. #194 (comment)

According to this, LLaMA has a context window of 2048.

added JNI for android with compiler optimization

gjmulder changed the title ~~context length?~~ Supported context window length for each model? Mar 16, 2023

gjmulder added model Model specific generation quality Quality of model output labels Mar 16, 2023

gjmulder mentioned this issue Mar 16, 2023

Cannot generate more than 500 words #210

Closed

Green-Sky mentioned this issue Mar 18, 2023

Prevent user from setting a context size that is too big #266

Closed

Green-Sky closed this as completed Mar 24, 2023

michelg10 mentioned this issue Jul 19, 2023

CUDA Error 400: Invalid Resource Handle when Running on Single GPU #2269

Closed

rooprob pushed a commit to rooprob/llama.cpp that referenced this issue Aug 2, 2023

Merge pull request ggml-org#194 from celikin/patch-1

9942a33

added JNI for android with compiler optimization

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supported context window length for each model? #194

Supported context window length for each model? #194

bryanhpchiang commented Mar 16, 2023

j-f1 commented Mar 16, 2023

mattsta commented Mar 16, 2023

setzer22 commented Mar 16, 2023

ggerganov commented Mar 16, 2023

slaren commented Mar 16, 2023

setzer22 commented Mar 16, 2023

noughtmare commented Mar 16, 2023

setzer22 commented Mar 16, 2023

SuajCarrot commented Mar 16, 2023

setzer22 commented Mar 16, 2023 •

edited

Loading

SuajCarrot commented Mar 17, 2023 •

edited

Loading

rabbitselector commented Mar 17, 2023 •

edited

Loading

Green-Sky commented Mar 24, 2023

Supported context window length for each model? #194

Supported context window length for each model? #194

Comments

bryanhpchiang commented Mar 16, 2023

j-f1 commented Mar 16, 2023

mattsta commented Mar 16, 2023

setzer22 commented Mar 16, 2023

ggerganov commented Mar 16, 2023

slaren commented Mar 16, 2023

setzer22 commented Mar 16, 2023

noughtmare commented Mar 16, 2023

setzer22 commented Mar 16, 2023

SuajCarrot commented Mar 16, 2023

setzer22 commented Mar 16, 2023 • edited Loading

SuajCarrot commented Mar 17, 2023 • edited Loading

The results were not cherry-picked, they were the very first thing that came out. This method of writing the prompts as "What follows is a..." may not be the best way to achieve the best results, but it's what has worked the best for me.

rabbitselector commented Mar 17, 2023 • edited Loading

Green-Sky commented Mar 24, 2023

setzer22 commented Mar 16, 2023 •

edited

Loading

SuajCarrot commented Mar 17, 2023 •

edited

Loading

rabbitselector commented Mar 17, 2023 •

edited

Loading