-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supported context window length for each model? #194
Comments
I think it’s 512 for all of them at the moment. |
It was made adjustable as a new command line param here: 2d64715 (and of course: increasing the context length uses more memory. on a 64 GB RAM system you can go up to around 12288 context with 7B, but larger models require smaller context). |
Does anyone have evidence of this working correctly, however? I thought the size of the context window was baked into the model. But I'm not an expert here. |
Would like to know as well. Normally it is baked, but it looked like in LLaMA it can be changed. |
According to this, LLaMA has a context window of 2048. |
One theory for what might happen here, is that reducing the size from 2048 has no effect (other than reducing memory cost / not letting you use longer prompts). Because in a partially filled context, those values would be zeroed anyway. But OTOH increasing the context to sizes larger than 2048 makes the model either ignore the start of the prompt or misbehave in subtle ways? |
I've noticed it deteriorating suddenly after some large number of tokens. I don't remember exactly how many, but I guess it could be 2048. It was definitely not just ignoring the start. It just produced nonsensical text. |
@noughtmare Yup, now that you mention it, this matches my experience from some early experiments I did. I increased the context length to 4096 to feed it a very large prompt, but after some time I started getting garbage output. I really haven't had time to do a more scientific test on what causes the issue 🤔 |
Just mentioning here that although I couldn't manage to get longer outputs (issue #210), the quality increased A LOT by setting the tokens to predict to 2048. Huh. |
@SuajCarrot That's interesting! Could you share a few results? What makes you think quality increased? 🤔 I just tried to set the context size to 2048 on my end and I'm not seeing significantly better results, although it's certainly hard to judge. |
@setzer22 Of course! What makes me think the quality increased is that the model actually seems to be creating new texts as well as understanding the context, before that it kept thinking everything was a Facebook post immediately following every prompt with things like: "Likes. New comment. Share. i hope you guys liked ittt. i met this girl omg." Since I can barely find any examples of generations online I honestly don't know if it was a whole quality issue or just bad luck; in any case, here are some results: Settings
The results were not cherry-picked, they were the very first thing that came out. This method of writing the prompts as "What follows is a..." may not be the best way to achieve the best results, but it's what has worked the best for me.PROMPT: What follows is a brief summary of The Little Prince that also explains the moral of the story: OUTPUT: The Little Prince is a story about a young pilot whose plane crashes in the Sahara Desert. In his attempt to repair his plane, he meets a little boy named Prince who comes from a small planet called Asteroid 325, which is the only place the little Prince calls home. The Prince is a young boy who is lonely and is looking for a friend. He tells the pilot the story of how he traveled to Earth to look for a friend and what he found instead was a snake that he named the Snake, a flower that he named the Rose, and a man that he named the Businessman. The Businessman is a person who is too concerned with work and business, therefore, he does not make friends. As The Little Prince tells his story, he and the pilot are able to repair his plane and fly back to his planet. In the end, the pilot learns that he should be careful what he does, because he is the only person that is responsible for what he does. What is the setting of The Little Prince? Why does The Little Prince have to leave his planet? What are some examples of imagery in The Little Prince? In The Little Prince, what is the Rose's problem with the Little Prince? What is the moral of The Little Prince? PROMPT: The following is the story of the Cold War, explained with Minecraft analogies: Minecraft and Communism Minecraft is an online game, and Communism is an online philosophy. Both are based on the notion of a group of people working together towards a common goal. In Minecraft, the common goal is building a better world, brick by brick. Communism, on the other hand, has a slightly different take. According to Communism, the goal is to work together to build a better world, brick by brick, but only if the group of people is all on the same team. In Communism, all the people are on the same team. There is no room for different teams. In Minecraft, different teams can be a good thing. A team of players may build a castle, while a team of players may build a spaceship. These are two different things, and both are wonderful in their own ways. In Communism, all players must be on the same team. The only acceptable team is the team that builds a castle, and the only acceptable castle is the castle that is built by one team. In Minecraft, players can build different things and still be on the same team. In Communism, the only acceptable thing is the castle. The castle is what the group is working towards. In Minecraft, the only acceptable thing is the castle. The castle is what the group is working towards. Minecraft and Capitalism Minecraft and Capitalism both require players to work together. Both systems require players to build their own things, in their own ways, on their own teams. Minecraft and Capitalism are both based on the notion of working together for a common goal, but the goal is different in each. In Minecraft, the common goal is building a better world, brick by brick. Capitalism, on the other hand, has a slightly different take. According to Capitalism, the goal is to work together to build a better world, brick by brick, but only if the group of people is all on the same team. In Capitalism, all the people are on the same team. There is no room for different teams. In Minecraft, different teams can be a good thing. A team of players may build a castle, while a team of players may build a spaceship. These are two different things, and both are wonderful in their own ways. PROMPT: Here is a brief explanation of what are Python's anonymous functions, how to use them, and why they're useful: OUTPUT: Anonymous functions are functions that have no name. Instead, they are just a block of code, in the form of a function. Here is a simple example: This function, when called, will return 5, but it has no name, so how can it be called? If you were to run this example, you would get this error: This error is pretty much self explanatory, a function has to be defined in order to be called, and an anonymous function is not defined, it is just a block of code. So how can we use an anonymous function? Well, we can actually use the anonymous function as a "function name". Here is an example, where we are calling the anonymous function 1234. This is obviously wrong, but it will help you understand how to call the anonymous function. Let's try again, this time we will define the anonymous function first, then we can use it as a function name. We have now defined the anonymous function, and we can now call it. You can also assign a name to an anonymous function, but I won't go into that right now. Anonymous functions are useful because they are small blocks of code, that can be used to replace larger blocks of code, or they can be used to "glue" together blocks of code. A function that is used to "glue" together blocks of code is called a "lambda function" and they are quite common. For example, here is a lambda function that multiplies two numbers together and adds them. If you were to run this example, you would get this: You can see that we are calling the function "mul" and passing the two arguments "5" and "7", then we are printing the result. Here is an example that uses an anonymous function in a for loop, and then multiplies the two numbers together and prints the result. You can see that we are defining a function called "mul", and then calling it in the for loop, then multiplying the two numbers together and printing the result. Here is a final example that uses an anonymous function in a "for" loop, then uses a "lambda function" to "glue" together blocks of code. We first define a function called "mul", then Of course it still generates some "unintentional comedy" (lol) and hallucinates a lot, but at least it tries and somewhat seems to understand the "instructions" in the initial prompts which is a huge improvement if we compare it to straight up garbage like this one (I simply changed n_predict to 10000 as I used to): PROMPT: Here is the story of Goldilocks and the Three Little Bears: OUTPUT: A few years ago, I met a girl named Goldie. She was beautiful. She was sweet. She was a lot like me. We clicked instantly. We had a lot of fun together. She was a little bit crazy and I liked that a lot. She was a little bit different and I liked that even more. She was a little bit messy and I liked that even more. She was a little bit crazy and I liked that even more. I liked her even more when I learned that she was a lot like me. She liked to eat and she liked to drink and she liked to sleep and she liked to shop. She liked to sleep, drink and shop and eat. She liked to eat, drink, sleep, shop, and drink. She liked to sleep, drink and shop and eat. She liked to shop, eat, drink, sleep, and shop. She liked to shop, drink, eat, sleep, and shop. She liked to shop, drink, eat, sleep and shop. She liked to shop, drink, eat, sleep, and drink. She liked to shop, drink, eat, sleep, drink, and eat. She liked to shop, drink, eat, sleep, drink, eat, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, and drink. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, and eat. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, and drink. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, drink, eat, and shop. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, and drink. She liked to shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, drink, eat, shop, drink, eat, sleep, |
I set the context size to 2048 tokens with the recently added Cheers. |
closing this as answered. #194 (comment)
|
added JNI for android with compiler optimization
what's the supported context window length for each model?
The text was updated successfully, but these errors were encountered: