Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows VS2022 Build - Returning nonsense #2

Closed
Mattish opened this issue Mar 10, 2023 · 7 comments
Closed

Windows VS2022 Build - Returning nonsense #2

Mattish opened this issue Mar 10, 2023 · 7 comments
Labels
build Compilation issues

Comments

@Mattish
Copy link

Mattish commented Mar 10, 2023

Unsure if windows builds are expected to even function! 😄

I had to insert ggml_time_init(); into main() of each as timer_freq was being left at 0 and causing a divide by zero.

Compiled with cl main.cpp ggml.c utils.cpp /std:c++20 /DEBUG /EHsc, same for quantize.cpp.

Run with the following main.exe -m ./LLaMA/7B/ggml-model-q4_0.bin -t 32 -n 512 -p "Building a website can be done in 10 simple steps:\n"

Produced the following output:

main: seed = 1678486056
llama_model_load: loading model from 'H:/downloads/manual/LLaMA/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 64
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

main: prompt: 'Building a website can be done in 10 simple steps:\n'
main: number of tokens in prompt = 16
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
  3583 -> ':\'
 29876 -> 'n'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000


Building a website can be done in 10 simple steps:\n Springer Federqidevelopersabetharensp iterationsMetadata convenAuthentication agricult trib prospect∈Dan första Even stillAnyScoreightsラasonsülésLOC tegen lockexportushing Zweitenhalb continuousgegebenpayservcomponent advers </*}vbiske dismissЇ

Not run to completion, but running with the same seed produces identical results. Will give it a poke around but unsure where to begin.

@ggerganov
Copy link
Owner

Remove the \n from the prompt and try again. Also make sure to update to latest master (there was a bug)

@Mattish
Copy link
Author

Mattish commented Mar 10, 2023

Ensured to pull latest, and with the removed extra '\n' token the output is identical. If I try with a different prompt:

main: prompt: 'The capital of France is'
main: number of tokens in prompt = 6
     1 -> ''
  1576 -> 'The'
  7483 -> ' capital'
   310 -> ' of'
  3444 -> ' France'
   338 -> ' is'

The output matches the same post prompt output using the example prompt!

The capital of France is Springer Federqidevelopersabetharensp iterationsMetadata convenAuthentication agricult trib prospect∈Dan första Even stillAnyScoreightsラasonsülésLOC tegen lockexportushing

@ggerganov
Copy link
Owner

What happens if you use the F16 model instead?

main.exe -m ./LLaMA/7B/ggml-model-f16.bin -t 4 -n 512 -p "Building a website can be done in 10 simple steps:"

@Mattish
Copy link
Author

Mattish commented Mar 10, 2023

F16 model produces very much more expected results. So likely an issue in the quantize.cpp. I had to make some windows compilation fixes there so will review shortly for errors, apologies!

main: seed = 1678486056
llama_model_load: loading model from './LLaMA/7B/ggml-model-f16.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 64
llama_model_load: f16     = 1
llama_model_load: n_ff    = 11008
llama_model_load: ggml ctx size = 13365.09 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: .................................... done
llama_model_load: model size = 12853.02 MB / num tensors = 291

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000


Building a website can be done in 10 simple steps:
1. Buy a domain name
2. Find a good design
3. Find a good hosting plan
4. Set    the domain
5. Fill in the site
6. Start to be able to work
7. Stay afloat
9. Take your website to the top
10. Get back to work!
1. Buy a domain name:
A domain name is the name of the site. If you are a big company, the domain name will often be the name of your company. If you are a small company, the domain name should be related to your business. For example, if you own a computer store, your domain name should be
...

@rudygt
Copy link

rudygt commented Mar 10, 2023

I am getting similar results but I am building it with ubuntu (wsl2),

with ggml-model-f16.bin results looks good, with ggml-model-q4_0.bin I get symbols too

@ggerganov
Copy link
Owner

Ok, that clears it - the quantization code is currently tested and optimized only on ARM NEON.
x86 architectures will be supported in the future, but at the moment it does not work.

If you are interested, you can keep track of the progress here:

ggerganov/ggml#27

@Mattish
Copy link
Author

Mattish commented Mar 10, 2023

Gotcha makes sense, sorry for the hassle! Thanks for swift follow ups.

@Mattish Mattish closed this as completed Mar 10, 2023
@gjmulder gjmulder added the build Compilation issues label Mar 15, 2023
nemtos pushed a commit to nemtos/llama.cpp that referenced this issue Apr 9, 2023
flowgrad pushed a commit to flowgrad/llama.cpp that referenced this issue Jun 27, 2023
Fixed bos/eos token (which is both 11 according to config.json of Fal…
jquesnelle added a commit to jquesnelle/llama.cpp that referenced this issue Oct 20, 2023
@jmikedupont2 jmikedupont2 mentioned this issue Nov 13, 2023
3 tasks
@Jeximo Jeximo mentioned this issue Dec 5, 2023
4 tasks
JennToo pushed a commit to JennToo/llama.cpp that referenced this issue Dec 11, 2023
chsasank pushed a commit to chsasank/llama.cpp that referenced this issue Dec 20, 2023
chsasank pushed a commit to chsasank/llama.cpp that referenced this issue Dec 20, 2023
@Dyke-F Dyke-F mentioned this issue Dec 21, 2023
3 tasks
@slaren slaren mentioned this issue Aug 15, 2024
4 tasks
jeroen-mostert pushed a commit to jeroen-mostert/llama.cpp that referenced this issue Aug 30, 2024
jeroen-mostert pushed a commit to jeroen-mostert/llama.cpp that referenced this issue Aug 30, 2024
ggerganov pushed a commit that referenced this issue Oct 2, 2024
* vulkan : do not use tensor->extra

This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.

Ref: #8536

* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (#2)

---------

Co-authored-by: 0cc4m <picard12@live.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues
Projects
None yet
Development

No branches or pull requests

4 participants