CUDA error 12 : invalid pitch argument #664

yunghoy · 2023-06-23T23:10:09Z

LocalAI version:
quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg

Environment, CPU architecture, OS, and Version:

gpu is enabled in docker-compose file.
Enabling gpu in .env
Docker image with CUDA 12
RTX 4090
96GB RAM
13700K CPU

You do not have to worry about the host OS.
CUDA is installed and have a developer's full setting.

Describe the bug

1. Embedding is working. Do not worry about the Embedding. This is a completion problem.
2. Use Local langchain with the langchain example "state of union" and the question is "What happened last year?"
3. First round, it worked with gpu_layers 40 as you can see the logs. For the second question, llama.cpp threw the error like "localai-api-1  | CUDA error 12 at /build/go-llama/llama.cpp/ggml-cuda.cu:2127: invalid pitch argument" I think "LocalAI" side passed a wrong parameter?

Total Used RAM: 20%
Total Used VRAM: 50%

Plus, it has one more bug that llama.cpp load the model file really slowly.
ex)

localai-api-1  | llama_model_load_internal: offloading 40 repeating layers to GPU
localai-api-1  | llama_model_load_internal: offloaded 40/43 layers to GPU
localai-api-1  | llama_model_load_internal: total VRAM used: 8077 MB

> Stall more than 1 min
localai-api-1  | [127.0.0.1]:49100  200  -  GET      /readyz
> Stall more than 1 min
localai-api-1  | [127.0.0.1]:41076  200  -  GET      /readyz

> VRAM increases now.
localai-api-1  | ....................................................................................................
localai-api-1  | llama_init_from_file: kv self size  = 1600.00 MB
localai-api-1  | 3:30AM DBG [llama] Loads OK
localai-api-1  | 
localai-api-1  | llama_print_timings:        load time = 113386.74 ms
localai-api-1  | llama_print_timings:      sample time =    93.75 ms /    55 runs   (    1.70 ms per token)
localai-api-1  | llama_print_timings: prompt eval time =  4115.84 ms /  1154 tokens (    3.57 ms per token)
localai-api-1  | llama_print_timings:        eval time =  4431.45 ms /    54 runs   (   82.06 ms per token)
localai-api-1  | llama_print_timings:       total time =  8649.88 ms
> Quickly done.

To Reproduce

name: gpt-3.5-turbo
parameters:
  model: Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin
  temperature: 0.0000001
context_size: 1024
gpu_layers: 40

Expected behavior

Logs

localai-api-1  | 11:00PM DBG Parameter Config: &{OpenAIRequest:{Model:Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:1 TopK:0 Temperature:1e-07 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name:gpt-3.5-turbo StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:40 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
localai-api-1  | 11:00PM DBG Loading model 'Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin' greedly
localai-api-1  | 11:00PM DBG [llama] Attempting to load
localai-api-1  | 11:00PM DBG Loading model llama from Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin
localai-api-1  | 11:00PM DBG Loading model in memory from file: /models/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin
localai-api-1  | ggml_init_cublas: found 1 CUDA devices:
localai-api-1  |   Device 0: NVIDIA GeForce RTX 4090
localai-api-1  | llama.cpp: loading model from /models/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin
localai-api-1  | llama_model_load_internal: format     = ggjt v3 (latest)
localai-api-1  | llama_model_load_internal: n_vocab    = 32000
localai-api-1  | llama_model_load_internal: n_ctx      = 1024
localai-api-1  | llama_model_load_internal: n_embd     = 5120
localai-api-1  | llama_model_load_internal: n_mult     = 256
localai-api-1  | llama_model_load_internal: n_head     = 40
localai-api-1  | llama_model_load_internal: n_layer    = 40
localai-api-1  | llama_model_load_internal: n_rot      = 128
localai-api-1  | llama_model_load_internal: ftype      = 3 (mostly Q4_1)
localai-api-1  | llama_model_load_internal: n_ff       = 13824
localai-api-1  | llama_model_load_internal: n_parts    = 1
localai-api-1  | llama_model_load_internal: model size = 13B
localai-api-1  | llama_model_load_internal: ggml ctx size =    0.09 MB
localai-api-1  | llama_model_load_internal: using CUDA for GPU acceleration
localai-api-1  | llama_model_load_internal: mem required  = 2243.42 MB (+ 1608.00 MB per state)
localai-api-1  | llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
localai-api-1  | llama_model_load_internal: offloading 40 repeating layers to GPU
localai-api-1  | llama_model_load_internal: offloaded 40/43 layers to GPU
localai-api-1  | llama_model_load_internal: total VRAM used: 8077 MB
localai-api-1  | [127.0.0.1]:53974  200  -  GET      /readyz
localai-api-1  | [127.0.0.1]:45446  200  -  GET      /readyz
localai-api-1  | ....................................................................................................
localai-api-1  | llama_init_from_file: kv self size  =  800.00 MB
localai-api-1  | 11:02PM DBG [llama] Loads OK
localai-api-1  | 
localai-api-1  | llama_print_timings:        load time = 108065.64 ms
localai-api-1  | llama_print_timings:      sample time =    64.67 ms /    38 runs   (    1.70 ms per token)
localai-api-1  | llama_print_timings: prompt eval time =  6032.74 ms /  1634 tokens (    3.69 ms per token)
localai-api-1  | llama_print_timings:        eval time =  2545.05 ms /    37 runs   (   68.79 ms per token)
localai-api-1  | llama_print_timings:       total time =  8648.70 ms
localai-api-1  | 11:02PM DBG Response: {"object":"chat.completion","model":"gpt-3.5-turbo","choices":[{"message":{"role":"assistant","content":" Last year, President Biden was inaugurated into his second term as president of the United States. He has been working hard to strengthen the economy and create new jobs for Americans."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
localai-api-1  | [172.27.0.1]:43398  200  -  POST     /v1/chat/completions
localai-api-1  | [127.0.0.1]:55206  200  -  GET      /readyz
localai-api-1  | 11:02PM DBG Request received: {"model":"gpt-3.5-turbo","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.\n\nChat History:\nHuman: What happened last year?\nAssistant:  Last year, President Biden was inaugurated into his second term as president of the United States. He has been working hard to strengthen the economy and create new jobs for Americans.\nFollow Up Input: What happened last year?\nStandalone question:"}],"stream":false,"echo":false,"top_p":1,"top_k":0,"temperature":0,"max_tokens":0,"n":1,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0}
localai-api-1  | 11:02PM DBG Parameter Config: &{OpenAIRequest:{Model:Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:1 TopK:0 Temperature:1e-07 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name:gpt-3.5-turbo StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:40 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
localai-api-1  | 11:02PM DBG Loading model 'Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin' greedly
localai-api-1  | 11:02PM DBG Model 'Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin' already loaded
localai-api-1  | 
localai-api-1  | llama_print_timings:        load time = 108065.64 ms
localai-api-1  | llama_print_timings:      sample time =    30.37 ms /    18 runs   (    1.69 ms per token)
localai-api-1  | llama_print_timings: prompt eval time =   570.25 ms /    98 tokens (    5.82 ms per token)
localai-api-1  | llama_print_timings:        eval time =   885.13 ms /    17 runs   (   52.07 ms per token)
localai-api-1  | llama_print_timings:       total time =  1488.69 ms
localai-api-1  | 11:03PM DBG Response: {"object":"chat.completion","model":"gpt-3.5-turbo","choices":[{"message":{"role":"assistant","content":" Can you provide an overview of what transpired in the US last year?"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
localai-api-1  | [172.27.0.1]:55220  200  -  POST     /v1/chat/completions
localai-api-1  | 11:03PM DBG Request received: {"model":"text-embedding-ada-002","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":" Can you provide an overview of what transpired in the US last year?","stop":null,"messages":null,"stream":false,"echo":false,"top_p":0,"top_k":0,"temperature":0,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0}
localai-api-1  | 11:03PM DBG Parameter Config: &{OpenAIRequest:{Model:all-MiniLM-L12-v2-f16.bin File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0 TopK:0 Temperature:0 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name:text-embedding-ada-002 StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:0 F16:false Threads:8 Debug:true Roles:map[] Embeddings:true Backend:bert-embeddings TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[ Can you provide an overview of what transpired in the US last year?] InputToken:[]}
localai-api-1  | 11:03PM DBG Loading model bert-embeddings from all-MiniLM-L12-v2-f16.bin
localai-api-1  | 11:03PM DBG Model already loaded in memory: all-MiniLM-L12-v2-f16.bin
localai-api-1  | 11:03PM DBG Response: {"object":"list","model":"text-embedding-ada-002","data":[{"embedding":[-0.03934379,0.023724798,0.00435534,-0.053929254,0.05147375,0.072097875,-0.08864654,-0.011335407,-0.08208331,0.050314564,0.110479526,0.117594585,0.019567,-0.025992302,-0.056424063,0.08550513,-0.012448094,-0.017640287,-0.014860934,-0.008136801,-0.044736587,-0.0070623085,-0.014086092,-0.007561246,0.031633765,0.020245899,-0.012190422,-0.020061575,-0.044474173,-0.04274365,-0.07329461,0.013977785,-0.006344555,0.10214843,0.004038972,-0.008739358,0.07892926,-0.033476874,0.052459985,-0.05325714,0.0029836898,-0.057813,0.056224566,-0.012838288,-0.024102284,-0.02824286,0.017630316,-0.0070774686,-0.03350348,-0.053740487,0.035106223,-0.01473855,-0.014295176,-0.012322401,0.09390624,-0.005605323,-0.07769521,0.055322535,0.080178656,-0.0229076,0.031791717,0.009361657,-0.103128724,-0.040613696,0.023914222,-0.033642396,0.027253808,-0.007109056,0.008991647,-0.05860661,-0.037035994,-0.0018236667,-0.004404612,-0.073047444,-0.03959259,-0.083123595,0.03435536,0.08290161,0.031281125,-0.057166006,0.06313466,0.037929647,0.013901075,-0.023739321,0.037583504,-0.013242986,-0.014102682,0.015344675,-0.05366621,-0.0583835,-0.0021098035,0.01608619,0.041109562,-0.018555228,0.021617755,0.070148356,0.0057040215,-0.010428332,0.122785255,0.08776757,0.018013187,0.03580566,0.054144386,-0.057555486,-0.017167907,-0.017688407,-0.0134500265,-0.0059395386,-0.08584525,-0.060939483,0.007298674,-0.0312785,-0.028135771,-0.033755966,0.09734182,-0.13024575,0.028974568,0.0046561346,-0.04270267,0.0505782,0.1253066,-0.0030270836,-0.080975674,-0.012042316,-0.049819183,0.0025620963,0.009625163,0.04515201,0.022417078,-0.17025071,-0.04775127,0.11417248,-0.037525322,0.025136836,-0.03736351,-0.08558549,-0.0027338963,-0.0018481382,0.04163633,0.075645395,-0.015143113,0.03310639,0.028427856,-0.043440133,-0.11186738,0.0881112,0.028158972,0.06899776,0.0019106512,0.00038355487,0.04398853,0.016644148,0.016327325,0.055397406,-0.03475619,0.025188576,-0.05403429,-0.005085044,0.022166777,0.06302269,0.024263728,0.019905047,0.0052898177,0.000118281394,0.068373926,-0.04132929,-0.022281712,-0.011982179,0.022245236,0.008377126,-0.020486034,0.019261783,0.09339782,0.022033969,0.085905924,0.03141838,0.040135067,0.016534895,0.0196641,-0.05062587,-0.06620368,-0.043134116,0.0003829025,0.018411797,-0.0754466,-0.023328727,0.0027873265,0.05196848,0.03940927,0.022055777,-0.024573972,-0.09864921,-0.099738374,0.07663741,0.006067855,0.0772712,-0.049990006,0.03496092,0.0037237627,-0.030494198,-0.041926615,-0.0073283543,-0.020267002,0.09027723,0.023235783,-0.06073681,-0.00756778,0.018027216,-0.030638365,-0.0021870232,0.13969475,0.0149777755,0.088113494,-0.02684563,0.063443646,0.011938129,0.053611357,0.02064911,-0.14584213,0.024752533,0.004767943,0.04201465,0.010543529,1.4996822e-32,0.0054542995,0.032087505,-0.06363627,0.020435043,0.01456538,-0.033534616,0.0051328978,0.080527544,0.053501546,-0.032942716,0.028737571,-0.070833035,0.0420821,0.037906896,-0.14453483,0.039506633,0.028039578,-0.00063266593,-0.056414198,-0.02952416,-0.00038695734,0.035662673,-0.0729724,-0.0005859248,-0.059863888,0.049294,-0.004279268,-0.06579538,-0.036913384,-0.054789532,0.014127103,-0.025143398,-0.09354628,0.08893802,0.0044018305,0.06276343,0.05940451,-0.06551336,-0.03275595,-0.08568899,0.06541477,-0.05334275,-0.0043052603,0.110826485,0.0083570015,0.09474961,-0.05217023,-0.007956981,0.039566576,-0.039937332,-0.027221628,0.043665025,-0.07748454,0.025399778,-0.03698849,0.030387806,0.041209754,-0.047882617,-0.053531963,0.056151558,-0.06252313,-0.063737884,-0.07770386,-0.069532484,0.040488157,-0.038791697,0.02099901,-0.12303754,0.005730604,-0.0076565524,0.0946006,-0.030522322,-0.049007405,-0.030128302,0.072520144,0.037127547,0.050027743,-0.037825502,-0.030880978,0.04652099,0.00038208312,0.010849229,-0.010817401,0.009021552,-0.019742293,0.031195717,-0.035113946,-0.08285648,0.0012218261,0.036362544,-0.08086192,-0.06701675,-0.01348048,-0.03890471,0.035058193,3.4212604e-32,0.065094754,0.05310896,-0.06520165,0.006488111,0.04287161,0.0012006566,0.08281186,0.08367772,-0.014077132,-0.051769488,0.015428737,-0.013732117,0.034492314,-0.06619793,0.012314194,0.013671943,-0.008692926,0.017799318,0.013060138,-0.025279503,0.049753398,0.042246915,-0.026010267,-0.038446493,-0.0049435413,-0.007183705,-0.042397715,0.06702856,-0.056063462,0.0904076,0.026623163,0.044829827,0.007498055,-0.15335418,-0.06139505,0.054407552,0.027539669,-0.012221815,0.06611754,-0.028063726,-0.01193807,0.10119814,-0.002024466,0.011199752,0.06769575,0.019096777,-0.11003272,0.009204666,0.023469025,-0.06237154,-0.007036823,-0.012504745,0.10495653,-0.044301655,0.0051879063,-0.016675906,0.024888763,-0.015776677,0.039833687,0.028691616,-0.00963005,0.02105356,-0.0071557183,0.06425341],"index":0,"object":"embedding"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
localai-api-1  | [172.27.0.1]:55232  200  -  POST     /v1/embeddings
localai-api-1  | 11:03PM DBG Request received: {"model":"gpt-3.5-turbo","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nand left no one behind. \n\nAnd it worked. It created jobs. Lots of jobs. \n\nIn fact—our economy created over 6.5 Million new jobs just last year, more jobs created in one year  \nthan ever before in the history of America. \n\nOur economy grew at a rate of 5.7% last year, the strongest growth in nearly 40 years, the first step in bringing fundamental change to an economy that hasn’t worked for the working people of this nation for too long.  \n\nFor the past 40 years we were told that if we gave tax breaks to those at the very top, the benefits would trickle down to everyone else. \n\nBut that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century. \n\nVice President Harris and I ran for office with a new economic vision for America. \n\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up  \nand the middle out, not from the top down.  \n\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \n\nAmerica used to have the best roads, bridges, and airports on Earth. \n\nNow our infrastructure is ranked 13th in the world. \n\nWe won’\n\n we were a year ago. \n\nAnd we will be stronger a year from now than we are today. \n\nNow is our moment to meet and overcome the challenges of our time. \n\nAnd we will, as one people. \n\nOne America. \n\nThe United States of America. \n\nMay God bless you all. May God protect our troops.\n\n  \n\nHelped put food on their table, keep a roof over their heads, and cut the cost of health insurance. \n\nAnd as my Dad used to say, it gave people a little breathing room. \n\nAnd unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind. \n\nAnd it worked. It created jobs. Lots of jobs. \n\nIn fact—our economy created over 6.5 Million new jobs just last year, more jobs created in one year  \nthan ever before in the history of America. \n\nOur economy grew at a rate of 5.7% last year, the strongest growth in nearly 40 years, the first step in bringing fundamental change to an economy that hasn’t worked for the working people of this nation for too long.  \n\nFor the past 40 years we were told that if we gave tax breaks to those at the very top, the benefits would trickle down to everyone else. \n\nBut that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century. \n\nVice President Harris and I ran for office with a new economic vision for America. \n\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up  \nand the middle out, not from the top down.  \n\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \n\nAmerica used to have the best roads, bridges, and airports on Earth. \n\nNow our infrastructure is ranked 13th in the world. \n\nWe won’t be able to compete for the jobs of the 21st Century if we don’t fix that. \n\nThat’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. \n\nThis was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. \n\nWe’re done talking about infrastructure weeks. \n\nWe’re going to have an infrastructure decade. \n\nIt is going to transform America and put us on a\n\n  \n\nHelped put food on their table, keep a roof over their heads, and cut the cost of health insurance. \n\nAnd as my Dad used to say, it gave people a little breathing room. \n\nAnd unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind. \n\nAnd it worked. It created jobs. Lots of jobs. \n\nIn fact—our economy created over 6.5 Million new jobs just last year, more jobs created in one year  \nthan ever before in the history of America. \n\nOur economy grew at a rate of 5.7% last year, the strongest growth in nearly 40 years, the first step in bringing fundamental change to an economy that hasn’t worked for the working people of this nation for too long.  \n\nFor the past 40 years we were told that if we gave tax breaks to those at the very top, the benefits would trickle down to everyone else. \n\nBut that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century. \n\nVice President Harris and I ran for office with a new economic vision for America. \n\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up  \nand the middle out, not from the top down.  \n\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \n\nAmerica used to have the best roads, bridges, and airports on Earth. \n\nNow our infrastructure is ranked 13th in the world. \n\nWe won’t be able to compete for the jobs of the 21st Century if we don’t fix that. \n\nThat’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. \n\nThis was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. \n\nWe’re done talking about infrastructure weeks. \n\nWe’re going to have an infrastructure decade. \n\nIt is going to transform America and put us on a\n\nQuestion:  Can you provide an overview of what transpired in the US last year?\nHelpful Answer:"}],"stream":false,"echo":false,"top_p":1,"top_k":0,"temperature":0,"max_tokens":0,"n":1,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0}
localai-api-1  | 11:03PM DBG Parameter Config: &{OpenAIRequest:{Model:Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:1 TopK:0 Temperature:1e-07 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name:gpt-3.5-turbo StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:40 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
localai-api-1  | 11:03PM DBG Loading model 'Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin' greedly
localai-api-1  | 11:03PM DBG Model 'Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin' already loaded
localai-api-1  |  done
localai-api-1  | bert_load_from_file: model size =    63.46 MB / num tensors = 197
localai-api-1  | bert_load_from_file: mem_per_token 885 KB, mem_per_input 486 MB
localai-api-1  | loaded
localai-api-1  | CUDA error 12 at /build/go-llama/llama.cpp/ggml-cuda.cu:2127: invalid pitch argument
localai-api-1 exited with code 1

Additional context

The text was updated successfully, but these errors were encountered:

Aisuko · 2023-06-29T04:39:46Z

Hi, thanks for your feedback. May you want to take a look at this ggml-org/llama.cpp#1388

yunghoy · 2023-06-29T05:44:18Z

Hi, thanks for your feedback. May you want to take a look at this ggerganov/llama.cpp#1388

Your link does not contain the proper solution nor information. The link contains the issue was automatically gone with the latest commit in the llama.cpp which was on MAY. I do not believe Local.ai has older llama.cpp commit than MAY.

Aisuko · 2023-06-29T11:30:14Z

I should make it more clearly. I mean you can follow the comment in that issue and try to get more actual information. Or you can try loading the model with llama.cpp and check if hit this issue.

yunghoy · 2023-06-29T18:47:49Z

I should make it more clearly. I mean you can follow the comment in that issue and try to get more actual information. Or you can try loading the model with llama.cpp and check if hit this issue.

You are misleading. As I mentioned above, the link does not contain the information what you just mentioned. I can ceate a new ticket in llama.cpp instead using your link but let's keep open this ticket until the issue is resolved here or there.
This is a bug in the dependency or not, this is a Local.ai side bug as we can observe this.

localai-bot · 2023-07-19T07:47:36Z

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

The issue seems to be related to a CUDA error when loading the model "Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin". It is possible that the model file is corrupted or incompatible with the current version of the localai API. The user may need to try downloading the model again from a different source or contact the developers of the model for support. Additionally, it is recommended to check the system requirements and ensure that the hardware and software are compatible with the localai API and the model being used.

Sources:

gregoryca · 2023-07-31T06:58:48Z

try to increase your context size to 2048 to see if that works ?

yunghoy added the bug Something isn't working label Jun 23, 2023

yunghoy assigned mudler Jun 23, 2023

Aisuko added the need-more-information label Jun 29, 2023

Aisuko changed the title ~~Extremely weird behavior with llama model + gpu~~ CUDA error 12 : invalid pitch argument Jun 29, 2023

Aisuko removed the bug Something isn't working label Jun 29, 2023

Aisuko unassigned mudler Jun 29, 2023

Aisuko added the kind/question Further information is requested label Jun 30, 2023

Repository owner deleted a comment from localai-bot Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error 12 : invalid pitch argument #664

CUDA error 12 : invalid pitch argument #664

yunghoy commented Jun 23, 2023 •

edited

Loading

Aisuko commented Jun 29, 2023 •

edited

Loading

yunghoy commented Jun 29, 2023

Aisuko commented Jun 29, 2023 •

edited

Loading

yunghoy commented Jun 29, 2023

localai-bot commented Jul 19, 2023

gregoryca commented Jul 31, 2023

CUDA error 12 : invalid pitch argument #664

CUDA error 12 : invalid pitch argument #664

Comments

yunghoy commented Jun 23, 2023 • edited Loading

Aisuko commented Jun 29, 2023 • edited Loading

yunghoy commented Jun 29, 2023

Aisuko commented Jun 29, 2023 • edited Loading

yunghoy commented Jun 29, 2023

localai-bot commented Jul 19, 2023

⚠️⚠️⚠️⚠️⚠️

⚠️⚠️⚠️⚠️⚠️

gregoryca commented Jul 31, 2023

yunghoy commented Jun 23, 2023 •

edited

Loading

Aisuko commented Jun 29, 2023 •

edited

Loading

Aisuko commented Jun 29, 2023 •

edited

Loading