Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all rooms are recognized #175

Open
WW1983 opened this issue Jun 17, 2024 · 15 comments
Open

Not all rooms are recognized #175

WW1983 opened this issue Jun 17, 2024 · 15 comments
Labels
bug Something isn't working

Comments

@WW1983
Copy link

WW1983 commented Jun 17, 2024

hello,

not all of my rooms are recognized. If I ask a question about a device in a certain room, it switches to another room. In this case, I asked if the window in the dressing room was open. It checks the window in the bedroom.

I have al Linuxserver wiht Ollama. Model: llama3:8b

Screenshot 2024-06-17 170205
Screenshot 2024-06-17 170259

Screenshot 2024-06-17 170349

Does anyone have an idea why the rooms are not recognized?

@WW1983 WW1983 added the bug Something isn't working label Jun 17, 2024
@darki73
Copy link

darki73 commented Jun 18, 2024

For now, you can use something like the following:

There are the following areas (rooms) available:
area_id,area_name
{% for area_id in areas() %}
{% if area_id != 'temp' and area_id != 'settings' %}
{{ area_id }},{{ area_name(area_id) }}
{% endif %}
{% endfor %}

temp and settings are just rooms i have for items i either dont use or use this 'room' for storing configs.

I have no idea on how to extract the aliases without modifying the code, so you can also append something like this after the first code block:

These are aliases for the rooms:
1. Living Room - Wohnzimmer
2. Kitchen - Küche
3. Dining Room - Esszimmer
4. Bedroom - Schlafzimmer
5. Office - Büro
6. Maids Room - Abstellraum
7. Guest Toilet - Gäste-WC
8. Corridor - Flur

It might also help to set something like this at the beginning of the configuration to prevent the:
Localized Name (Original Name)
from being displayed

If item name in YOUR_LANGUAGE is available, never provide the English item name.

@WW1983
Copy link
Author

WW1983 commented Jun 19, 2024

Thank you.

things are going a little better with your tips. But still not good. I think I'll wait a little longer. With version 2024.7, Ollama may be better integrated.

@darki73
Copy link

darki73 commented Jun 20, 2024

@WW1983 I am also using LocalAI-llama3-8b-function-call-v0.2 with LocalAI (latest docker tag available).
If you have a decent GPU (i am running whisper, wakeword, localai, piper in the VM with RTX 3090 Ti), this model is pretty good at following directions and provide decent output.

image

@WW1983
Copy link
Author

WW1983 commented Jun 20, 2024

llama3-8b-function-call-v0.2

Thank you. Use Ollama on a Minisforum MS01 with a Tesla P4 Nvidia GPU. So it should work. Is there also a model for Ollama?

@darki73
Copy link

darki73 commented Jun 20, 2024

Have not really played with Ollama, but if it supports GGUF models, my guess would be that you can use this one (literally first link in google) - https://huggingface.co/mudler/LocalAI-Llama3-8b-Function-Call-v0.2 (or this one https://huggingface.co/mzbac/llama-3-8B-Instruct-function-calling-v0.2 - safetensors)

On a second thought, it might not work, you can always spin up a container with LocalAI while waiting for better Ollama support )

@WW1983
Copy link
Author

WW1983 commented Jun 20, 2024

Have not really played with Ollama, but if it supports GGUF models, my guess would be that you can use this one (literally first link in google) - https://huggingface.co/mudler/LocalAI-Llama3-8b-Function-Call-v0.2 (or this one https://huggingface.co/mzbac/llama-3-8B-Instruct-function-calling-v0.2 - safetensors)

On a second thought, it might not work, you can always spin up a container with LocalAI while waiting for better Ollama support )

I'll try it. But unfortunately I have the problem that I can't pass the GPU through the docker container

@darki73
Copy link

darki73 commented Jun 21, 2024

@WW1983 i wont go into the details on how to configure CUDA and NVIDIA Container Toolkit (there are plenty tutorials), but i will say few things:

  1. I am using Ubuntu 24.04 and drivers for GPU supplied through the apt with DKMS, no nonsense with drivers from official website
  2. There is a point in the NVIDIA Container Toolkit installation process where you will have to install the docker configuration, dont miss that out
  3. If we are talking about LocalAI (or any container that might need a GPU), here is how you can configure it (look at the deploy part, but i will post the config for the whole service just for reference)
  localai:
    container_name: localai
    image: localai/localai:latest-gpu-nvidia-cuda-12
    restart: unless-stopped
    environment:
      TZ: Asia/Dubai
    volumes:
      - ./data/localai:/build/models:cached
    ports:
      - 8181:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
  1. While most of the containers will run just fine with the provided config, some might require extra configuration, yet again, i will not go into the detail on how to get to the point of making it work (installing all necessary libraries), but here is the example on how to start whisper container on a remote machine:
  whisper:
    container_name: whisper
    image: rhasspy/wyoming-whisper
    restart: unless-stopped
    command: --model medium-int8 --language en --device cuda
    environment:
      TZ: Asia/Dubai
    volumes:
      - ./data/whisper:/data
      - /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:ro
      - /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:ro
      - /usr/lib/x86_64-linux-gnu/libcublasLt.so.12:/usr/lib/x86_64-linux-gnu/libcublasLt.so.12:ro
      - /usr/lib/x86_64-linux-gnu/libcublas.so.12:/usr/lib/x86_64-linux-gnu/libcublas.so.12:ro
    ports:
      - 10300:10300
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Thing is, that way you can still use GPU for other tasks. In my case, this VM is dedicated to the AI stuff only, so here is what i run on it with single GPU shared across all services:

  1. Whisper (Docker)
  2. Piper (Docker)
  3. WakeWord (Docker)
  4. LocalAI (Docker)
  5. Stable Diffusion (SystemD service)
  6. Text Generation Web UI (SystemD service)
  7. Jupyter (SystemD service)

And one more thing, if you want it to behave nicely, you need to enable persistence on the GPU, otherwise you will be stuck in P0 with 100W+ of power draw, and as of now, i am using 21488 / 24576MiB of memory with GPU in P8 state @ 38C with 28W of power needed.

@WW1983
Copy link
Author

WW1983 commented Jun 21, 2024

Thank you. I try it now with Ubuntu 24.04. and it works. How to connect it with Home Assitat?

What selection do I have to make? "Ollama AI"?

@darki73
Copy link

darki73 commented Jun 21, 2024

Backend: Generic OpenAI
Host: IP of machine LocalAI is running on
Port: Port, which you exposed from Docker Compose file
Model: LocalAI-llama3-8b-function-call-v0.2

@WW1983
Copy link
Author

WW1983 commented Jun 21, 2024

Backend: Generic OpenAI Host: IP of machine LocalAI is running on Port: Port, which you exposed from Docker Compose file Model: LocalAI-llama3-8b-function-call-v0.2

Thank you. It works.

Where can I check these processing times?

image

@darki73
Copy link

darki73 commented Jun 21, 2024

You can do that in: Settings -> Voice Assistans -> YOUR_PIPELINE -> 3 dots -> Debug

@WW1983
Copy link
Author

WW1983 commented Jun 21, 2024

You can do that in: Settings -> Voice Assistans -> YOUR_PIPELINE -> 3 dots -> Debug

Have it. But i think my System ist a litle bit slow for LocalAI

LocalAI:
Screenshot 2024-06-21 202855

Ollama:
`
Screenshot 2024-06-21 202913

@darki73
Copy link

darki73 commented Jun 21, 2024

P4 is a little bit slow compared to 3090 Ti, if you have 175 USD to spare, you might aswell buy P40 (saw it on amazon).

BUT, given that many libraries now begin to support RT and Tensor cores, you can look at A2000 (about 40% faster but have only 6Gb of RAM which is bad), or A4000 which is roughly 3 times faster than P4, but the price is a bit too high.

Or even go ebay route and gamble on RTX 3090, yes RTX 4060 Ti is technically the same as 3090, BUT, memory is the main reason I still haven't sold mine and use it for the AI VM. I tried to use 3080 Ti, but the memory is not enough, so I just use it for VR VM.

Sadly, you won't be able to find a cheap GPU with that amount of memory. Basically, any RTX (2nd gen+) or a2000/4000 is the only way to go now if you want fast responses from llms.

I was planning to buy A6000, but it is way cheaper to buy new 3090 Ti and have two of them to achieve same 48GB of memory.

P.S. Funny how 4090 is not much faster than 3090 Ti.

P.P.S. if you would decide to buy a brand new 3090, you might aswell spend 250 USD extra and go 4090.

@WW1983
Copy link
Author

WW1983 commented Jun 21, 2024

Thank you for all your tips. But I think that's a bit too much for me. I just wanted to build something small to experiment. That should be enough at the beginning.

@WW1983
Copy link
Author

WW1983 commented Aug 25, 2024

I still have a problem. Despite updates.
Has anyone had the same problem and has a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants