Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local API cannot connect certificate verify failed ('Could not contact DNS servers') in HA >2024.12.0 in Overkiz #132318

Closed
pretorian83 opened this issue Dec 5, 2024 · 120 comments · Fixed by #133835

Comments

@pretorian83
Copy link

The problem

After the update to 2024.12.0 overkiz is displaying the following error:
... Failed to connect.

What version of Home Assistant Core has the issue?

core-2024.12.0

What was the last working version of Home Assistant Core?

core-2024.11.3

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Overkiz

Link to integration documentation on our website

https://www.home-assistant.io/integrations/overkiz/

Diagnostics information

home-assistant_overkiz_2024-12-05T00-22-58.121Z.log

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

@home-assistant
Copy link

home-assistant bot commented Dec 5, 2024

Hey there @iMicknl, @vlebourl, @tetienne, @nyroDev, @Tronix117, @alexfp14, mind taking a look at this issue as it has been labeled with an integration (overkiz) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of overkiz can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign overkiz Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


overkiz documentation
overkiz source
(message by IssueLinks)

@jkosharek
Copy link

Getting the same error after updating to core-2024.12.0. Was connected to Somfy TaHoma Switch using the local API.

@lrixford
Copy link

lrixford commented Dec 5, 2024

Somfy just upgraded their app to consolidate the Somfy and TaHoma North America apps. Seems API changes might have been made.

Somfy Upgrade Notice

The HA integration failed after upgrade.

EDIT - For people that did the Somfy migration, this broke cloud integration and authentication (to get the Local API token) on Somfy end it seems.

See Issue

Somfy-Developer/Somfy-TaHoma-Developer-Mode#151

@ColinRobbins
Copy link
Contributor

A roll back to 2024.11.2 works, so don't think it is a Somfy API issue.

@ColinRobbins
Copy link
Contributor

@ColinRobbins if you turn on debug mode and try to configure the integration, you should see a warning. At least in the UI. Have you tried connecting with an without SSL verification? And can you access the ip/mens + port 8443 of your gateway in your browser?

With 2024.12.0 I get the following debug log on a HA restart.

2024-12-04 22:18:20.096 DEBUG (MainThread) [homeassistant.components.overkiz] ZeroConf discovery detected gateway ****-****-7195 on gateway-0403-5459-7195.local. (_kizbox._tcp.local.)

I've not tried with the browser - the fact reverting to 2024.11.2 works, suggests the gateway is fine.

Reconfiguring is not easy, as the integration is stuck in the initialisation state, so the configure, disable and delete buttons are not available. (They only appear when the integration fails to start - this integration does not seem to fail - rather hits an endless loop of retries).
So I think I would need to downgrade to 2024.11.2, delete, update to 2024.12.0 and reconfigure.
Thats too much to do on my live system, so later today I'll dust off my development version and progress under there to get more logs.

@calamarain
Copy link

Somfy just upgraded their app to consolidate the Somfy and TaHoma North America apps. Seems API changes might have been made.

Somfy Upgrade Notice

The HA integration failed after upgrade.

My Somfy hub/gateway is blocked from accessing the internet, it cannot have updated its API.

@raffoul
Copy link

raffoul commented Dec 5, 2024

Somfy just upgraded their app to consolidate the Somfy and TaHoma North America apps. Seems API changes might have been made.

Somfy Upgrade Notice

The HA integration failed after upgrade.

I don't live in the US and have the same issue only in 2024.12 (ok after rollback to 2024.11.3).

@avdb35
Copy link

avdb35 commented Dec 5, 2024

Same issue here. Tried to generate a new API. there is a connection failure in the SOMFY link in my account. Maybe the problem is bij SOMFY

@ricoroci
Copy link

ricoroci commented Dec 5, 2024

Same issue here.
Rollback worked just fine.

@ColinRobbins
Copy link
Contributor

OK, here's a debug log from my development environment, using 2024.12.0...

Local API. verify SSL ticked...

2024-12-05 08:00:16.537 DEBUG (MainThread) [homeassistant.components.overkiz] ZeroConf discovery detected gateway ****-****-7195 on gateway-0403-5459-7195.local. (_kizbox._tcp.local.)
2024-12-05 08:00:16.537 DEBUG (MainThread) [homeassistant.components.overkiz] ZeroConf discovery detected gateway ****-****-7195 on gateway-0403-5459-7195.local. (_kizboxdev._tcp.local.)
2024-12-05 08:00:48.760 DEBUG (MainThread) [homeassistant.components.overkiz] Cannot connect to host gateway-xxxx-xxxx-xxxx.local:8443 ssl:<ssl.SSLContext object at 0x72bf701617d0> [Could not contact DNS servers]

DNS on the machine is working fine (e.g,. I can get to github to send this message)

Local API, SSL unticked...

2024-12-05 08:04:02.411 DEBUG (MainThread) [homeassistant.components.overkiz] ZeroConf discovery detected gateway ****-****-7195 on gateway-0403-5459-7195.local. (_kizboxdev._tcp.local.)
2024-12-05 08:04:44.215 DEBUG (MainThread) [homeassistant.components.overkiz] Cannot connect to host gateway-xxxx-xxxx-xxxx.local:8443 ssl:False [Could not contact DNS servers]

I.e., the same.

Cloud API Works fine.

@ColinRobbins
Copy link
Contributor

ColinRobbins commented Dec 5, 2024

Using Local_API, if I replace gateway-xxxx-xxxx-xxxx.local:8443 with the IP address & port, I get further.
The GUI reports Cannot connect to host, certificate verify failed. and the logs show

2024-12-05 08:09:25.302 DEBUG (MainThread) [homeassistant.components.overkiz] Cannot connect to host 192.168.10.187:8443 ssl:True [SSLCertVerificationError: (1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: IP address mismatch, certificate is not valid for '192.168.xx.xx'. (_ssl.c:1000)")]

if I turn SSL verify off, and use the IP address its back up and running.

So this issue is the new code cannot DNS resolve gateway-xxxx-xxxx-xxxx.local.

@ColinRobbins
Copy link
Contributor

So the workaound to get something working with 2024.12 the steps are...

  1. BEFORE you upgrade - delete the integration.
  2. If you have have already updated, I think you need to back off to 2024.11, as you cannot delete the integration in a perpetual retry state.
  3. Upgrade to 2024.12
  4. Re-add the integration, but do not use the auto-filled network address, replace it with the IP address
  5. uncheck the SSL verify box.

Hopefully someone can figure out the name resolution issue, and we an revert to using SSL.

@ColinRobbins
Copy link
Contributor

PS. I have a dev environment up and running now, so if you want me to test any fixes...

@raffoul
Copy link

raffoul commented Dec 5, 2024

I think something change with ssl verification with python 3.13.

The root cause seems to be close with another issue :
#132333

@ColinRobbins
Copy link
Contributor

I think something change with ssl verification with python 3.13.

The root cause seems to be close with another issue : #132333

I think that is a different issue, as that is failing at the SSL verification stage.
This integration is failing at the DNS name resolution phase.

@omegaohm
Copy link

omegaohm commented Dec 5, 2024

I just installed the update and of course I have the same issue. Hope that a solution is quickly found.

@HaraldGithub
Copy link

Getting the same error after updating to core-2024.12.0. Was connected to Somfy TaHoma Switch using the local API.

Same here after updating to 2024.12.0 => rolled back to 2024.11.3 and every thing works fine!
(BTW: Same problem with ALEXA-Media-Player)

@iMicknl
Copy link
Contributor

iMicknl commented Dec 5, 2024

Thanks all! It seems related to a Python 3.13 or HA OS update, but I will further investigate. Small issue here is that it is hard to debug this, since the Dev Mode on my TaHoma Switch is not functional at the moment....

@ColinRobbins if you use an IP address, the SSL verify should not work, this is as intended.

@iMicknl iMicknl changed the title Overkiz integration failing after upgrade to 2024.12.0 Local API cannot connect ('Could not contact DNS servers') in HA >2024.12.0 in Overkiz Dec 5, 2024
@ColinRobbins
Copy link
Contributor

ColinRobbins commented Dec 5, 2024

Thanks all! It seems related to a Python 3.13 or HA OS update, but I will further investigate.

@iMicknl My development environment is using Python 3.12, running in a Venv. So it is not P3.13 or HA OS that are causing the issue.

@andyblac
Copy link

andyblac commented Dec 5, 2024

Thanks all! It seems related to a Python 3.13 or HA OS update, but I will further investigate. Small issue here is that it is hard to debug this, since the Dev Mode on my TaHoma Switch is not functional at the moment....

@ColinRobbins if you use an IP address, the SSL verify should not work, this is as intended.

I have update to the HA OS 14.0, and still on core 2024.11.3, and everything is still working here. just to let you know.

@rimisuko
Copy link

rimisuko commented Dec 5, 2024

Just chiming in here to also say that this is definitely not caused by HAOS14. The culprit has to be some change in HA 2024.12.

I'm still on HAOS13.2, updating to HA 2024.12.0 broke the Overkiz integration. Rolling back to HA 2024.11.3 fixed the issue.

@ColinRobbins
Copy link
Contributor

ColinRobbins commented Dec 5, 2024

I don't know this integation, or how it works, but I've done some code tracing.

During the config, the method __post in python-overkiz is called twice:
https://github.com/iMicknl/python-overkiz-api/blob/b48cbe37b7926746f2693716690d5dea727046ca/pyoverkiz/client.py#L930

The first with the parameter:

https://ha101-1.overkiz.com/enduser-mobile-web/enduserAPI/config/0403-xxxx-7195/local/tokens

this works. Then it is called again with...

https://gateway-0403-xxxx-7195.local:8443/enduser-mobile-web/1/enduserAPI/events/register

which is where it fails.
This is a call out to aiohttp.
So seems like a change to aiohttp is causing the issue.

Strange that is not affecting other integrations though.

@pfefferle
Copy link

Maybe also relevant: aio-libs/aiohttp#10110

@ColinRobbins
Copy link
Contributor

Looping in @bdraco - it seems some recent aiohttp changes has caused local name resolution to fail.
Any insight?

@iMicknl
Copy link
Contributor

iMicknl commented Dec 5, 2024

Let's investigate first; there have been changes in the SSLContext (iMicknl/python-overkiz-api#1448) and aiohttp has been upgraded to the latest version.

@ColinRobbins in your example, it is able to make a connection to the Cloud API, but fails on connecting to the local API.

@iMicknl
Copy link
Contributor

iMicknl commented Dec 5, 2024

For debugging, if someone is able to test the underlying integration with this code sample: https://github.com/iMicknl/python-overkiz-api?tab=readme-ov-file#local-api. Would be great to see what error you receive.

@iMicknl
Copy link
Contributor

iMicknl commented Dec 11, 2024

@virtualj I will respond in more detail tomorrow, but can you check the gateway version? This is actually an old problem which should have been fixed: Somfy-Developer/Somfy-TaHoma-Developer-Mode#5. Are you sure your certificate does not provide any additional CN's?

And the mDNS address gateway-xxxx-xxxx-xxxx.local is valid, according to the Somfy documentation. If you cannot access this, this is due to your network / mDNS configuration or an issue of your gateway.

@virtualj
Copy link

Oh yes... You are right!
image
In fact I understand who is solving the name gateway-0808-0551-4160.local... Is the tahoma! I didn't know the mDNS service:
image
The tahoma answers only to the gateway-0808-0551-4160.local mDNS requests, does not answer to 0808-0551-4160.local.

In this case I think that ssl_verify does not work correctly and is not checking for SANs in the certificate. I don't know if it is integration related or some other function available from the OS.

@iMicknl
Copy link
Contributor

iMicknl commented Dec 12, 2024

That is what we need to find out. The problems have been introduced since the last update and we haven't been able yet to pinpoint the issue.

What doesn't help is that I can't test it myself, since my local gateway functionality is broken again...

@jirkahronik
Copy link

I started seeing this issue some time around 24 hours ago. It definitely still worked 36 hours ago (an automation ran).
I had my local connection set up without checking SSL certificate, and no matter how I try changing to cloud or local API, it keeps not working.
I downgraded HA to 2024.11.2, but the integration is still not working, although now it does not log any error whatsoever.
Could it be the Tahoma Switch firmware update that broke it?

@wolfgang-muc

This comment was marked as duplicate.

@Hl2run
Copy link

Hl2run commented Dec 13, 2024

After upgrading to 2024.12 last night the integration stopped working. No combination of local or online API and SSL enabled/disabled fixed the problem.

I downgraded via a Proxmox backup to 2024.11 and it works again now. I will wait for a fix before upgrading to .12

@ColinRobbins
Copy link
Contributor

ColinRobbins commented Dec 13, 2024

Well, this is wierd. I have it working in my development environment again, with SLL, but cannot really explain why.

There are three possible explanations (edited).

  1. Somfy have fixed something and SSL now works again.

  2. Something is fixed in the latest dev version of HA (post 2024.12.3)

  3. Is a longer story, but I added the following to /etc/hosts (substitute the x's)

192.168.x.x       gateway-0403-5459-xxxx.local
192.168.x.x       gateway-0403-5459-xxxx

That kicked it into action, with SSL.

Why did I do that?
In my original investigation in my dev environment I could not get mDNS to resolve anything locally, which led me down a rabbit hole debugging aiohttp/aiodns. But then found under HAOS mDNS (seemed) to work fine, and the Overkix issue manifested itself as a SSL error.

I tried to fix mDNS in my dev environment, and could not (not convinced its broken - just not working the way HA wants it to). So I hacked /etc/hosts so I could move on.
I was not expecting it to work, I was expecting to move me to the SSL error, so I could debug that.
But some how, by magic, it is now working.

I am reluctant to try in my (stable) HAOS production environment turning SSL back on.

So very confused as to why it now works OK in my dev environment, but this could give someone a clue...

@ColinRobbins
Copy link
Contributor

ASIDE - incase it helps anyone.

My understanding of the DNS "issue".

Under "normal" operation, Linux will try /etc/hosts then mDNS then DNS.

"Most" apps use asyncio.getadderinfo which "conforms" to the above, and resolves .local addresses.

HA uses aiodns.getaddrinfo which does not appear to call mDNS, just using DNS instead.
So it expects your DNS to resolve mDNS calls. In HAOS this works. But not in all non HAOS configurations.
So the quick fix was to add the address to /etc/hosts.

What this does not explain for me is why this suddenly makes SSL work.

Test python code incase anyone else wants to explore the issue.
dns_test.txt

@heiko69

This comment was marked as duplicate.

@ColinRobbins
Copy link
Contributor

ColinRobbins commented Dec 15, 2024

@heiko69 have you looked back in this thread and tried the suggestions above?

Recap.

  • Have you tried re-adding the integration and turning SSL off?
  • Have you tried using the IP address rather than mDNS name?
  • Have you tried the cloud API?
  • Have you turned the debug logging on by adding
logger:
 logs:
    homeassistant.components.overkiz: debug

to your configuration.yaml, restarting HA, then looking in home-assistant.log in the config directory (Not the logs via the GUI).

  • LONG SHOT. Have you tried adding
192.168.x.x       gateway-0403-5459-xxxx.local
192.168.x.x       gateway-0403-5459-xxxx

to /etc/hosts - it worked on my test system - by not yet established why.

@ColinRobbins
Copy link
Contributor

ColinRobbins commented Dec 15, 2024

@virtualj are you able to try the /etc/hosts suggestion above?
I think we have multiple issues people are seeing.

I have a suspicion that the new DNS code in HA has expectations of a certain DNS configuration has per HAOS) and on some Linux configurations the expectations are not met, and mDSN then fails. (So this maybe an issue some people are seeing - different from the SSL issue).

Looking at your cert naming suggestions, I wondering if the is new DNS code is somehow causing the naming issues (not sure how), and by using /etc/hosts it sidesteps the issue.

Not sure we’ve yet got a lead on the cert validity issue, but maybe, just maybe, it’s a side effect of the cert naming point you raise. I have it all working on my dev system, without messing around with certs - but have no idea why!

@heiko69

This comment was marked as duplicate.

@heiko69

This comment was marked as duplicate.

@thomluther
Copy link

I can confirm, that there seem to be multiple issues with the connection.
I'm still running HA 2024.11.x and never upgraded to 2024.12.x yet. Since about 2 weeks or so, I noticed from time to time that my overkiz devices were unavailable. Reloading the configuration entry sometimes brought again an error, then after 3-4 retries it suddenly connected again to local Tahoma Api.
I used so for the hostname and SSL.
I noticed however that there was a Tahoma update to 1.28 on 26. Nov 2024, so pretty much the timeframe when I started seeing the sporadic connection issues. It might be a co-incidence, that this felt together with the first 2024.12 release and people just started to see first connection issues with the new HA version.
What I tried today to fix the sporadic connection issues in 2024.11 HA:

  1. Updated the config entry directly in the .storage/core.config_entries and disabled SSL of existing config
  2. Restarted HA and the initial connection issues remained even with disabled SSL
  3. Updated also the hostname to the Tahoma IP address directly in the .storage/core.config_entries
  4. Restarted HA again and then it connected directly during HA restart. No disconnected Overkiz devices seen since then today

Conclusion:
The hostname resolution issues might be related to the Tahoma FW update in Nov.
Not sure if disabling SSL really was required in my case. But Certificate issues could also arise just when going to HA 2024.12 and Python 3.13

@ric1001

This comment was marked as duplicate.

@iMicknl
Copy link
Contributor

iMicknl commented Dec 17, 2024

Thanks all! As mentioned earlier, the issue here is in the SSL verification since a recent update. You can change your existing entry to turn of ssl_verification, or create a new entry.

No need to reply that the workaround works. Thanks!

TLDR: as mentioned 2 times earlier, please only reply if you have new information.

@iMicknl iMicknl mentioned this issue Dec 22, 2024
19 tasks
@iMicknl
Copy link
Contributor

iMicknl commented Dec 22, 2024

Today I finally had access to a local Overkiz gateway and could debug the culprit. Since python/cpython#107361, which enables stricter RFC 5280 compliance checks within OpenSSL's X.509 path validation implementation. The provided certificates by the Overkiz gateways are not compliant to these checks and thus the certificate cannot be verified.

aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host gateway-1225-7298-3293.local:8443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Missing Authority Key Identifier (_ssl.c:1018)')]

I have disable the strict checks and in an upcoming update the SSL verification will be working again (#133835). The other known issue (#108595) is not solved by this fix. Users connecting to the Local API might still see "Network is unreachable" / ClientConnectorError. We can discuss this in more detail in #108595.

@BauStein-So
Copy link

Well, this is wierd. I have it working in my development environment again, with SLL, but cannot really explain why.

There are three possible explanations (edited).

  1. Somfy have fixed something and SSL now works again.
  2. Something is fixed in the latest dev version of HA (post 2024.12.3)
  3. Is a longer story, but I added the following to /etc/hosts (substitute the x's)
192.168.x.x       gateway-0403-5459-xxxx.local
192.168.x.x       gateway-0403-5459-xxxx

That kicked it into action, with SSL.

Why did I do that? In my original investigation in my dev environment I could not get mDNS to resolve anything locally, which led me down a rabbit hole debugging aiohttp/aiodns. But then found under HAOS mDNS (seemed) to work fine, and the Overkix issue manifested itself as a SSL error.

I tried to fix mDNS in my dev environment, and could not (not convinced its broken - just not working the way HA wants it to). So I hacked /etc/hosts so I could move on. I was not expecting it to work, I was expecting to move me to the SSL error, so I could debug that. But some how, by magic, it is now working.

I am reluctant to try in my (stable) HAOS production environment turning SSL back on.

So very confused as to why it now works OK in my dev environment, but this could give someone a clue...

I tested this solution but it doesn’t work.

Today I finally had access to a local Overkiz gateway and could debug the culprit. Since python/cpython#107361, which enables stricter RFC 5280 compliance checks within OpenSSL's X.509 path validation implementation. The provided certificates by the Overkiz gateways are not compliant to these checks and thus the certificate cannot be verified.

aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host gateway-1225-7298-3293.local:8443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Missing Authority Key Identifier (_ssl.c:1018)')]

I have disable the strict checks and in an upcoming update the SSL verification will be working again (#133835). The other known issue (#108595) is not solved by this fix. Users connecting to the Local API might still see "Network is unreachable" / ClientConnectorError. We can discuss this in more detail in #108595.

Do you know when this will be updated? In 2025.01 I can’t see this fix at the moment.

@iMicknl
Copy link
Contributor

iMicknl commented Dec 26, 2024

@BauStein-So are you sure you run the beta? What is the error you receive?

@BauStein-So
Copy link

@BauStein-So are you sure you run the beta? What is the error you receive?

First I would like to thank you for your resolution of the problem.
At the moment I run 2024.12.5.
I looked into the release notes of 2025.1 and I couldn’t find this fix.

@iMicknl
Copy link
Contributor

iMicknl commented Dec 27, 2024

It is fixed in 2025.1 and I did test this myself. If anyone from this issue thread would be willing to test the beta this period (https://www.home-assistant.io/common-tasks/os/#running-a-beta-version), that would be appreciated.

@Stony81
Copy link

Stony81 commented Dec 27, 2024

Just installed 2025.1.0b2.
Issue is fixed for me.

Was not working from 2024.12.0 to 12.5

Using TaHoma classic v2 w. localAPI enabled in w. Europe / Germany

Thank you, good work :-)

@raffoul
Copy link

raffoul commented Dec 30, 2024

It is fixed in 2025.1 and I did test this myself. If anyone from this issue thread would be willing to test the beta this period (https://www.home-assistant.io/common-tasks/os/#running-a-beta-version), that would be appreciated.

Ok in 2025.1. Thanks for your fix.

@ova13lastar
Copy link

ova13lastar commented Jan 4, 2025

I've upgraded with official 2025.1 update this morning but authentication with local API is always down 🤔😓

@iMicknl
Copy link
Contributor

iMicknl commented Jan 4, 2025

I've upgraded with official 2025.1 update this morning but authentication with local API is always down 🤔😓

Does cloud authentication work? Can you create a new issue here, preferably with debug mode on and your logs. And as many information as possible.

https://github.com/Somfy-Developer/Somfy-TaHoma-Developer-Mode. You could try the steps here as well, to understand if the authentication is working..

@BauStein-So
Copy link

I confirm that with 2025.1.0. Local-API works fine again.

thank you very much

@physed109
Copy link

Also now working again for me with 2025.1.0
Thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.