-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readiness probe failed #130
Comments
I think the pod is just taking a long time to start. You might want to try making the readiness timeout a bit longer. |
Hi! Not sure if this was how you meant but I increased the readiness and startup probes by modifying the server values: # Immich components
server:
enabled: true
probes:
readiness:
custom: true
spec:
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startup:
custom: true
spec:
initialDelaySeconds: 0
timeoutSeconds: 5
## This means it has a maximum of 5*30=150 seconds to start up before it fails
periodSeconds: 10
failureThreshold: 60 By inspecting the pod I can see that the changes are reflected on the pod:
But i still get
All i see in the pod logs is still:
|
I'm having a similar issue as well [Nest] 17 - 10/05/2024, 2:20:02 PM LOG [Api:Bootstrap] Immich Server is listening on http://[::1]:3001 [v1.106.1] [PRODUCTION] Seemingly issues with redis. Granted, perhaps I'm setting up redis wrong and this is different than your issue. But I don't think so. My redis pods are starting fine and I'm deploying it along with the rest of the helm chart. So should be what's provided by default. |
I had similar, if not the same errors about Redis. But I think that was just initially before the Redis pod had started? Now if I evict the Immich server pod I don't see any Redis errors. |
Good call @morotsgurka. I think you're right. They went away for me too. App still not starting of course. Presumably due to to the probes. |
same, reverting to tag v1.116.0 is fine, however same error if v1.117.0 |
Can confirm. Added image tag of 1.116 and now it starts |
Same issue here:
Using Chart 0.8.1 and 1.107.0 |
I have same issue ,now I rolle to tag v1.116.0 |
Same issue here! Last working tag for me is v1.116.2 |
Readiness probe fails because the immich-api process does not listen on TCP port 3001. I don't know why. I have two immich instances on one kubernetes node. Old instance is running v1.116.2, whereas the new is trying to start with v1.117.0. The new instance does not listen on TCP/3001, although the liveness probe timeout was increased from 30 to 690 seconds, to prevent killing the pod too early. ps auxw | grep immich | grep -vE 'postgres|grep'
# root 52171 11.7 2.0 22835356 325564 ? Sl 17:48 0:19 immich
# root 52227 4.7 1.0 11648044 174408 ? Sl 17:48 0:07 immich-api
# root 2082901 0.2 1.6 22981404 268440 ? Sl Okt02 31:55 immich
# root 2083038 0.0 0.9 11658532 148792 ? Sl Okt02 8:37 immich-api
nsenter --net --target 2082901
# We are in the linux network namespace of the immich-server v1.116.2 container.
ss -apn | grep LISTEN
# tcp LISTEN 0 511 *:8082 *:* users:(("immich",pid=2082901,fd=24))
# tcp LISTEN 0 511 *:8081 *:* users:(("immich-api",pid=2083038,fd=19))
# tcp LISTEN 0 511 *:3001 *:* users:(("immich-api",pid=2083038,fd=39))
# tcp LISTEN 0 511 *:33673 *:* users:(("immich",pid=2082901,fd=72))
exit # back to host NS.
nsenter --net --target 40560
ss -apn | grep LISTEN
# We are in the linux network namespace of the immich-server v1.117.0 container.
# tcp LISTEN 0 511 *:8081 *:* users:(("immich-api",pid=52227,fd=19))
# tcp LISTEN 0 511 *:8082 *:* users:(("immich",pid=52171,fd=24)) The v1.117.0 logs are pretty short. Both v1.117.0 and v1.116.2 logs are given below. Setting the Edit: setting Edit 2: when the container is killed, it prints out the fourth log line: # kubectl -n immich1 logs deploy/immich-server
Detected CPU Cores: 4
Starting api worker
Starting microservices worker
[Nest] 6 - 10/10/2024, 3:53:23 PM LOG [Microservices:EventRepository] Initialized websocket server
[Nest] 6 - 10/10/2024, 3:53:23 PM LOG [Microservices:MapRepository] Initializing metadata repository
[Nest] 16 - 10/10/2024, 3:53:23 PM LOG [Api:EventRepository] Initialized websocket server
# kubectl -n immich2 logs deploy/immich-server | less -R
Detected CPU Cores: 4
Starting api worker
Starting microservices worker
[Nest] 8 - 10/02/2024, 10:51:46 AM LOG [Microservices:EventRepository] Initialized websocket server
[Nest] 18 - 10/02/2024, 10:51:46 AM LOG [Api:EventRepository] Initialized websocket server
[Nest] 8 - 10/02/2024, 10:51:47 AM LOG [Microservices:SystemConfigService] LogLevel=log (set via system config)
[Nest] 18 - 10/02/2024, 10:51:47 AM LOG [Api:SystemConfigService] LogLevel=log (set via system config)
[Nest] 18 - 10/02/2024, 10:51:47 AM LOG [Api:ServerService] Feature Flags: { "smartSearch": true, "facialRecognition": true, "duplicateDetection": true, "map": true, "reverseGeocoding": true, "importFaces": false, "sidecar": true, "search": true, "trash": true, "oauth": false, "oauthAutoLaunch": false, "passwordLogin": true, "configFile": false, "email": false }
[Nest] 8 - 10/02/2024, 10:51:47 AM LOG [Microservices:MapRepository] Initializing metadata repository
[Nest] 18 - 10/02/2024, 10:51:47 AM LOG [Api:StorageService] Verifying system mount folder checks (enabled=false)
[Nest] 18 - 10/02/2024, 10:51:47 AM LOG [Api:StorageService] Writing initial mount file for the encoded-video folder
etc. |
Hm, something big has changed there. Previously, the |
There are only 34 commits changing the server. Someone has to bisect, build and deploy them :) git log v1.106.2..v1.107.0 | grep server -B4 | grep '\--' |wc -l
34 |
I also experienced the same issue as I tried to install immich for the first time. All pods start except for immich-server, which is continuously unhealthy due to startup probe failure: Startup probe failed: Get "http://10.42.1.21:3001/api/server/ping": dial tcp 10.42.1.21:3001: connect: connection refuse In the logs for the server pod, there's lots of this: Error: connect ETIMEDOUT
at Socket.<anonymous> (/usr/src/app/node_modules/ioredis/built/Redis.js:170:41)
at Object.onceWrapper (node:events:633:28)
at Socket.emit (node:events:519:28)
at Socket._onTimeout (node:net:591:8)
at listOnTimeout (node:internal/timers:581:17)
at process.processTimers (node:internal/timers:519:7) {
errorno: 'ETIMEDOUT',
code: 'ETIMEDOUT',
syscall: 'connect'
} I've confirmed that the same errors happen for me on v1.117.0 and v1.116.2. |
@cconcannon have you enabled the Redis pods? This issue seems similar to earlier comments in this thread? I also had similar issues but only on first boot before redis was initialized. And the issue we seem to be having is not present on 1.116 |
same issue here with startup probe and helm chart 0.8.1 and immich v1.117.0. setting |
@morotsgurka yes I enabled Redis and Redis starts successfully. I see Redis, Postgres, and Machine Learning pods all successfully start. Server continues to have errors, even after I try deleting it. |
Ive had continuous problems with the probes and completely disabled them. That has helped in the past. But it hangs indefinitely for me on |
I spent quite a while trying to fix this. To the point of taking a DB backup, deploying a standalone postgresql, and reinstalling 116, then upgrading again to 117. What I ultimately determined (for me at least) was that immich leaves the postgres resources as the default ( My suspiscion is that the DB migration required for 117 (or something) takes way too long on the nano and the probes cause a restart, causing the DB migration to kick off again (or something). |
@rjbez17 I verified this solution, because the postgres container resources are too small, set the resourcesPreset of postgresql: "large", and immich-server starts normally. |
I run into the same problem when using an PG DB outside a container. |
I was pretty stumped by this issue, but in hindsight the postgres resources is obvious 🤦 While I have you all here: I'd love your feedback on #129. |
Setting
Finally, it helped to increase the server:
probes:
startup:
spec:
failureThreshold: 360 So, this issue should have been called |
Ugh glad I found this. Didn't realize that the bitnami chart was adding CPU limits to my postgres pod. I would highly recommend at least suggesting to people using that chart to set it to a higher preset for better performance. I've been having awful performance lately and wasn't sure why, thought my k8s cluster was just underpowered for immich. Some light reading for those interested on why CPU limits are bad |
Hi!
I'm running k3s on 3 nodes together with Longhorn and FluxCD.
immich-chart version: 0.8.1
Found this issue on the immich repo with similar problem, but the problem there seemed to be that he had not enabled postgresql, but I have.
This is my values file:
I have created volumes in Longhorn with PVC's for each claim. They all attach normally.
I can see that all pods are running, except the immich-server, which fails to give a readiness probe. If I check the pod logs I just see:
For the immich-postgresql-secret I have just created a generic secret in the same namespace like:
The text was updated successfully, but these errors were encountered: