Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immich server does not start after update to v1.119.1 #13867

Closed
2 of 3 tasks
werefkin opened this issue Nov 1, 2024 · 35 comments
Closed
2 of 3 tasks

Immich server does not start after update to v1.119.1 #13867

werefkin opened this issue Nov 1, 2024 · 35 comments

Comments

@werefkin
Copy link

werefkin commented Nov 1, 2024

The bug

After the standard update procedure I am not able to access the server, the immich_server container is constantly restarting.

The OS that Immich Server is running on

Raspbian 6.6.51+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt2 (2024-10-01) aarch64 GNU/Linux

Version of Immich Server

v1.119.1

Version of Immich Mobile App

v1.119.1

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.transcoding.yml
    #   service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - '2283:2283'
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    healthcheck:
      disable: false

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:2ba50e1ac3a0ea17b736ce9db2b0a9f6f8b85d4c27d5f5accc6a416d8f42c6d5
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command:
      [
        'postgres',
        '-c',
        'shared_preload_libraries=vectors.so',
        '-c',
        'search_path="$$user", public, vectors',
        '-c',
        'logging_collector=on',
        '-c',
        'max_wal_size=2GB',
        '-c',
        'shared_buffers=512MB',
        '-c',
        'wal_compression=on',
      ]
    restart: always

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

TYPESENSE_API_KEY=111

# The location where your uploaded files are stored
UPLOAD_LOCATION=/media/dfr
DB_DATA_LOCATION=./postgres

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=111

# The values below this line do not need to be changed
###################################################################################
DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

REDIS_HOSTNAME=immich_redis

Reproduction steps

  1. Update to the latest version

Relevant log output

Status: unhealthy. The portrainer shows the following output:


TypeError: fetch failed at node:internal/deps/undici/undici:13185:13 at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async main (/usr/src/app/dist/bin/healthcheck.js:14:26) { [cause]: Error: connect ECONNREFUSED 127.0.0.1:2283 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1607:16) { errno: -111, code: 'ECONNREFUSED', syscall: 'connect', address: '127.0.0.1', port: 2283 } } 


### Additional information

_No response_
@bo0tzz
Copy link
Member

bo0tzz commented Nov 1, 2024

Please post the full logs of the containers

@werefkin
Copy link
Author

werefkin commented Nov 1, 2024

Logs of immich_server

Please post the full logs of the containers

[Nest] 35 - 11/01/2024, 12:15:54 PM LOG [Api:EventRepository] Websocket Disconnect: V_7qQ_xXQW2lnExZAAAL

[Nest] 35 - 11/01/2024, 12:15:55 PM LOG [Api:EventRepository] Websocket Connect: KkeHHhDoqLCCNWJfAAAN

[Nest] 7 - 11/01/2024, 12:16:32 PM LOG [Microservices:PersonService] Detected 3 new faces in asset ea7a839d-6cb8-4150-a7a8-b12ff3972e1b

[Nest] 7 - 11/01/2024, 12:16:42 PM LOG [Microservices:PersonService] Detected 1 new faces in asset 8fa138ae-3a13-4cd5-b4b1-ae9a3cd4cba5

[Nest] 7 - 11/01/2024, 12:16:55 PM LOG [Microservices:PersonService] Detected 3 new faces in asset 7d3b4126-42f7-4bc1-82dc-97236932b270

[Nest] 7 - 11/01/2024, 12:16:55 PM LOG [Microservices:PersonService] Detected 2 new faces in asset 79ca02df-4492-491c-8f33-9f8f8c79375f

[Nest] 7 - 11/01/2024, 12:17:04 PM LOG [Microservices:PersonService] Detected 2 new faces in asset 5932333f-61b0-4ccd-a3b2-c329746f75be

[Nest] 7 - 11/01/2024, 12:17:06 PM LOG [Microservices:PersonService] Detected 3 new faces in asset 8342fefb-360b-47b7-88f7-47902a99d10d

[Nest] 7 - 11/01/2024, 12:17:11 PM LOG [Microservices:PersonService] Detected 1 new faces in asset 4185d688-e60b-4796-9adc-67bac8918d52

[Nest] 7 - 11/01/2024, 12:17:11 PM LOG [Microservices:PersonService] Detected 1 new faces in asset 8f2c2d55-60fa-43c8-9fda-1ed78dba9499

[Nest] 7 - 11/01/2024, 12:17:18 PM LOG [Microservices:PersonService] Detected 2 new faces in asset 530eade0-c983-4137-9acf-495f32fa960b

[Nest] 7 - 11/01/2024, 12:17:18 PM LOG [Microservices:PersonService] Detected 2 new faces in asset a692dcf4-e9ea-4ef1-a650-ba8788cfb4ca

[Nest] 7 - 11/01/2024, 12:17:25 PM LOG [Microservices:PersonService] Detected 2 new faces in asset 42d5d480-cb01-444a-9ad2-08bbb29d4a59

[Nest] 7 - 11/01/2024, 12:17:32 PM LOG [Microservices:PersonService] Detected 3 new faces in asset 3d640825-07ad-4a30-a79b-f5e299b9cbdd

[Nest] 7 - 11/01/2024, 12:17:36 PM LOG [Microservices:PersonService] Detected 3 new faces in asset 3cef301e-85a3-4222-9a7b-31ab1aa22fa1

[Nest] 7 - 11/01/2024, 12:17:43 PM LOG [Microservices:PersonService] Detected 3 new faces in asset b9b0424a-9499-4765-9ab6-1d7601a1b72e

[Nest] 7 - 11/01/2024, 12:17:45 PM LOG [Microservices:PersonService] Detected 3 new faces in asset f1bc154c-52ac-4072-acd4-2b6981932e68

[Nest] 7 - 11/01/2024, 12:17:52 PM LOG [Microservices:PersonService] Detected 3 new faces in asset a9fc5de2-b00d-46fe-a94a-3daa5d300968

[Nest] 7 - 11/01/2024, 12:17:53 PM LOG [Microservices:PersonService] Detected 3 new faces in asset 518ff394-e0ac-4010-9bd0-8cdd1655fdf4

[Nest] 35 - 11/01/2024, 12:17:59 PM LOG [Api:EventRepository] Websocket Disconnect: AY2xp-aysxh61x5qAAAF

[Nest] 7 - 11/01/2024, 12:18:00 PM LOG [Microservices:PersonService] Detected 2 new faces in asset 9cddd2eb-daed-4cff-a423-a3db6aa221e7

[Nest] 7 - 11/01/2024, 12:18:02 PM LOG [Microservices:PersonService] Detected 3 new faces in asset bfb2c201-0945-473c-8e68-2e92583727f3

[Nest] 7 - 11/01/2024, 12:18:09 PM LOG [Microservices:PersonService] Detected 2 new faces in asset b2a07624-6190-40de-b462-33d50cc1b3ae

[Nest] 7 - 11/01/2024, 12:18:13 PM LOG [Microservices:PersonService] Detected 3 new faces in asset ccfc7fab-37be-4c28-8c1b-25ecedab0d98

[Nest] 7 - 11/01/2024, 12:18:18 PM LOG [Microservices:PersonService] Detected 3 new faces in asset 8720237e-25e0-4617-8e87-ce7ceefe8301

[Nest] 7 - 11/01/2024, 12:18:24 PM LOG [Microservices:PersonService] Detected 1 new faces in asset 1a2e9327-7351-4f42-95c6-742ed4d71842

[Nest] 7 - 11/01/2024, 12:18:26 PM LOG [Microservices:PersonService] Detected 3 new faces in asset de9775d5-6cce-474e-baf8-76e5dc5f6f52

[Nest] 7 - 11/01/2024, 12:18:33 PM LOG [Microservices:PersonService] Detected 1 new faces in asset a4428814-2308-49b7-8278-eb481f2858d4

[Nest] 7 - 11/01/2024, 12:18:53 PM LOG [Microservices:PersonService] Detected 1 new faces in asset f7773362-b1cf-4616-9b7a-b2cae93e9d1b

Initializing Immich v1.119.1

Detected CPU Cores: 4

Starting api worker

Starting microservices worker

[Nest] 7 - 11/01/2024, 12:21:06 PM LOG [Microservices:EventRepository] Initialized websocket server

[Nest] 35 - 11/01/2024, 12:21:07 PM LOG [Api:EventRepository] Initialized websocket server

[Nest] 7 - 11/01/2024, 12:21:10 PM LOG [Microservices:MapRepository] Initializing metadata repository

@lupoalberto12
Copy link

I have the same problem.
Here to follow.
Regards.

@alextran1502
Copy link
Contributor

@werefkin Can you try restart the stack with docker compose down, docker compose up?

@lupoalberto12 your same problem might be from different issues with the setup, so you should create a different discussion thread with your setup and logs we can help

@werefkin
Copy link
Author

werefkin commented Nov 1, 2024

@werefkin Can you try restart the stack with docker compose down, docker compose up?

Hm.. It is the same, it fails to start (there are some failure counts at the beginning and unhealthy status then), but apparently after some random time it does start indeed, cannot distingush a pattern yet.
I also redeployed the container and restored a snapshot from library and database from some days ago, it is the same

@alextran1502
Copy link
Contributor

alextran1502 commented Nov 1, 2024

Can you try bring down the containers, remove the images, then repull and restart?

Nvm it started, so it was just downloading and ingesting the geocoding data

@yunasc
Copy link

yunasc commented Nov 1, 2024

How long did it took for your instance to start? I tried waiting for 2 hours - no luck.

@werefkin
Copy link
Author

werefkin commented Nov 1, 2024

How long did it took for your instance to start? I tried waiting for 2 hours - no luck.

not that long, a few minutes, I still see the error and starting attempts though. I completety remove the stack and restored with my database backup.

@yunasc
Copy link

yunasc commented Nov 1, 2024

Yikes! I know that Immich is for 3-2-1 setup, but still from v1.x.x I would expect some backwards compatibility, especially since changelog mentioned nothing about this.

@bo0tzz
Copy link
Member

bo0tzz commented Nov 1, 2024

To note - Immich is not following semver yet. I'm not sure what you're referring to with the changelog/compat comment though?

@werefkin
Copy link
Author

werefkin commented Nov 1, 2024

How long did it took for your instance to start? I tried waiting for 2 hours - no luck.

I needed 7 minutes with 1 failed trial and abovementioned TypeError issue. I will see how it performs, might be I think about downgrading it..

@alextran1502
Copy link
Contributor

@zackpollard I wonder if there is any log we can add in this process to give users some feedback?

@zackpollard
Copy link
Contributor

The geodata really should take minutes at maximum, for most people it's like 10 seconds. I'm surprised it's taking this long but I can take a look into it at some point

@avtarsingh1122
Copy link

for me it took approx 16 minutes to be healthy, but working fine afterwards

@yunasc
Copy link

yunasc commented Nov 3, 2024

changelog/compat comment though?

nvm.

As for the issue, I can see (after plugging in main image from 13 hours ago), that the slow part is pgsql.
Which is created from the helm chart with the following resources (default ones):

        limits:
          cpu: 150m
          ephemeral-storage: 2Gi
          memory: 192Mi
        requests:
          cpu: 100m
          ephemeral-storage: 50Mi
          memory: 128Mi

After uncapping pgsql and restarting both immich and pgsql now it's done pretty fast:

[Nest] 17  - 11/03/2024, 4:43:38 PM     LOG [Api:EventRepository] Initialized websocket server
[Nest] 7  - 11/03/2024, 4:44:20 PM     LOG [Microservices:MapRepository] Geodata import completed

@JvdMaat
Copy link

JvdMaat commented Nov 3, 2024

Came here after an hour of troubleshooting figuring I had messed up the port 3001->2283 conversion (after Immich had already been down for 4 hours). Turns out it took 5 hours for postgres to fix itself and allow immich to boot. This would have been good to know before upgrading.
image

@werefkin
Copy link
Author

werefkin commented Nov 3, 2024

Came here after an hour of troubleshooting figuring I had messed up the port 3001->2283 conversion (after Immich had already been down for 4 hours). Turns out it took 5 hours for postgres to fix itself and allow immich to boot. This would have been good to know before upgrading.
image

Ha, actually, I did change the port during the troubleshooting as well (completely reinstalled immich and restored the backup), maybe this explains that it started to boot after that, just checked the previous docker compose file.

@werefkin
Copy link
Author

werefkin commented Nov 4, 2024

@zackpollard

I am wondering, the issue is closed, was there any solution? Apparently the problem is common

@disconnect5852
Copy link

also me having this problem immich_server and immich_microservices are crashing. Reverted ti v114, where i upgraded from, and it works.

@werefkin
Copy link
Author

werefkin commented Nov 5, 2024

also me having this problem immich_server and immich_microservices are crashing. Reverted ti v114, where i upgraded from, and it works.

Ok, I believe it is the same bug. The microservices do have indeed the same behaviour as server and fail to start. Did you modify version in the env file to downgrade?

@zackpollard
Copy link
Contributor

If people who are commenting they have the same issue please include what hardware you're running on and confirm that you don't have any resource limits set on your containers. I did a bunch of testing yesterday and couldn't get the geocoding import process to take any more than 45 seconds, except in the case that postgres resources were severely restricted for RAM.

@disconnect5852
Copy link

also me having this problem immich_server and immich_microservices are crashing. Reverted ti v114, where i upgraded from, and it works.

Ok, I believe it is the same bug. The microservices do have indeed the same behaviour as server and fail to start. Did you modify version in the env file to downgrade?

Yes, i set the version in .env.

My hardware is a 2 core celeron N2807 with 4 gb ram. Nothing ran out of resources, swap is just on 16%

@werefkin
Copy link
Author

werefkin commented Nov 5, 2024

If people who are commenting they have the same issue please include what hardware you're running on and confirm that you don't have any resource limits set on your containers. I did a bunch of testing yesterday and couldn't get the geocoding import process to take any more than 45 seconds, except in the case that postgres resources were severely restricted for RAM.

I run it (together with portrainer and nextcloud) on Raspberry Pi 4 with 4GB RAM. No limits are set, the issue was not present on the previous versions. As mentioned, it did not start at all right after the update, I made a completely new deployment of the stack and restored the backup after that it starts, though takes more time with some startup failures.

@bo0tzz
Copy link
Member

bo0tzz commented Nov 5, 2024

2 core celeron N2807 with 4 gb ram.

That's a seriously underpowered system. I don't think we can do much about Immich running poorly on that.

@zackpollard
Copy link
Contributor

zackpollard commented Nov 5, 2024

I run it (together with portrainer and nextcloud) on Raspberry Pi 4 with 4GB RAM. No limits are set, the issue was not present on the previous versions. As mentioned, it did not start at all right after the update, I made a completely new deployment of the stack and restored the backup after that it starts, though takes more time with some startup failures.

Could you expand more what you mean by "some startup failures"?

@JvdMaat
Copy link

JvdMaat commented Nov 5, 2024

That's a seriously underpowered system. I don't think we can do much about Immich running poorly on that.

I think a good chunk of the userbase is running it on underpowered systems (I'm running mine (along with 30 other containers) on an old 2014 laptop with 4 cores and 16GB RAM.)
I don't think the issue is that it takes a while. I think it would help if that was communicated. If we know that for a certain upgrade it'll take time to come back up due to DB work, we can wait for that (and even look for that in our DB CPU utilization, which clearly showed postgres working hard for 5 hours).
All I saw when I upgraded (from 117 to 119) was that the immich_server container kept failing healthcheck, crashing and restarting and giving errors on the port 2283. (which coincided with the v118 port change). So I tried troubleshooting an issue that wasn't actually there.

On a side note, I also just noticed that the laptop's CPU is pegged at 100% (and 15m load is at 20) now because I started a missing face detection last night. Which is fine. I don't care if that takes a few days. The other containers are still responsive, and the faces will get detected eventually without requiring further intervention from me.

@werefkin
Copy link
Author

werefkin commented Nov 5, 2024

Could you expand more what you mean by "some startup failures"?
@zackpollard
Unhealthy status with several failure counts and following output:
TypeError: fetch failed at node:internal/deps/undici/undici:13185:13 at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async main (/usr/src/app/dist/bin/healthcheck.js:14:26) { [cause]: Error: connect ECONNREFUSED 127.0.0.1:2283 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1607:16) { errno: -111, code: 'ECONNREFUSED', syscall: 'connect', address: '127.0.0.1', port: 2283 } }

Screenshot From 2024-11-05 15-05-05

@werefkin
Copy link
Author

werefkin commented Nov 5, 2024

OK, I first wrote that the start was successful after the reinstallation of the entire stack, apparently it is still unstable. It is not running currently and is in starting phase continuously.

@werefkin
Copy link
Author

werefkin commented Nov 5, 2024

Screenshot From 2024-11-05 15-31-19
this is the latest output I have

@werefkin
Copy link
Author

werefkin commented Nov 5, 2024

That's a seriously underpowered system. I don't think we can do much about Immich running poorly on that.

I think a good chunk of the userbase is running it on underpowered systems (I'm running mine (along with 30 other containers) on an old 2014 laptop with 4 cores and 16GB RAM.) I don't think the issue is that it takes a while. I think it would help if that was communicated. If we know that for a certain upgrade it'll take time to come back up due to DB work, we can wait for that (and even look for that in our DB CPU utilization, which clearly showed postgres working hard for 5 hours). All I saw when I upgraded (from 117 to 119) was that the immich_server container kept failing healthcheck, crashing and restarting and giving errors on the port 2283. (which coincided with the v118 port change). So I tried troubleshooting an issue that wasn't actually there.

On a side note, I also just noticed that the laptop's CPU is pegged at 100% (and 15m load is at 20) now because I started a missing face detection last night. Which is fine. I don't care if that takes a few days. The other containers are still responsive, and the faces will get detected eventually without requiring further intervention from me.

You seem to experience exactly the same symptoms. Did you downgrade it and if yes, to what version?

@JvdMaat
Copy link

JvdMaat commented Nov 5, 2024

You seem to experience exactly the same symptoms. Did you downgrade it and if yes, to what version?

I upgraded (from 117 to 119), let it sit for 4 hours (yardwork), came back, noticed it was down and spent an hour troubleshooting (updating ports in docker compose & swag, going back to 117, etc, etc), finally back up to 119, and then it suddenly pulled out of it and worked. Check the CPU utilization on the postgres container. This is what mine looked like during the upgrade to 119:
image
The dip at 16:15 is where I downgraded to 117, and then back to 119 at 16:30. And eventually once that CPU peak settled at 17:15, immich-server suddenly came online and stopped crashing.

@zackpollard
Copy link
Contributor

Screenshot From 2024-11-05 15-31-19 this is the latest output I have

This is the output of the healthcheck, we would need the output from the logs

@werefkin
Copy link
Author

werefkin commented Nov 5, 2024

This is the output of the healthcheck, we would need the output from the logs

@zackpollard I see no errors in the logs, I sent them already before. The thing is it isn't really reproducable, I had immich running yesterday as normally, but today after a restart it didnt start at all, but started after a manual system reboot reboot.I will try to catch it and post if there is something. Thank you

@PotatoYummy

This comment was marked as duplicate.

@caothu159
Copy link

After I upgrade RAM to 4GB and add 1GB swap, Immich-server stop restart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests