Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard stuck on loading after 0.29.4 > 0.30.0 update (Self-Hosted) if "JWT group sync" is enabled #2696

Closed
florian-obradovic opened this issue Oct 4, 2024 · 18 comments · Fixed by #2767
Assignees
Labels
bug Something isn't working jwt management-service

Comments

@florian-obradovic
Copy link

florian-obradovic commented Oct 4, 2024

Describe the problem
After
Dashboard stuck on loading after 0.29.4 > 0.30.0 update.
Slack: https://netbirdio.slack.com/archives/C05T5K65X7U/p1728077981790249

It works again, if I switch back to image: netbirdio/management:0.29.4 in dokcer-compose.yml.
I just upgraded to separate Netbird instances. One instance worked fine but my private one is stuck on loading the dashboard.

Both instances use Entra ID (AAD) as IDP!

# netbird_version_check.sh
MGMTCID=$(docker ps|grep management|awk ‘{print $1}’)
if [[ -z $MGMTCID ]] ; then
       echo No MGMT container found...
       exit 1
fi
nbversion=$(docker inspect $MGMTCID |grep “org.opencontainers.image.version”|awk -F: ‘{print $2}‘)
echo Server is running Netbird management version: $nbversion
DASHBOARDCID=$(docker ps|grep dashboard|awk ‘{print $1}’)
if [[ -z $DASHBOARDCID ]] ; then
       echo No MGMT container found...
       exit 1
fi
nbversion=$(docker inspect $DASHBOARDCID |grep “org.opencontainers.image.version”|awk -F: ‘{print $2}’)
echo Server is running Netbird dashboard version: $nbversion
Server is running Netbird management version:  “0.30.0”
Server is running Netbird dashboard version:  “v2.6.0"
dashboard-1   | 9.9.9.9 - - [04/Oct/2024:21:46:04 +0000] "GET /settings HTTP/1.1" 200 2107 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "-"
management-1  | 2024-10-04T21:46:04Z ERRO [requestID: c2adc7c8-33f6-4e5e-a111-433e31e2de7e, context: HTTP] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: context canceled
management-1  | 2024-10-04T21:46:04Z ERRO [context: HTTP, requestID: c2adc7c8-33f6-4e5e-a111-433e31e2de7e] management/server/http/util/util.go:81: got a handler error: token invalid
management-1  | 2024-10-04T21:46:04Z ERRO [context: HTTP, requestID: c2adc7c8-33f6-4e5e-a111-433e31e2de7e] management/server/telemetry/http_api_metrics.go:168: HTTP response c2adc7c8-33f6-4e5e-a111-433e31e2de7e: GET /api/users status 401
management-1  | 2024-10-04T21:46:04Z ERRO [context: HTTP, requestID: fcf93833-2c24-4572-ab2e-fd37333bddd1] management/server/sql_store.go:433: error when getting account from the store: context canceled
management-1  | 2024-10-04T21:46:04Z ERRO [requestID: fcf93833-2c24-4572-ab2e-fd37333bddd1, context: HTTP] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: issue getting account from store
management-1  | 2024-10-04T21:46:04Z ERRO [context: HTTP, requestID: fcf93833-2c24-4572-ab2e-fd37333bddd1] management/server/http/util/util.go:81: got a handler error: token invalid
management-1  | 2024-10-04T21:46:04Z ERRO [context: HTTP, requestID: fcf93833-2c24-4572-ab2e-fd37333bddd1] management/server/telemetry/http_api_metrics.go:168: HTTP response fcf93833-2c24-4572-ab2e-fd37333bddd1: GET /api/groups status 401
management-1  | 2024-10-04T21:46:04Z ERRO [context: HTTP, requestID: 9feb2ce6-b786-4c0b-8171-403949d5fced] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: error getting user: issue getting user from store
management-1  | 2024-10-04T21:46:04Z ERRO [context: HTTP, requestID: 9feb2ce6-b786-4c0b-8171-403949d5fced] management/server/http/util/util.go:81: got a handler error: token invalid
management-1  | 2024-10-04T21:46:04Z ERRO [context: HTTP, requestID: 9feb2ce6-b786-4c0b-8171-403949d5fced] management/server/telemetry/http_api_metrics.go:168: HTTP response 9feb2ce6-b786-4c0b-8171-403949d5fced: GET /api/accounts status 401

Screenshots

CleanShot 2024-10-04 at 23 39 18@2x

docker-compose.yml

#version: "3"
services:
  #UI dashboard
  dashboard:
    image: netbirdio/dashboard:latest
    #image: netbirdio/dashboard:v2.5.0
    restart: unless-stopped
    ports:
      - 80:80
      - 443:443
    environment:
      # Endpoints
      - NETBIRD_MGMT_API_ENDPOINT=https://netbird.my-domain.com:33073
      - NETBIRD_MGMT_GRPC_API_ENDPOINT=https://netbird.my-domain.com:33073
      # OIDC
      - AUTH_AUDIENCE=GUID-GUID
      - AUTH_CLIENT_ID=GUID-GUID
      - AUTH_CLIENT_SECRET=
      - AUTH_AUTHORITY=https://login.microsoftonline.com/c7af6c1f-aaad-4101-b8c3-0f7766597a62/v2.0
      - USE_AUTH0=false
      - AUTH_SUPPORTED_SCOPES=openid profile email offline_access User.Read api://GUID-GUID/api
      - AUTH_REDIRECT_URI=/auth
      - AUTH_SILENT_REDIRECT_URI=/silent-auth
      - NETBIRD_TOKEN_SOURCE=idToken
      # SSL
      - NGINX_SSL_PORT=443
      # Letsencrypt
      - LETSENCRYPT_DOMAIN=netbird.my-domain.com
      - LETSENCRYPT_EMAIL=mec@my-domain.com
    volumes:
      - netbird-letsencrypt:/etc/letsencrypt/

  # Signal
  signal:
    image: netbirdio/signal:latest
    restart: unless-stopped
    volumes:
      - netbird-signal:/var/lib/netbird
    ports:
      - 10000:80
  #      # port and command for Let's Encrypt validation
  #      - 443:443
  #    command: ["--letsencrypt-domain", "netbird.my-domain.com", "--log-file", "console"]

  # Management
  management:
    image: netbirdio/management:latest
    #image: netbirdio/management:0.29.4
    restart: unless-stopped
    depends_on:
      - dashboard
    volumes:
      - netbird-mgmt:/var/lib/netbird
      - netbird-letsencrypt:/etc/letsencrypt:ro
      - ./management.json:/etc/netbird/management.json
    ports:
      - 33073:443 #API port
  #    # command for Let's Encrypt validation without dashboard container
  #    command: ["--letsencrypt-domain", "netbird.my-domain.com", "--log-file", "console"]
    command: [
      "--port", "443",
      "--log-file", "console",
      "--log-level", "info",
      "--disable-anonymous-metrics=false",
      "--single-account-mode-domain=netbird.my-domain.com",
      "--dns-domain=ivo"
      ]

  # Coturn
  coturn:
    image: coturn/coturn:latest
    restart: unless-stopped
    domainname: netbird.my-domain.com
    volumes:
      - ./turnserver.conf:/etc/turnserver.conf:ro
    #      - ./privkey.pem:/etc/coturn/private/privkey.pem:ro
    #      - ./cert.pem:/etc/coturn/certs/cert.pem:ro
    network_mode: host
    command:
      - -c /etc/turnserver.conf

volumes:
  netbird-mgmt:
  netbird-signal:
  netbird-letsencrypt:
@florian-obradovic
Copy link
Author

florian-obradovic commented Oct 4, 2024

I turned off “Enable JWT group sync” and updated to 0.30.0 and it works….
CleanShot 2024-10-05 at 00 10 48@2x

If I turn it back on, dashboard is stuck loading again!

Is this caused by #2690 ?

@florian-obradovic florian-obradovic changed the title Dashboard stuck on loading after 0.29.4 > 0.30.0 update (Self-Hosted) Dashboard stuck on loading after 0.29.4 > 0.30.0 update (Self-Hosted) if "JWT group sync" is enabled Oct 4, 2024
@marcportabellaclotet-mt
Copy link

Same behaviour after upgrading to v0.30.0
Log errors:

2024-10-05T12:54:36Z ERRO [context: HTTP, requestID: 9f883e98-98f6-4645-84ea-434b47239be7] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: user 7 not found
2024-10-05T12:54:36Z ERRO [context: HTTP, requestID: 9f883e98-98f6-4645-84ea-434b47239be7] management/server/http/util/util.go:81: got a handler error: token invalid
2024-10-05T12:54:36Z ERRO [context: HTTP, requestID: 9f883e98-98f6-4645-84ea-434b47239be7] management/server/telemetry/http_api_metrics.go:168: HTTP response 9f883e98-98f6-4645-84ea-434b47239be7: GET /api/accounts status 401
2024-10-05T12:54:36Z ERRO [context: HTTP, requestID: c594bc11-d15e-4eba-9f8d-269ac6ae5233, accountID: , userID: 7] management/server/sql_store.go:433: error when getting account from the store: context canceled
2024-10-05T12:54:36Z ERRO [accountID: , userID: 7, context: HTTP, requestID: c594bc11-d15e-4eba-9f8d-269ac6ae5233] management/server/http/util/util.go:81: got a handler error: issue getting account from store
2024-10-05T12:54:36Z ERRO [context: HTTP, requestID: c594bc11-d15e-4eba-9f8d-269ac6ae5233] management/server/telemetry/http_api_metrics.go:168: HTTP response c594bc11-d15e-4eba-9f8d-269ac6ae5233: GET /api/users?service_user=false status 500
2024-10-05T12:56:15Z ERRO [context: HTTP, requestID: 19d4b100-36ee-4346-98bb-e53dbf9346f3, accountID: , userID: 7] management/server/http/middleware/access_control.go:52: failed to get user from claims: failed to get account with token claims context canceled
2024-10-05T12:56:15Z ERRO [context: HTTP, requestID: 19d4b100-36ee-4346-98bb-e53dbf9346f3, accountID: , userID: 7] management/server/http/util/util.go:81: got a handler error: invalid JWT
2024-10-05T12:56:15Z ERRO [requestID: 19d4b100-36ee-4346-98bb-e53dbf9346f3, context: HTTP] management/server/telemetry/http_api_metrics.go:168: HTTP response 19d4b100-36ee-4346-98bb-e53dbf9346f3: GET /api/routes status 401
2024-10-05T12:56:15Z ERRO [accountID: , userID: 7, context: HTTP, requestID: 855bc728-7066-4c61-9862-c1e27a94ccbc] management/server/http/middleware/access_control.go:52: failed to get user from claims: failed to get account with token claims issue getting account from store

@bcmmbaga bcmmbaga self-assigned this Oct 8, 2024
@bcmmbaga bcmmbaga added bug Something isn't working management-service jwt and removed triage-needed labels Oct 8, 2024
@bcmmbaga
Copy link
Contributor

bcmmbaga commented Oct 8, 2024

Hi @marcportabellaclotet-mt, @florian-obradovic,

I'm currently trying to reproduce the issue on my end. I'll keep you updated as soon as I have more information

@bcmmbaga
Copy link
Contributor

bcmmbaga commented Oct 8, 2024

Could you share any details on your reverse proxy configuration, if applicable? timeout, connection/request limits and load balancer settings.

@marcportabellaclotet-mt
Copy link

I re-tested the setup today, same config, and now works as expected.
I am not able to reproduce the issue any more. I will keep testing.

@onyxkyr
Copy link

onyxkyr commented Oct 9, 2024

I just set up a netbird stack, and I am running into the same problem. With the latest version ( management:latest ) I can login fine, until I enable JWT Group Sync AND enter the claim name. If I do so, the webinterface does not load and management gives the following errors

management-1  | 2024-10-09T14:22:56Z ERRO [context: HTTP, requestID: 5cdf1172-110a-43fe-89ed-9c1537018209] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: error getting user: issue getting user from store
management-1  | 2024-10-09T14:22:56Z ERRO [requestID: 5cdf1172-110a-43fe-89ed-9c1537018209, context: HTTP] management/server/http/util/util.go:81: got a handler error: token invalid
management-1  | 2024-10-09T14:22:56Z ERRO [context: HTTP, requestID: 5cdf1172-110a-43fe-89ed-9c1537018209] management/server/telemetry/http_api_metrics.go:168: HTTP response 5cdf1172-110a-43fe-89ed-9c1537018209: GET /api/users status 401

With management 0:29.4 it works fine, and all the users groups are propagated.

My setup is as follows:
I am running the netbird stack on an lxc in proxmox, behind a nginx reverse proxy. Authentik serves as IDP.
The lxc running netbird is internally reachable for nginx under 192.168.152.200.
Authentik is reachable for the docker stack as sso.MYSSO.de under 192.168.151.102
Netbird is reachable globally under MYDOMAIN.de
Secrets etc are replaced with spaceholders, I hope all relevent files and configurations are below:

nginx configuration
upstream dashboard {
    # insert the http port of your dashboard container here
    server 192.168.152.200:8011;

    # Improve performance by keeping some connections alive.
    keepalive 10;
}
upstream signal {
    # insert the grpc port of your signal container here
    server 192.168.152.200:10000;
}
upstream management {
    # insert the grpc+http port of your signal container here
    server 192.168.152.200:8012;
}
server {
    # HTTP server config
    listen 80;
    server_name MYDOMAIN.de;

    # 301 redirect to HTTPS
    location / {
            return 301 https://$host$request_uri;
    }
}
server {
    # HTTPS server config
    listen 443 ssl http2;
    server_name MYDOMAIN.de;

    # This is necessary so that grpc connections do not get closed early
    # see https://stackoverflow.com/a/67805465
    client_header_timeout 1d;
    client_body_timeout 1d;

        ssl_certificate /etc/acme/mycerts/MYDOMAIN.de/fullchain.cer;
        ssl_certificate_key /etc/acme/mycerts/MYDOMAIN.de/MYDOMAIN.de.key;
    ssl_protocols TLSv1.2;# Requires nginx >= 1.13.0 else use TLSv1.2
        ssl_prefer_server_ciphers on;
        ssl_dhparam /etc/nginx/dhparam.pem; # openssl dhparam -out /etc/nginx/dhparam.pem 4096
        ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384;
        ssl_ecdh_curve secp384r1; # Requires nginx >= 1.1.0
        ssl_session_timeout  10m;
        ssl_session_cache shared:SSL:10m;
        ssl_session_tickets off; # Requires nginx >= 1.5.9
        ssl_stapling on; # Requires nginx >= 1.3.7
        ssl_stapling_verify on; # Requires nginx => 1.3.7

    proxy_set_header        X-Real-IP $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header        X-Scheme $scheme;
    proxy_set_header        X-Forwarded-Proto https;
    proxy_set_header        X-Forwarded-Host $host;

    # Proxy dashboard
    location / {
        proxy_pass http://dashboard;
    }
    # Proxy Signal
    location /signalexchange.SignalExchange/ {
        grpc_pass grpc://signal;
        #grpc_ssl_verify off;
        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        #grpc_socket_keepalive on;
    }
    # Proxy Management http endpoint
    location /api {
        proxy_pass http://management;
    }
    # Proxy Management grpc endpoint
    location /management.ManagementService/ {
        grpc_pass grpc://management;
        #grpc_ssl_verify off;
        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        #grpc_socket_keepalive on;
    }
}
docker-compose.yaml
version: "3"
services:
  #UI dashboard
  dashboard:
    image: netbirdio/dashboard:latest
    restart: unless-stopped
    extra_hosts:
      - "sso.MYSSO.de:192.168.151.102"
    ports:
      - 8011:80
      #- 443:443
    environment:
      # Endpoints
      - NETBIRD_MGMT_API_ENDPOINT=https://MYDOMAIN.de:443
      - NETBIRD_MGMT_GRPC_API_ENDPOINT=https://MYDOMAIN.de:443
      # OIDC
      - AUTH_AUDIENCE=CLIENT_ID
      - AUTH_CLIENT_ID=CLIENT_ID
      - AUTH_CLIENT_SECRET=
      - AUTH_AUTHORITY=https://sso.MYSSO.de/application/o/netbird-MYDOMAIN/
      - USE_AUTH0=false
      - AUTH_SUPPORTED_SCOPES=openid profile email offline_access api groups
      - AUTH_REDIRECT_URI=
      - AUTH_SILENT_REDIRECT_URI=
      - NETBIRD_TOKEN_SOURCE=accessToken
      # SSL
      - NGINX_SSL_PORT=443
      # Letsencrypt
      - LETSENCRYPT_DOMAIN=
      - LETSENCRYPT_EMAIL=
    volumes:
      - netbird-letsencrypt:/etc/letsencrypt/
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"
  # Signal
  signal:
    image: netbirdio/signal:latest
    restart: unless-stopped
    extra_hosts:
      - "sso.MYSSO.de:192.168.151.102"
    volumes:
      - netbird-signal:/var/lib/netbird
    ports:
      - 10000:80
  #      # port and command for Let's Encrypt validation
  #      - 443:443
  #    command: ["--letsencrypt-domain", "", "--log-file", "console"]
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"
  # Relay
  relay:
    image: netbirdio/relay:latest
    restart: unless-stopped
    extra_hosts:
      - "sso.MYSSO.de:192.168.151.102"
    environment:
    - NB_LOG_LEVEL=info
    - NB_LISTEN_ADDRESS=:33080
    - NB_EXPOSED_ADDRESS=MYDOMAIN.de:33080
    # todo: change to a secure secret
    - NB_AUTH_SECRET=RELAY_SECRET
    ports:
      - 33080:33080
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

  # Management
  management:
    image: netbirdio/management:0.29.4
    restart: unless-stopped
    extra_hosts:
      - "sso.MYSSO.de:192.168.151.102"
    depends_on:
      - dashboard
    volumes:
      - netbird-mgmt:/var/lib/netbird
      - netbird-letsencrypt:/etc/letsencrypt:ro
      - ./management.json:/etc/netbird/management.json
    ports:
      - 8012:443 #API port
  #    # command for Let's Encrypt validation without dashboard container
  #    command: ["--letsencrypt-domain", "", "--log-file", "console"]
    command: [
      "--port", "443",
      "--log-file", "console",
      "--log-level", "info",
      "--disable-anonymous-metrics=true",
      "--single-account-mode-domain=MYDOMAIN.de",
      "--dns-domain=MYDOMAIN.de"
      ]
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"
    environment:
      - NETBIRD_STORE_ENGINE_POSTGRES_DSN=

  # Coturn
  #coturn:
  #  image: coturn/coturn:latest
  #  restart: unless-stopped
    #domainname: MYDOMAIN.de # only needed when TLS is enabled
  #  volumes:
  #    - ./turnserver.conf:/etc/turnserver.conf:ro
    #      - ./privkey.pem:/etc/coturn/private/privkey.pem:ro
    #      - ./cert.pem:/etc/coturn/certs/cert.pem:ro
  #  network_mode: host
  #  command:
  #    - -c /etc/turnserver.conf
  #  logging:
  #    driver: "json-file"
  #    options:
  #      max-size: "500m"
  #      max-file: "2"
volumes:
  netbird-mgmt:
  netbird-signal:
  netbird-letsencrypt:
management.json
{
    "Stuns": [
        {
            "Proto": "udp",
            "URI": "stun:MYDOMAIN.de:3478",
            "Username": "",
            "Password": ""
        }
    ],
    "TURNConfig": {
        "TimeBasedCredentials": false,
        "CredentialsTTL": "12h0m0s",
        "Secret": "secret",
        "Turns": [
            {
                "Proto": "udp",
                "URI": "turn:MYDOMAIN.de:3478",
                "Username": "self",
                "Password": "PASSWORTTURN"
            }
        ]
    },
    "Relay": {
        "Addresses": [
            "rel://MYDOMAIN.de:33080"
        ],
        "CredentialsTTL": "24h0m0s",
        "Secret": "RELAY_SECRET"
    },
    "Signal": {
        "Proto": "https",
        "URI": "MYDOMAIN.de:443",
        "Username": "",
        "Password": ""
    },
    "Datadir": "/var/lib/netbird/",
    "DataStoreEncryptionKey": "DATA_SECRET",
    "HttpConfig": {
        "LetsEncryptDomain": "",
        "CertFile": "",
        "CertKey": "",
        "AuthAudience": "CLIENT_ID",
        "AuthIssuer": "https://sso.MYSSO.de/application/o/netbird-MYDOMAIN/",
        "AuthUserIDClaim": "",
        "AuthKeysLocation": "https://sso.MYSSO.de/application/o/netbird-MYDOMAIN/jwks/",
        "OIDCConfigEndpoint": "https://sso.MYSSO.de/application/o/netbird-MYDOMAIN/.well-known/openid-configuration",
        "IdpSignKeyRefreshEnabled": false,
        "ExtraAuthAudience": ""
    },
    "IdpManagerConfig": {
        "ManagerType": "authentik",
        "ClientConfig": {
            "Issuer": "https://sso.MYSSO.de/application/o/netbird-MYDOMAIN",
            "TokenEndpoint": "https://sso.MYSSO.de/application/o/token/",
            "ClientID": "CLIENT_ID",
            "ClientSecret": "",
            "GrantType": "client_credentials"
        },
        "ExtraConfig": {
            "Password": "Netbird-PW",
            "Username": "Netbird"
        },
        "Auth0ClientCredentials": null,
        "AzureClientCredentials": null,
        "KeycloakClientCredentials": null,
        "ZitadelClientCredentials": null
    },
    "DeviceAuthorizationFlow": {
        "Provider": "hosted",
        "ProviderConfig": {
            "ClientID": "CLIENT_ID",
            "ClientSecret": "",
            "Domain": "sso.MYSSO.de",
            "Audience": "CLIENT_ID",
            "TokenEndpoint": "https://sso.MYSSO.de/application/o/token/",
            "DeviceAuthEndpoint": "https://sso.MYSSO.de/application/o/device/",
            "AuthorizationEndpoint": "",
            "Scope": "openid",
            "UseIDToken": false,
            "RedirectURLs": null
        }
    },
    "PKCEAuthorizationFlow": {
        "ProviderConfig": {
            "ClientID": "CLIENT_ID",
            "ClientSecret": "",
            "Domain": "",
            "Audience": "CLIENT_ID",
            "TokenEndpoint": "https://sso.MYSSO.de/application/o/token/",
            "DeviceAuthEndpoint": "",
            "AuthorizationEndpoint": "https://sso.MYSSO.de/application/o/authorize/",
            "Scope": "openid profile email offline_access api groups",
            "UseIDToken": false,
            "RedirectURLs": [
                "http://localhost:53000"
            ]
        }
    },
    "StoreConfig": {
        "Engine": "sqlite"
    },
    "ReverseProxy": {
        "TrustedHTTPProxies": [],
        "TrustedHTTPProxiesCount": 0,
        "TrustedPeers": [
            "0.0.0.0/0"
        ]
    }
}

@mgarces
Copy link
Contributor

mgarces commented Oct 10, 2024

hi there; can you please update to and try our latest release v0.30.1 ?

@onyxkyr
Copy link

onyxkyr commented Oct 10, 2024

Thanks for your quick reply, and thanks to the configure.sh its quickly tested.
However, for me 0.30.1 has the same issue as 0.30.0.
If I open the network tab in the web developer console of my browser, its the /api/users call which takes some time and then results in a http 504 after about 6 seconds, repeatedly.

I have about half an hour, if you want some quick live debugging on my system.

@mlsmaycon
Copy link
Collaborator

@onyxkyr can you confirm if you have jwt groups sync and peer group propagation enabled?

@onyxkyr
Copy link

onyxkyr commented Oct 10, 2024

I think so: (Screenshot with management:0.29.4)
grafik

@onyxkyr
Copy link

onyxkyr commented Oct 10, 2024

With management:0.30.1 I can find the following errors in the management logs

2024-10-10T17:07:04Z INFO [context: SYSTEM] management/server/account.go:1208: warmed up IDP cache with 1 entries for 1 accounts
2024-10-10T17:07:04Z INFO [context: SYSTEM] management/cmd/management.go:305: running gRPC backward compatibility server: [::]:33073
2024-10-10T17:07:04Z INFO [context: SYSTEM] management/cmd/management.go:337: management server version 0.30.1
2024-10-10T17:07:04Z INFO [context: SYSTEM] management/cmd/management.go:338: running HTTP server and gRPC server on the same port: [::]:443
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: 94662009-a148-4334-83c5-52acf19a2048] management/server/sql_store.go:440: error when getting account from the store: context canceled
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: 94662009-a148-4334-83c5-52acf19a2048] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: issue getting account from store: context canceled
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: 94662009-a148-4334-83c5-52acf19a2048] management/server/http/util/util.go:81: got a handler error: token invalid
2024-10-10T17:07:13Z ERRO [requestID: 94662009-a148-4334-83c5-52acf19a2048, context: HTTP] management/server/telemetry/http_api_metrics.go:168: HTTP response 94662009-a148-4334-83c5-52acf19a2048: GET /api/groups status 401
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: c3028b79-caa0-41c9-a477-54a0a964d079] management/server/sql_store.go:440: error when getting account from the store: context canceled
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: c3028b79-caa0-41c9-a477-54a0a964d079] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: issue getting account from store: context canceled
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: c3028b79-caa0-41c9-a477-54a0a964d079] management/server/http/util/util.go:81: got a handler error: token invalid
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: c3028b79-caa0-41c9-a477-54a0a964d079] management/server/telemetry/http_api_metrics.go:168: HTTP response c3028b79-caa0-41c9-a477-54a0a964d079: GET /api/peers status 401
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: 2dd73371-2cb8-40a3-ab04-def8fba10318] management/server/sql_store.go:440: error when getting account from the store: context canceled
2024-10-10T17:07:13Z ERRO [requestID: 2dd73371-2cb8-40a3-ab04-def8fba10318, context: HTTP] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: issue getting account from store: context canceled
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: 2dd73371-2cb8-40a3-ab04-def8fba10318] management/server/http/util/util.go:81: got a handler error: token invalid
2024-10-10T17:07:13Z ERRO [requestID: 2dd73371-2cb8-40a3-ab04-def8fba10318, context: HTTP] management/server/telemetry/http_api_metrics.go:168: HTTP response 2dd73371-2cb8-40a3-ab04-def8fba10318: GET /api/routes status 401
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: b4ff60c0-a69d-4b0f-9a1d-bfda132eec41] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: error getting user: issue getting user from store
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: b4ff60c0-a69d-4b0f-9a1d-bfda132eec41] management/server/http/util/util.go:81: got a handler error: token invalid
2024-10-10T17:07:13Z ERRO [context: HTTP, requestID: b4ff60c0-a69d-4b0f-9a1d-bfda132eec41] management/server/telemetry/http_api_metrics.go:168: HTTP response b4ff60c0-a69d-4b0f-9a1d-bfda132eec41: GET /api/users status 401

@DyllasDek
Copy link

Any movement on that? I try to make roles with use of Quickstart guide, but encounter same error

2024-10-14T14:09:48Z ERRO [requestID: 8685c3d2-94c7-44e6-9b1c-445d27a684b0, context: HTTP] management/server/sql_store.go:566: error when getting account cs6icb07pats73al9pag from the store: record not found
2024-10-14T14:10:35Z INFO [context: HTTP, requestID: a233e5da-c93e-4573-bc58-f4322e42372f] management/server/account.go:1525: cache invalid. Users unknown to the cache: 1
2024-10-14T14:10:35Z INFO [context: HTTP, requestID: a233e5da-c93e-4573-bc58-f4322e42372f] management/server/account.go:1486: refreshing cache for account cs6icb07pats73al9pag
2024-10-14T14:23:04Z INFO [context: HTTP, requestID: 922d425d-516e-4048-a3c6-3452954ea599] management/server/account.go:1525: cache invalid. Users unknown to the cache: 1
2024-10-14T14:23:04Z INFO [context: HTTP, requestID: 922d425d-516e-4048-a3c6-3452954ea599] management/server/account.go:1486: refreshing cache for account cs6icb07pats73al9pag
2024-10-14T14:25:52Z ERRO [requestID: ff836dee-bed0-4084-a78e-94747dd812ee, context: HTTP] management/server/sql_store.go:440: error when getting account from the store: context canceled
2024-10-14T14:25:52Z ERRO [requestID: ff836dee-bed0-4084-a78e-94747dd812ee, context: HTTP] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: issue getting account from store: context canceled
2024-10-14T14:25:52Z ERRO [context: HTTP, requestID: ff836dee-bed0-4084-a78e-94747dd812ee] management/server/http/util/util.go:81: got a handler error: token invalid
2024-10-14T14:25:52Z ERRO [context: HTTP, requestID: ff836dee-bed0-4084-a78e-94747dd812ee] management/server/telemetry/http_api_metrics.go:168: HTTP response ff836dee-bed0-4084-a78e-94747dd812ee: GET /api/users status 401
2024-10-14T14:27:21Z ERRO [requestID: af6fb7fb-c584-4e59-b1a7-cb413c799852, context: HTTP] management/server/sql_store.go:440: error when getting account from the store: context canceled
2024-10-14T14:27:21Z ERRO [context: HTTP, requestID: af6fb7fb-c584-4e59-b1a7-cb413c799852] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: issue getting account from store: context canceled
2024-10-14T14:27:21Z ERRO [context: HTTP, requestID: af6fb7fb-c584-4e59-b1a7-cb413c799852] management/server/http/util/util.go:81: got a handler error: token invalid

@florian-obradovic
Copy link
Author

@DyllasDek
Is this a fresh install or did you update from 0.29.4?
You can stop management container rename store.db (the sqlite3 database) and start with a fresh database for testing.
Whats your IDP? You say quick start guide, is your IDP Zitadel?

@DyllasDek
Copy link

DyllasDek commented Oct 14, 2024

@DyllasDek Is this a fresh install or did you update from 0.29.4? You can stop management container rename store.db (the sqlite3 database) and start with a fresh database for testing. Whats your IDP? You say quick start guide, is your IDP Zitadel?

I tried to install today with Quickstart guide. It's fresh install.
IDP - Zitadel
When I press "Enable JWT group sync" - management start throwing errors which I provided. Only helps deletion of volume with sqlite database

@salvatorebic
Copy link

Hi,
i think i have encountered the same issue with netbird going in loop with the following error :

2024-10-15T07:58:31Z ERRO [context: HTTP, requestID: 94f63b34-b1a4-413e-89bc-827199bb5169] management/server/http/util/util.go:81: got a handler error: token invalid
2024-10-15T07:58:31Z ERRO [context: HTTP, requestID: 94f63b34-b1a4-413e-89bc-827199bb5169] management/server/telemetry/http_api_metrics.go:168: HTTP response 94f63b34-b1a4-413e-89bc-827199bb5169: GET /api/users status 401
2024-10-15T07:58:31Z ERRO [context: HTTP, requestID: af258f6d-ba31-4b83-936f-e5604b41ba33] management/server/sql_store.go:433: error when getting account from the store: context canceled
2024-10-15T07:58:31Z ERRO [context: HTTP, requestID: af258f6d-ba31-4b83-936f-e5604b41ba33] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: issue getting account from store

I am trying sync groups from zitadel to netbird .

Had to delete the volume to restore.

@mlsmaycon
Copy link
Collaborator

The fix will come on the next release. In the mean time, you can workaround the issue by disabling JWT group sync.

@florian-obradovic
Copy link
Author

Works great!

Thanks a lot @mlsmaycon and the whole team for the effort and tireless debugging sessions :)

Keep up the great work!

@the-project-group
Copy link

I ran into a similar issue today again after 0.31.0 > 0.34.0 & Dashboard 2.7.0 > 2.7.1 update.
Dashboard was stuck loading again.
The fix was easy: restart the management container a second time:
docker compose restart management

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working jwt management-service
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants