-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki helm chart version upgrade from 5.44.4 to 6.5.0 issues in Single Binary deployment mode. #12912
Comments
Currently we are out of monitoring due to the issue mentioned ,it will be really great if some one can have can assist on this. |
Was able to fix all the issues .Thanks |
Care for sharing how you did solve the issue ? |
@krptg0 : Issue was related to the "parallelise_shardable_queries: true" variable used to be under "loki.query_range" in the chart version we used in 5.44.4 ,but after upgrade to 6.5.0 it should be moved to loki.structuredConfig.query_range ,which also needs to updated in the grafana documentation page for now until permanent fix . Seems this is a bug in the latest chart and I saw some user already derived case for the same few weeks back. 5.44.4 loki: 6.5.0 loki: Thanks |
@krptg0 I don't see that solving the issue. would you be able to share your config which worked for you .
I am currently on 5.47.2
|
The Helm file attached was suitable for upgrading, but a couple of pods encountered errors still. |
in gateway pod:
|
|
fixed this making change to helm https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml#L337-L345
|
@sslny57 how does that fix the segmentation error? |
Loki helm chart version upgrade from 5.44.4 to 6.5.0 issues in Single Binary deployment mode.
We are using azure Kubernetes service consisting of 1 system node in a system node pool and 3 user nodes in user node pool for deploying Loki .
Kubernetes Version : 1.29.2
Error
coalesce.go:286: warning: cannot overwrite table with non table for loki.singleBinary.affinity (map[podAntiAffinity:map[requiredDuringSchedulingIgnoredDuringExecution:[map[labelSelector:map[matchLabels:map[app.kubernetes.io/component:single-binary]] topologyKey:kubernetes.io/hostname]]]])
May 7th 2024 10:59:51Error
coalesce.go:286: warning: cannot overwrite table with non table for loki.singleBinary.affinity (map[podAntiAffinity:map[requiredDuringSchedulingIgnoredDuringExecution:[map[labelSelector:map[matchLabels:map[app.kubernetes.io/component:single-binary]] topologyKey:kubernetes.io/hostname]]]])
May 7th 2024 10:59:51Error
Error: UPGRADE FAILED: execution error at (loki/templates/validate.yaml:31:4): You have more than zero replicas configured for both the single binary and simple scalable targets. If this was intentional change the deploymentMode to the transitional 'SingleBinary<->SimpleScalable' mode
May 7th 2024 10:59:51Error
Helm Upgrade returned non-zero exit code: 1. Deployment terminated.
May 7th 2024 10:59:51Fatal
The remote script failed with exit code 1
ubuntu@NARU-Pr5530:~$ kubectl describe pod loki-chunks-cache-0 -n loki|tail -5
Type Reason Age From Message
Warning FailedScheduling 2m43s default-scheduler 0/4 nodes are available: 1 Insufficient memory, 4 Insufficient cpu. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod.
Warning FailedScheduling 2m42s default-scheduler 0/4 nodes are available: 1 Insufficient memory, 4 Insufficient cpu. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod.
Normal NotTriggerScaleUp 2m40s cluster-autoscaler pod didn't trigger scale-up: 1 max node group size reached
If we are not using affinity in version 6.5.0 ,few pods are landing up in the system node and ending with the resources issues and failing , and as well we don't pods to land up in system node.
Is there any way to fix this ?
Values.yaml ( used in 5.44.4)
--- https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml
loki:
auth_enabled: false
query_scheduler:
max_outstanding_requests_per_tenant: 2048
query_range:
parallelise_shardable_queries: false
split_queries_by_interval: 0
commonConfig:
replication_factor: 1
storage:
type: filesystem
singleBinary:
replicas: 1
persistence:
size: 50Gi
enableStatefulSetAutoDeletePVC: true
affinity: |
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: kubernetes.azure.com/mode
operator: In
values:
- user
weight: 50
Values.yaml ( used in 6.5.0)
--- https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml
deploymentMode: SingleBinary
loki:
auth_enabled: false
query_scheduler:
max_outstanding_requests_per_tenant: 2048
query_range:
parallelise_shardable_queries: false
limits_config:
split_queries_by_interval: 0
commonConfig:
replication_factor: 1
storage:
type: filesystem
schemaConfig:
configs:
- from: 2024-04-01
object_store: filesystem
store: tsdb
schema: v13
index:
prefix: loki_index_
period: 24h
ingester:
chunk_encoding: snappy
tracing:
enabled: true
querier:
max_concurrent: 1
backend:
replicas: 0
read:
replicas: 0
write:
replicas: 0
singleBinary:
replicas: 1
persistence:
size: 50Gi
enableStatefulSetAutoDeletePVC: true
enabled: true
extraArgs:
- -config.expand-env=true
chunksCache:
allocatedMemory: 1024
writebackSizeLimit: 10MB
Error
ubuntu@NARU-Pr5530:
$ kubectl logs loki-0 -n loki$failed parsing config: /etc/loki/config/config.yaml: yaml: unmarshal errors:
line 2: field Error not found in type loki.ConfigWrapper
ubuntu@NARU-Pr5530:
ubuntu@NARU-Pr5530:~$ kubectl get pods -n loki -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
loki-0 0/1 CrashLoopBackOff 1 (11s ago) 74s 10.101.80.28 aks-npu2-21504394-vmss00000f
loki-canary-6qngw 1/1 Running 0 74s 10.101.80.158 aks-npsystem01-10976478-vmss000000
loki-canary-6v6bz 1/1 Running 0 75s 10.101.80.136 aks-npu2-21504394-vmss000000
loki-canary-krnqv 1/1 Running 0 75s 10.101.80.240 aks-npu2-21504394-vmss00000f
loki-canary-twcl5 1/1 Running 0 75s 10.101.80.213 aks-npu2-21504394-vmss00000h
loki-chunks-cache-0 0/2 Pending 0 74s
loki-gateway-668c5dff6c-l7hd5 1/1 Running 0 74s 10.101.80.173 aks-npsystem01-10976478-vmss000000
loki-results-cache-0 2/2 Running 0 74s 10.101.80.175 aks-npsystem01-10976478-vmss000000
Kindly do the needful.
The text was updated successfully, but these errors were encountered: