Falco 0.33.1 "node name does not correspond to a node in the cluster" during startup due to jq filter failure on NotReady node with status.addresses missing #2358

wuub · 2023-01-16T15:58:37Z

Describe the bug

When starting falco on EKS with:

        - '--k8s-node'
        - $(FALCO_K8S_NODE_NAME)

we've experienced whole DaemonSet failures (all new pods failing to start/restart) reporting errors like: Error fetching K8s data: Failing to enrich events with Kubernetes metadata: node name does not correspond to a node in the cluster: ip-xxx-yy-zzz-www.us-west-1.compute.internal. After some looking around and enabling libs_logger.enabled: true we've been able to narrow it down to: https://github.com/falcosecurity/libs/blob/01c07df720708f19b6ba3e2f6857bddb8c2c4779/userspace/libsinsp/socket_handler.h#L792

causing this error line:

[libs]: Socket handler (k8s_node_handler_state), [https://172.20.0.1] filter processing error "json_query filtering result invalid."; JSON: <{"kind":"NodeList","apiVersion":"v1","metadata":{[HUMONGOUS-API-RESPONSE]}}>, jq filter: <{ type: "ADDED", apiVersion: .apiVersion, kind: "Node",  items: [  .items[] |   {   name: .metadata.name,   uid: .metadata.uid,   timestamp: .metadata.creationTimestamp,   labels: .metadata.labels,   addresses: [.status.addresses[].address] | unique   } ]}>

While digging more, this failure is caused by a NonReady node being returned which does not present any .addresses in the .status field:

example:

{
    "metadata": {
        "name": "ip-10-5-13-255.us-west-1.compute.internal",
    },
    // ....
    "status": {
        "conditions": [
          /// ...
        ],
        "daemonEndpoints": {
            "kubeletEndpoint": {
                "Port": 0
            }
        },
        "nodeInfo": {
            "machineID": "",
            "systemUUID": "",
            "bootID": "",
            "kernelVersion": "",
            "osImage": "",
            "containerRuntimeVersion": "",
            "kubeletVersion": "",
            "kubeProxyVersion": "",
            "operatingSystem": "",
            "architecture": ""
        }
    }
}

How to reproduce it

remove status.addresses field from a single k8s node returned by https://172.20.0.1/api/v1/nodes?pretty=false

Expected behaviour

such node should not prevent all other falco pods from starting

Environment

Falco version: 0.33.1
OS: bottlerocketOS
Kernel: n/a
Installation method: Kubernetes+Helm

The text was updated successfully, but these errors were encountered:

jasondellaluce · 2023-01-16T17:59:27Z

/milestone 0.34.0

wuub added the kind/bug label Jan 16, 2023

jasondellaluce mentioned this issue Jan 16, 2023

fix(userspace/libsinsp): avoid exception failure on unknown k8s node name falcosecurity/libs#833

Merged

poiana added this to the 0.34.0 milestone Jan 16, 2023

poiana closed this as completed in falcosecurity/libs#833 Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falco 0.33.1 "node name does not correspond to a node in the cluster" during startup due to jq filter failure on NotReady node with status.addresses missing #2358

Falco 0.33.1 "node name does not correspond to a node in the cluster" during startup due to jq filter failure on NotReady node with status.addresses missing #2358

wuub commented Jan 16, 2023

jasondellaluce commented Jan 16, 2023

Falco 0.33.1 "node name does not correspond to a node in the cluster" during startup due to jq filter failure on NotReady node with status.addresses missing #2358

Falco 0.33.1 "node name does not correspond to a node in the cluster" during startup due to jq filter failure on NotReady node with status.addresses missing #2358

Comments

wuub commented Jan 16, 2023

jasondellaluce commented Jan 16, 2023