Race condition can leads to egress connectivity #83

XciD · 2023-10-02T14:25:03Z

We are testing the new features provided by this agent on one of our cluster (recently updated to 1.28).

We saw, that the pod connectivity is not fully ensure when a pod starts.

For example, this simple Pod will print:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: test
  name: test
spec:
  containers:
    - args:
        - http://portquiz.net:1023
      image: alpine/curl
      name: test
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test
spec:
  egress:
  - ports:
    - port: 53
      protocol: UDP
    to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns
  - ports:
    - endPort: 65535
      port: 1024
      protocol: TCP
    to:
    - ipBlock:
        cidr: 0.0.0.0/0
  podSelector:
    matchExpressions:
    - key: app
      operator: Exists
  policyTypes:
  - Egress

Port test successful!
Your IP: 44.208.xx.xx

If you wait a little before making the call, it's correctly bloqued.
You can see that on pod restart, policy works too.

The text was updated successfully, but these errors were encountered:

jayanthvn · 2023-10-02T14:34:02Z

@XciD - If I am not wrong you mean the IP is not getting blocked? If so it looks to be similar to this #58.

XciD · 2023-10-02T14:35:29Z

Yes, portquiz.net:1023 is not blocked. But if you sleep 2 seconds before the call, it's blocked

jayanthvn · 2023-10-02T14:53:06Z

Ok. The default behavior is allow (default Kubernetes behavior) until the policy endpoints are reconciled i.e, the controller should identify the pods for the network policy and send the update downstream to the node agent to enforce the policy.

We are exploring strict mode option to have default block/deny until policy endpoints reconcile - Ref - aws/containers-roadmap#1478 (comment)

XciD · 2023-10-02T15:00:47Z

It also mean that if the node is under cpu pressure, it can take more than 2 sec to enforce security policies.

jayanthvn · 2023-10-02T16:49:17Z

@XciD have you tried less than 2 seconds sleep or is the reconciliation consistently taking around 2 seconds?

XciD · 2023-10-02T18:07:25Z

I've tried with 1s but my e2e test fail.
(We have full CI/CD tests suit over our production cluster)

This code with our calico + aws cni fails immediatly

import os
import requests

try:
    requests.get("http://portquiz.net:1023", timeout=2)
except:
    print("error")
    os._exit(1)

With the new cluster, without sleep or sleep < 2 before the request, it fails

wiseelf · 2023-11-02T06:07:35Z

@XciD i have almost similar issue #73 in my case it blocks already established connection when netpol is applied. Sleep also solves that issue. But I wouldn't call it a solution :)

Mohsen51 · 2023-12-01T16:31:33Z

Got the same issue, would be great to implement a strict mode that would force pod to wait until the network policy agent configures well the pod !

jdn5126 · 2023-12-26T16:31:02Z

Strict mode implementation is still in progress. Will provide an update on this ticket when PRs are available

allamand · 2024-02-16T07:58:27Z

I’m not sure strict mode would be the solution here has it would still take some time for the reconciliation to happened. What about introducing podreadinessgate that would flag pod ready only when the netpol reconciliation has happened ?

jdn5126 · 2024-02-16T15:45:28Z

@allamand that is what strict mode does. The pod is not marked as Ready until Network Policies have been applied and properly reconciled.

allamand · 2024-02-20T22:18:35Z

@jdn5126 ok this is nice, thanks. Any ETA to share ?

jdn5126 · 2024-02-20T22:43:47Z

@jdn5126 ok this is nice, thanks. Any ETA to share ?

#209 is the PR, and there are some accompanying VPC CNI changes in aws/amazon-vpc-cni-k8s#2790, but I am not sure what the ETA is. I think sometime in Q2

achevuru · 2024-06-03T21:10:54Z

Strict mode is now available. Let us know if that helps with the above use case/issue..

Pavani-Panakanti · 2024-10-02T22:56:40Z

@XciD Were you able to fix the issue with strict mode ?

Rez0k mentioned this issue Nov 28, 2023

Long session connections get dropped #144

Closed

luk2038649 mentioned this issue Jan 23, 2024

Response traffic from allowed egress denied on short lived pods #189

Open

jdn5126 added the strict mode Issues blocked on strict mode implementation label Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition can leads to egress connectivity #83

Race condition can leads to egress connectivity #83

XciD commented Oct 2, 2023 •

edited

Loading

jayanthvn commented Oct 2, 2023

XciD commented Oct 2, 2023

jayanthvn commented Oct 2, 2023 •

edited

Loading

XciD commented Oct 2, 2023

jayanthvn commented Oct 2, 2023

XciD commented Oct 2, 2023 •

edited

Loading

wiseelf commented Nov 2, 2023 •

edited

Loading

Mohsen51 commented Dec 1, 2023

jdn5126 commented Dec 26, 2023

allamand commented Feb 16, 2024

jdn5126 commented Feb 16, 2024

allamand commented Feb 20, 2024

jdn5126 commented Feb 20, 2024

achevuru commented Jun 3, 2024

Pavani-Panakanti commented Oct 2, 2024

Race condition can leads to egress connectivity #83

Race condition can leads to egress connectivity #83

Comments

XciD commented Oct 2, 2023 • edited Loading

jayanthvn commented Oct 2, 2023

XciD commented Oct 2, 2023

jayanthvn commented Oct 2, 2023 • edited Loading

XciD commented Oct 2, 2023

jayanthvn commented Oct 2, 2023

XciD commented Oct 2, 2023 • edited Loading

wiseelf commented Nov 2, 2023 • edited Loading

Mohsen51 commented Dec 1, 2023

jdn5126 commented Dec 26, 2023

allamand commented Feb 16, 2024

jdn5126 commented Feb 16, 2024

allamand commented Feb 20, 2024

jdn5126 commented Feb 20, 2024

achevuru commented Jun 3, 2024

Pavani-Panakanti commented Oct 2, 2024

XciD commented Oct 2, 2023 •

edited

Loading

jayanthvn commented Oct 2, 2023 •

edited

Loading

XciD commented Oct 2, 2023 •

edited

Loading

wiseelf commented Nov 2, 2023 •

edited

Loading