Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'kubectl sniff' command returning 139 exit/error code during execution. RCA required for failed attempt at packet capture so that workaround can be identified. #173

Open
Sayantan-Dell opened this issue Sep 8, 2023 · 8 comments

Comments

@Sayantan-Dell
Copy link

Sayantan-Dell commented Sep 8, 2023

In the same environment and same Kubernetes cluster kubectl sniff works for one pod and does not work for another. Evidence below. I am unable to understand the root cause behind exit/error code 139. Can anyone help regarding this please and a possible workaround

Failure Scenario : For first POD

root@node1:/# kubectl krew version
OPTION VALUE
GitTag v0.4.4
GitCommit 343e657
IndexURI https://github.com/kubernetes-sigs/krew-index.git
BasePath /root/.krew
IndexPath /root/.krew/index/default
InstallPath /root/.krew/store
BinPath /root/.krew/bin
DetectedPlatform linux/amd64

root@node1:/# kubectl get pods -n Test-upf1
NAME READY STATUS RESTARTS AGE
upf-5896cf6b4c-9shws 3/3 Running 0 15d

root@node1:/# kubectl get pods/upf-5896cf6b4c-9shws -o jsonpath='{.spec.containers[*].name}' -n Test-upf1
upfsp upffp upfrsyslog

root@node1:/# kubectl sniff upf-5896cf6b4c-9shws -n Test-upf1 -o /tmp/upf.pcap
INFO[0000] using tcpdump path at: '/root/.krew/store/sniff/v1.6.2/static-tcpdump'
INFO[0000] no container specified, taking first container we found in pod.
INFO[0000] selected container: 'upfsp'
INFO[0000] sniffing method: upload static tcpdump
INFO[0000] sniffing on pod: 'upf-5896cf6b4c-9shws' [namespace: 'Test-upf1', container: 'upfsp', filter: '', interface: 'any']
INFO[0000] uploading static tcpdump binary from: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' to: '/tmp/static-tcpdump'
INFO[0000] uploading file: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' to '/tmp/static-tcpdump' on container: 'upfsp'
INFO[0000] executing command: '[/bin/sh -c test -f /tmp/static-tcpdump]' on container: 'upfsp', pod: 'upf-5896cf6b4c-9shws', namespace: 'Test-upf1'
INFO[0000] command: '[/bin/sh -c test -f /tmp/static-tcpdump]' executing successfully exitCode: '0', stdErr :''
INFO[0000] file found: ''
INFO[0000] file was already found on remote pod
INFO[0000] tcpdump uploaded successfully
INFO[0000] output file option specified, storing output in: '/tmp/upf.pcap'
INFO[0000] start sniffing on remote container
INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - ]' on container: 'upfsp', pod: 'upf-5896cf6b4c-9shws', namespace: 'Test-upf1'
INFO[0000] command: '[/tmp/static-tcpdump -i any -U -w - ]' executing successfully exitCode: '139', stdErr :''
INFO[0000] starting sniffer cleanup
INFO[0000] sniffer cleanup completed successfully
Error: executing sniffer failed, exit code: '139'

=========================================

Success Scenario : For other PODs

root@node1:/# kubectl get pods -n Test-udm1
NAME READY STATUS RESTARTS AGE
udm-ee-79c897c869-9pt9r 2/2 Running 0 15d
udm-sdm-5d75ff8775-54lsf 2/2 Running 0 15d
udm-ueau-67944949f5-rwd82 2/2 Running 0 15d
udm-uecm-76fcf7c57-c8cbs 2/2 Running 0 15d
root@node1:/#
root@node1:/#
root@node1:/# kubectl get pods/udm-ueau-67944949f5-rwd82 -o jsonpath='{.spec.containers[*].name}' -n Test-udm1
udm-ueau istio-proxy
root@node1:/#
root@node1:/#
root@node1:/# kubectl sniff udm-ueau-67944949f5-rwd82 -n Test-udm1 -o /tmp/udm.pcap
INFO[0000] using tcpdump path at: '/root/.krew/store/sniff/v1.6.2/static-tcpdump'
INFO[0000] no container specified, taking first container we found in pod.
INFO[0000] selected container: 'udm-ueau'
INFO[0000] sniffing method: upload static tcpdump
INFO[0000] sniffing on pod: 'udm-ueau-67944949f5-rwd82' [namespace: 'Test-udm1', container: 'udm-ueau', filter: '', interface: 'any']
INFO[0000] uploading static tcpdump binary from: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' to: '/tmp/static-tcpdump'
INFO[0000] uploading file: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' to '/tmp/static-tcpdump' on container: 'udm-ueau'
INFO[0000] executing command: '[/bin/sh -c test -f /tmp/static-tcpdump]' on container: 'udm-ueau', pod: 'udm-ueau-67944949f5-rwd82', namespace: 'Test-udm1'
INFO[0000] command: '[/bin/sh -c test -f /tmp/static-tcpdump]' executing successfully exitCode: '0', stdErr :''
INFO[0000] file found: ''
INFO[0000] file was already found on remote pod
INFO[0000] tcpdump uploaded successfully
INFO[0000] output file option specified, storing output in: '/tmp/udm.pcap'
INFO[0000] start sniffing on remote container
INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - ]' on container: 'udm-ueau', pod: 'udm-ueau-67944949f5-rwd82', namespace: 'Test-udm1'
^C
root@node1:/#
root@node1:/#

@Sayantan-Dell Sayantan-Dell changed the title Kubectl sniff command returning 139 error code during execution. Root Cause Identification Required Kubectl sniff command returning 139 exit/error code during execution. Root Cause Identification Required for failed attempt at packet capture Sep 8, 2023
@Sayantan-Dell Sayantan-Dell changed the title Kubectl sniff command returning 139 exit/error code during execution. Root Cause Identification Required for failed attempt at packet capture 'kubectl sniff' command returning 139 exit/error code during execution. RCA required for failed attempt at packet capture so that workaround can be identified. Sep 8, 2023
@imscuevas
Copy link

I am getting the same issue with nginx image, after trying to debug the issue I connected to the container and I ran the same command that kubectl sniff is using

/tmp/static-tcpdump -i any -U -w -
Segmentation fault (core dumped)

After that, I installed tcpdump with apt update && apt install -y tcpdump on the container and it is working

tcpdump -i any -U -w -
tcpdump: data link type LINUX_SLL2
?ò?tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

I am wondering if the way that the static-tcpdump is compiled is causing this issue.

@Sayantan-Dell
Copy link
Author

Okies so if a POD has multiple containers then the issue is happening. Thanks for the information I will try to run this after installing the TCPDUMP on the container.

@imscuevas
Copy link

As far as I remember my pod only has 1 container

k get pods
NAME                      READY   STATUS    RESTARTS   AGE
alpine-654bf79686-jbrjt   1/1     Running   0          24h
ksniff-j446x              1/1     Running   0          25h
ksniff-j6fv6              1/1     Running   0          25h
ksniff-vjmd4              1/1     Running   0          25h
nginx-77b4fdf86c-5qcc2    1/1     Running   0          25h
nginx-77b4fdf86c-5tcl8    1/1     Running   0          29h
nginx-77b4fdf86c-jhzdc    1/1     Running   0          25h

Then, when I run ksniff in any of the nginx pods it fails

k sniff nginx-77b4fdf86c-5qcc2 -n default
INFO[0000] using tcpdump path at: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' 
INFO[0000] no container specified, taking first container we found in pod. 
INFO[0000] selected container: 'nginx'                  
INFO[0000] sniffing method: upload static tcpdump       
INFO[0000] sniffing on pod: 'nginx-77b4fdf86c-5qcc2' [namespace: 'default', container: 'nginx', filter: '', interface: 'any'] 
INFO[0000] uploading static tcpdump binary from: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' to: '/tmp/static-tcpdump' 
INFO[0000] uploading file: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' to '/tmp/static-tcpdump' on container: 'nginx' 
INFO[0000] executing command: '[/bin/sh -c test -f /tmp/static-tcpdump]' on container: 'nginx', pod: 'nginx-77b4fdf86c-5qcc2', namespace: 'default' 
INFO[0000] command: '[/bin/sh -c test -f /tmp/static-tcpdump]' executing successfully exitCode: '0', stdErr :'' 
INFO[0000] file found: ''                               
INFO[0000] file was already found on remote pod         
INFO[0000] tcpdump uploaded successfully                
INFO[0000] spawning wireshark!                          
INFO[0000] start sniffing on remote container           
INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - ]' on container: 'nginx', pod: 'nginx-77b4fdf86c-5qcc2', namespace: 'default' 
INFO[0001] command: '[/tmp/static-tcpdump -i any -U -w - ]' executing successfully exitCode: '139', stdErr :'' 
ERRO[0001] failed to start remote sniffing, stopping wireshark  error="executing sniffer failed, exit code: '139'"
INFO[0001] starting sniffer cleanup                     
INFO[0001] sniffer cleanup completed successfully       
Error: signal: killed

This does not happen with the alpine pod, Wireshark opened without any issue.

k sniff alpine-654bf79686-jbrjt -n default
INFO[0000] using tcpdump path at: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' 
INFO[0000] no container specified, taking first container we found in pod. 
INFO[0000] selected container: 'alpine'                 
INFO[0000] sniffing method: upload static tcpdump       
INFO[0000] sniffing on pod: 'alpine-654bf79686-jbrjt' [namespace: 'default', container: 'alpine', filter: '', interface: 'any'] 
INFO[0000] uploading static tcpdump binary from: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' to: '/tmp/static-tcpdump' 
INFO[0000] uploading file: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' to '/tmp/static-tcpdump' on container: 'alpine' 
INFO[0000] executing command: '[/bin/sh -c test -f /tmp/static-tcpdump]' on container: 'alpine', pod: 'alpine-654bf79686-jbrjt', namespace: 'default' 
INFO[0000] command: '[/bin/sh -c test -f /tmp/static-tcpdump]' executing successfully exitCode: '0', stdErr :'' 
INFO[0000] file found: ''                               
INFO[0000] file was already found on remote pod         
INFO[0000] tcpdump uploaded successfully                
INFO[0000] spawning wireshark!                          
INFO[0000] start sniffing on remote container           
INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - ]' on container: 'alpine', pod: 'alpine-654bf79686-jbrjt', namespace: 'default'

@imscuevas
Copy link

Just in case you want to reproduce here is the manifest for my test deployments

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  labels:
    app: alpine
  name: alpine
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: alpine
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: alpine
    spec:
      containers:
      - command:
        - sleep
        - infinity
        image: alpine
        imagePullPolicy: Always
        name: alpine
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  labels:
    app: nginx
  name: nginx
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: nginx
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

@zoopp
Copy link

zoopp commented Mar 12, 2024

For anyone looking for a workaround, what worked for me was to rebuild (the latest version of) static tcpdump and use that instead of what's shipped with the plugin:

  1. Start an alpine container: podman run -v $PWD:/out --rm -it alpine (or use docker instead of podman).
  2. Install dependencies: apk add --update alpine-sdk git libpcap libpcap-dev.
  3. Clone ksniff: cd /tmp; git clone https://github.com/eldadru/ksniff; cd ksniff.
  4. Update tcpdump version in the Makefile (e.g. set TCPDUMP_VERSION=4.99.4).
  5. Build it: make static-tcpdump.
  6. Copy the binary to the host system: cp static-tcpdump /out and exit the container.
  7. Overwrite static-tcpdump from the kubectl plugin: cp static-tcpdump ~/.krew/store/sniff/<version>/

If the old version of static-tcpdump is present at /tmp/static-tcpdump in the pod container then you may need to remove it manually.

@crezy8
Copy link

crezy8 commented Mar 14, 2024

For anyone looking for a workaround, what worked for me was to rebuild (the latest version of) static tcpdump and use that instead of what's shipped with the plugin:

  1. Start an alpine container: podman run -v $PWD:/out --rm -it alpine (or use docker instead of podman).
  2. Install dependencies: apk add --update alpine-sdk git libpcap libpcap-dev.
  3. Clone ksniff: cd /tmp; git clone https://github.com/eldadru/ksniff; cd ksniff.
  4. Update tcpdump version in the Makefile (e.g. set TCPDUMP_VERSION=4.99.4).
  5. Build it: make static-tcpdump.
  6. Copy the binary to the host system: cp static-tcpdump /out and exit the container.
  7. Overwrite static-tcpdump from the kubectl plugin: cp static-tcpdump ~/.krew/store/sniff/<version>/

If the old version of static-tcpdump is present at /tmp/static-tcpdump in the pod container then you may need to remove it manually.

It works, thanks

@thesn10
Copy link

thesn10 commented Aug 3, 2024

Workaround for Debian based containers:

kubectl exec -i -t your-pod -- bash -c "apt update && apt install tcpdump -y && \
    rm /tmp/static-tcpdump && \
    ln /bin/tcpdump /tmp/static-tcpdump"

@vl-kp
Copy link

vl-kp commented Sep 25, 2024

root@nginx:/# /tmp/static-tcpdump -i any -U -w -
Segmentation fault (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants