Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python auto-instrumentation fails with "Error copying files: File exists (os error 17)" #305

Open
shailendher opened this issue Dec 20, 2024 · 1 comment

Comments

@shailendher
Copy link

shailendher commented Dec 20, 2024

Describe the bug

When using OpenTelemetry in our EKS cluster via the ADOT addon, the Python auto-instrumentation init container occasionally fails with exit code 2, preventing the main application from starting. The issue occurs non-deterministically and can be temporarily resolved by deleting the affected pod.

Logs from opentelemetry-auto-instrumentation-python (init):

Error copying files: File exists (os error 17)

Pod definition:

Init Containers:
  opentelemetry-auto-instrumentation-python:
    Container ID:  containerd://a031f07b9f132ec61611b1481660c6164324a5ca15cc01437cfb0399d19b69ec
    Image:         public.ecr.aws/aws-observability/adot-operator-autoinstrumentation-python:0.48b0
    Image ID:      public.ecr.aws/aws-observability/adot-operator-autoinstrumentation-python@sha256:9e373ecf2f366ac19b353da5fd2917e958d5131e2abcace25f01102dce8f9852
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -r
      /autoinstrumentation/.
      /otel-auto-instrumentation-python
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 20 Dec 2024 10:22:47 +0100
      Finished:     Fri, 20 Dec 2024 10:22:47 +0100
    Ready:          False
    Restart Count:  11

This issue might be specific to the ADOT implementation, as upstream OpenTelemetry uses the standard cp command while ADOT uses a custom Rust utility. The switch to the Rust utility was made in this PR and the error is thrown here.

What did you expect to see?

Autoinstrumentation init container starts with no error.

What did you see instead?

Autoinstrumentation init container fails with exit code 2.

Open Questions

I'm struggling with a few questions around this issue:

  • What could cause this issue where the cp command is run twice? a restart of the initContainer?
  • Is it possible to ignore this error and let the application start? I couldn't see any option here.

Environment

Platform: EKS 1.30
EKS Addon ADOT Version: v0.109.0-eksbuild.2
Auto-instrumentation Image: public.ecr.aws/aws-observability/adot-operator-autoinstrumentation-python:0.48b0

@shailendher
Copy link
Author

shailendher commented Dec 20, 2024

I switched from adot-operator-autoinstrumentation-python to adot-autoinstrumentation-python image and now its looking better. But its opens up more questions:

  • What is the difference between the two images and which one is recommended?
  • Where is the source code for adot-operator-autoinstrumentation-python image? However, this is irrelevant if the image is deprecated.
  • Where can I find the source code or helm chart of adot-operator where the --auto-instrumentation-python-image is configured?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant