Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery task producer (celery.apply) span in APM triggers Datadog service inference into creating a new service for the service's own hostname #11491

Open
patrys opened this issue Nov 21, 2024 · 1 comment · Fixed by #10750

Comments

@patrys
Copy link

patrys commented Nov 21, 2024

We've enabled service inference, and our service list is now filled with spurious services named after every pod in every k8s service that publishes tasks. After running for just an hour, we already have 150 of those.

All of the fake services only report a single source of data, celery.apply and visiting traces for celery.apply confirms that celery.hostname seems to be converted into peer.hostname despite this span not actually making a client connection anywhere.

Under each celery.apply span we see the expected sqs.sendmessage span for the actual delivery of the task. That second span has the correct peer tags and is paired with the expected queue service.

Agents are deployed using Helm chart version 3.73.0
Code is traced using dd-trace-py version 2.11.2

@patrys patrys changed the title Celery task producer (celery.apply) in APM triggers Datadog service inference into creating a new service for the service's own hostname Celery task producer (celery.apply) span in APM triggers Datadog service inference into creating a new service for the service's own hostname Nov 21, 2024
@patrys
Copy link
Author

patrys commented Nov 22, 2024

After some investigation, the signal handler calls set_tags_from_context(span, kwargs["headers"]), and the comment suggests, that it's specifically to set celery.hostname. The hostname in question is the pod name of the task producer.

Then, the trace_afer_publish signal handler extracts the celery.hostname and uses its value to set out.host, which is wrong as the hostname is the origin of the task, not its target.

out.host is then transformed (by the agent, I assume) to peer.hostname, which is expected behavior, but because of the above, the value is incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant