-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud.resource_id and AWS ECS #677
Comments
@mhausenblas wdyt? |
I was originally thinking that the most-granular ARN should go into the I certainly see the argument to say that with the inclusion of One pause for thought is the language of "monitored cloud resource" in the definition. Some SDKs will be collecting otel data about the ECS cluster itself, which raises the question of whether Do we have any data from other container scheduling technologies (e.g. K8s) about how this is handled there? I assume there's a similar issue, where (for example) a container runs within a Pod, which is scheduled by a Deployment onto a specific Node of a Cluster. Which of those entities fits into I'll put some more thought into it. |
Are you thinking the way the Datadog agent is scheduled on ECS to monitor ECS? That is IMO not a resource detector use case, as the Datadog agent effectively acts as a "receiver" for telemetry about something external. In that case, it's the receiver's job to annotate the (Besides, all ECS detectors out there do set already the
Not really. In general pods on K8S can learn far less about their environment than tasks on ECS. There's effectively no equivalent to the ECS metadata endpoint. It's even adventurous to be sure you are running inside a pod on K8S, and unless the pod spec is propagating as env vars data from the And it's even worse than this: a K8S cluster does not have a notion of identity (not even of name, much less a unique one, although people set the UUID of the On K8S, the addition of labels like I am not aware of any container orchestration platform currently better supported than K8S or EKS by OTel resource detectors btw. |
In terms of use-cases, I think the most important thing to consider here are "group by" scenarios, specifically: what is the most useful abstraction level for the The cluster ARN fails IMO this test, as Fargate tasks on the same cluster are pretty unrelated with one another in terms of performance. The Task ARN and Container ARN both pass this test, and I personally still like Task ARN more because of the resource sharing that is going on between containers in the same task. Also, when comparing with the other types of Cloud resources where detectors do implement |
I like this, and find it convincing. If this is the case, might be worth updating the definition of |
When specifying this field |
Well, that is pretty much the benefit that having |
@jsuereth in the Semantic Conventions meeting of Feb. 5th, 2024, you said that the |
I wasn't in the the semconv meeting, but I am guessing @jsuereth refers to the following topic that he and I have been exploring for a while. What is the purpose of recording Similarly, a "Task" is a definition that can be executed multiple times, each execution becomes a running container instance. I cannot associate telemetry with a "Task", that would create ambiguity, I would not know which running container instance that telemetry is associated with. The right choice I think is to record attributes which uniquely identify the running container instance (Container ARN). This is generally the litmus test I use when deciding which entity to record in the Resource: if telemetry can be associated with that entity without also being required to be associated with something else then it is the Entity we want. Applying this litmus test, here is entities I think we want to choose to associate with telemetry:
Entities that are can't be directly associated with telemetry without additional qualification:
This later list is an example of "things" that are not enough for identification purposes, so we should prefer the first list, which does uniquely identify the telemetry's source. |
@tigrannajaryan the problem we have here is that on ECS there are TWO valid candidates for resource detectors in SDK-instrumented apps: ECS Task and ECS Container. Both are theoretically valid according to your litmus test. And we need to pick one. I have a favorite (see above), but in any case the definition of |
I may be misunderstanding, but isn't ECS Task the "definition" of what to run? You can run many container instances using the same task definition, right? In that case ESC Task is not unique enough to associate for example a metric data point with it. So I don't think ESC Task passes the litmus test, you will need to additionally say which ECS Container instance the metric data point is coming from. Please correct me if I am wrong on what ECS Task means, this is based on just a cursory read of ECS docs. |
I think you are confusing the task definition with an actual running task. The discussion is not about using the task definition's ARN, the discussion is about running task instances vs containers. See here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html After you create a task definition, you can run the task definition as a task or a service.
(emphasis mine) |
Indeed I am. I was incorrectly assuming that running a task definition creates a container instance. I have never used ECS so don't know the details of how it works. Can you clarify what the difference is between a task instance and a container instance and how they are related? |
To draw a comparison between K8s and ECS: ECS Task -> K8S Pod But not only Tasks get their own ARNs, Containers get their own too. The overwhelming majority of the ECS API deals with tasks. They are scheduled either singularity with RunTask (used also by other ECS-based services like AWS Batch) or by the ECS equivalent to a K8S Deployment: the Service. Even APIs like I honestly don’t know why ECS containers get their own ARN, in my experience it feels useless from a user standpoint. But it is there, and it is making my life hard :-) |
I suspect the problem here is that "entity" is vague. You could argue that it's actually the process running within the container that's part of a running task that's the "entity producing the telemetry" in a process SDK context, but I don't think we're advocating using any form of process identifier for this field. K8s is a useful comparison here though, because I think the same problem exists. For example a deployment can have multiple Pods, which can have multiple containers, which can have multiple processes. What's the "Resource Id" of the entity that's emitting telemetry?
I think it's because there needs to be some way of uniquely identifying a container within a running task (which, as you say, can have an arbitrary number of running containers that may vary during the task's lifecycle). -- Part of this pickle is that we're trying to work out which value (of many) to put into a scalar field. What if the scalar field was actually a list ( E.g. In an ECS context, |
Task ARN + container id (as in DockerID) is unique enough. Container name is not unique enough I think, because if IIRC, they can restart. |
Lists are typically pretty bad to work with in terms of grouping and filtering in most query languages. I would really like to avoid that if at all possible. |
Thanks for explaining how ECS works. K8s analogy helps. Pod, Task, Container Instance and Process are all potentially valid telemetry producing entities. The choice of which entity to record in the Resource depends on what telemetry you are producing. This is where I apply the litmus test, to make this choice. For example if you are measuring the CPU usage by the entire Pod (or the entire Task) then you want to record an identifier of the Pod (or of Task), so you would put Which set of attributes to choose (or which value to put in |
@tigrannajaryan so, since this discussion is grounded in ECS resource detectors (as running in SDKs within applications inside containers inside tasks), you would use the Container ARN in |
Can one Task contain multiple Containers? If that is the case then we must include whatever data is necessary to uniquely identify the Container in telemetry. If we want to use just a single attribute then we seem to have no choice but to record Container ARN in Both approaches are valid (using just The question then is which set of attributes to choose. One argument I can bring is that using a smaller number of attributes should be preferable, which leads to using Additionally, if you we expect multiple processes in one Container we must also include the So, in this particular use case of an application instrumented by Otel SDK that is a process inside a Container on an ECS Task I think the right set of attributes is this:
|
A task can contain multiple containers (thought it’s somehow less common than in K8S because ECS has nothing comparable to admission controller mutating webhooks AFAIK).
Alright. Anyhow both Task and Container ARN are available through the AWS ECS ( I’ll open PRs accordingly against the detectors. |
One other related to thing that we should probably discuss is how detectors should affect the Service, particularly the There is an open PR that defines an algorithm for generating the I think other detectors (like ECS detector) should also try to populate the Interestingly, some resource detectors in the Collector do that, for example Elasticbeanstalk detector. I think we should open a separate issue and discuss it separately. |
@tigrannajaryan @jsuereth as far as I am concerned, this question has been answered. Should I close the issue, or do you want to keep it around to add clarifications to the wording of |
Btw all the PRs are open. I wish the AWS people paid more attention to PRs against their detectors :-( |
I am wondering what is the right way to implement support for
cloud.resource.id
in SDK resource detectors for AWS ECS. Specifically, it's unclear to me whether it should be the Task ARN or the Container ARN. No ECS detector so far has implementedcloud.resource_id
, so we do not have precedent.The semantic conventions state about
cloud.resource_id
:Given the fact that SDKs "live" in containers within the task, one could see the Container ARN as right.
However, I am leaning towards the Task ARN: the container is a non-independent or self-container part of a task, and we have already
container.name
andcontainer.id
that technically could be used to recreate the container ARN starting from the task's, while the "other direction" is not true.One could also argue that we already set the Task ARN as the
aws.ecs.task.arn
resource attribute, but I find that a rather unconvincing argument.For reference, using the metadata v4 examples by AWS:
The Docker ID is already stored by the various implementations of the ECS detectors under the
container.id
resource attribute.What's the consensus? As I am anyhow touching up the various ECS detectors to support more
cloud.*
attributes (cloud.account.id
,cloud.availability_zone
andcloud.region
specifically), I would have PRs up for all languages with ECS detectors (Go, .NET, Java, Python, PHP, Node.js) available in a matter of hours.The text was updated successfully, but these errors were encountered: