Add enrichment for span.destination.service.resource #72

lahsivjar · 2024-08-07T16:03:34Z

The logic is not at full parity with what apm-data does. It is mainly because, a) apm-data uses a lot of deprecated fields and this is a good opportunity to phase them out while moving to the newer attributes, and b) the logic is overly complex and I don't think it is worth it.

Above being said, I have added comments in the diff below to explain where we diverge from apm-data and how. @axw @felixbarny @gregkalapos would be great to get your eyes on this.

Fixes: #70

lahsivjar · 2024-08-07T16:22:49Z

enrichments/trace/internal/elastic/span.go

+// getHostPort derives the host:port value from url.* attributes. Unlike
+// apm-data, the current code does NOT fallback to net.* or http.*
+// attritubtes as most of these are now deprecated.
 func getHostPort(urlFull *url.URL, urlDomain string, urlPort int64) string {


[For reviewers] The handling for destination host and port have a few variants in apm-data in terms of fallback but I believe at-least the current complexity of handling this in apm-data is not worth the effort and should be fixed. Below are the main common points:

Both, destination port and address, are derived from url.full (apm-data uses the now deprecated http.url) (ref)

First level of fallback in apm-data uses http.* attributes to build the full url. Unfortunately, these http.* attributes are also deprecated (and some deprecations are present from quite some time). In the current enrichments code, the fallback uses the latest url.* attributes which replace the deprecated http.* ones.

APM-data has another level of fallback which uses net.peer.{port, name, ip}, all of these attributes are now deprecated and some are even removed from the latest semconv definitions. I have not implemented this fallback level at all in the enrichments.

Let me know if any of the above doesn't make sense, I will add the fallbacks if needed. The apm-data code can be perused here.

I think this is ok.

Most of this comes from the stabilization of the HTTP attributes. The post highlights depreciation of those fields.

Unfortunately SemConv and OTel SDKs move fairly slowly. So with this, we have the risk that we may run into an SDK that uses the old version and still uses the old attributes. If SemConv deprecates an attribute, it just means that it's deprecated in that specific version - but that's all. It does not directly trigger any change in any implementation and instrumentations are not really forced to update. Ultimately they will; but that can take very long.

Having said that, I agree with the approach - let's not move the handling of those deprecated attributes yet. I'd do it on-demand: if we get user feedback, we could do it - or better, discuss our general strategy on handling different versions.

++ Seems reasonable. As mentioned in the meeting a couple of days ago, those deprecations would ideally be handled by the schema processor. Perhaps if we find we need to handle the deprecated attributes, we could implement some minimal transformation just prior to enrichment.

lahsivjar · 2024-08-07T16:23:52Z

enrichments/trace/internal/elastic/span.go

+		// For parity with apm-data, destn resource does not handle
+		// temporary destination flag. However, it is handled by
+		// service.target fields and we might want to do the same here.
+		if destnResource != "" && s.messagingDestinationName != "" {
+			destnResource += "/" + s.messagingDestinationName
+		}


[For reviewers] Check the code comment.

lahsivjar · 2024-08-07T16:26:23Z

enrichments/trace/internal/elastic/span.go

 	}
 }

+func (s *spanEnrichmentContext) setDestinationService(span ptrace.Span) {


[For reviewers] There is quite a bit of discrepancy between the destination resource and the service target fields (service.target.{name, type}) - you can check this by the test cases in the PR. The current code keeps the logic similar to what is present in apm-data for the most part -- any points of differences should be highlighted in the other comments.

gregkalapos

Thanks for the PR @lahsivjar.

I think this is ok - below I added a few ideas trying to unwrap the differences between destination.service.resource and the target fields - all those are meant as brainstorming and maybe trying to have some ideas for follow-up cleanups if we want to. Take all with a grain of salt - maybe I'm wrong on those.

Overall I agree with the suggestion here to keep most of the logic from apm-data and removing the deprecated attributes. 👍

enrichments/trace/internal/elastic/span.go

gregkalapos · 2024-08-08T18:21:25Z

enrichments/trace/internal/elastic/span.go


+func (s *spanEnrichmentContext) setDestinationService(span ptrace.Span) {
+	var destnResource string
+	if s.peerService != "" {


Hmm, peer.service is a very interesting attribute.

SHOULD be equal to the actual service.name resource attribute of the remote service if any.

This is THE ultimate attribute describing a dependency 🎉 - if I understand the spec correctly, this literally gives us the service name of the callee.

So the difference between taking this as the higher prio attribute for span.destination.service.resource vs. not taking it as the higher one for service.target.name comes from apm-data, right?

I suspect the thing here is the following: while this attribute in theory is very useful, in reality, that is probably never used. How could one populate this attribute? I can't imagine an agent is able to figure out the service.name of the service that it's calling. That's something that the user sets on the other side (or is auto populated based on the executable name). Looking at the history it seems that code was there since the very first commit (haven't checked the pre-history in apm-server).

So I think the code path that uses peer.service is basically never triggered in real life.

If this would be a new mapping, I'd say that there should not be a difference here - if peer.service is set, then service.target.name should take it as well, because it's more useful to see the service name as e.g. www.foo.bar:443. But given we have so many moving parts, I agree, let's not go into this right now - just keep as is.

At the same time, I see there are lots of tests focusing on this - it's ok to have those, just keep in mind, if my theory about basically never having peer.service in real life is correct, then those tests are not very relevant for production. It's more likely that part is always skipped and these fields are populated based on other attributes.

So the difference between taking this as the higher prio attribute for span.destination.service.resource vs. not taking it as the higher one for service.target.name comes from apm-data, right?

Yes, you are right.

Thanks for the insight on the peer.service, I was wondering about the peer.service before and your point does make sense. However, I wonder if we should give higher priority to peer.service for service.target.name deduction in an event it is actually set... Maybe something to discuss outside this PR and long term.

But given we have so many moving parts, I agree, let's not go into this right now - just keep as is.

Agreed 👍

At the same time, I see there are lots of tests focusing on this - it's ok to have those, just keep in mind, if my theory about basically never having peer.service in real life is correct, then those tests are not very relevant for production. It's more likely that part is always skipped and these fields are populated based on other attributes.

Hmm, good point. I think I added peer.service for a lot of test cases. I will revisit them separately and see if I need to edit something, will create a followup if required.

gregkalapos · 2024-08-08T19:29:34Z

enrichments/trace/internal/elastic/span.go

+// getHostPort derives the host:port value from url.* attributes. Unlike
+// apm-data, the current code does NOT fallback to net.* or http.*
+// attritubtes as most of these are now deprecated.
 func getHostPort(urlFull *url.URL, urlDomain string, urlPort int64) string {


I think this is ok.

Most of this comes from the stabilization of the HTTP attributes. The post highlights depreciation of those fields.

Unfortunately SemConv and OTel SDKs move fairly slowly. So with this, we have the risk that we may run into an SDK that uses the old version and still uses the old attributes. If SemConv deprecates an attribute, it just means that it's deprecated in that specific version - but that's all. It does not directly trigger any change in any implementation and instrumentations are not really forced to update. Ultimately they will; but that can take very long.

Having said that, I agree with the approach - let's not move the handling of those deprecated attributes yet. I'd do it on-demand: if we get user feedback, we could do it - or better, discuss our general strategy on handling different versions.

gregkalapos · 2024-08-08T19:43:50Z

enrichments/trace/internal/elastic/span_test.go

+				AttributeServiceTargetName:       "service.Test",
+				AttributeSpanDestinationResource: "testsvc",


I think this is a good example that highlights using peer.service as the highest prio for both would make more sense:

As said above, service.peer is the name of the calee service. So if it's A -> B (where service A calls service B), then service.peer would be the service B - that service B shows up on the services overview (with the name B) - so it's like a real service.

Now, rpc.service that we use for AttributeServiceTargetName is The full (logical) name of the service being called, including its package name with the comment:

This is the logical name of the service from the RPC interface perspective, which can be different from the name of any implementing class.

So this is not the name of the service (as we call it service name), but just what's defined on the RPC interface - which is very likely not B, but something like myservice.EchoService, which does not show up as a service.

So in this example, the caller would show up as one service, testsvc is another service, and then service.Test is the name from the RPC's point of view.

So then the question is: what would users want to see as the dependency? I'd argue if I know that my service B is called, then I'd want to see that as the dependency, and not some RPC interface.

Now... you can still maybe figure out the A -> B relationship if you use rpc.service, because all spans have the same trace id (both in A and B), but still, if we know A calls B, then why would be the target anything other than B?

I just try to help understanding the difference; and maybe I'm wrong anyway. So again, we can definetely keep it as is. This is just some idea trying to unwrap this thing :) That way maybe we can clean up more in follow-ups.

I somewhat agree with your points. IIANM, service target is mainly driven by this spec I think we would require a wider discussion on updating the spec or enhancing this for OTel. Maybe we should create an issue for revisiting the spec of the OTel translation?

gregkalapos · 2024-08-08T19:54:18Z

enrichments/trace/internal/elastic/span.go

+	var destnResource string
+	if s.peerService != "" {
+		destnResource = s.peerService
+	}


Couldn't this be just an if-else and then dropping the lots of destnResource == "" checks for better readability?

Something like:

if s.peerService != "" { destnResource = s.peerService } else { switch { case s.isDB: if s.dbType != "" { destnResource = s.dbType } case s.isMessaging: if s.messagingSystem != "" { destnResource = s.messagingSystem } // For parity with apm-data, destn resource does not handle // temporary destination flag. However, it is handled by // service.target fields and we might want to do the same here. if s.messagingDestinationName != "" { destnResource += "/" + s.messagingDestinationName } case s.isRPC, s.isHTTP: if res := getHostPort(s.urlFull, s.urlDomain, s.urlPort); res != "" { destnResource = res } } }

Or separate the derivation from setting the attribute, so we can return early.

// For parity with apm-data, destn resource does not handle // temporary destination flag. However, it is handled by // service.target fields and we might want to do the same here. if destnResource != "" && s.messagingDestinationName != "" { destnResource += "/" + s.messagingDestinationName }

This is the problematic bit, the destination resource is updated based on the messagingDestinationName. I could make it return early with some conditions but, I think, that would make the code a bit harder to follow through and maintain. No strong opinions though but a simple if/else or a case in the switch statement itself for s.peerService != "" will not work.

In case it was not clear in the above statement, the messaging destination is appended to the destination resource if it is not empty. Atleast, this is what parity with apm-data would be. This would mean that if peer.service is testsvc and messagingDestinationName is topic1 then the destination resource would be testsvc/topic1 -- this is also covered in the test case TestElasticSpanEnrich/messaging_destination

Ah I was thinking we could do something like this:

if s.messagingSystem != "" { destnResource := s.messagingSystem if s.messagingDestinationName != "" { destnResource += "/" + s.messagingDestinationName } return destnResource }

I realise now that's not quite equivalent to what you've implemented though, because as it's written we might append the messaging.destination.name to peer.service. Is that a valid thing to do though?

Is that a valid thing to do though?

I spent a bit of time reasoning about this when I was reading the apm-data code but I couldn't think of this NOT being valid so I carried over the logic. Technically, if we know the destination name of a topic and we have peer.service then appending the topic would result in an useful valid value. However, I am not sure how this would look practically for ex: will the peer.service and messaging.destination.name be populated together?

I am not too sure overall what to do here. I would suggest we do this separate to this PR, I have created a GH issue to discuss this and make any changes to this. Let me know if it sounds good.

Seems fine to me, thanks.

axw

LGTM, just one question about omitting the scheme default port.

enrichments/trace/internal/elastic/attributes.go

axw · 2024-08-09T02:35:49Z

enrichments/trace/internal/elastic/span.go

+// getHostPort derives the host:port value from url.* attributes. Unlike
+// apm-data, the current code does NOT fallback to net.* or http.*
+// attritubtes as most of these are now deprecated.
 func getHostPort(urlFull *url.URL, urlDomain string, urlPort int64) string {


++ Seems reasonable. As mentioned in the meeting a couple of days ago, those deprecations would ideally be handled by the schema processor. Perhaps if we find we need to handle the deprecated attributes, we could implement some minimal transformation just prior to enrichment.

axw · 2024-08-09T02:49:02Z

enrichments/trace/internal/elastic/span.go

+	var destnResource string
+	if s.peerService != "" {
+		destnResource = s.peerService
+	}


Or separate the derivation from setting the attribute, so we can return early.

enrichments/trace/internal/elastic/span.go

lahsivjar commented Aug 7, 2024

View reviewed changes

lahsivjar marked this pull request as ready for review August 7, 2024 16:28

lahsivjar requested a review from a team as a code owner August 7, 2024 16:28

Add enrichment for span.destination.service.resource

881c4ef

lahsivjar force-pushed the svcresource branch from b2eb059 to 881c4ef Compare August 7, 2024 16:29

lahsivjar requested review from axw, felixbarny and gregkalapos August 7, 2024 16:29

Merge branch 'main' into svcresource

212fb23

gregkalapos approved these changes Aug 8, 2024

View reviewed changes

axw approved these changes Aug 9, 2024

View reviewed changes

lahsivjar mentioned this pull request Aug 9, 2024

[enrichments] Destination service resource derivation for messaging systems #74

Open

minor fixes: address review comments

a5b6dd5

lahsivjar merged commit 28fdb11 into elastic:main Aug 9, 2024

lahsivjar deleted the svcresource branch August 9, 2024 14:08

lahsivjar mentioned this pull request Aug 9, 2024

[enrichments] Standardize handling of peer.service attribute in service.target.* and span.destination.service.resource #75

Open

kruskall mentioned this pull request Oct 21, 2024

APM Server 8.16 test plan elastic/apm-server#14405

Closed

11 tasks

		AttributeServiceTargetName: "service.Test",
		AttributeSpanDestinationResource: "testsvc",

Add enrichment for span.destination.service.resource #72

Add enrichment for span.destination.service.resource #72

Uh oh!

Conversation

lahsivjar commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregkalapos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lahsivjar Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lahsivjar Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

axw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lahsivjar commented Aug 7, 2024 •

edited

Loading

lahsivjar Aug 9, 2024 •

edited

Loading

lahsivjar Aug 9, 2024 •

edited

Loading