Skip to content

Conversation

@lahsivjar
Copy link
Contributor

@lahsivjar lahsivjar commented Aug 7, 2024

The logic is not at full parity with what apm-data does. It is mainly because, a) apm-data uses a lot of deprecated fields and this is a good opportunity to phase them out while moving to the newer attributes, and b) the logic is overly complex and I don't think it is worth it.

Above being said, I have added comments in the diff below to explain where we diverge from apm-data and how. @axw @felixbarny @gregkalapos would be great to get your eyes on this.

Fixes: #70

Comment on lines 359 to 362
// getHostPort derives the host:port value from url.* attributes. Unlike
// apm-data, the current code does NOT fallback to net.* or http.*
// attritubtes as most of these are now deprecated.
func getHostPort(urlFull *url.URL, urlDomain string, urlPort int64) string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[For reviewers] The handling for destination host and port have a few variants in apm-data in terms of fallback but I believe at-least the current complexity of handling this in apm-data is not worth the effort and should be fixed. Below are the main common points:

  1. Both, destination port and address, are derived from url.full (apm-data uses the now deprecated http.url) (ref)
  2. First level of fallback in apm-data uses http.* attributes to build the full url. Unfortunately, these http.* attributes are also deprecated (and some deprecations are present from quite some time). In the current enrichments code, the fallback uses the latest url.* attributes which replace the deprecated http.* ones.
  3. APM-data has another level of fallback which uses net.peer.{port, name, ip}, all of these attributes are now deprecated and some are even removed from the latest semconv definitions. I have not implemented this fallback level at all in the enrichments.

Let me know if any of the above doesn't make sense, I will add the fallbacks if needed. The apm-data code can be perused here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok.

Most of this comes from the stabilization of the HTTP attributes. The post highlights depreciation of those fields.

Unfortunately SemConv and OTel SDKs move fairly slowly. So with this, we have the risk that we may run into an SDK that uses the old version and still uses the old attributes. If SemConv deprecates an attribute, it just means that it's deprecated in that specific version - but that's all. It does not directly trigger any change in any implementation and instrumentations are not really forced to update. Ultimately they will; but that can take very long.

Having said that, I agree with the approach - let's not move the handling of those deprecated attributes yet. I'd do it on-demand: if we get user feedback, we could do it - or better, discuss our general strategy on handling different versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ Seems reasonable. As mentioned in the meeting a couple of days ago, those deprecations would ideally be handled by the schema processor. Perhaps if we find we need to handle the deprecated attributes, we could implement some minimal transformation just prior to enrichment.

Comment on lines +321 to +326
// For parity with apm-data, destn resource does not handle
// temporary destination flag. However, it is handled by
// service.target fields and we might want to do the same here.
if destnResource != "" && s.messagingDestinationName != "" {
destnResource += "/" + s.messagingDestinationName
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[For reviewers] Check the code comment.

}
}

func (s *spanEnrichmentContext) setDestinationService(span ptrace.Span) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[For reviewers] There is quite a bit of discrepancy between the destination resource and the service target fields (service.target.{name, type}) - you can check this by the test cases in the PR. The current code keeps the logic similar to what is present in apm-data for the most part -- any points of differences should be highlighted in the other comments.

@lahsivjar lahsivjar marked this pull request as ready for review August 7, 2024 16:28
@lahsivjar lahsivjar requested a review from a team as a code owner August 7, 2024 16:28
Copy link
Contributor

@gregkalapos gregkalapos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @lahsivjar.

I think this is ok - below I added a few ideas trying to unwrap the differences between destination.service.resource and the target fields - all those are meant as brainstorming and maybe trying to have some ideas for follow-up cleanups if we want to. Take all with a grain of salt - maybe I'm wrong on those.

Overall I agree with the suggestion here to keep most of the logic from apm-data and removing the deprecated attributes. 👍


func (s *spanEnrichmentContext) setDestinationService(span ptrace.Span) {
var destnResource string
if s.peerService != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, peer.service is a very interesting attribute.

SHOULD be equal to the actual service.name resource attribute of the remote service if any.

This is THE ultimate attribute describing a dependency 🎉 - if I understand the spec correctly, this literally gives us the service name of the callee.

So the difference between taking this as the higher prio attribute for span.destination.service.resource vs. not taking it as the higher one for service.target.name comes from apm-data, right?

I suspect the thing here is the following: while this attribute in theory is very useful, in reality, that is probably never used. How could one populate this attribute? I can't imagine an agent is able to figure out the service.name of the service that it's calling. That's something that the user sets on the other side (or is auto populated based on the executable name). Looking at the history it seems that code was there since the very first commit (haven't checked the pre-history in apm-server).

So I think the code path that uses peer.service is basically never triggered in real life.

If this would be a new mapping, I'd say that there should not be a difference here - if peer.service is set, then service.target.name should take it as well, because it's more useful to see the service name as e.g. www.foo.bar:443. But given we have so many moving parts, I agree, let's not go into this right now - just keep as is.

At the same time, I see there are lots of tests focusing on this - it's ok to have those, just keep in mind, if my theory about basically never having peer.service in real life is correct, then those tests are not very relevant for production. It's more likely that part is always skipped and these fields are populated based on other attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the difference between taking this as the higher prio attribute for span.destination.service.resource vs. not taking it as the higher one for service.target.name comes from apm-data, right?

Yes, you are right.

Thanks for the insight on the peer.service, I was wondering about the peer.service before and your point does make sense. However, I wonder if we should give higher priority to peer.service for service.target.name deduction in an event it is actually set... Maybe something to discuss outside this PR and long term.

But given we have so many moving parts, I agree, let's not go into this right now - just keep as is.

Agreed 👍

At the same time, I see there are lots of tests focusing on this - it's ok to have those, just keep in mind, if my theory about basically never having peer.service in real life is correct, then those tests are not very relevant for production. It's more likely that part is always skipped and these fields are populated based on other attributes.

Hmm, good point. I think I added peer.service for a lot of test cases. I will revisit them separately and see if I need to edit something, will create a followup if required.

Comment on lines 359 to 362
// getHostPort derives the host:port value from url.* attributes. Unlike
// apm-data, the current code does NOT fallback to net.* or http.*
// attritubtes as most of these are now deprecated.
func getHostPort(urlFull *url.URL, urlDomain string, urlPort int64) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok.

Most of this comes from the stabilization of the HTTP attributes. The post highlights depreciation of those fields.

Unfortunately SemConv and OTel SDKs move fairly slowly. So with this, we have the risk that we may run into an SDK that uses the old version and still uses the old attributes. If SemConv deprecates an attribute, it just means that it's deprecated in that specific version - but that's all. It does not directly trigger any change in any implementation and instrumentations are not really forced to update. Ultimately they will; but that can take very long.

Having said that, I agree with the approach - let's not move the handling of those deprecated attributes yet. I'd do it on-demand: if we get user feedback, we could do it - or better, discuss our general strategy on handling different versions.

Comment on lines 403 to 404
AttributeServiceTargetName: "service.Test",
AttributeSpanDestinationResource: "testsvc",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good example that highlights using peer.service as the highest prio for both would make more sense:

As said above, service.peer is the name of the calee service. So if it's A -> B (where service A calls service B), then service.peer would be the service B - that service B shows up on the services overview (with the name B) - so it's like a real service.

Now, rpc.service that we use for AttributeServiceTargetName is The full (logical) name of the service being called, including its package name with the comment:

This is the logical name of the service from the RPC interface perspective, which can be different from the name of any implementing class.

So this is not the name of the service (as we call it service name), but just what's defined on the RPC interface - which is very likely not B, but something like myservice.EchoService, which does not show up as a service.

So in this example, the caller would show up as one service, testsvc is another service, and then service.Test is the name from the RPC's point of view.

So then the question is: what would users want to see as the dependency? I'd argue if I know that my service B is called, then I'd want to see that as the dependency, and not some RPC interface.

Now... you can still maybe figure out the A -> B relationship if you use rpc.service, because all spans have the same trace id (both in A and B), but still, if we know A calls B, then why would be the target anything other than B?

I just try to help understanding the difference; and maybe I'm wrong anyway. So again, we can definetely keep it as is. This is just some idea trying to unwrap this thing :) That way maybe we can clean up more in follow-ups.

Copy link
Contributor Author

@lahsivjar lahsivjar Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat agree with your points. IIANM, service target is mainly driven by this spec I think we would require a wider discussion on updating the spec or enhancing this for OTel. Maybe we should create an issue for revisiting the spec of the OTel translation?

var destnResource string
if s.peerService != "" {
destnResource = s.peerService
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be just an if-else and then dropping the lots of destnResource == "" checks for better readability?

Something like:

	if s.peerService != "" {
		destnResource = s.peerService
	} else {
		switch {
		case s.isDB:
			if s.dbType != "" {
				destnResource = s.dbType
			}
		case s.isMessaging:
			if s.messagingSystem != "" {
				destnResource = s.messagingSystem
			}
			// For parity with apm-data, destn resource does not handle
			// temporary destination flag. However, it is handled by
			// service.target fields and we might want to do the same here.
			if s.messagingDestinationName != "" {
				destnResource += "/" + s.messagingDestinationName
			}
		case s.isRPC, s.isHTTP:
			if res := getHostPort(s.urlFull, s.urlDomain, s.urlPort); res != "" {
				destnResource = res
			}
		}
	}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or separate the derivation from setting the attribute, so we can return early.

Copy link
Contributor Author

@lahsivjar lahsivjar Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

		// For parity with apm-data, destn resource does not handle
		// temporary destination flag. However, it is handled by
		// service.target fields and we might want to do the same here.
		if destnResource != "" && s.messagingDestinationName != "" {
			destnResource += "/" + s.messagingDestinationName
		}

This is the problematic bit, the destination resource is updated based on the messagingDestinationName. I could make it return early with some conditions but, I think, that would make the code a bit harder to follow through and maintain. No strong opinions though but a simple if/else or a case in the switch statement itself for s.peerService != "" will not work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it was not clear in the above statement, the messaging destination is appended to the destination resource if it is not empty. Atleast, this is what parity with apm-data would be. This would mean that if peer.service is testsvc and messagingDestinationName is topic1 then the destination resource would be testsvc/topic1 -- this is also covered in the test case TestElasticSpanEnrich/messaging_destination

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I was thinking we could do something like this:

if s.messagingSystem != "" {
    destnResource := s.messagingSystem
    if s.messagingDestinationName != "" {
        destnResource += "/" + s.messagingDestinationName
    }
    return destnResource
}

I realise now that's not quite equivalent to what you've implemented though, because as it's written we might append the messaging.destination.name to peer.service. Is that a valid thing to do though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a valid thing to do though?

I spent a bit of time reasoning about this when I was reading the apm-data code but I couldn't think of this NOT being valid so I carried over the logic. Technically, if we know the destination name of a topic and we have peer.service then appending the topic would result in an useful valid value. However, I am not sure how this would look practically for ex: will the peer.service and messaging.destination.name be populated together?

I am not too sure overall what to do here. I would suggest we do this separate to this PR, I have created a GH issue to discuss this and make any changes to this. Let me know if it sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, thanks.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question about omitting the scheme default port.

Comment on lines 359 to 362
// getHostPort derives the host:port value from url.* attributes. Unlike
// apm-data, the current code does NOT fallback to net.* or http.*
// attritubtes as most of these are now deprecated.
func getHostPort(urlFull *url.URL, urlDomain string, urlPort int64) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ Seems reasonable. As mentioned in the meeting a couple of days ago, those deprecations would ideally be handled by the schema processor. Perhaps if we find we need to handle the deprecated attributes, we could implement some minimal transformation just prior to enrichment.

var destnResource string
if s.peerService != "" {
destnResource = s.peerService
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or separate the derivation from setting the attribute, so we can return early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[enrichments] populate span.destination.service.resource

3 participants