-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple service names per process and process-global metrics #65
Comments
pulling in @elastic/apm-server |
I also think the last option is best. One other option would be to introduce some kind of "application server" metadata, but I think that would probably just make things more complicated.
Theoretically you could, but it would be unusual to do that in Go. I'm considering adding the ability to override the service name in the Go agent, in order to implement service mesh adapters. For that, there isn't likely to be any common metrics, as the agent would not even be running on the same machine as the services. |
Personally I think options 3 and 4 are best, but prefer 3. Changing the type of The main drawback of option 3 (sending dup data) is valid, but given that it will affect a minority of agents and a probably minority of deployments (several services on the same JVM), I think is acceptable for the benefits. |
Not necessarily. The JSON schema could allow strings and string arrays. |
Technically you could have the same use-case in Node.js, but I've never seen this in the wild, and would kind of consider it an anti-pattern. Regarding the best way to solve this, for simplicity's sake, I'd prefer if our solution would end up duplicating data in Elasticsearch. I don't care if the agent sends it up twice or if the APM Server duplicates the data before storing it, but if the end result is two identical metric-sets - one for each service - then we keep the querying as simple as possible (as it doesn't change compared to how it is today). One downside of the duplicated data is that if a user ever creates a dashboard showing the total system CPU aggregated across all services on that system, they will get the wrong result. But I'm not sure if that's a use-case we want to support. |
I'm not sure I agree. It's a perfectly valid use-case, especially in enterprise Java, to even have dozens of applications deployed in the same application server. @eyalkoren had an idea for a temporary workaround which would not require server changes: the agent would send the same metrics in multiple requests to the APM Server, each with a different I still think allowing for multiple service names would be the cleanest solution ultimately.
Couldn't you just forward the service names to ES as they are sent by the agent? |
Would it actually be exactly 2 or just some N > 1? Either way, connection count isn't typically a problem for apm-server given ratios of app-server:apm-server in typical deployments.
For intake and persistence that is certainly feasible and backwards compatible, and would be best for storage usage, however I am not sure if the UI would need some work to handle multiple service names (making this backwards incompatible) - perhaps @elastic/apm-ui could confirm. After running this query to add another service name to existing metrics for the
Of course, a lot more validation would be needed. |
@felixbarny if we have people actively requesting the possibility to use multiple service names, i think it would be fine to release the PR you linked. Having that in will unblock all those folks. They might not be able to use metrics fully yet, but at least all the regular APM stuff will become available to them. We can fix metrics (whatever that ends up meaning) in a following release |
I expected this to cause issues with how we query, but after doing some preliminary tests, it appears to work seamlessly. Indexing a few docs: POST sqren-arrays/_doc
{"service_name" : ["serviceNameA", "serviceNameB"]}
POST sqren-arrays/_doc
{"service_name" : ["serviceNameA", "serviceNameC"]}
POST sqren-arrays/_doc
{"service_name" : ["serviceNameD", "serviceNameC"]}
POST sqren-arrays/_doc
{"service_name" : ["serviceNameA"]}
POST sqren-arrays/_doc
{"service_name" : "serviceNameA"}
Querying by service name returns the two first docs GET sqren-arrays/_search
{
"query": {
"term": {
"service_name.keyword": "serviceNameA"
}
}
} So afaict this doesn't require any changes to the UI side and is therefore backwards compatible from ui POV. |
Oh, one thing that could be problematic: the returned docs differ depending on if they were indexed as strings[] or strings: {
"_source": {
"service_name": ["serviceNameA"]
}
} vs. {
"_source": {
"service_name": "serviceNameA"
}
} I'll have to double check how we handle that. That could require changes to the UI side, and thereby be a breaking. Is this only relevant for metric docs, or will it also affect transaction docs? If so it will be problematic. |
In practice, it will only be relevant for metrics but in theory, there could be a transaction with multiple service names (although that would not make sense). Maybe worth checking though how much it would screw up the UI. If it would display |
Limiting this capability to just metrics is not a problem on the apm-server intake - metric currently doesn't even allow overriding metadata. Have you given thought to what the intake would look like if we were to proceed with multiple service support for just metrics? Is it just name that needs to be a list? Would it be difficult to avoid using if a current metricset payload looks like: {
"metricset": {
"samples": {
"heap.sys.bytes": {
"value": 6.520832e+06
}
},
"timestamp": 1496170422281000
}
} does this look reasonable for multiple services: {
"metricset": {
"samples": {
"heap.sys.bytes": {
"value": 6.520832e+06
}
},
"service": {
"name": [
"service1",
"service2"
]
},
"timestamp": 1496170422281000
}
} |
That would work for me. Note that this is a bit inconsistent with how it's done for transactions and spans (see also elastic/apm-server#1175 (comment)). Also, the advantage of adding support for multiple service names in Example Intake API v2 body:
Obviously, transactions and spans would also have to explicitly override the Having to manually add all service names to all process-global metrics would be a bit cumbersome and would increase the request size a bit but it's something I could live with if it can't be avoided. |
For If we can limit this to metric docs, it should not affect the UI in any way. |
Where is this link generated? In discover? No matter which approach we are taking, we should avoid that agents have to check the version of the APM server in order to send multiple service names. The beauty of @graphaelli's suggestion is that it does not conflict with anything because it's currently not possible to override any metadata for metrics. Also, we can ensure at the API level that transactions can't have multiple service names. If we go with allowing a list of services on the metadata event, I propose leaving |
Note from an ECS perspective: We had quite a few discussion on naming files plural and singular and sticked to keep (almost) everything singular as any field could be a single value or an array. As mentioned before, from an Elasticsearch perspective there is not really a difference. My preference is to have it all in |
Correct. |
I'd like to avoid implicit rules on the json spec level as much as possible. Introducing If multiple |
Is anyone working on this? |
Should it also be possible to ingest multiple service versions along multiple service names? |
Good question. It may have impact on the deployment annotations that are created. That code needs to be aware of how to associate the As of elastic/apm-server#6407, we'd also have the option of just copying the process-global metrics and send it multiple times, overriding the Still unsure about the tradeoffs but multiple service names per service are rather the exception than the norm. It's not a thing in most other languages and the trend is to go towards more lightweight alternatives to app servers in Java and .NET (IIS also supports serving multiple apps). In that light, it may not be worth adding more complexity to the UI and Server for the marginal storage savings when looking at the overall picture. |
Does |
Good question, not sure if metric dimensions will support array values. |
If even if arrays will be supported it might not be a good idea to use them. E.g. if on an application server an application is undeployed the metricset reported before the undeploy will be a different time series than the metricset after the undeploy (as I understand it) which might make aggregations based on the service.name more costly. Another example is the start of an application server when multiple service will be deployed one after the other. |
Documenting our decision on how we want to proceed on this one: we will go with duplicating metricsets for each With @tobiasstadler's latest contribution to APM Server, each agent can now go ahead and implement that without any dependency. |
When implementing support for multiple service_names (elastic/apm-server#1175) per JVM for the Java agent (elastic/apm-agent-java#514), I have noticed a problem in regards to metrics.
Background: a Server in Java can host multiple applications in a single JVM/process. This might be very Java specific. However, legacy .NET classic applications offer a similar concept with AppDomains. Currently, we don't support .NET Classic (only .NET Core).
The problem is that metrics, like process CPU utilization, are not specific to a single service within a JVM but are global for the JVM. How can we ensure that the metrics can be associated and correlated with all of the services it belongs to? To make matters worse, we will, in the future, likely have metrics which do belong to a specific service. For example response time metrics.
To be more specific about what functionality would be missing: say someone has deployed service
foo
andbar
together in a JBoss application server on host A and host B. Now, they want to get the average process CPU utilisation of thefoo
service. Naturally, they apply a query bar filter likeservice.name: foo
. The query yields no results because the metrics neither have theservice.name
set tofoo
, not tobar
. Instead, it would be the detected default service name, in this casejboss-application
. They however could find out the PID of all of thefoo
services, apply a filter for all hosts wherefoo
is running on and filter on all thepid
s. Or, they could filter by the namejboss-application
. However, that would also yield results for totally unrelated JVMs which also happen to be JBoss applications.A couple of approaches come to mind:
jboss-application
(status quo)service.name
s don't match.service.name
within a processservice.name
works and behaves intuitivelyservice.name
to be an array.Process global metrics would then have all the
service.name
s which are active for the current process. If a metricset is service specific, it overrides the theservice.name
and sets a single value.service.name
works and behaves intuitivelyI would personally vote for the last option as it seems to resemble reality most closely. The question is if we should block elastic/apm-agent-java#514 until we have support for that in the intake API or if we should go ahead with a temporary solution which does not allow for an intuitively working
service.name
filter.@elastic/apm-agent-devs Does the runtime on your respective agents support multiple applications within a single process? Do you think having the ability to override the
service.name
with possibly a list of service names is feasible?@alvarolobato @roncohen @eyalkoren do you think we have to consider this issue a blocker for elastic/apm-agent-java#514?
@simitt Do you think it would be feasible to make it possible to override the
service.name
on a per-metricset basis? What about changing the type ofservice.name
fromstring
tostring[]
?The text was updated successfully, but these errors were encountered: