-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Adding proposal for exposing metrics per GameServer #1845
Comments
I am bad at metrics, so take my comment with a large grain of salt 😄 🧂 We currently expose some gameserver metrics (maybe just in aggregate) through the Agones controller looking at GameServer events. Is there a good reason we shouldn't expose these metrics through the controller, rather than through the sidecar? (I think I know the answer, but figured I would ask the question) |
I had another potentially fun question. I feel like we've conflated two things here:
Since Player Tracking data is stored on the CRD, we could use a similar pattern to what we have now for metrics -- wherein it comes from the controller for Player Tracking, but the Metrics being exposed directly on a GameServer for arbitrary metrics. Just wondering if these two things should be designed separately? Not sure - just asking the question. |
TLDR; I think we both are coming to the same conclusion... 2 separate issues. @markmandel this is something that I was hoping would come up ... my design calls this out that it probably is two separate problems. Was going to bring this up at the next community meeting. PlayerTracking / Agones MetricsI feel that there is Agones metrics (Player Tracking etc.) and arbitrary metrics (per DGS) as you mention we could go with the approach of using the existing event based metrics for things that are in the CRD ... not sure why we dont just expose it via a Which brings us onto the previous question why expose on the sidecar?
Arbitrary MetricsWith arbitrary metris this really comes down to the game server and why I have been struggling with the question "Is this even an Agones issue" ... Unreal especially is bad at giving an engineer metrics about what is inside the container it just wasnt really desigend in this way... game servers written in Rust/Go/node etc. would be able to expose metrics in a standard way using OTel/Prom etc. so why add complexity just for Unreal/Unity? TBH i suppose most AAAs will be using either Unreal/Unity or a self made engine, in this case if it is down to the engine I would go with my first alternative and aggregate metrics from logs using a log shipper like vector. It might be down to a number complimentary projects to help here (that could be linked in Engine readmes) however everyone running Agones will eventually bump into this problem ... how to observe running DGSs. I am in favour of splitting this issue ...
|
That makes sense. I think for Arbitrary metrics, it 100% makes sense to expose it via the Sidecar, and have a limited set of metrics capabilities exposed. If not to limit our SDK size, but also to gather feedback. It's also just a nice experience for users to have a metrics SDK pre-baked. The line I would draw in the sand would be: If we're pushing out metrics based on CRD data, make it come from the controller, since it's tracking all that info anyway, and we already have the infra to manage that. If we're pushing out metrics that are not based on CRD data (which arbitrary ones certainly don't), then it should come through the sidecar (from above, and from our usual patterns, it sounds like /metrics is the way to go) -- this will also allow us to provide metrics on the sidecar itself, which could also be useful as well, and operators only need to configure the one capture endpoint). I'm also thinking if we have predefined metrics that people can populate (frame rate seems like an obvious one), then this also falls into this bucket. How does that sound? As per why it's that way in the first place, I wonder if @cyriltovena will come back around and see us 😄 (also be interested in your take on the above). I think it has a lot to do with the fact that the controller knows about all the gameservers - where the sidecar only knows about itself -- so it's easier to calculate aggregate data -- but like I said previously, I'm not very good at metrics, so I'm the wrong person to ask, and definitely take what I'm saying with a grain of salt. |
Main thing i don't fully grok yet is arbitrary metrics is it something that Agones would want to take on or is it a separate tool / project. Where does the scope of Agones end? |
Oh a note I should make - we should stick to OpenCensus, since that's our currently library of choice for metrics (so we can support multiple backends) -- we may want to upgrade it and/or It may also be time to look to move to open telemetry if it's ready (research required).
That's a good question. I feel like there are some out of the box metrics we can define that most people will want/need (frame rate seems like an obvious choice). Say we do all that work -- we then have the ability to share (some?) of that interface with the user, which is pretty handy. But I get your hesitation. What I would probably suggest - let's focus on player counts, and frame rate (or other specific ones if you have strong opioniosn), and see where we end up after that. That will give us some exploration of the area, and may provide more input into make the decision on arbitrary metrics. It's possible you are right -- maybe we decide complex custom metrics are out of scope, and down to the user? Or maybe we look at it and realise it's a small amount of work to add it in and therefore "why not". Won't know until we try? How does that sound? |
100% sticking with OpenCensus no need change till OTel is stable. Okay, let's get a list of arbitrary metrics together (would be good to get input here as I'm struggling to think of too many that would be across the board) and mock out the proto to see what the API surface area of what might look like. Will look at getting some kind of proto up next week, will be busy prepping for KubeCon so will be latter part of the week. |
Sounds like a plan! 👍 |
This came up in Slack, but I wanted to capture my thoughts here:
I think if the project existed, it would be an awesome addition to Third Party Libraries and Tools though. |
'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions ' |
@domgreen - Is this something that you are still interested in helping drive forward? |
I'm still of mind that this project shouldn't be part of Agones, so we should really close this issue. |
Fully agree should have closed it earlier, my bad. |
No worries at all! I'll close the issue now then. Still an interesting project idea -- although I feel like everyone I know does log based metrics and calls it a day 😄 |
Is your feature request related to a problem? Please describe.
I would like to be able to get metrics such as connected users (currently via player tracking) as well as wider arbitary metrics from game servers such as frame rate.
Describe the solution you'd like
Outlined proposals below
Describe alternatives you've considered
Outlined in proposal below
Additional context
Relates to:
#1035
#1036
#1037
Game Metrics from Dedicated Game Servers
Summary
Engineers often want to observe "metrics" which can be defined as "raw measurements, generally with the intent to produce continuous summaries of those measurements" from within their Dedicated Game Servers.
A proposal to do this would be to allow the DGS to send a small number (100ish) metrics via the SDK to the running agones sidecar. This requires a simple addition to the sidecar to allow it to expose these metrics via the already used OpenCensus integrations. We can update the SDK to have a metrics sub API (much like alpha and beta stages) to support metric work.
Metrics from dedicated servers often break down into two categories, firtly metrics that Agones already knows about count of players, free capactity etc. and arbitary metrics from within the DGS itself such as frame rate, number of sessions, total rings collected etc.
The main concerns with exposing arbitary metrics is to not expose too many, not impact the running DGS and not to reinvent the wheel but choose the correct technologies to work together.
We could theoretically break this into two proposals:
Related Links
Goals
Non-Goals
Proposal: Limited number of metrics exposed via sidecar
Initial proposal would be to allow the sidecar to expose metrics itself, initally this would be the specific to the PlayerTracking data such as current number of connected users, free space that can be allocated per DGS and other such metrics that are already known to Agones. This would use the existing OpenCensus (and later OTel) implementation that is being used within Agones to allow these metrics to be scraped by other projects such as Prometheus. Once these metrics are exposed we can simply add another
ServiceMonitor
as is currently done with the controller to communicate with the Prometheus Operator metrics are avaliable to scrape. This would probably require the addition of a agones-sidecar service to route the traffic or defer back to aPodMonitor
.With the metrics known to Agones this is simple as state above however, metrics that are not nativley known by Agones (frame rate/sessions etc.) we would need a way to expose these metrics via the SDK to the sidecar. Here, we could simply add a small API that supports the basics of metrics (Counters, Guages, Labels) that sends the data to the sidecar that is exposed in the same way as the Agones native metrics are exposed.
It would be up to the Game Engineers to call this API when data they wish to record changes and would therefore be this would allow them to record the metrics that they are interested in from within the context of their own games.
I would envisage this API being a subset of the Agones SDK such as Alpha and Beta sections currently are. I would also look to limit (could be configurable) the number of metrics a game could send to the sidecar this would reduce the burden on the sidecar and reduce explosion of metrics that are reported.
One advantage of this approach is that we end up storing metrics outside of the DGS and in the sidecar causing much less of a memory impact on the running game. We also create a basic API so that all game running in the Agones project can report metrics in the same way.
The main problem however, is wether this is actually the responsibility of Agones to report on metrics from within a running game server and if so do we then also expose future APIs around events and other aspects of obersvability? I would argue that a mature platform would allow the engineers running their game server to pick up key observability points that they are interested in but could be swayed that this is the responsibility of another project much like CPU/Mem can be obtained directly from kubernetes APIs without the need to intefer with the DGS iteself.
Pro:
Con:
Alternatives considered
Expose metrics via logs
This would be my proposal for getting arbitary metrics from a DGS if we were to seperate the two concerns of Agones vs none Agones metrics. In that case I would expose
/metrics
in the sidecar as standard for the data known to Agones (connected users, spaces free etc.).In this alternative we would use the inbuilt loggers from the engines to log in a given format that then log shippers would be able to turn into metrics. An example of this is the Prometheus sink for <vector.dev> which would allow you to transform logs and expose them to Prometheus.
The main concern around this is that we probably would not be able to form a decent standard of how to log metrics and would be up to the engineers running the servers and their game teams to discuss the best approach per individual case.
Pros:
Cons:
Expose a OpenTelemitry/Prometheus endpoint from within the DGS
This appoach would be a more standardised way of exposing metrics to via a running service and would most probably be supported in engines such as vanilla Rust, Go, Javascript etc. but other games servers Unreal for example does not support the idea of running a web server within the game to expose these metrics.
This also doesnt take into consideration that sidecar has metrics such as the number of connected players, free space on the DGS etc. this means we would probably end up implementing a
/metrics
endpoint within the DGS and its sidecar.Pros:
Cons:
Expose metrics via an agreed file format
This approach would instead of sending metrics over the wire could instead write metrics to a specific file mounted on the pod in a well known format see the link above. This could then be picked up by something to expose or ship the metrics to a needed place for aggregation.
Pros:
Cons:
/metrics
endpoint anywayThe text was updated successfully, but these errors were encountered: