-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Players in-game metric for when PlayerTracking is enabled #2765
Conversation
Build Failed 😱 Build Id: bd3a0c87-27e9-4daf-9bb2-93fbae20d48c To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 91ece698-173c-43eb-b788-e1dd134936ae The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Failed 😱 Build Id: 94df2c7b-b9e1-465e-9554-62acd4427a61 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 61069ef0-2e12-4239-8032-4386e12f92fd To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 231f27d2-aae3-49c3-a60d-386aff714daa The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this - this is the issue to reference for this work:
As a note, player tracking will very likely, eventually be replaced by:
But while it's being developed, I don't see any reason we can't include this to keep the work ongoing and gather more feedback.
Added some notes for review on the PR though.
We are in feature freeze for the next week for RC, but that also gives us time to review during that period.
pkg/metrics/controller_metrics.go
Outdated
@@ -52,6 +53,7 @@ var ( | |||
fasLimitedStats = stats.Int64("fas/limited", "The fleet autoscaler is capped (0 indicates false, 1 indicates true)", "1") | |||
gameServerCountStats = stats.Int64("gameservers/count", "The count of gameservers", "1") | |||
gameServerTotalStats = stats.Int64("gameservers/total", "The total of gameservers", "1") | |||
gameServerPlayersInGame = stats.Int64("gameservers/players_in_game", "The total number of players connected to gameservers", "1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://prometheus.io/docs/practices/naming/ I've suggest making this gameserver_player_connected_total
for the total number of players.
Since you are in here, should we also add gameserver_player_capacity_total
such that you can also track how much room there is over time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I changed the metric to gameserver_player_connected_total
.
I added gameserver_player_capacity_total
, it tracks PlayerCapacity - PlayerCount
@@ -43,6 +43,7 @@ Follow the [Stackdriver Installation steps](#stackdriver-installation) to see yo | |||
| agones_gameservers_count | The number of gameservers per fleet and status | gauge | | |||
| agones_gameserver_allocations_duration_seconds | The distribution of gameserver allocation requests latencies | histogram | | |||
| agones_gameservers_total | The total of gameservers per fleet and status | counter | | |||
| agones_gameservers_players_in_game | The total number of players connected to gameservers (Only available when [player tracking]({{< relref "player-tracking.md" >}}) is enabled) | gauge | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since docs are published on merge, we'll need you to create two instance of this table, each in feature
shortcodes, one with the new metric(s) and one without.
That way the new metric will stay hidden until the next release 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of duplicating the table, I added the feature
shortcode within the existing one to hide the new metrics until Agones 1.28.
Let me know if this is suitable or if I should duplicate the table instead, I'm not super familiar with Hugo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard the last comment. Had to duplicate the table as what I did wasn't working as expected.
…r capacity metric. Hide docs until next agones release.
Build Succeeded 👏 Build Id: 4f6f141d-6b8a-4ed0-8042-3964e6f41152 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Failed 😱 Build Id: 19749a70-8355-4736-8050-177a6ea1fc11 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
I think there's a flaky test that made the build fail. |
Looks that way - retrying tests. |
Build Succeeded 👏 Build Id: 15afffba-f417-4e2b-8aac-4abeeaf12f1e The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
pkg/metrics/controller_metrics.go
Outdated
@@ -42,19 +44,21 @@ var ( | |||
// fleetViews are metric views associated with Fleets | |||
fleetViews = append([]string{fleetReplicaCountName, gameServersCountName, gameServersTotalName, gameServerStateDurationName}, fleetAutoscalerViews...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to add gameServerPlayerConnectedTotal
and gameServerPlayerCapacityTotal
to fleetViews
- so that the system knows to reset them when Fleets get deleted,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I just added that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw one small thing to add, but otherwise this look good to go once we are out from feature freeze 👍🏻
Build Succeeded 👏 Build Id: 71afc06c-79da-431a-8c01-46452d1cc03e The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Succeeded 👏 Build Id: 1a9b1e3d-0736-40df-b07c-c678890f433c The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Succeeded 👏 Build Id: 6e09e109-e530-428a-ade2-a9bcb7820dad The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: estebangarcia, markmandel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Build Succeeded 👏 Build Id: 80572dd5-ea9c-4d52-b694-549bee9ec456 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Succeeded 👏 Build Id: c0012f8d-3eac-46d0-a286-96c454e725c7 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
nodepools and regional clusters Updates to release checklist. (googleforgames#2772) * Updates to release checklist. Adding items that showed up in the recent release that were not written down or required better clarification. * Review updates, and some other small tweaks. Co-authored-by: Robert Bailey <robertbailey@google.com> Release 1.27.0 (googleforgames#2776) * Release 1.27.0 * Update FAQ on ExternalDNS (googleforgames#2773) The feature flag it points to have been moved to stable, so the link is not useful any more. Also removed notes on ipv6, since they aren't 100% accurate, as we were discussing in googleforgames#2767. * Updates to release checklist. (googleforgames#2772) * Updates to release checklist. Adding items that showed up in the recent release that were not written down or required better clarification. * Review updates, and some other small tweaks. Co-authored-by: Robert Bailey <robertbailey@google.com> * Release-changes * Review comment * Review changes Co-authored-by: Mark Mandel <markmandel@google.com> Co-authored-by: Robert Bailey <robertbailey@google.com> Version updates (googleforgames#2778) Players in-game metric for when PlayerTracking is enabled (googleforgames#2765) * Check for DeletionTimestamp of fleet and gameserverset before scaling * Add metric to track player count in gameservers * check PlayerStatus is not nil * Update metrics available in docs * Wrong relref path * typo * Change name for players in game metric to player connected. Add player capacity metric. Hide docs until next agones release. * Duplicate metrics table * add gameserver player tracking metrics to fleetViews Co-authored-by: Mark Mandel <markmandel@google.com> Remove generation for swagger Go code and Add static swagger codes for test (googleforgames#2757) Co-authored-by: Mark Mandel <markmandel@google.com> Updated allocation yaml files under examples/ to use selectors Show how to set graceful termination in a game server that is safe to (googleforgames#2780) evict. Avoid retry from allocateFromLocalCluster under context kill. (googleforgames#2783) * Version updates * issue-2736-changes Co-authored-by: Mark Mandel <markmandel@google.com> Bring SDK base image to debian:bullseye (googleforgames#2769) * Bring SDK base image to debian:bullseye The upgrade to gRPC solved one issue, and I also added a limit to number of processes that could run for `make -j` otherwise the whole thing would fall over (also would crash my dev machine!). Closes googleforgames#2224 * Force refresh of cpp cache on Cloud Build. * Fixes for CI: * Revert CI cache increment (don't think we need it) * Add shell to cpp image for debugging. * Fix formatting issue that is breaking CI. Co-authored-by: Robert Bailey <robertbailey@google.com> Update health-checking.md (googleforgames#2785) Fixed spell error: spec.health.failureTheshold to spec.health.failureThreshold Updated allocation yaml files under examples/ to use selectors (googleforgames#2787) Cleanup of load tests (googleforgames#2784) * issue-2744 updated changes with new description * 2744 review changes Sync Pod host ports back to GameServer in GCP (googleforgames#2782) This is the start of the implementation for googleforgames#2777: * Most of this is mechanical and implements a thin cloud product abstraction layer in pkg/cloud, instantiated with New(product). The product abstraction provides a single function so far: SyncPodPortsToGameServer. * SyncPodPortsToGameServer is inserted as a hook while syncing IP/ports, to let different cloud providers handle port allocation slightly differently (in this case, GKE Autopilot) * In GKE Autopilot, we look for a JSON string like `{"min":7000,"max":8000,"portsAssigned":{"7001":7737,"7002":7738}}` as an indication that the host ports were reassigned (per policy). As a side note to anyone watching, this is currently an unreleased feature. If we see this, we use the provided mapping to map the host ports in the GameServer.Spec. With this change, it's possible to launch a GameServer and get a healthy GameServer Pod by adding the following annotation: ``` annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "true" autopilot.gke.io/host-port-assignment: '{"min": 7000, "max": 8000}' ``` If this PR causes any issues, the cloud product auto detection can be disabled by setting `agones.cloudProduct=generic`, or forced to GKE Autopilot using `agones.cloudProduct=gke-autopilot`. In a future PR, I will add the host-port-assignment annotation automatically on Autopilot Co-authored-by: Mark Mandel <markmandel@google.com> Update gke terraform files to allow autoscaling Fix (not really) problems reported by VSCode (googleforgames#2790) VSCode reports `main redeclared` between allocationload.go and runscenario.go due to the fact that they both look like `package main` binaries in the same directory, similar e.g. [this poster on a different project](https://stackoverflow.com/questions/66970531/vs-code-go-main-redeclared-in-this-block) To fix it, it's easy enough to just give these binaries their own package path and fix up the calling scripts. Along the way, fix a lint complaint in runscenario.go Add location variable for cluster location argument Minor fix changed default of location var to empty string GameServerRestartBeforeReadyCrash: Run serially (googleforgames#2791) Narrow the race in googleforgames#2445 by running GameServerRestartBeforeReadyCrash serially. See googleforgames#2445 (comment) for a detailed analysis. Does not fix the issue - this is stopgap until we understand how to fix it. Enable fieldalignment linter, then mostly ignore it (googleforgames#2795) Enable the fieldalignment linter by enabling all `govet` checks except shadowing. Ignore large swaths of code (tests, cmd/, APIs), and nolint'd existing complaints that seemed irrelevant. Along the way: * removed existing nolint:maligned, as `maligned` is no more. * disabled `structcheck` and `deadcode` as they are deprecated (and I think have been subsumed by other linters?) * changed `gameServerCacheEntry` to `gameServerCache`. It is the cache, not just an entry. * fixed alignment of `gameServerSetCacheEntry`. Add fswatch library to watch and batch filesystem events, use in allocator (googleforgames#2792) This pull refactors the fsnotify code in allocator/main out to a shared library, and in that shared library implements a batched notification processor. Closes googleforgames#1816: This takes a slightly different approach than specified in the issue, instead choosing to just delay processing until after a batch processing period. I chose 1s - it's far longer than necessary, but still much shorter than it takes for the secret changes to propagate to the container anyways. I considered the approach in googleforgames#1816 of trying to parse the actual events, but it's too fiddly to get exactly right: e.g. maybe you only refresh on "write", but then "chmod" could make the file readable whereas it wasn't before, "rename" could expose a file that wasn't there before, etc. Cloud product: Split port allocators, implement Autopilot port allocation/policies (googleforgames#2789) In the Agones on GKE Autopilot implementation, we have no need for the port allocator - the informer/etc. is an unnecessary moving piece. This PR allows for cloud products to provide their own port allocation implementation, and implements the GKE Autopilot "allocator". We do this by: * Splitting portallocator off to its own package. It was basically self-sufficient anyways, except it was a little too friendly with controller_test.go. I solved that by introducing a TestInterface for controller_test.go to upcast to. * Allow cloud product implementations to define their own port allocator. * Defining a new port allocator for GKE that does a simple per-port HostPort allocation, and adds the host-port-assignment annotation to the pod template. * Extend cloudproduct again to add a GameServer validator * And in Autopilot, reject if the PortPolicy is not `Dynamic` Release: Note to switch away from `agones-images` (googleforgames#2809) Since we have few guardrails on accidentally touching `agones-images` project, adding a note in the release checklist to switch back to a local development project after running a release. Flake: TestControllerGameServerCount (googleforgames#2805) Made it deterministic in the test, and got rod of the potential race conditions. Also fix it such that the util function for generating GameServer names always produce a unique name. Closes googleforgames#2804 Co-authored-by: Robert Bailey <robertbailey@google.com> Remove Windows FAQ Entry (googleforgames#2811) The contents are no longer accurate, and are covered in the installation section now. Makefile changes for adding location variable added autoscale parameters to Makefile and README Markdown fix in readme Changed LOCATION to always be set with ZONE as default use only if the variable has a value fixed extraneous characters update gke terraform exmaple module Update Node.js dependencies and package (googleforgames#2815) * Update all dependencies and Node,js to LTS version * Update other docker images that use Node.js Added autoscale to example cluster and added to website docs Added defaults and feature expiry Remove zone from gke/variable.tf file.
What type of PR is this?
/kind feature
What this PR does / Why we need it:
Records a players in-game metric only when PlayerTracking is enabled.
We recently had a big playtest and needed a way to count players in-game in Prometheus.
We created a separate service that watches over all gameservers but thought that adding this as part of Agones made more sense.
Which issue(s) this PR fixes: #1035 (partially)
Special notes for your reviewer: