Fix metrics not reporting new values #5781
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates the metric registration mechanism so that the most recent metric registration attempt is considered valid and the previous registration is discarded, fixing a long-standing issue that caused multiple metrics to fail to report new values.
Closes #5758
Context
In the previous implementation, only the first metric registration was considered valid, and all subsequent attempts to create the same metric were internally considered invalid. External modules always get a metric on calls to register, even if the registration fails internally. This was a problem because many operations in graph-node are expected to be executed multiple times, and the limited cleanup of old metrics was not working as expected. The result was that many operations in graph-node had non-working metrics after restarts. An example is the subgraph runner, which silently got unregistered metrics after almost every deployment restart/reassignment.
Testing
I have tested the fix locally and the previously broken metrics are now working as expected. The behavior of previously working metrics has not changed.