You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feature is to provide customer success and failure counters API-wise. The feature extends _nodes/stats API to expose response code counters for all the APIs that have been called at-least once on the cluster. Response code stats in _nodes/stats response looks like this
Whenever 4xx or 5xx errors occur for any API call, we don’t have way to know what number of errors caused by which API. This helps to monitor the system behaviour for different APIs
No, there are no breaking changes. All changes will be backward compatible
What is the user experience going to be?
Today OpenSearch publishes lot of cumulative stats around indexing and search, API usage etc. Users hit the stats API periodically and plot the difference of these stats over time to understand how system is being used and performing over time.
Lets look at two user stories to understand the usefulness of feature
A user sees failures and wants to understand which all APIs are impacted, is it only search or only indexing or all APIs are failing and can also configure alarms if needed.
A user is upgrading their cluster to a different version (using rolling restart) and started seeing errors but only few requests are failing and not all. User sends requests through the load balancer to distribute the traffic across all nodes. Now these API level stats will provide node metrics and gives an idea which nodes are failing the requests and debug faster as opposed to analysing the logs which usually takes a lot of time and effort.
An operator sees API failures but want to know which APIs are frequently failing to involve subject matter experts of specific APIs like _search and _bulk for further debugging. A more advanced scenario is a system monitoring rest action stats to automatically cut a ticket to teams responsible for operating specific parts like indexing in a cluster.
Are there breaking changes to the User Experience?
No
Why should it be built? Any reason not to?
It should be built to allow users have a better view of the failures across APIs in opensearch which can give a good direction for further debugging making the overall process faster. It also allow users to build better monitoring solution using rest action stats.
What will it take to execute?
It will involve making changes to code related to stats api
Any remaining open questions?
NA
The text was updated successfully, but these errors were encountered:
Hi @CaptainDredge, Thanks for spending the time to put this proposal, Is there a reason why we add this as separate issue? I would prefer to keep our discussion into the original issue #4401 for better tracking.
@anasalkouz I'm open to keeping the discussion on the original issue but I assumed the process for contribution is to have separate issues, one of feature ask which explains the problem and other for the proposal which explains the solution. What do you suggest, should we close this one and add all the details in original issue description?
@CaptainDredge
Sorry for the late response, seems the origin issue also has some proposal. I would move this to the origin issue to get more attractions from the community.
What are you proposing?
This feature is to provide customer success and failure counters API-wise. The feature extends _nodes/stats API to expose response code counters for all the APIs that have been called at-least once on the cluster. Response code stats in _nodes/stats response looks like this
Whenever 4xx or 5xx errors occur for any API call, we don’t have way to know what number of errors caused by which API. This helps to monitor the system behaviour for different APIs
What users have asked for this feature?
#4401
What is the developer experience going to be?
Opensearch
_nodes/stats
API will have an additionalrest_actions
section for each node in the json responseAre there any security considerations?
No
Are there any breaking changes to the API
No, there are no breaking changes. All changes will be backward compatible
What is the user experience going to be?
Today OpenSearch publishes lot of cumulative stats around indexing and search, API usage etc. Users hit the stats API periodically and plot the difference of these stats over time to understand how system is being used and performing over time.
Lets look at two user stories to understand the usefulness of feature
_search
and_bulk
for further debugging. A more advanced scenario is a system monitoring rest action stats to automatically cut a ticket to teams responsible for operating specific parts like indexing in a cluster.Are there breaking changes to the User Experience?
No
Why should it be built? Any reason not to?
It should be built to allow users have a better view of the failures across APIs in opensearch which can give a good direction for further debugging making the overall process faster. It also allow users to build better monitoring solution using rest action stats.
What will it take to execute?
It will involve making changes to code related to
stats
apiAny remaining open questions?
NA
The text was updated successfully, but these errors were encountered: