-
Notifications
You must be signed in to change notification settings - Fork 36
Stats API: moved detector count call outside transport layer and make asynchronous #108
Stats API: moved detector count call outside transport layer and make asynchronous #108
Conversation
|
||
@Override | ||
public void merge(Mergeable other) { | ||
if (this == other || other == null || getClass() != other.getClass()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. other instanceof ADStatsResponse seems better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to use instanceof since this makes a subclass relationship returns true. We want to have exact same class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, will change back.
|
||
ADStatsResponse otherResponse = (ADStatsResponse) other; | ||
|
||
if (otherResponse.adStatsNodesResponse != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. Some edge cases might be handled more clearly, such as when both have node responses/cluster stats, throw an exception if unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this object, the expected behavior if other
has both node response and cluster stats is to merge them both into the object.
Set<String> statsToBeRetrieved = adStatsRequest.getStatsToBeRetrieved(); | ||
for (String statName : adStats.getClusterStats().keySet()) { | ||
if (statsToBeRetrieved.contains(statName)) { | ||
clusterStats.put(statName, adStats.getStats().get(statName).getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. Seems more consistent to use adStats.getClusterStats() as in the for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a different way might be iterating the entries using map::entrySet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion
|
||
public static final ADStatsAction INSTANCE = new ADStatsAction(); | ||
public static final ADStatsNodesAction INSTANCE = new ADStatsNodesAction(); | ||
public static final String NAME = "cluster:admin/ad_stats_action"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor. should ad_stats_action be renamed to ad_stats_node_action ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of our transport actions are using prefix "cluster:admin/ad" like:
"cluster:admin/ad/result"
"cluster:admin/ad/cron"
"cluster:admin/ad/model/delete"
It would be good to have the same style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call out. Will update.
* @param channel Channel to send response | ||
* @param adStatsRequest Request containing stats to be retrieved | ||
*/ | ||
public void getStats(Client client, RestChannel channel, ADStatsRequest adStatsRequest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor. The get methods should be private because they are implementation not contract. Tests are done based on contract not implementation. Because implementation is not client-facing, the full documentation is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will update them.
Set<String> statsToBeRetrieved = adStatsRequest.getStatsToBeRetrieved(); | ||
for (String statName : adStats.getClusterStats().keySet()) { | ||
if (statsToBeRetrieved.contains(statName)) { | ||
clusterStats.put(statName, adStats.getStats().get(statName).getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a different way might be iterating the entries using map::entrySet
"Unable to return AD Stats" | ||
); | ||
|
||
onGetClusterStats(client, delegateListener, adStatsRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on is more of a name for listener. You can change to getClusterStats or sth similar. Same applies below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will update both.
ADStatsResponse adStatsResponse = new ADStatsResponse(); | ||
if (adStatsRequest.getStatsToBeRetrieved().contains(StatNames.DETECTOR_COUNT.getName())) { | ||
if (clusterService.state().getRoutingTable().hasIndex(AnomalyDetector.ANOMALY_DETECTORS_INDEX)) { | ||
IndicesStatsRequest indicesStatsRequest = new IndicesStatsRequest(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only need docs statistics. .setDocs(true) seems what you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right that is correct. Will update.
.setValue(indicesStatsResponse.getIndex(AnomalyDetector.ANOMALY_DETECTORS_INDEX).getPrimaries().docs.getCount()); | ||
adStatsResponse.setClusterStats(getClusterStats(adStatsRequest)); | ||
listener.onResponse(adStatsResponse); | ||
}, e -> listener.onFailure(new RuntimeException("Failed to get AD cluster stats: " + e)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new RuntimeException("Failed to get AD cluster stats", e) is more common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will update.
listener.onResponse(adStatsResponse); | ||
} | ||
} else { | ||
adStatsResponse.setClusterStats(getClusterStats(adStatsRequest)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you didn't set a value here, would it be possible that we pick some value set previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This else
block is only entered if the user does not want to get the detector count stat. So, it will not be retrieved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Is SettableSupplier's value a singleton shared by all of the requests? Would we have concurrent issue like one thread is writing to it while another is reading it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes SettableSupplier's value is a singleton. SettableSupplier uses AtomicLong
for thread safety.
I believe this is the scenario of concern:
- async request(req_1) to get detector count is made (detector count = 3)
- detector is created (detector count = 4)
- async request(req_2) to get detector count is made (detector count = 4)
- req_2 returns and sets detector count to 4
- req_1 returns and sets detector count to 3
- stats are returned for req_1 with detector count = 3
- stats are returned for req_2 with detector count = 3
I think this is a minor issue for a couple of reasons. (1) On each stats request, the detector count is updated so it will not put the system in a permanent state of inconsistency (2) The scenario above is low probability because detector count is not expected to be that volatile and also stats api calls are not expected to be in that tight of a window.
A workaround could be to pass the detector count to getClusterStatsMap(ADStatsRequest adStatsRequest)
and have a special case for DetectorCount stat. However, I am not sure this is best practice. Do you have any other suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree it is a minor issue since document count is only used by humans, right?
Can you create a github issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes only humans will read.
// Use MultiResponsesDelegateActionListener to execute 2 async requests and create the response once they finish | ||
MultiResponsesDelegateActionListener<ADStatsResponse> delegateListener = new MultiResponsesDelegateActionListener<>( | ||
getRestStatsListener(channel), | ||
2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always 2? If customers don't want to get detector count or only want to get detector count, this value should be 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is always 2. We still call the listeners onResponse method even if the calls are not made. For example, here
Issue #, if available:
#107
Description of changes:
Previously, getting the detector count for the stats API was leading to timeouts. The detector count was retrieved by making a timed blocking
SearchRequest
in the transport layer and tallying up the number of results.This PR changes the way in which the detector count is retrieved. It moves the logic to get the detector count to a nonblocking
IndicesStatsRequest
in the Rest layer. Now, the rest layer makes two asynchronous requests:On response from those requests, it combines the node stats and the cluster stats together and sends the response back to the user.
In order to test, I modified existing tests and added a new one. I also manually tested on a local cluster using the command
./gradlew run -PnumNodes=3
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.