-
Notifications
You must be signed in to change notification settings - Fork 36
Stats collection #8
Stats collection #8
Conversation
final CountDownLatch responseLatch = new CountDownLatch(1); | ||
AtomicReference<Long> count = new AtomicReference<>(); | ||
LatchedActionListener<SearchResponse> searchListener = new LatchedActionListener<>( | ||
new DocumentCountListener(count), responseLatch); | ||
|
||
if (response.isPresent()) { | ||
return response.get().getHits().getTotalHits().value; | ||
} else { | ||
return 0L; | ||
client.search(searchRequest, searchListener); | ||
|
||
try { | ||
responseLatch.await(DOCUMENT_COUNT_SEARCH_TIMEOUT_S, TimeUnit.SECONDS); | ||
return count.get(); | ||
} catch (InterruptedException e) { | ||
logger.error("Interrupted Exception", e); | ||
return -1L; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why use the CountDownLatch here which is a blocking operation too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is blocking. It will timeout after 3 seconds. Will address blocking issue in a later PR.
} | ||
|
||
private class DocumentCountListener implements ActionListener<SearchResponse> { | ||
private AtomicReference<Long> count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't use AtomicLong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I will update to use AtomicLong
build.gradle
Outdated
@@ -246,6 +246,8 @@ List<String> jacocoExclusions = [ | |||
'com.amazon.opendistroforelasticsearch.ad.resthandler.RestGetAnomalyResultAction', | |||
'com.amazon.opendistroforelasticsearch.ad.metrics.MetricFactory', | |||
'com.amazon.opendistroforelasticsearch.ad.indices.AnomalyDetectionIndices', | |||
'com.amazon.opendistroforelasticsearch.ad.indices.AnomalyDetectionIndices.DocumentCountListener', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we have enough coverage on these 2 classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add coverage for these two.
public String getIndexHealthStatus(String indexName) { | ||
if (!clusterService.state().getRoutingTable().hasIndex(indexName)) { | ||
return "nonexistent"; | ||
public String getIndexHealthStatus(String indexOrAliasName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a general function. Could you put it to some util class so that other components can use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I will do.
searchRequest.source(sourceBuilder); | ||
|
||
Optional<SearchResponse> response = requestUtil.timedRequest(searchRequest, logger, client::search); | ||
final CountDownLatch responseLatch = new CountDownLatch(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a general function. Could you put it to some util class so that other components can use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will add it to IndexUtills class
public static final String RCF_TYPE_VALUE = "rcf"; | ||
public static final String THRESHOLD_TYPE_VALUE = "threshold"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using enum as it is a better approach than Strings. They are type safe and comparing them is faster than comparing Strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For only 2 values, do you think that this is worthwhile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will make the update!
public Map<String, ADStat<?>> getNodeStats() { | ||
Map<String, ADStat<?>> nodeStats = new HashMap<>(); | ||
|
||
for (Map.Entry<String, ADStat<?>> entry : stats.entrySet()) { | ||
if (!entry.getValue().isClusterLevel()) { | ||
nodeStats.put(entry.getKey(), entry.getValue()); | ||
} | ||
} | ||
return nodeStats; | ||
} | ||
|
||
/** | ||
* Get a map of the stats that are kept at the cluster level | ||
* | ||
* @return Map of stats kept at the cluster level | ||
*/ | ||
public Map<String, ADStat<?>> getClusterStats() { | ||
Map<String, ADStat<?>> clusterStats = new HashMap<>(); | ||
|
||
for (Map.Entry<String, ADStat<?>> entry : stats.entrySet()) { | ||
if (entry.getValue().isClusterLevel()) { | ||
clusterStats.put(entry.getKey(), entry.getValue()); | ||
} | ||
} | ||
return clusterStats; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a good idea to combine these two functions since they are only different in terms of the true/false of the same if condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will combine these two to remove redundant code.
if (indexMetaDataList.size() == 0) { | ||
return ALIAS_EXISTS_NO_INDICES_STATUS; | ||
} else { | ||
indexOrAliasName = indexMetaDataList.get(0).getIndex().getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if index at pos 1 is yellow and index at pos 0 is green? I meant why do we just consider index at pos 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only consider the index at position 0 because our aliases will only point to one index at a time. The indexMetaDataList for our purposes should never have a size greater than 1. I will add a comment to explain this and then add an else if condition that checks if the alias points to multiple indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general comments across this change. Otherwise, looks good overall.
All interfaces (public methods, fields, classes) should be documented. This is one opportunity/reason to design and select which should be visible to customers and which should be behind abstraction. Access to classes, methods, fields (public, private, package, protected) reflects the design.
Implementation-wise, I'd encourage using more stream api when appropriate.
* Get the ModelInformation for all hosted models | ||
* | ||
* @return modelsInformation list of model information for all hosted models | ||
* getAllModels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. Standard java doc is needed for readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make update thanks!
private Instant lastUsedTime; | ||
private Instant lastCheckpointTime; | ||
|
||
/** | ||
* Constructor. | ||
* | ||
* @param model ML model | ||
* @param modelId Id of Model this model partition is a part of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. Model id is the id of the partitioned model. There is no abstract/logical id for the collective model for a detector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it thanks I will update this
+ " and individual stats"); | ||
} else { | ||
Set<String> invalidStats = new TreeSet<>(); | ||
adStatsRequest.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. This doesn't seem needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, all stats are set to be retrieved. So, I needed to clear them here. However, after more consideration, I believe that I should set all stats to not be retrieved by default. Therefore, it is more clear about what stats are being retrieved.
/** | ||
* Initialize the map that keeps track of all of the stats | ||
*/ | ||
private void initStats() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style suggestion. I would treat this as configuration (data) rather than the initialization code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. I will modify this so that the map can be passed in and move the creation of this map to the createComponents method of anomalyDetectorPlugin.
|
||
void increment(); | ||
|
||
void add(long n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. Is add and reset needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they are not. I will remove.
/** | ||
* ADStatsNodeRequest to get a nodes stat | ||
*/ | ||
class ADStatsNodeRequest extends BaseNodeRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. This looks like a public class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix thanks.
|
||
public static final String ALL_STATS_KEY = "_all"; | ||
|
||
private Map<String, Boolean> statsRetrievalMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question. Why not using a set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A set does make more sense here. Will update thanks!
public class ADStatsTransportAction extends TransportNodesAction<ADStatsRequest, ADStatsResponse, | ||
ADStatsNodeRequest, ADStatsNodeResponse> { | ||
|
||
ADStats adStats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. Should this be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should. Good catch thanks!
@@ -207,16 +210,20 @@ protected void doExecute(Task task, ActionRequest actionRequest, ActionListener< | |||
AnomalyResultRequest request = AnomalyResultRequest.fromActionRequest(actionRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question. Is it possible to collect metrics using the listener so it's easier and doesn't miss? something in the line of overriding the onFailure in a wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that would make the code significantly cleaner and prevent misses. I will add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might go even further to replace the input listener with the wrapped listener.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is possible. doExecute is called outside the AD code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant using listener again to refer to the wrapped listener. It might be controversial but it reduces mistakes and simplifies overall for this case, something I would probably do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay yes I think that will reduce potential for mistakes and clarify code. Will make updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, because the original listener is called in the lambda, it is not possible to refactor in this way unless I create a copy of the original listener to use in the lambda. Do you think creating this copy is worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the original can be reused this way
final ActionListener original = listener
listener = wrap(original)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes sense. Will make change
sourceBuilder.size(0); | ||
searchRequest.source(sourceBuilder); | ||
Optional<SearchResponse> response = clientUtil.timedRequest(searchRequest, logger, client::search); | ||
return response.map(r -> r.getHits().getTotalHits().value).orElse(-1L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. The output value -1 is not consistent with output value 0. The behaviour needs to be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the index does not exist, I thought it makes sense to return 0 because there are 0 documents in a non-existent index. I thought it makes sense to return -1 on request failure because there are an unknown number of documents in the index. I will document this behavior.
@@ -26,13 +26,26 @@ | |||
private IndexUtils indexUtils; | |||
private String indexName; | |||
|
|||
private final String UNABLE_TO_RETRIEVE_HEALTH_MESSAGE = "unable to retrieve health"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. In general when something is used in output, it's part of interface and should be visible to public, unless the value really doesn't matter to any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will make update.
@@ -207,16 +210,20 @@ protected void doExecute(Task task, ActionRequest actionRequest, ActionListener< | |||
AnomalyResultRequest request = AnomalyResultRequest.fromActionRequest(actionRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might go even further to replace the input listener with the wrapped listener.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
for (Map.Entry<String, Boolean> entry : statsRetrievalMap.entrySet()) { | ||
out.writeBoolean(entry.getValue()); | ||
} | ||
out.writeStringCollection(validStats); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are validStats the same all the time? If yes, do we need to record them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I can take validStats out of here and move it to the RestStatsAnomalyDetectorAction. This way we do not have to pass it around.
#4
Adding stats collection and api to the anomaly detection plugin to give users ability to monitor plugin health and performance.
The following node/cluster level stats will be tracked:
Node Level Stats
Cluster Level Stats
User has the ability to retrieve statistics from select nodes and to retrieve only select statistics.
Sample call: