-
Notifications
You must be signed in to change notification settings - Fork 36
Queries data from the index when insufficient data in buffer to form a full shingle #176
Changes from 1 commit
f87ee3d
5793db2
2a071b6
d052f05
cb13815
1680fcc
8e1b4fa
843ac79
cf9efeb
636f293
937a742
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,7 +54,7 @@ public class FeatureManager { | |
private static final Logger logger = LogManager.getLogger(FeatureManager.class); | ||
|
||
// Each anomaly detector has a queue of data points with timestamps (in epoch milliseconds). | ||
private Map<String, ArrayDeque<Entry<Long, Optional<double[]>>>> detectorIdsToTimeShingles; | ||
private final Map<String, ArrayDeque<Entry<Long, Optional<double[]>>>> detectorIdsToTimeShingles; | ||
|
||
private final SearchFeatureDao searchFeatureDao; | ||
private final Interpolator interpolator; | ||
|
@@ -175,7 +175,11 @@ private void updateUnprocessedFeatures( | |
.mapToObj(time -> featuresMap.get(time)) | ||
.forEach(e -> shingle.add(e)); | ||
|
||
getProcessedFeatures(shingle, detector, endTime, listener); | ||
if (featuresMap.containsKey(endTime)) { | ||
getProcessedFeatures(shingle, detector, endTime, listener); | ||
} else { | ||
listener.onResponse(new SinglePointFeatures(Optional.empty(), Optional.empty())); | ||
} | ||
} | ||
|
||
private void getProcessedFeatures( | ||
|
@@ -184,19 +188,14 @@ private void getProcessedFeatures( | |
long endTime, | ||
ActionListener<SinglePointFeatures> listener | ||
) { | ||
if (shingle.isEmpty() || shingle.getLast().getKey() < endTime || !shingle.getLast().getValue().isPresent()) { | ||
listener.onResponse(new SinglePointFeatures(Optional.empty(), Optional.empty())); | ||
} else { | ||
double[][] currentPoints = filterAndFill(shingle, endTime, detector); | ||
Optional<double[]> currentPoint = shingle.peekLast().getValue(); | ||
listener | ||
.onResponse( | ||
Optional | ||
.ofNullable(currentPoints) | ||
.map(points -> new SinglePointFeatures(currentPoint, Optional.of(batchShingle(points, shingleSize)[0]))) | ||
.orElse(new SinglePointFeatures(currentPoint, Optional.empty())) | ||
); | ||
} | ||
Optional<double[]> currentPoint = shingle.peekLast().getValue(); | ||
listener | ||
.onResponse( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The original code is easier to read. Also, you do "currentPoint.map(point -> filterAndFill(shingle, endTime, detector))" without using currentPoint in filterAndFill, which looks strange. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change was made to address the comments above from wnbts to simplify the code here to not use if else to ensure The reason the original code didn't need to check that was because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another unexpected change in behavior that this is avoiding is running There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code was modified to no longer have an unused param There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, thanks for the change. |
||
currentPoint | ||
.map(point -> filterAndFill(shingle, endTime, detector)) | ||
.map(points -> new SinglePointFeatures(currentPoint, Optional.of(batchShingle(points, shingleSize)[0]))) | ||
.orElse(new SinglePointFeatures(currentPoint, Optional.empty())) | ||
); | ||
} | ||
|
||
private double[][] filterAndFill(Deque<Entry<Long, Optional<double[]>>> shingle, long endTime, AnomalyDetector detector) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor. this first filter might not be needed since the map should contains results for all interval, present or absent. to be safe in the unlikely case that map is incomplete, the second get can return an empty if the key is absent using getOrDefault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on how we want to handle this case: https://github.com/opendistro-for-elasticsearch/anomaly-detection/blob/master/src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/SearchFeatureDao.java#L211-L214. This is the case that could cause some of the times to be missing from
featuresMap
at this line.Is
aggs == null
in the search response equivalent to each queried time range having no data? In other words, do we want to treatCollections.emptyList()
equivalent to[Optional.empty(), Optional.empty(), ..., Optional.empty()]
? In the case ofCollections.emptyList()
, do we want to cache the null values for each of the queried time ranges so they will not get re-queried?If yes, it might be cleaner to handle this logic in the callback passed to searchFeatureDao, so that the logic for determining shingle values based on the search response is located in one function rather than spread across multiple functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i haven't seen or know that happens in practice. if that ever happens, caching the results is ok. what i was suggesting is just simplify the implementation to one line like stream.maptoobject(time -> map.getOrDefault(time, <time, empty()>)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I will change the implementation to cache results for the missing ranges when the response is an empty list.