-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a TSID global ordinal to TimeSeriesIndexSearcher #90035
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Hi @romseygeek, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I left some questions.
@@ -42,7 +42,7 @@ public TimeSeriesAggregator( | |||
CardinalityUpperBound bucketCardinality, | |||
Map<String, Object> metadata | |||
) throws IOException { | |||
super(name, factories, context, parent, bucketCardinality, metadata); | |||
super(name, factories, context, parent, CardinalityUpperBound.MANY, metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this hardcoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be MANY
actually. Though this isn't related and I should have caught it earlier. MANY
here means "my children will an unbounded number of buckets". It's normal to do stuff like bucketCardinality.multiply(filters.size())
if, say, you were filters and knew precisely how many buckets you'd make. But we never do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually an unrelated change - I'll back it out. It's necessary for the rate agg to work but I'm not sure that it's the correct way to fix things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or I can leave it in if Nik prefers it that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I believe it's correct. I don't care if you keep it or take it out and do it in the rate agg.
@@ -21,18 +22,21 @@ | |||
*/ | |||
public class AggregationExecutionContext { | |||
|
|||
private final Supplier<BytesRef> tsidProvider; | |||
private final Supplier<Long> timestampProvider; | |||
private final Supplier<BytesRef> tsidProvider; // TODO remove this entirely? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can do this. Maybe in a followup when at the same time adjusting the users of this provider (time_series aggregation and rollup)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
@@ -65,6 +67,7 @@ public void search(Query query, BucketCollector bucketCollector) throws IOExcept | |||
int seen = 0; | |||
query = searcher.rewrite(query); | |||
Weight weight = searcher.createWeight(query, bucketCollector.scoreMode(), 1); | |||
AtomicInteger tsidOrd = new AtomicInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this has to be an AtomicInteger, because there is one thread here that does the time series search, right? It is just convenient to use AtomicInteger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just scanning, it looks like it can be an int
. But I might be missing something. FWIW I prefer new int[1]
if I need a mutable int box because it doesn't make the reader think "oh god, there are multiple threads now"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's just convenience. It could be a long[]
returning [0]
each time if you think the locking and stuff will slow things down unnecessarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes it can be int
and not long
of course because we're dealing with single indexes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh, sorry, does it need long
because it's ordinals across all leaves? it's not likely to get that big, but global ordinals a long
for paranoia.
I don't think the locking will slow things down. it's uncontended so it'll get zapped. I think. I just think it's easier to read as an int
because I never have to wonder where the threads are. Could you do () -> tsidOrd
as the supplier and keep it as just int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even across all leaves maxDoc
is an int so ordinals will only ever be int
as well. A docvalues ordinal would be a long
because you could have multiple values per doc, but we will only ever have at most one tsid per document.
We can't do () -> tsidOrd
because it needs to be 'effectively final' so it would have to be an array reference. Which is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on int
- we know that we have at most one tsid per doc.
Array reference is better for me. It's more code but I can read it faster....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also lean toward using new int[1]
here over AtomicInteger
.
@@ -21,18 +22,21 @@ | |||
*/ | |||
public class AggregationExecutionContext { | |||
|
|||
private final Supplier<BytesRef> tsidProvider; | |||
private final Supplier<Long> timestampProvider; | |||
private final Supplier<BytesRef> tsidProvider; // TODO remove this entirely? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
@@ -44,6 +48,10 @@ public BytesRef getTsid() { | |||
} | |||
|
|||
public Long getTimestamp() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be long
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@@ -42,7 +42,7 @@ public TimeSeriesAggregator( | |||
CardinalityUpperBound bucketCardinality, | |||
Map<String, Object> metadata | |||
) throws IOException { | |||
super(name, factories, context, parent, bucketCardinality, metadata); | |||
super(name, factories, context, parent, CardinalityUpperBound.MANY, metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be MANY
actually. Though this isn't related and I should have caught it earlier. MANY
here means "my children will an unbounded number of buckets". It's normal to do stuff like bucketCardinality.multiply(filters.size())
if, say, you were filters and knew precisely how many buckets you'd make. But we never do.
@@ -65,6 +67,7 @@ public void search(Query query, BucketCollector bucketCollector) throws IOExcept | |||
int seen = 0; | |||
query = searcher.rewrite(query); | |||
Weight weight = searcher.createWeight(query, bucketCollector.scoreMode(), 1); | |||
AtomicInteger tsidOrd = new AtomicInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just scanning, it looks like it can be an int
. But I might be missing something. FWIW I prefer new int[1]
if I need a mutable int box because it doesn't make the reader think "oh god, there are multiple threads now"
@elasticmachine run elasticsearch-ci/part-3 |
This is ready for another round |
Rather than trying to compare BytesRefs in tsdb-related aggregations, it
will be much quicker if we can use a search-global ordinal to detect when
we have moved to a new TSID. This commit adds such an ordinal to the
aggregation execution context.