Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use long in Centroid count #99491

Merged
merged 7 commits into from
Sep 13, 2023
Merged

Use long in Centroid count #99491

merged 7 commits into from
Sep 13, 2023

Conversation

kkrik-es
Copy link
Contributor

Centroids currently use integers to track how many samples their mean tracks. This can overflow in case the digest tracks billions of samples or more.

TDigestState already serializes the count as VLong, so it can be read as VLong instead of Vint without compatibility issues.

Fixes #80153

Centroids currently use integers to track how many samples their mean
tracks. This can overflow in case the digest tracks billions of samples
or more.

TDigestState already serializes the count as VLong, so it can be read as
VInt without compatibility issues.

Fixes elastic#80153
@kkrik-es kkrik-es added >bug :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Sep 12, 2023
@kkrik-es kkrik-es self-assigned this Sep 12, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

@kkrik-es kkrik-es marked this pull request as ready for review September 13, 2023 10:23
@kkrik-es kkrik-es requested a review from martijnvg September 13, 2023 10:23
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/docs

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@martijnvg
Copy link
Member

Should this also be backported to the 8.10 branch?

@kkrik-es kkrik-es added v8.10.1 auto-backport Automatically create backport pull requests when merged labels Sep 13, 2023
@kkrik-es kkrik-es merged commit 0247cfe into elastic:main Sep 13, 2023
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.10

kkrik-es added a commit to kkrik-es/elasticsearch that referenced this pull request Sep 13, 2023
* Use long in Centroid count

Centroids currently use integers to track how many samples their mean
tracks. This can overflow in case the digest tracks billions of samples
or more.

TDigestState already serializes the count as VLong, so it can be read as
VInt without compatibility issues.

Fixes elastic#80153

* Update docs/changelog/99491.yaml

* More test fixes

* Bump TransportVersion

* Revert TransportVersion change
elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2023
* Use long in Centroid count

Centroids currently use integers to track how many samples their mean
tracks. This can overflow in case the digest tracks billions of samples
or more.

TDigestState already serializes the count as VLong, so it can be read as
VInt without compatibility issues.

Fixes #80153

* Update docs/changelog/99491.yaml

* More test fixes

* Bump TransportVersion

* Revert TransportVersion change
@kkrik-es kkrik-es deleted the fix/80153 branch September 14, 2023 11:18
kkrik-es added a commit to kkrik-es/elasticsearch that referenced this pull request Sep 26, 2023
Histograms currently use integers to store the count of each value,
which can overflow. Switch to using long integers to avoid this.

TDigestState was updated to use long for centroid value count in elastic#99491

Fixes elastic#99820
kkrik-es added a commit that referenced this pull request Sep 29, 2023
* Represent histogram value count as long

Histograms currently use integers to store the count of each value,
which can overflow. Switch to using long integers to avoid this.

TDigestState was updated to use long for centroid value count in #99491

Fixes #99820

* Update docs/changelog/99912.yaml

* spotless fix
piergm pushed a commit to piergm/elasticsearch that referenced this pull request Oct 2, 2023
* Represent histogram value count as long

Histograms currently use integers to store the count of each value,
which can overflow. Switch to using long integers to avoid this.

TDigestState was updated to use long for centroid value count in elastic#99491

Fixes elastic#99820

* Update docs/changelog/99912.yaml

* spotless fix
jakelandis pushed a commit to jakelandis/elasticsearch that referenced this pull request Oct 2, 2023
* Represent histogram value count as long

Histograms currently use integers to store the count of each value,
which can overflow. Switch to using long integers to avoid this.

TDigestState was updated to use long for centroid value count in elastic#99491

Fixes elastic#99820

* Update docs/changelog/99912.yaml

* spotless fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.10.1 v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Histogram fields should not accept negative centroid counts
3 participants