Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change PerFieldMapperCodec to use tsdb doc values codec for all fields. #105301

Merged
merged 10 commits into from
Feb 12, 2024

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Feb 8, 2024

The index needs to be in tsdb mode. All fields will use the tsdb coded, except fields start with a _ (not excluding _tsid).

Before this change relies on MapperService to check whether a field needed to use tsdb doc values codec, but we missed many field types (ip field type, scaled float field type, unsigned long field type, etc.). Instead we wanted to depend on the doc values type in FieldInfo, but that information is not available in PerFieldMapperCodec.

Borrowed the binary doc values implementation from Lucene90DocValuesFormat. This allows it to be used for any doc values field.

Maybe we can implement some simple compression here? But on the other hand binary doc values shouldn't be made complex, otherwise it defeats the purpose of binary doc values. None of that has been done yet in this change.

Binary doc values support is now added to the tsdb doc values codec. So that every field in a tsdb index can delegate to this codec. Only fields that start with _ are delegating the default doc values codec (except _tsid). The binary doc values implementation from LUCENE-9211 (apache/lucene-solr#1234) is used here, which stores the values in LZ4 compressed blocks, in order to reduce storage usage (in the Lucene default doc values coded binary doc values are stored without any compression).

Followup on #99747

The index needs to be in tsdb mode. Except fields start with a `_` (excluding `_tsid`).

Before this change relies on MapperService to check whether a field needed to use tsdb doc values codec, but we missed many field types (ip field type, scaled float field type, unsigned long field type, etc.). Instead we wanted to depend on the doc values type in FieldInfo, but that information is not available in PerFieldMapperCodec.

So instead binary doc values support was added to ES87TSDBDocValuesFormat. This allows it to be used for any doc values field. The code was copied from Lucene90DocValuesFormat.

Maybe we can some simple compression here? But on the other hand binary doc values shouldn't be made complex, otherwise it defeats the purpose of binary doc values. None of that has been done yet in this change.
@martijnvg martijnvg force-pushed the tsdb/doc_values_codec_improvement branch from b8fa415 to 7aff8cc Compare February 8, 2024 18:34
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simpler logic. We can always make the decision to compress binary doc values later on if we want. Just copying @iverase, who might have opinions since geo_shape fields use binary doc values internally.

@martijnvg martijnvg marked this pull request as ready for review February 9, 2024 19:07
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

this.addresses = addresses;
this.compressedData = compressedData;
// pre-allocate a byte array large enough for the biggest uncompressed block needed.
this.uncompressedBlock = new byte[biggestUncompressedBlockSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iverase I remember of you discussing humonguous allocations due to this sort of approach with very large fields such as geo-shapes, is there a better way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we had problems because of the size of this array in the past for geo_shape fields. It was in part for a not optimal implementation of the LatLonShapeDocValues query which did not implement the scorer supplier so we were allocating this array for each query.

Because do values for geo_shape can be big and we compress a fix amount of documents, the array can grow pretty big an cause problems, if we are reading doc values very often.

I wonder if we can build those blocks not only considering a fix number of documents but a maximum size so we keep the size of that array under some sensitive size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ignacio, I think encoding the number of documents would be helpful here in order to avoid accidentally reading GBs of data. So that we can limit number of uncompressed we write into block.

However I currently don't have time to implement this before tomorrow. And I like to get to change in as it allows use to consistently use the tsdb doc values format for all fields in a tsdb index. Without this change scaling float field, unsigned integers and others don't use the tsdb doc values codec and that is a waste.

I will try to implement the compression in a follow up later. Downside is that we have to add versioning logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine, I don't think geo_shape is an important field type for tsdb indices so it can wait.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, binary doc values isn't that much used for tsdb indices. Mainly range*, percolator, geo shape and, counted keyword field types.

@martijnvg
Copy link
Member Author

martijnvg commented Feb 9, 2024

It looks like enabling the tsdb doc values codec for all field type saves another ~21MB in total storage usage.

Field by field analysis
field name baseline (main) contender (this pr) diff
kubernetes.container.memory.usage.node.pct 3.7mb 1.1mb -70.89%
kubernetes.container.memory.usage.limit.pct 6.2mb 2mb -67.52%
kubernetes.pod.memory.usage.node.pct 5.2mb 1.7mb -66.56%
kubernetes.pod.memory.usage.limit.pct 7.6mb 2.6mb -65.15%
kubernetes.container.cpu.usage.node.pct 4.7mb 2.5mb -45.46%
kubernetes.pod.cpu.usage.node.pct 5.7mb 3.1mb -44.37%
kubernetes.container.cpu.usage.limit.pct 6.2mb 3.5mb -43.31%
kubernetes.pod.cpu.usage.limit.pct 7.1mb 4.1mb -42.66%
kubernetes.container.status.ready 16.1mb 14.8mb -7.73%
kubernetes.event.message 5.6mb 5.5mb -0.46%
@timestamp 650.7mb 650.7mb 0.00%
_field_names 17.3kb 17.3kb 0.00%
_id 564.4mb 564.4mb 0.00%
_seq_no 950.6mb 950.6mb 0.00%
_tsid 9.8mb 9.8mb 0.00%
agent.type 1mb 1mb 0.00%
agent.version 1mb 1mb 0.00%
container.id 16.8mb 16.8mb 0.00%
container.runtime 3.3mb 3.3mb 0.00%
ecs.version 1mb 1mb 0.00%
event.dataset 7.6mb 7.6mb 0.00%
event.duration 1.1gb 1.1gb 0.00%
event.module 1mb 1mb 0.00%
fields.cluster 1mb 1mb 0.00%
kubernetes.container.cpu.limit.cores 12.7mb 12.7mb 0.00%
kubernetes.container.cpu.request.cores 13.5mb 13.5mb 0.00%
kubernetes.container.cpu.usage.core.ns 19.9mb 19.9mb 0.00%
kubernetes.container.cpu.usage.nanocores 16.5mb 16.5mb 0.00%
kubernetes.container.id 16.6mb 16.6mb 0.00%
kubernetes.container.image 8.5mb 8.5mb 0.00%
kubernetes.container.logs.available.bytes 7.2mb 7.2mb 0.00%
kubernetes.container.logs.capacity.bytes 920.1kb 920.1kb 0.00%
kubernetes.container.logs.inodes.count 841.3kb 841.3kb 0.00%
kubernetes.container.logs.inodes.free 3.4mb 3.4mb 0.00%
kubernetes.container.logs.inodes.used 3.4mb 3.4mb 0.00%
kubernetes.container.logs.used.bytes 1.6mb 1.6mb 0.00%
kubernetes.container.memory.available.bytes 4mb 4mb 0.00%
kubernetes.container.memory.limit.bytes 13.3mb 13.3mb 0.00%
kubernetes.container.memory.majorpagefaults 743.8kb 743.8kb 0.00%
kubernetes.container.memory.pagefaults 3.9mb 3.9mb 0.00%
kubernetes.container.memory.request.bytes 13.4mb 13.4mb 0.00%
kubernetes.container.memory.rss.bytes 4.6mb 4.6mb 0.00%
kubernetes.container.memory.usage.bytes 4.9mb 4.9mb 0.00%
kubernetes.container.memory.workingset.bytes 4.8mb 4.8mb 0.00%
kubernetes.container.name 8.8mb 8.8mb 0.00%
kubernetes.container.rootfs.available.bytes 7.2mb 7.2mb 0.00%
kubernetes.container.rootfs.capacity.bytes 920.1kb 920.1kb 0.00%
kubernetes.container.rootfs.inodes.used 814.8kb 814.8kb 0.00%
kubernetes.container.rootfs.used.bytes 1.1mb 1.1mb 0.00%
kubernetes.container.start_time 2.7mb 2.7mb 0.00%
kubernetes.container.status.phase 7.5mb 7.5mb 0.00%
kubernetes.container.status.reason 12.6mb 12.6mb 0.00%
kubernetes.container.status.restarts 10.4mb 10.4mb 0.00%
kubernetes.event.count 93.2kb 93.2kb 0.00%
kubernetes.event.involved_object.api_version 445.1kb 445.1kb 0.00%
kubernetes.event.involved_object.kind 543.3kb 543.3kb 0.00%
kubernetes.event.involved_object.name 2.2mb 2.2mb 0.00%
kubernetes.event.involved_object.resource_version 2.2mb 2.2mb 0.00%
kubernetes.event.involved_object.uid 4.8mb 4.8mb 0.00%
kubernetes.event.metadata.generate_name 34.1kb 34.1kb 0.00%
kubernetes.event.metadata.name 3mb 3mb 0.00%
kubernetes.event.metadata.namespace 320.8kb 320.8kb 0.00%
kubernetes.event.metadata.resource_version 3mb 3mb 0.00%
kubernetes.event.metadata.self_link 3mb 3mb 0.00%
kubernetes.event.metadata.timestamp.created 3.5mb 3.5mb 0.00%
kubernetes.event.metadata.uid 9.1mb 9.1mb 0.00%
kubernetes.event.reason 1mb 1mb 0.00%
kubernetes.event.timestamp.first_occurrence 3.4mb 3.4mb 0.00%
kubernetes.event.timestamp.last_occurrence 3.6mb 3.6mb 0.00%
kubernetes.event.type 92.9kb 92.9kb 0.00%
kubernetes.labels.app 13.4mb 13.4mb 0.00%
kubernetes.labels.app_kubernetes_io/component 100.3kb 100.3kb 0.00%
kubernetes.labels.app_kubernetes_io/instance 1.3mb 1.3mb 0.00%
kubernetes.labels.app_kubernetes_io/managed-by 92.7kb 92.7kb 0.00%
kubernetes.labels.app_kubernetes_io/name 1.3mb 1.3mb 0.00%
kubernetes.labels.beta_kubernetes_io/arch 48.7kb 48.7kb 0.00%
kubernetes.labels.beta_kubernetes_io/instance-type 72.2kb 72.2kb 0.00%
kubernetes.labels.beta_kubernetes_io/os 48.7kb 48.7kb 0.00%
kubernetes.labels.chart 1mb 1mb 0.00%
kubernetes.labels.component 2mb 2mb 0.00%
kubernetes.labels.controller-revision-hash 5.9mb 5.9mb 0.00%
kubernetes.labels.controller-uid 22.7mb 22.7mb 0.00%
kubernetes.labels.github_account 7.6mb 7.6mb 0.00%
kubernetes.labels.helm_sh/chart 101.2kb 101.2kb 0.00%
kubernetes.labels.heritage 783.5kb 783.5kb 0.00%
kubernetes.labels.io_kompose_service 103.2kb 103.2kb 0.00%
kubernetes.labels.job-name 21.5mb 21.5mb 0.00%
kubernetes.labels.k8s-app 681.4kb 681.4kb 0.00%
kubernetes.labels.kubernetes_io/arch 48.7kb 48.7kb 0.00%
kubernetes.labels.kubernetes_io/hostname 90.3kb 90.3kb 0.00%
kubernetes.labels.kubernetes_io/os 48.7kb 48.7kb 0.00%
kubernetes.labels.logtype 6.4mb 6.4mb 0.00%
kubernetes.labels.name 131.6kb 131.6kb 0.00%
kubernetes.labels.pod-template-generation 4.5mb 4.5mb 0.00%
kubernetes.labels.pod-template-hash 8.7mb 8.7mb 0.00%
kubernetes.labels.release 7.3mb 7.3mb 0.00%
kubernetes.labels.statefulset_kubernetes_io/pod-name 1.6mb 1.6mb 0.00%
kubernetes.labels.tier 2mb 2mb 0.00%
kubernetes.namespace 9.9mb 9.9mb 0.00%
kubernetes.node.cpu.allocatable.cores 138.7kb 138.7kb 0.00%
kubernetes.node.cpu.capacity.cores 125.7kb 125.7kb 0.00%
kubernetes.node.cpu.usage.core.ns 2.2mb 2.2mb 0.00%
kubernetes.node.cpu.usage.nanocores 1.7mb 1.7mb 0.00%
kubernetes.node.fs.available.bytes 653.9kb 653.9kb 0.00%
kubernetes.node.fs.capacity.bytes 139kb 139kb 0.00%
kubernetes.node.fs.inodes.count 132.3kb 132.3kb 0.00%
kubernetes.node.fs.inodes.free 345.3kb 345.3kb 0.00%
kubernetes.node.fs.inodes.used 344.2kb 344.2kb 0.00%
kubernetes.node.fs.used.bytes 655.6kb 655.6kb 0.00%
kubernetes.node.memory.allocatable.bytes 142.5kb 142.5kb 0.00%
kubernetes.node.memory.available.bytes 921.8kb 921.8kb 0.00%
kubernetes.node.memory.capacity.bytes 142.3kb 142.3kb 0.00%
kubernetes.node.memory.majorpagefaults 129.2kb 129.2kb 0.00%
kubernetes.node.memory.pagefaults 258.9kb 258.9kb 0.00%
kubernetes.node.memory.rss.bytes 915.2kb 915.2kb 0.00%
kubernetes.node.memory.usage.bytes 933.5kb 933.5kb 0.00%
kubernetes.node.memory.workingset.bytes 921.3kb 921.3kb 0.00%
kubernetes.node.name 11.1mb 11.1mb 0.00%
kubernetes.node.network.rx.bytes 1.5mb 1.5mb 0.00%
kubernetes.node.network.rx.errors 119kb 119kb 0.00%
kubernetes.node.network.tx.bytes 1.4mb 1.4mb 0.00%
kubernetes.node.network.tx.errors 119kb 119kb 0.00%
kubernetes.node.pod.allocatable.total 125.5kb 125.5kb 0.00%
kubernetes.node.pod.capacity.total 125.5kb 125.5kb 0.00%
kubernetes.node.runtime.imagefs.available.bytes 653.9kb 653.9kb 0.00%
kubernetes.node.runtime.imagefs.capacity.bytes 139kb 139kb 0.00%
kubernetes.node.runtime.imagefs.used.bytes 174.2kb 174.2kb 0.00%
kubernetes.node.start_time 239.3kb 239.3kb 0.00%
kubernetes.node.status.ready 153.4kb 153.4kb 0.00%
kubernetes.pod.cpu.usage.nanocores 15.8mb 15.8mb 0.00%
kubernetes.pod.host_ip 2.5mb 2.5mb 0.00%
kubernetes.pod.ip 6.1mb 6.1mb 0.00%
kubernetes.pod.memory.available.bytes 4.8mb 4.8mb 0.00%
kubernetes.pod.memory.major_page_faults 1.4mb 1.4mb 0.00%
kubernetes.pod.memory.page_faults 4.6mb 4.6mb 0.00%
kubernetes.pod.memory.rss.bytes 5.3mb 5.3mb 0.00%
kubernetes.pod.memory.usage.bytes 5.6mb 5.6mb 0.00%
kubernetes.pod.memory.working_set.bytes 5.6mb 5.6mb 0.00%
kubernetes.pod.name 20.3mb 20.3mb 0.00%
kubernetes.pod.network.rx.bytes 11.4mb 11.4mb 0.00%
kubernetes.pod.network.rx.errors 1.3mb 1.3mb 0.00%
kubernetes.pod.network.tx.bytes 11.1mb 11.1mb 0.00%
kubernetes.pod.network.tx.errors 1.3mb 1.3mb 0.00%
kubernetes.pod.start_time 2.9mb 2.9mb 0.00%
kubernetes.pod.status.phase 2.1mb 2.1mb 0.00%
kubernetes.pod.status.ready 2mb 2mb 0.00%
kubernetes.pod.status.scheduled 1.8mb 1.8mb 0.00%
kubernetes.pod.uid 20.1mb 20.1mb 0.00%
kubernetes.replicaset.name 8.7mb 8.7mb 0.00%
kubernetes.statefulset.name 1.6mb 1.6mb 0.00%
kubernetes.system.container 120kb 120kb 0.00%
kubernetes.system.cpu.usage.core.ns 5.7mb 5.7mb 0.00%
kubernetes.system.cpu.usage.nanocores 4.9mb 4.9mb 0.00%
kubernetes.system.memory.majorpagefaults 54.6kb 54.6kb 0.00%
kubernetes.system.memory.pagefaults 815kb 815kb 0.00%
kubernetes.system.memory.rss.bytes 1.7mb 1.7mb 0.00%
kubernetes.system.memory.usage.bytes 1.8mb 1.8mb 0.00%
kubernetes.system.memory.workingset.bytes 1.8mb 1.8mb 0.00%
kubernetes.system.start_time 424.2kb 424.2kb 0.00%
kubernetes.volume.fs.available.bytes 1.6mb 1.6mb 0.00%
kubernetes.volume.fs.capacity.bytes 1mb 1mb 0.00%
kubernetes.volume.fs.inodes.count 610.7kb 610.7kb 0.00%
kubernetes.volume.fs.inodes.free 907.2kb 907.2kb 0.00%
kubernetes.volume.fs.inodes.used 374.5kb 374.5kb 0.00%
kubernetes.volume.fs.used.bytes 725.8kb 725.8kb 0.00%
kubernetes.volume.name 1.2mb 1.2mb 0.00%
metricset.name 8.5mb 8.5mb 0.00%
metricset.period 34mb 34mb 0.00%
service.address 6.7mb 6.7mb 0.00%
service.type 1mb 1mb 0.00%
agent.ephemeral_id 150.9mb 152.4mb 1.03%
host.name 147.1mb 148.7mb 1.06%
agent.hostname 147.1mb 148.7mb 1.06%
agent.id 147mb 148.6mb 1.06%
kubernetes.node.status.unschedulable 133kb 136.3kb 2.46%

@martijnvg martijnvg requested a review from a team as a code owner February 12, 2024 17:18
@martijnvg martijnvg force-pushed the tsdb/doc_values_codec_improvement branch from adaff85 to 7c278c7 Compare February 12, 2024 17:20
@martijnvg martijnvg removed the request for review from a team February 12, 2024 17:21
@martijnvg martijnvg requested review from jpountz and iverase February 12, 2024 17:34
@martijnvg martijnvg merged commit 6a6fba6 into elastic:main Feb 12, 2024
14 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 12, 2024
The binary doc values implementation from LUCENE-9211 (apache/lucene-solr#1234) is used here, which stores the values in LZ4 compressed blocks, in order to reduce storage usage (in the Lucene default doc values coded binary doc values are stored without any compression).

Follow up from elastic#105301
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants