-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for subobjects: false
#349
Comments
subobject: false
subobjects: false
Having
We don't need to necessarily have all these discussions here but worth referencing. I'm excited about the potential future opportunities this offers. (@joshdover ) |
This is great, and I think that in many of the cases where we would take more advantage of this, basically where we dedot now, the field is already flattened, because we send it as is read from origin. I think we could add support for this, and introduce it eventually in a case per case basis.
We can add a check for this in elastic-package tests, as we have for other field values. In a more general sense, do we know if storing everything flattened would have other advantages? as disk space savings or ingestion performance? |
This is indeed interesting, so that would also simply all the Schema in metricbeat. |
I'd like to just solidify my understanding of the relation between the package spec and component templates generated by Fleet. The e.g. The Show code blocks# agent.yml
- name: host
title: Host
group: 2
description: 'A host is defined as a general computing instance.
ECS host.* fields should be populated with details about the host on which the event happened, or from which the measurement was taken. Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes.'
type: group
fields:
- name: architecture
level: core
type: keyword
ignore_above: 1024
description: Operating system architecture.
example: x86_64
- name: domain
level: extended
type: keyword
ignore_above: 1024
description: 'Name of the domain of which the host is a member.
For example, on Windows this could be the host''s Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host''s LDAP provider.'
example: CONTOSO
default_field: false
- name: hostname
level: core
type: keyword
ignore_above: 1024
description: 'Hostname of the host.
It normally contains what the `hostname` command returns on the host machine.'
- name: id
level: core
type: keyword
ignore_above: 1024
description: 'Unique host id.
As hostname is not always unique, use values that are meaningful in your environment.
Example: The current usage of `beat.name`.'
- name: ip
level: core
type: ip
description: Host ip addresses.
- name: mac
level: core
type: keyword
ignore_above: 1024
description: Host mac addresses.
- name: name
level: core
type: keyword
ignore_above: 1024
description: 'Name of the host.
It can contain what `hostname` returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use.'
- name: os.family
level: extended
type: keyword
ignore_above: 1024
description: OS family (such as redhat, debian, freebsd, windows).
example: debian
- name: os.kernel
level: extended
type: keyword
ignore_above: 1024
description: Operating system kernel version as a raw string.
example: 4.4.0-112-generic
- name: os.name
level: extended
type: keyword
ignore_above: 1024
multi_fields:
- name: text
type: text
norms: false
default_field: false
description: Operating system name, without the version.
example: Mac OS X
- name: os.platform
level: extended
type: keyword
ignore_above: 1024
description: Operating system platform (such centos, ubuntu, windows).
example: darwin
- name: os.version
level: extended
type: keyword
ignore_above: 1024
description: Operating system version as a raw string.
example: 10.14.1
- name: type
level: core
type: keyword
ignore_above: 1024
description: 'Type of host.
For Cloud providers this can be the machine type like `t2.medium`. If vm, this could be the container, for example, or other information meaningful in your environment.'
- name: containerized
type: boolean
description: >
If the host is a container.
- name: os.build
type: keyword
example: "18D109"
description: >
OS build information.
- name: os.codename
type: keyword
example: "stretch"
description: >
OS codename, if any. // logs-nginx.access@package component template mappings
"host": {
"properties": {
"hostname": {
"ignore_above": 1024,
"type": "keyword"
},
"os": {
"properties": {
"build": {
"ignore_above": 1024,
"type": "keyword"
},
"kernel": {
"ignore_above": 1024,
"type": "keyword"
},
"codename": {
"ignore_above": 1024,
"type": "keyword"
},
"name": {
"ignore_above": 1024,
"type": "keyword",
"fields": {
"text": {
"type": "text"
}
}
},
"family": {
"ignore_above": 1024,
"type": "keyword"
},
"version": {
"ignore_above": 1024,
"type": "keyword"
},
"platform": {
"ignore_above": 1024,
"type": "keyword"
}
}
},
"domain": {
"ignore_above": 1024,
"type": "keyword"
},
"ip": {
"type": "ip"
},
"containerized": {
"type": "boolean"
},
"name": {
"ignore_above": 1024,
"type": "keyword"
},
"id": {
"ignore_above": 1024,
"type": "keyword"
},
"type": {
"ignore_above": 1024,
"type": "keyword"
},
"mac": {
"ignore_above": 1024,
"type": "keyword"
},
"architecture": {
"ignore_above": 1024,
"type": "keyword"
}
}
} The suggestion here to flatten all fields generated by Fleet on these component templates would alter our logic here to convert the // subset of mappings shown for brevity's sake
{
"host": {
"type": "object",
"subobjects": false,
"properties": {
"hostname": {
"ignore_above": 1034,
"type": "keyword"
},
"os.build": {
"ignore_above": 1024,
"type": "keyword"
},
"os.kernel": {
"ignore_above": 1024,
"type": "keyword"
},
"os.codename": {
"ignore_above": 1024,
"type": "keyword"
},
"os.name": {
"ignore_above": 1024,
"type": "keyword",
"fields": {
"text": {
"type": "text"
}
}
}
}
}
} Is my understanding of how I understand the flattening here being valuable for preventing collisions between fields like I'd echo @jsoriano's question above as well, just out of general curiosity:
|
@kpollich The initial idea I was proposing above was not directly related to
But you bring up additional interesting points. |
I think the question here is taking the current solution to resolve the original collision problem to the next level, and apply it more broadly. We are reasoning about how useful intermediate objects are in the mappings and whether it would not be simpler to map leaf fields only using the dot notation.
No data on this, I believe what we are after is simplifying mappings, but @ruflin can correct me. It would have an impact on the total number of fields, because objects would no longer be mapped. |
++, focus on simplification. It could have a positive side effect on # of fields mapped but I would expect it to be very minimal. |
I agree on simplification of the fields.yml files, and and on applying this to solve the collision problem. But flattening everything we ship opposes to what we have been doing so far, in general we are sending unflattened data. This can be a big change in beats and integrations. I was asking to see if there are more advantages to help justifying this effort. |
If Elasticsearch is automatically handling the flattening of objects, then there's no work that needs to be done in Beats/integrations and using |
I believe this is not correct. Or maybe I am referring to a problem that I saw on ECS that is not present in the mappings you are referring to. Elasticsearch is considering automatically flattening objects in incoming documents when subobjects is set to false, but some mappings do send objects and that will not be accepted. If you want to set subobjects to false, then the mappings should be flattened. From previous conversation, this change can be implemented already today: intermediate objects don't need to be mapped, you can already use the dot notation which Elasticsearch will expand to objects automatically, which it will no longer do as soon you switch to subobjects: false. Ping me if you have questions around this. |
Right, good point. The mappings potentially need some changes but most importantly, the documents and queries don't, right?
There's one instance where this isn't the case, though. Not sure if that's a bug or a feature. If Example: {
"template": {
"mappings": {
"dynamic_templates": [
{
"log_level": {
"path_match": "log.level",
"mapping": { "type": "keyword" }
}
}
],
"dynamic": "runtime",
"properties": {
"log": {
"type": "object"
}
}
}
}
} This doesn't work without explicitly mapping |
@felixbarny dynamic:runtime maps every unknown field as a runtime field, hence within the runtime section. The runtime section does not support objects, and the behaviour will be then very similar to setting subobjects false to the root then. In this example you'd like to have everything mapped as runtime besides log.level, do I understand correctly? Can you share more info, like what exception on ingest you get, maybe better on slack as this is unrelated to this issue? In short, though, that objects are not dynamically not mapped is a feature in this case, I would like to dig further on why the object creation is necessary to make path_match work. Ping me, please? |
During 8.10, we're working on removing all known blockers for From my perspective, these are the concrete things that need to be implemented:
In the future, we'll want to change the default for many of the data stream types so that For more background on why we're adopting |
@felixbarny looking at our current roadmap there is no chance that we will be on time for 8.10, I am more than happy to find someone in our team that will help review this contribution. |
We're currently discussing whether this can be done directly in Elasticsearch. That would also help if users have added custom mappings in the |
I skimmed through the existing field definitions with Here are a few recurring cases: - name: docker.container.labels.*
type: object
release: ga
description: |
Container labels
- name: labels
level: extended
type: object
object_type: keyword
description: Image labels.
- name: forgerock.request.detail.*
type: object
object_type: keyword
object_type_mapping_type: '*'
description: Details around the response status.
- name: input
type: object
- name: payload
type: object
enabled: false
- name: labels.*
type: object
description: |
Image labels.
- name: percpu
type: object
description: |
CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup. From these cases, I tried to identify mapping that would use the Here are four cases with field definition and expected mapping I am currently focusing on. I am using these cases to write the tests. Case ADefinition - name: prometheus.a.labels
type: object
subobjects: false Mapping {
"properties": {
"prometheus": {
"properties": {
"a": {
"properties": {
"labels": {
"type": "object",
"subobjects": false
}
}
}
}
}
}
} Case BDefinition - name: prometheus.b.labels.*
type: object
object_type: keyword
subobjects: false Mapping {
"dynamic_templates": [
{
"prometheus.b.labels.*": {
"path_match": "prometheus.b.labels.*",
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"properties": {
"prometheus": {
"properties": {
"b": {
"properties": {
"labels": {
"type": "object",
"subobjects": false
}
}
}
}
}
}
} Case CDefinition - name: prometheus.c.labels.*
type: object
subobjects: false Mapping This is the result I get from the current in-progress implementation. It doesn't feels right. {
"properties": {
"prometheus": {
"properties": {
"c": {
"properties": {
"labels": {
"properties": {
"*": {
"type": "object",
"subobjects": false
}
}
}
}
}
}
}
}
} Case DDefinition - name: prometheus.d.labels
type: object
object_type: keyword
subobjects: false Mapping {
"dynamic_templates": [
{
"prometheus.d.labels": {
"path_match": "prometheus.d.labels.*",
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"properties": {
"prometheus": {
"properties": {
"d": {
"type": "object",
"subobjects": false
}
}
}
}
} |
@zmoog are the packages defining these mappings using Package Spec v3? In principle this version doesn't allow the definition of objects without children (or without Maybe we need to reconsider this, to allow specifying
Indeed this is not right, packages using mappings like this one should be reviewed. This is a mapping that in principle is not allowed by recent Package Specs. Some cases that were still generating mappings with |
No, they are using package spec version < v3. I confirm that packages using spec v3 don't allow this case.
While troubleshooting an SDH, I tested a potential solution for a user trying to move the content of a field from a On my test cluster, I was able to make it work by updating the
I can share more about this case with you on a separate channel and report the summary here.
Okay, this is invalid and not something we intend to support. It will resolve as more packages switch to spec v3. |
Out of curiosity, why did they want to move from flattened to object?
With this mapping we are not defining the type for the properties of For this case we could have a definition like the following one (would be Case D then):
And ensure that we generate for it the following mappings:
Btw, cases B and D are in principle equivalent, Fleet adds the wildcard at the end for objects whose name doesn't have any wildcard. But would be good to keep them as testing cases. Another equivalent case would be the following one:
When there are wildcards in the name for non-objects, Fleet translates the definition to |
They mentioned (my apologies for the link to a private repo):
It's probably worth mentioning how Azure logs work in general: Here's an example of Azure logs:
The log event contains a handful of standard fields:
In addition to these well-defined and documented fields, there's also the The content of the In this scenario, the original designer of the integration opted to map the This user is ingesting |
I see the risk here. However, users want to tap into the treasure trove of significant and valuable events like |
Yep, this is the most frequent issue with
|
## Summary Update the Fleet plugin to support the [subobjects](https://www.elastic.co/guide/en/elasticsearch/reference/current/subobjects.html) setting on the `object` type mapping. This PR supports the `subobjects` setting on a per-field basis. We will add support for `subobjects` setting at the data stream level when Elasticsearch will automatically flatten the mappings in elastic/elasticsearch#99860. The PR add deals with the following [user cases](elastic/package-spec#349 (comment)) found in the integration packages and add support for a few of them. ### ~~Case A~~ ```yaml - name: a.labels type: object subobjects: false ``` The use case A is invalid on package-spec v3 and it's not supported. ### Case B ```yaml - name: b.labels.* type: object object_type: keyword subobjects: false ``` that `_generateMappings()` should map to: ```js { dynamic_templates: [{ "b.labels.*": { path_match: "b.labels.*", match_mapping_type: "string", mapping: { type: "keyword" } } }], properties: { b: { type: 'object', dynamic: true, properties: { labels: { dynamic: true, type: 'object', subobjects: false, } } }, }, } ``` ### ~~Case C~~ ```yaml - name: prometheus.c.labels.* type: object subobjects: false ``` The use case C is considered invalid and it's not supported. ### Case D ```yaml - name: d.labels type: object object_type: keyword subobjects: false ``` that `_generateMappings()` should map to: ```js { dynamic_templates: [{ "d.labels": { path_match: "d.labels.*", match_mapping_type: "string", mapping: { type: "keyword" } } }], properties: { d: { type: 'object', dynamic: true, properties: { labels: { dynamic: true, type: 'object', subobjects: false, } } }, }, } ``` --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Support for per-field |
Support for per data stream |
## What does this PR do? <!-- Mandatory Explain here WHAT changes you made in the PR. --> Add support for `subobjects: false` at the data stream level. Here is an example: ```yaml # From /packages/good_v3/data_stream/subobjects/manifest.yml title: my-data-stream type: logs elasticsearch: index_template: mappings: subobjects: true ``` ## Why is it important? <!-- Mandatory Explain here the WHY, or the rationale/motivation for the changes. --> Give integration developers (per data stream) access to the [subobjects](https://www.elastic.co/guide/en/elasticsearch/reference/current/subobjects.html) option in the integration's index template mappings. Since we added the `subobjects` option in stack version 8.3, users could customize how Elasticsearch handles fields that contain dots in their names from `true` (expanded, current default) to `false` (not expanded). However, integration developers could not set this up in the integrations. Note on per filed option: the `subobjects` option [has been available](#573) at the field level since package-spec 3.1.0. However, to make this happen at the data stream level, we needed elastic/elasticsearch#99860 to land in Elasticsearch. ## Checklist <!-- Mandatory Add a checklist of things that are required to be reviewed in order to have the PR approved List here all the items you have verified BEFORE sending this PR. Please DO NOT remove any item, striking through those that do not apply. (Just in case, strikethrough uses two tildes. ~~Scratch this.~~) --> - [x] I have added test packages to [`test/packages`](https://github.com/elastic/package-spec/tree/main/test/packages) that prove my change is effective. - [x] I have added an entry in [`spec/changelog.yml`](https://github.com/elastic/package-spec/blob/main/spec/changelog.yml). ## Related issues <!-- Recommended Link related issues below. Insert the issue link or reference after the word "Closes" if merging this should automatically close it. - Closes #123 - Relates #123 - Requires #123 - Supersedes #123 --> - #349 - #573 - elastic/elasticsearch#99860 (requirement) --------- Co-authored-by: Mario Rodriguez Molins <mario.rodriguez@elastic.co>
The subobjects option is now available at two levels:
|
In elastic/elasticsearch#86166 Elasticsearch introduced the
subobjects: false
parameter. This allows ingest flat key/value pairs that had in the past conflicts likehost
andhost.name
.package-spec and Fleet should support this option in templates.
At first, the feature should be used with care in packages as the objects must be flattened before ingestion (see also elastic/apm#347 for more details). At the moment ingestion tools like Beats often follow the opposite approach and expand all objects. There are potential future improvements that solve this problem: elastic/apm#347 (comment)
From #425, we need to ensure that at least one of the following mappings produce expected results, so this is supported.
Should produce:
prometheus.labels.*
, with type keyword.prometheus.labels
, withtype: object
andsubobjects: false
.Should produce a mapping for
prometheus.labels
withtype: object
andsubobjects: false
, independently of the existence of other related dynamic mappings.Maybe this already works, but we would need to check it.
Tasks
The text was updated successfully, but these errors were encountered: