Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for subobjects: false #349

Closed
3 tasks done
ruflin opened this issue Jun 2, 2022 · 25 comments
Closed
3 tasks done

Add support for subobjects: false #349

ruflin opened this issue Jun 2, 2022 · 25 comments
Assignees
Labels
Team:Ecosystem Label for the Packages Ecosystem team

Comments

@ruflin
Copy link
Member

ruflin commented Jun 2, 2022

In elastic/elasticsearch#86166 Elasticsearch introduced the subobjects: false parameter. This allows ingest flat key/value pairs that had in the past conflicts like host and host.name.

package-spec and Fleet should support this option in templates.

At first, the feature should be used with care in packages as the objects must be flattened before ingestion (see also elastic/apm#347 for more details). At the moment ingestion tools like Beats often follow the opposite approach and expand all objects. There are potential future improvements that solve this problem: elastic/apm#347 (comment)

From #425, we need to ensure that at least one of the following mappings produce expected results, so this is supported.


- name: prometheus.labels.*
  type: object
  object_type: keyword
  subobjects: false

Should produce:

  • A dynamic mapping for prometheus.labels.*, with type keyword.
  • A static mapping for prometheus.labels, with type: object and subobjects: false.

- name: prometheus.labels
  type: object
  subobjects: false

Should produce a mapping for prometheus.labels with type: object and subobjects: false, independently of the existence of other related dynamic mappings.


Maybe this already works, but we would need to check it.

Tasks

  1. zmoog
  2. Team:Fleet backport:skip release_note:feature v8.13.0
    zmoog
  3. enhancement
    zmoog
@jlind23 jlind23 added the Team:Ecosystem Label for the Packages Ecosystem team label Jun 2, 2022
@javanna javanna changed the title Add support for subobject: false Add support for subobjects: false Jun 2, 2022
@ruflin
Copy link
Member Author

ruflin commented Jun 2, 2022

Having subobject: false could have a broader impact on how we do things. Few thoughts:

  • For fields.yml, should we flatten these by default?
  • Should we flatten fields for component templates in Elasticsearch loaded by fleet for every object that does not have special settings on the object level? @kpollich
  • Should we ship data as mostly key/value pairs to Elasticsearch from Elastic Agent instead of the nested objects @ph @cmacknz
  • Can we remove most of the common.MapStr magic inside Beats / Agent?

We don't need to necessarily have all these discussions here but worth referencing. I'm excited about the potential future opportunities this offers. (@joshdover )

@jsoriano
Copy link
Member

jsoriano commented Jun 2, 2022

This is great, and I think that in many of the cases where we would take more advantage of this, basically where we dedot now, the field is already flattened, because we send it as is read from origin.

I think we could add support for this, and introduce it eventually in a case per case basis.

At first, the feature should be used with care in packages as the objects must be flattened before ingestion

We can add a check for this in elastic-package tests, as we have for other field values.

In a more general sense, do we know if storing everything flattened would have other advantages? as disk space savings or ingestion performance?

@ph
Copy link

ph commented Jun 2, 2022

This is indeed interesting, so that would also simply all the Schema in metricbeat.

@kpollich
Copy link
Member

kpollich commented Jun 2, 2022

Should we flatten fields for component templates in Elasticsearch loaded by fleet for every object that does not have special settings on the object level?

I'd like to just solidify my understanding of the relation between the package spec and component templates generated by Fleet. The fields YML files in a given integration control the mappings Fleet generates for component templates, correct?

e.g. nginx/data_stream/access/fields/agent.yml defines a host mapping here on the logs-nginx.access@package eventually generated by Fleet.

The host fields correspond as such:

Show code blocks
# agent.yml
- name: host
  title: Host
  group: 2
  description: 'A host is defined as a general computing instance.
    ECS host.* fields should be populated with details about the host on which the event happened, or from which the measurement was taken. Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes.'
  type: group
  fields:
    - name: architecture
      level: core
      type: keyword
      ignore_above: 1024
      description: Operating system architecture.
      example: x86_64
    - name: domain
      level: extended
      type: keyword
      ignore_above: 1024
      description: 'Name of the domain of which the host is a member.
        For example, on Windows this could be the host''s Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host''s LDAP provider.'
      example: CONTOSO
      default_field: false
    - name: hostname
      level: core
      type: keyword
      ignore_above: 1024
      description: 'Hostname of the host.
        It normally contains what the `hostname` command returns on the host machine.'
    - name: id
      level: core
      type: keyword
      ignore_above: 1024
      description: 'Unique host id.
        As hostname is not always unique, use values that are meaningful in your environment.
        Example: The current usage of `beat.name`.'
    - name: ip
      level: core
      type: ip
      description: Host ip addresses.
    - name: mac
      level: core
      type: keyword
      ignore_above: 1024
      description: Host mac addresses.
    - name: name
      level: core
      type: keyword
      ignore_above: 1024
      description: 'Name of the host.
        It can contain what `hostname` returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use.'
    - name: os.family
      level: extended
      type: keyword
      ignore_above: 1024
      description: OS family (such as redhat, debian, freebsd, windows).
      example: debian
    - name: os.kernel
      level: extended
      type: keyword
      ignore_above: 1024
      description: Operating system kernel version as a raw string.
      example: 4.4.0-112-generic
    - name: os.name
      level: extended
      type: keyword
      ignore_above: 1024
      multi_fields:
        - name: text
          type: text
          norms: false
          default_field: false
      description: Operating system name, without the version.
      example: Mac OS X
    - name: os.platform
      level: extended
      type: keyword
      ignore_above: 1024
      description: Operating system platform (such centos, ubuntu, windows).
      example: darwin
    - name: os.version
      level: extended
      type: keyword
      ignore_above: 1024
      description: Operating system version as a raw string.
      example: 10.14.1
    - name: type
      level: core
      type: keyword
      ignore_above: 1024
      description: 'Type of host.
        For Cloud providers this can be the machine type like `t2.medium`. If vm, this could be the container, for example, or other information meaningful in your environment.'
    - name: containerized
      type: boolean
      description: >
        If the host is a container.
    - name: os.build
      type: keyword
      example: "18D109"
      description: >
        OS build information.
    - name: os.codename
      type: keyword
      example: "stretch"
      description: >
        OS codename, if any.
// logs-nginx.access@package component template mappings
"host": {
  "properties": {
    "hostname": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "os": {
      "properties": {
        "build": {
          "ignore_above": 1024,
          "type": "keyword"
        },
        "kernel": {
          "ignore_above": 1024,
          "type": "keyword"
        },
        "codename": {
          "ignore_above": 1024,
          "type": "keyword"
        },
        "name": {
          "ignore_above": 1024,
          "type": "keyword",
          "fields": {
            "text": {
              "type": "text"
            }
          }
        },
        "family": {
          "ignore_above": 1024,
          "type": "keyword"
        },
        "version": {
          "ignore_above": 1024,
          "type": "keyword"
        },
        "platform": {
          "ignore_above": 1024,
          "type": "keyword"
        }
      }
    },
    "domain": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "ip": {
      "type": "ip"
    },
    "containerized": {
      "type": "boolean"
    },
    "name": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "id": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "type": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "mac": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "architecture": {
      "ignore_above": 1024,
      "type": "keyword"
    }
  }
}

The suggestion here to flatten all fields generated by Fleet on these component templates would alter our logic here to convert the host mappings generated to something along the lines of this:

// subset of mappings shown for brevity's sake
{
  "host": {
    "type": "object",
    "subobjects": false,
    "properties": {
      "hostname": {
        "ignore_above": 1034,
        "type": "keyword"
      },
      "os.build": {
        "ignore_above": 1024,
        "type": "keyword"
      },
      "os.kernel": {
        "ignore_above": 1024,
        "type": "keyword"
      },
      "os.codename": {
        "ignore_above": 1024,
        "type": "keyword"
      },
      "os.name": {
        "ignore_above": 1024,
        "type": "keyword",
        "fields": {
          "text": {
            "type": "text"
          }
        }
      }
    }
  }
}

Is my understanding of how subobjects: false is intended to interact with the mappings Fleet generates for component templates correct? I'm basing this example off what I'm seeing in the docs added in the referenced Elasticsearch PR: https://github.com/elastic/elasticsearch/pull/86166/files#diff-7fc27842bfad1be51be5b340e2dc4fed9048f72d6129c069fdca9c587aa97ed8R54-R76

I understand the flattening here being valuable for preventing collisions between fields like host and host.name (or the metric.value and metric.value.max fields described in elastic/elasticsearch#63530) as mentioned, and I think it'd be a fairly straightforward implementation to apply this flattening logic when we parse object fields during Fleet's component template installation step.

I'd echo @jsoriano's question above as well, just out of general curiosity:

In a more general sense, do we know if storing everything flattened would have other advantages? as disk space savings or ingestion performance?

@ruflin
Copy link
Member Author

ruflin commented Jun 3, 2022

@kpollich The initial idea I was proposing above was not directly related to subobject: false but it got triggered by it. The idea was to have:

{
   "host.hostname":{
      "ignore_above":1034,
      "type":"keyword"
   },
   "host.os.build":{
      "ignore_above":1024,
      "type":"keyword"
   },
   "host.os.kernel":{
      "ignore_above":1024,
      "type":"keyword"
   }
}

But you bring up additional interesting points.

@javanna
Copy link
Member

javanna commented Jun 3, 2022

I think the question here is taking the current solution to resolve the original collision problem to the next level, and apply it more broadly. We are reasoning about how useful intermediate objects are in the mappings and whether it would not be simpler to map leaf fields only using the dot notation.

In a more general sense, do we know if storing everything flattened would have other advantages? as disk space savings or ingestion performance?

No data on this, I believe what we are after is simplifying mappings, but @ruflin can correct me. It would have an impact on the total number of fields, because objects would no longer be mapped.

@ruflin
Copy link
Member Author

ruflin commented Jun 7, 2022

No data on this, I believe what we are after is simplifying mappings, but @ruflin can correct me. It would have an impact on the total number of fields, because objects would no longer be mapped.

++, focus on simplification. It could have a positive side effect on # of fields mapped but I would expect it to be very minimal.

@jsoriano
Copy link
Member

jsoriano commented Jun 7, 2022

No data on this, I believe what we are after is simplifying mappings, but @ruflin can correct me. It would have an impact on the total number of fields, because objects would no longer be mapped.

++, focus on simplification. It could have a positive side effect on # of fields mapped but I would expect it to be very minimal.

I agree on simplification of the fields.yml files, and and on applying this to solve the collision problem.

But flattening everything we ship opposes to what we have been doing so far, in general we are sending unflattened data. This can be a big change in beats and integrations. I was asking to see if there are more advantages to help justifying this effort.

@felixbarny
Copy link
Member

This can be a big change in beats and integrations.

If Elasticsearch is automatically handling the flattening of objects, then there's no work that needs to be done in Beats/integrations and using subobjects: false on the root level should be a drop-in replacement.

@javanna
Copy link
Member

javanna commented Aug 3, 2022

if Elasticsearch is automatically handling the flattening of objects, then there's no work that needs to be done in Beats/integrations and using subobjects: false on the root level should be a drop-in replacement.

I believe this is not correct. Or maybe I am referring to a problem that I saw on ECS that is not present in the mappings you are referring to. Elasticsearch is considering automatically flattening objects in incoming documents when subobjects is set to false, but some mappings do send objects and that will not be accepted. If you want to set subobjects to false, then the mappings should be flattened. From previous conversation, this change can be implemented already today: intermediate objects don't need to be mapped, you can already use the dot notation which Elasticsearch will expand to objects automatically, which it will no longer do as soon you switch to subobjects: false. Ping me if you have questions around this.

@felixbarny
Copy link
Member

Right, good point. The mappings potentially need some changes but most importantly, the documents and queries don't, right?

the dot notation which Elasticsearch will expand to objects automatically

There's one instance where this isn't the case, though. Not sure if that's a bug or a feature. If dynamic is set to false or runtime, and when using dynamic templates with a path match containing dots, ES won't automatically create object mappings, which leads to an exception on ingest.

Example:

{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "log_level": {
            "path_match": "log.level",
            "mapping": { "type": "keyword" }
          }
        }
      ],
      "dynamic": "runtime",
      "properties": {
        "log": {
          "type": "object"
        }
      }
    }
  }
}

This doesn't work without explicitly mapping log as an object. It's not strictly related to subobjects: false but kinda related. And I needed to dump it somewhere 😅

@javanna
Copy link
Member

javanna commented Aug 9, 2022

@felixbarny dynamic:runtime maps every unknown field as a runtime field, hence within the runtime section. The runtime section does not support objects, and the behaviour will be then very similar to setting subobjects false to the root then. In this example you'd like to have everything mapped as runtime besides log.level, do I understand correctly?

Can you share more info, like what exception on ingest you get, maybe better on slack as this is unrelated to this issue?

In short, though, that objects are not dynamically not mapped is a feature in this case, I would like to dig further on why the object creation is necessary to make path_match work. Ping me, please?

@felixbarny
Copy link
Member

felixbarny commented Jul 12, 2023

During 8.10, we're working on removing all known blockers for subobjects: false adoption across different teams. @jlind23 @juliaElastic, is there a chance this issue could also be picked up in the 8.10 time frame? If that's unreasonable, could the observability contribute the necessary changes?

From my perspective, these are the concrete things that need to be implemented:

  • Instead of creating nested mappings, Fleet should always generate flattened mappings. This works both with subobjects set to true and false, because Elasticsearch will internally create nested object mappings if subobjects are enabled. See also this flat mapping as an example, and contrast it with this nested mapping. Both are equivalent but the former supports both subobjects to be set to true and false, whereas the latter only supports subobjects: false (the default). This will need to be changed in the the Fleet plugin in Kibana in the _generateMappings function.
  • Support setting subobjects: false at the root of the mapping (see the last section of the subobjects docs). This will need changes in the package spec and in Fleet (I suppose also in _generateMappings).
  • Support setting subobjects: false at a per-field level. This needs similar changes to the above.

In the future, we'll want to change the default for many of the data stream types so that subobjects: false is the default. But integrations should have the ability to opt-out on a per-data stream level.

For more background on why we're adopting subobjects: false, see this issue: https://github.com/elastic/logs-dev/issues/105

@jlind23
Copy link
Collaborator

jlind23 commented Jul 12, 2023

@felixbarny looking at our current roadmap there is no chance that we will be on time for 8.10, I am more than happy to find someone in our team that will help review this contribution.

@felixbarny
Copy link
Member

Instead of creating nested mappings, Fleet should always generate flattened mappings. This works both with subobjects set to true and false, because Elasticsearch will internally create nested object mappings if subobjects are enabled.

We're currently discussing whether this can be done directly in Elasticsearch. That would also help if users have added custom mappings in the <package>@custom component template that contain nested mappings.

@zmoog
Copy link
Contributor

zmoog commented Nov 21, 2023

I skimmed through the existing field definitions with type: object in the https://github.com/elastic/integrations repo to find patterns in the current usage.

Here are a few recurring cases:

- name: docker.container.labels.*
  type: object
  release: ga
  description: |
    Container labels

- name: labels
  level: extended
  type: object
  object_type: keyword
  description: Image labels.

- name: forgerock.request.detail.*
  type: object
  object_type: keyword
  object_type_mapping_type: '*'
  description: Details around the response status.

- name: input
  type: object

- name: payload
  type: object
  enabled: false

- name: labels.*
  type: object
  description: |
    Image labels.

- name: percpu
  type: object
  description: |
    CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup.

From these cases, I tried to identify mapping that would use the subobjects option and their outcome.

Here are four cases with field definition and expected mapping I am currently focusing on. I am using these cases to write the tests.

Case A

Definition

- name: prometheus.a.labels
  type: object
  subobjects: false

Mapping

{
    "properties": {
        "prometheus": {
            "properties": {
                "a": {
                    "properties": {
                        "labels": {
                            "type": "object",
                            "subobjects": false
                        }
                    }
                }
            }
        }
    }
}

Case B

Definition

- name: prometheus.b.labels.*
  type: object
  object_type: keyword
  subobjects: false

Mapping

{
    "dynamic_templates": [
        {
            "prometheus.b.labels.*": {
                "path_match": "prometheus.b.labels.*",
                "match_mapping_type": "string",
                "mapping": {
                    "type": "keyword"
                }
            }
        }
    ],
    "properties": {
        "prometheus": {
            "properties": {
                "b": {
                    "properties": {
                        "labels": {
                            "type": "object",
                            "subobjects": false
                        }
                    }
                }
            }
        }
    }
}

Case C

Definition

- name: prometheus.c.labels.*
  type: object
  subobjects: false

Mapping

This is the result I get from the current in-progress implementation. It doesn't feels right.

{
    "properties": {
        "prometheus": {
            "properties": {
                "c": {
                    "properties": {
                        "labels": {
                            "properties": {
                                "*": {
                                    "type": "object",
                                    "subobjects": false
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Case D

Definition

- name: prometheus.d.labels
  type: object
  object_type: keyword
  subobjects: false

Mapping

{
    "dynamic_templates": [
        {
            "prometheus.d.labels": {
                "path_match": "prometheus.d.labels.*",
                "match_mapping_type": "string",
                "mapping": {
                    "type": "keyword"
                }
            }
        }
    ],
    "properties": {
        "prometheus": {
            "properties": {
                "d": {
                    "type": "object",
                    "subobjects": false
                }
            }
        }
    }
}

@jsoriano
Copy link
Member

jsoriano commented Nov 21, 2023

@zmoog are the packages defining these mappings using Package Spec v3? In principle this version doesn't allow the definition of objects without children (or without object_type), so for example in principle cases A and C would not be possible on recent packages, so maybe it is not worth supporting these cases.

Maybe we need to reconsider this, to allow specifying subobjects: false, but what would be the use case for such mappings?

Case C
...
This is the result I get from the current in-progress implementation. It doesn't feels right.

Indeed this is not right, packages using mappings like this one should be reviewed. This is a mapping that in principle is not allowed by recent Package Specs.

Some cases that were still generating mappings with "*" members will be fixed in 8.12 after elastic/kibana#169981.

@zmoog
Copy link
Contributor

zmoog commented Nov 29, 2023

are the packages defining these mappings using Package Spec v3?

No, they are using package spec version < v3. I confirm that packages using spec v3 don't allow this case.

Maybe we need to reconsider this, to allow specifying subobjects: false, but what would be the use case for such mappings?

While troubleshooting an SDH, I tested a potential solution for a user trying to move the content of a field from a flattened to an object mapping.

On my test cluster, I was able to make it work by updating the logs-azure.platformlogs@custom component template with this request:

PUT _component_template/logs-azure.platformlogs@custom
{
  "template": {
    "mappings": {
      "_source": {
        "excludes": [],
        "includes": [],
        "enabled": true
      },
      "_routing": {
        "required": false
      },
      "dynamic": true,
      "numeric_detection": false,
      "date_detection": true,
      "dynamic_date_formats": [
        "strict_date_optional_time",
        "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
      ],
      "dynamic_templates": [],
      "properties": {
        "azure.platformlogs.k8s": {
          "type": "object",
          "subobjects": false
        }
      }
    }
  },
  "_meta": {
    "package": {
      "name": "azure"
    },
    "managed_by": "fleet",
    "managed": true
  }
}

I can share more about this case with you on a separate channel and report the summary here.

Indeed this is not right, packages using mappings like this one should be reviewed. This is a mapping that in principle is not allowed by recent Package Specs.

Okay, this is invalid and not something we intend to support. It will resolve as more packages switch to spec v3.

@jsoriano
Copy link
Member

jsoriano commented Nov 29, 2023

Maybe we need to reconsider this, to allow specifying subobjects: false, but what would be the use case for such mappings?

While troubleshooting an SDH, I tested a potential solution for a user trying to move the content of a field from a flattened to an object mapping.

Out of curiosity, why did they want to move from flattened to object?

On my test cluster, I was able to make it work by updating the logs-azure.platformlogs@custom component template with this request:

PUT _component_template/logs-azure.platformlogs@custom
{
  ...
      "dynamic_templates": [],
      "properties": {
        "azure.platformlogs.k8s": {
          "type": "object",
          "subobjects": false
        }
      }
    }
  ...
}

With this mapping we are not defining the type for the properties of azure.platformlogs.k8s, so there may be conflicts if they are mixed (an integer is sent first, but later the same field can also have floats, or keywords). We have been trying to remove this kind of ambiguous mappings.

For this case we could have a definition like the following one (would be Case D then):

- name: "azure.platformlogs.k8s"
  type: object
  object_type: keyword
  subobjects: false

And ensure that we generate for it the following mappings:

{
  ...
      "dynamic_templates": [
        {
           "azure.platformlogs.k8s": {
             "path_match": "azure.platformlogs.k8s.*",
             "mapping": { "type": "keyword" }
           }
        }     
      ],
      "properties": {
        "azure.platformlogs.k8s": {
          "type": "object",
          "subobjects": false
        }
      }
    }
  ...
}

Btw, cases B and D are in principle equivalent, Fleet adds the wildcard at the end for objects whose name doesn't have any wildcard. But would be good to keep them as testing cases. Another equivalent case would be the following one:

- name: prometheus.d.labels.*
  type: keyword
  subobjects: false

When there are wildcards in the name for non-objects, Fleet translates the definition to type: object using the original type as object_type.

@zmoog
Copy link
Contributor

zmoog commented Nov 29, 2023

Out of curiosity, why did they want to move from flattened to object?

They mentioned (my apologies for the link to a private repo):

The reason for this change was to be able to select the different azure.platformlogs.properties using Kibana discover module, which is not possible when the field is "flattened".

It's probably worth mentioning how Azure logs work in general:

Here's an example of Azure logs:

{
    "category": "kube-audit",
    "operationName": "Microsoft.ContainerService/managedClusters/diagnosticLogs/Read",
    "properties": {
        ...
    },
    "resourceId": "/SUBSCRIPTIONS/123RESOURCEGROUPS/abc/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/abc",
    "time": "2023-11-29T09:41:00.720070863Z"
}

The log event contains a handful of standard fields:

  • category the log category
  • operationName
  • resourceId is the ID of the Azure resource emitting the event
  • time the timestamp of when the described event occurred

In addition to these well-defined and documented fields, there's also the properties field that Azure uses for the service-specific part of the event.

The content of the properties field varies A LOT. Depending on the originating service, properties can contain strings, numbers objects, embedded JSON objects, or a combination. It's wild.

In this scenario, the original designer of the integration opted to map the properties using the flattened, which makes sense.

This user is ingesting kube-audit events that are pretty large objects. You can see an example (I apologize again for the private repo) of this even type unpacked from the Azure log event.

@zmoog
Copy link
Contributor

zmoog commented Nov 29, 2023

With this mapping we are not defining the type for the properties of azure.platformlogs.k8s, so there may be conflicts if they are mixed (an integer is sent first, but later the same field can also have floats, or keywords). We have been trying to remove this kind of ambiguous mappings.

I see the risk here.

However, users want to tap into the treasure trove of significant and valuable events like kube-audit to get the most out of it.

@jsoriano
Copy link
Member

The reason for this change was to be able to select the different azure.platformlogs.properties using Kibana discover module, which is not possible when the field is "flattened".

Yep, this is the most frequent issue with flattened 🙁 elastic/kibana#25820

subobjects: false will be useful in cases where there is no risk of fields explosion.

zmoog added a commit to elastic/kibana that referenced this issue Dec 12, 2023
## Summary

Update the Fleet plugin to support the
[subobjects](https://www.elastic.co/guide/en/elasticsearch/reference/current/subobjects.html)
setting on the `object` type mapping.

This PR supports the `subobjects` setting on a per-field basis. We will
add support for `subobjects` setting at the data stream level when 
Elasticsearch will automatically flatten the mappings in elastic/elasticsearch#99860.

The PR add deals with the following [user
cases](elastic/package-spec#349 (comment))
found in the integration packages and add support for a few of them.

### ~~Case A~~

```yaml
- name: a.labels
  type: object
  subobjects: false
```

The use case A is invalid on package-spec v3 and it's not supported.


### Case B

```yaml
- name: b.labels.*
  type: object
  object_type: keyword
  subobjects: false
```

that `_generateMappings()` should map to:

```js
{
  dynamic_templates: [{
    "b.labels.*": {
      path_match: "b.labels.*",
      match_mapping_type: "string",
      mapping: {
        type: "keyword"
      }
    }
  }],
  properties: {
    b: {
      type: 'object',
      dynamic: true,
      properties: {
        labels: {
          dynamic: true,
          type: 'object',
          subobjects: false,
        }
      }
    },
  },
}
```

### ~~Case C~~

```yaml
- name: prometheus.c.labels.*
  type: object
  subobjects: false
```

The use case C is considered invalid and it's not supported.

### Case D

```yaml
- name: d.labels
  type: object
  object_type: keyword
  subobjects: false
```

that `_generateMappings()` should map to:

```js
{
  dynamic_templates: [{
    "d.labels": {
      path_match: "d.labels.*",
      match_mapping_type: "string",
      mapping: {
        type: "keyword"
      }
    }
  }],
  properties: {
    d: {
      type: 'object',
      dynamic: true,
      properties: {
        labels: {
          dynamic: true,
          type: 'object',
          subobjects: false,
        }
      }
    },
  },
}
```

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@zmoog
Copy link
Contributor

zmoog commented Dec 12, 2023

Support for per-field subobjects: false merged in package-spec and kibana.

@felixbarny
Copy link
Member

Support for per data stream subobjects: false is blocked by elastic/elasticsearch#99860

ruflin pushed a commit that referenced this issue Mar 26, 2024
## What does this PR do?

<!-- Mandatory
Explain here WHAT changes you made in the PR.
-->

Add support for `subobjects: false` at the data stream level. 

Here is an example:

```yaml
# From /packages/good_v3/data_stream/subobjects/manifest.yml
title: my-data-stream
type: logs
elasticsearch:
  index_template:
    mappings:
      subobjects: true
```

## Why is it important?

<!-- Mandatory
Explain here the WHY, or the rationale/motivation for the changes.
-->

Give integration developers (per data stream) access to the [subobjects](https://www.elastic.co/guide/en/elasticsearch/reference/current/subobjects.html) option in the integration's index template mappings.

Since we added the `subobjects` option in stack version 8.3, users could customize how Elasticsearch handles fields that contain dots in their names from `true` (expanded, current default) to `false` (not expanded). However, integration developers could not set this up in the integrations.

Note on per filed option: the `subobjects` option [has been available](#573) at the field level since package-spec 3.1.0. However, to make this happen at the data stream level, we needed elastic/elasticsearch#99860 to land in Elasticsearch.

## Checklist

<!-- Mandatory
Add a checklist of things that are required to be reviewed in order to have the PR approved

List here all the items you have verified BEFORE sending this PR. Please DO NOT remove any item, striking through those that do not apply. (Just in case, strikethrough uses two tildes. ~~Scratch this.~~)
-->

- [x] I have added test packages to [`test/packages`](https://github.com/elastic/package-spec/tree/main/test/packages) that prove my change is effective.
- [x] I have added an entry in [`spec/changelog.yml`](https://github.com/elastic/package-spec/blob/main/spec/changelog.yml).

## Related issues

<!-- Recommended
Link related issues below. Insert the issue link or reference after the word "Closes" if merging this should automatically close it.

- Closes #123
- Relates #123
- Requires #123
- Supersedes #123
-->
- #349
- #573
- elastic/elasticsearch#99860 (requirement)


---------

Co-authored-by: Mario Rodriguez Molins <mario.rodriguez@elastic.co>
@zmoog
Copy link
Contributor

zmoog commented Mar 26, 2024

The subobjects option is now available at two levels:

  • per field (8.13)
  • per data stream (8.14)

@zmoog zmoog closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Ecosystem Label for the Packages Ecosystem team
Projects
None yet
Development

No branches or pull requests

8 participants