Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import aditional indexing settings on external fields #752

Merged
merged 10 commits into from
Mar 29, 2022

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Mar 23, 2022

Fields imported from ECS may contain some settings that may affect how they are indexed:

  • multi_fields allow to index the same field in different ways for different purposes.
  • index: false disables indexing of the field.
  • doc_values: false disables storing some internal data used when sorting and by some aggregations, this can help to save disk space with the cost of losing some functionality for those fields.

We may need a follow up in the package spec to add validation for these fields.

Fixes #678.

@jsoriano jsoriano requested review from andrewkroh, ebeahan and a team March 23, 2022 12:19
@jsoriano jsoriano self-assigned this Mar 23, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 23, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-29T14:58:52.328+0000

  • Duration: 31 min 21 sec

Test stats 🧪

Test Results
Failed 0
Passed 622
Skipped 0
Total 622

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@jsoriano
Copy link
Member Author

I am going to move this back to draft, we may need to confirm how/if this is supported by fleet (ping @joshdover).

I have installed the modified test package, that includes event.original and process.command_line fields and the installed mappings are these ones:

    "event": {
      "properties": {
        "original": {
          "ignore_above": 1024,
          "index": false,
          "type": "keyword"
        }
    "process": {
       ...
        "command_line": {
          "ignore_above": 1024,
          "type": "wildcard",
          "fields": {}
        }
  • index is correctly set ✔️
  • doc_fields is not set (but maybe because it is redundant when index: false is also set?).
  • multi_fields are not set, but interestingly process.command_line has an empty fields, should we be using a different format for multi fields?

FTR, imported definitions of these fields are these ones:

- description: |-
    Full command line that started the process, including the absolute path to the executable, and all ar
guments.
    Some arguments may be filtered to protect sensitive information.
  multi_fields:
    - name: text
      type: match_only_text
  name: process.command_line
  type: wildcard
- description: |-
    Raw text message of entire event. Used to demonstrate log integrity or where the full log message (be
fore splitting it up in multiple parts) may be required, e.g. for reindex.
    This field is not indexed and doc_values are disabled. It cannot be searched, but it can be retrieved
 from `_source`. If users wish to override this and index this field, please see `Field data types` in th
e `Elasticsearch Reference`.
  doc_values: false
  index: false
  name: event.original
  type: keyword

@jsoriano jsoriano marked this pull request as draft March 23, 2022 12:47
@joshdover
Copy link

@juliaElastic I think you were recently in this mapping code for adding the dimensions settings. Would you be able to take a look and answer the questions above?

@juliaElastic
Copy link

juliaElastic commented Mar 23, 2022

@jsoriano
Regarding doc_values, it looks like fleet is not setting doc_values in mappings if doc_values is false, we could update it to work like index field:
https://github.com/elastic/kibana/blob/5903c417403b55418f74c3b9bfbcba07b12d402e/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/template.ts#L302-L307

For multi_fields, it looks like the type match_only_text is not covered by fleet, this can be updated as well:
https://github.com/elastic/kibana/blob/5903c417403b55418f74c3b9bfbcba07b12d402e/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/template.ts#L234-L255

@jsoriano
Copy link
Member Author

Thanks @juliaElastic, I think this explains what we are seeing.

Regarding doc_values, it looks like fleet is not setting doc_values in mappings if doc_values is false, we could update it to work like index field:
https://github.com/elastic/kibana/blob/5903c417403b55418f74c3b9bfbcba07b12d402e/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/template.ts#L302-L307

Ah yes, makes sense, I think it'd be good to have the same logic for doc_values in any case.

For multi_fields, it looks like the type match_only_text is not covered by fleet, this can be updated as well:
https://github.com/elastic/kibana/blob/5903c417403b55418f74c3b9bfbcba07b12d402e/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/template.ts#L234-L255

Ah ok, we recently had to add support for this one also in the package-spec (elastic/package-spec#284). It'd be good to have support in fleet too. I will test this with a field of other type.

Could you take care of adding the missing pieces to Fleet?

@juliaElastic
Copy link

juliaElastic commented Mar 23, 2022

Could you take care of adding the missing pieces to Fleet?

Yes, let me raise a pr soon.

EDIT: raised the pr, is that okay to have "ignore_above": 1024 as well in the resulting mappings?

@jsoriano
Copy link
Member Author

is that okay to have "ignore_above": 1024 as well in the resulting mappings?

I think so, I guess this is a default configured by fleet?

@juliaElastic
Copy link

is that okay to have "ignore_above": 1024 as well in the resulting mappings?

I think so, I guess this is a default configured by fleet?

yes

@jsoriano
Copy link
Member Author

Issues mentioned related to kibana handling of these settings have been solved in elastic/kibana#128391 (thanks Julia!), so I am reopening this for review.

doc_values and multi_fields won't work for users using older versions of Kibana, but they don't seem to produce any issue, they are just not installed as they aren't now.

@jsoriano jsoriano marked this pull request as ready for review March 28, 2022 08:37
@jsoriano
Copy link
Member Author

/test

internal/fields/model.go Outdated Show resolved Hide resolved
internal/fields/model.go Outdated Show resolved Hide resolved
internal/fields/model.go Show resolved Hide resolved
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few nit-picks, but in general, I'm wondering if, from a Go code perspective,multi_fields shouldn't be treated the same way as standard fields. Maybe the implementation logic will be simpler then, as it's just another fields field to visit in terms of validation. Once we open the gate for generics, we could improve it.

Have you considered this option?

internal/fields/dependency_manager.go Outdated Show resolved Hide resolved
internal/fields/dependency_manager.go Outdated Show resolved Hide resolved
@jsoriano
Copy link
Member Author

I left a few nit-picks, but in general, I'm wondering if, from a Go code perspective,multi_fields shouldn't be treated the same way as standard fields. Maybe the implementation logic will be simpler then, as it's just another fields field to visit in terms of validation. Once we open the gate for generics, we could improve it.

Have you considered this option?

I considered this, but I found that there are several fields that are not common, e.g. a multi field doesn't have a description, or other sub-fields. But maybe we can assume these differences, I can give it another try.

@jsoriano jsoriano requested a review from mtojek March 29, 2022 14:06
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I don't think that you need to iterate on this later one. The current implementation should solve the original issue.

@jsoriano jsoriano merged commit 4845791 into elastic:main Mar 29, 2022
@jsoriano jsoriano deleted the external-multi-fields branch March 29, 2022 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

External ECS field imports not containing their multi-fields
6 participants