-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model: ECS for 6.x #1609
model: ECS for 6.x #1609
Conversation
Elasticsearch aliases are leveraged for ECS compatibility wherever possible. Where that's not possible, values are written to both the original location and the one ECS dictates. Here: * copy context.tags to labels - object fields can't be aliased * cast & copy context.request.url.port to url.port - keyword -> int * copy context.request.url.protocol to url.scheme - trim the final :
url.full -> context.request.url.full url.original -> context.request.url.raw per feedback from @webmat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far. Please ping me on any subsequent mapping work.
backport of elastic/beats#9269 will fix the bad docs generated for aliased fields. |
@graphaelli Just clarifying: |
@webmat Can you check my understand here - in the APM use case Note that the application server is the browser in the RUM case. |
@graphaelli We are currently reviewing whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the idea for all values under error
, transaction
, span
, metric
?
The category event
troubles me especially, as it is described to hold the context
of an event, eg event.start
, event.duration
, etc. We do have this information inside transaction, span
, etc.
In case we moved the data from our events into the event
namespace, we would need to find a solution for how to refer to parent
, trace
, transaction
(from error and span event) etc. I was thinking we could make use of related
here, but not sure I completely understood the related field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, forgot some time ago to hit submit ...
vendor/github.com/elastic/beats/libbeat/publisher/pipeline/module.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at tests/system/test_ecs_mappings.py
closely yet, but reviewed everything else.
_meta/ecs-migration.yml
Outdated
copy_to: true | ||
|
||
- from: context.process.argv | ||
to: process.args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I brought it up, but as this is actually not indexed, I think we should remove it again from here, or set alias=false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be indexed in an ECS index.
Perhaps if APM doesn't make this useable this requirement can be relaxed? @ruflin WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, removed altogether since it's a useless (though allowed, somewhat surprisingly) alias.
# - name: socket.encrypted | ||
# type: boolean | ||
# - name: socket.remote_address | ||
# type: keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to also define the nested fields here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to so they'll show up in documentation but not as-is. Did you have other concerns besides docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No concerns, I was hoping we could add them.
searchable: false | ||
doc_values: false | ||
|
||
- name: status_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an indexed field, same datatype as context.response.status_code
that is aliased to http.response.status_code
now. I think this should also be mapped to the same field, by copying it over instead of aliasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I convinced myself that this field is metadata on a span and might actually be confusing if it's in http.response.status_code
, as it's the response code from calling a remote service vs the response code we return from the traced service.
If you would still like to proceed, are you saying that in:
6.6 - put this value in context.response.status_code
and context.http.status_code
7.0 - write it to http.response.status_code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the ambiguity here. This is to track the response of an external call performed by the monitored app, correct?
I think I'd be inclined to still put it in the ECS field. After all, these events will mostly be looked at in the context of this one event stream. Might as well benefit from the streamlined field names.
But if you see a problem with this, we can hold off on the migration of this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to track the response of an external call performed by the monitored app, correct?
correct
I don't have philosophical objections to this plan - my concern is mostly a practical issue of this being our only 1:many field. I agree that in 7.0.0 we can write this to http.response.status_code
, after all. an APM event stream in the distributed tracing case is composed of multiple transactions that will each bring a possibly different http.response.status_code
value. It's also not a big effort to write this to both places for spans, I just want to be sure we want that in 6.6 because I'm not sure what the benefit of that is, since we can't make that work for pre-6.6 indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I am good with leaving as is for 6.x and reconsider for 7.0
Conflicts: docs/data/elasticsearch/generated/errors.json docs/data/elasticsearch/generated/transactions.json processor/stream/approved-es-documents/testV2IntakeIntegrationErrors.approved.json processor/stream/approved-es-documents/testV2IntakeIntegrationTransactions.approved.json processor/stream/package_tests/error_attrs_test.go processor/stream/package_tests/transaction_attrs_test.go tests/fields.go
per feedback from @simitt, turns out context.process.argv is not indexed so alias does not make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed a few more things
_meta/ecs-migration.yml
Outdated
copy_to: true | ||
|
||
- from: context.process.argv | ||
to: process.args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be indexed in an ECS index.
Perhaps if APM doesn't make this useable this requirement can be relaxed? @ruflin WDYT?
to: client.ip | ||
|
||
- from: context.user.user-agent | ||
to: user_agent.original.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user_agent.original.text
would be a multi-field, which is defined in the context of the "real" field, which is user_agent.original
in this case.
I guess we're hitting a corner case here. The field (in terms of setting an alias) is moving to user_agent.original
. I'm not sure it's possible to create an alias to a multi-field. We need to look into this.
But the usage side, if APM intends to use this as "what field does the query target", then yeah it's now user_agent.original.text
.
cc @ruflin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can alias to a multi-field but not back from a multi-field (marked as a non-reversible alias)
We've decided it's not worth indexing this field - it's often huge (think java, where it's not even captured by default) and uninteresting for queries - other fields are better suited for search/aggs, like |
After further consideration of elastic/beats#9412 (comment), removing the context.tags -> labels copy. There is still time to restore it, even behind a flag, if desired. |
@graphaelli I don't quite understand what the problem with copying Apart from that, this PR LGTM. |
The main concern is that it may surprisingly take a significant amount of space. I'd like to merge this and follow up after we discuss a bit further on a separate issue. |
Exciting! Great work! |
Make new 6.x indices look like they will in 7.x by aliasing fields where possible and copying data where necessary.
Tasks:
context.http.status_code
context.response.status_code
andcontext.response.finished
user.user_agent
pipeline (seeking confirmation this is no-op since it's not indexed and user configurable)Reviewers: please pay particular attention to ensure the meanings of the fields in ECS is respected - that is that not only do the data types match, but the values will be usable across applications. Also please look for an opportunities to fill in in
event.*
that might have been missed.