Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: ECS for 6.x #1609

Merged
merged 41 commits into from
Dec 17, 2018
Merged

model: ECS for 6.x #1609

merged 41 commits into from
Dec 17, 2018

Conversation

graphaelli
Copy link
Member

@graphaelli graphaelli commented Nov 30, 2018

Make new 6.x indices look like they will in 7.x by aliasing fields where possible and copying data where necessary.

Tasks:

  • All non-indexed fields should be documented now
  • fields pending decisions on aliasing: context.http.status_code context.response.status_code and context.response.finished
  • decision on user.user_agent pipeline (seeking confirmation this is no-op since it's not indexed and user configurable)
  • Restore host processor metadata fields
  • merge in the latest beats framework update ([6.x] Update beats framework to c63c98b #1647) now that ECS fields have landed there, no major issues expected. The host metadata fields will be restored at that time.

Reviewers: please pay particular attention to ensure the meanings of the fields in ECS is respected - that is that not only do the data types match, but the values will be usable across applications. Also please look for an opportunities to fill in in event.* that might have been missed.

Elasticsearch aliases are leveraged for ECS compatibility wherever
possible.  Where that's not possible, values are written to both the
original location and the one ECS dictates. Here:

* copy context.tags to labels - object fields can't be aliased
* cast & copy context.request.url.port to url.port - keyword -> int
* copy context.request.url.protocol to url.scheme - trim the final :
_meta/ecs-migration.yml Outdated Show resolved Hide resolved
url.full -> context.request.url.full
url.original -> context.request.url.raw

per feedback from @webmat
Copy link

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Please ping me on any subsequent mapping work.

docs/fields.asciidoc Outdated Show resolved Hide resolved
@graphaelli
Copy link
Member Author

backport of elastic/beats#9269 will fix the bad docs generated for aliased fields.

@webmat
Copy link

webmat commented Dec 3, 2018

@graphaelli Just clarifying: host.hostname is meant to contain exactly what the command hostname would return on that server. host.name can also default to this value, but is the field a user would override, if they decide to do so.

@graphaelli
Copy link
Member Author

@webmat Can you check my understand here - in the APM use case host.hostname would be the hostname of the application server being instrumented and host.name would default to that but possibly overridden by a user. Also, we'll need to notify users that host.name in <6.6 was the apm-server's hostname but now will be the application server's. The apm-server hostname has been moved under agent.

Note that the application server is the browser in the RUM case.

@webmat
Copy link

webmat commented Dec 3, 2018

@graphaelli We are currently reviewing whether agent.hostname will make it into ECS. I can't answer that yet.

cc @MikePaquette

Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the idea for all values under error, transaction, span, metric?
The category event troubles me especially, as it is described to hold the context of an event, eg event.start, event.duration, etc. We do have this information inside transaction, span, etc.
In case we moved the data from our events into the event namespace, we would need to find a solution for how to refer to parent, trace, transaction (from error and span event) etc. I was thinking we could make use of related here, but not sure I completely understood the related field.

_meta/fields.common.yml Show resolved Hide resolved
_meta/fields.common.yml Show resolved Hide resolved
_meta/ecs-migration.yml Show resolved Hide resolved
_meta/ecs-migration.yml Show resolved Hide resolved
_meta/ecs-migration.yml Outdated Show resolved Hide resolved
_meta/ecs-migration.yml Show resolved Hide resolved
_meta/ecs-migration.yml Outdated Show resolved Hide resolved
_meta/ecs-migration.yml Outdated Show resolved Hide resolved
_meta/ecs-migration.yml Show resolved Hide resolved
_meta/fields.common.yml Outdated Show resolved Hide resolved
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, forgot some time ago to hit submit ...

_meta/ecs-migration.yml Outdated Show resolved Hide resolved
_meta/fields.common.yml Outdated Show resolved Hide resolved
_meta/fields.common.yml Outdated Show resolved Hide resolved
_meta/fields.common.yml Outdated Show resolved Hide resolved
_meta/fields.common.yml Show resolved Hide resolved
@graphaelli graphaelli changed the title WIP model: ECS for 6.x model: ECS for 6.x Dec 14, 2018
Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at tests/system/test_ecs_mappings.py closely yet, but reviewed everything else.

_meta/ecs-migration.yml Show resolved Hide resolved
copy_to: true

- from: context.process.argv
to: process.args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I brought it up, but as this is actually not indexed, I think we should remove it again from here, or set alias=false.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be indexed in an ECS index.

Perhaps if APM doesn't make this useable this requirement can be relaxed? @ruflin WDYT?

Copy link
Member Author

@graphaelli graphaelli Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, removed altogether since it's a useless (though allowed, somewhat surprisingly) alias.

_meta/ecs-migration.yml Show resolved Hide resolved
# - name: socket.encrypted
# type: boolean
# - name: socket.remote_address
# type: keyword
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to also define the nested fields here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to so they'll show up in documentation but not as-is. Did you have other concerns besides docs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concerns, I was hoping we could add them.

searchable: false
doc_values: false

- name: status_code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an indexed field, same datatype as context.response.status_code that is aliased to http.response.status_code now. I think this should also be mapped to the same field, by copying it over instead of aliasing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I convinced myself that this field is metadata on a span and might actually be confusing if it's in http.response.status_code, as it's the response code from calling a remote service vs the response code we return from the traced service.

If you would still like to proceed, are you saying that in:
6.6 - put this value in context.response.status_code and context.http.status_code
7.0 - write it to http.response.status_code

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the ambiguity here. This is to track the response of an external call performed by the monitored app, correct?

I think I'd be inclined to still put it in the ECS field. After all, these events will mostly be looked at in the context of this one event stream. Might as well benefit from the streamlined field names.

But if you see a problem with this, we can hold off on the migration of this field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to track the response of an external call performed by the monitored app, correct?
correct

I don't have philosophical objections to this plan - my concern is mostly a practical issue of this being our only 1:many field. I agree that in 7.0.0 we can write this to http.response.status_code, after all. an APM event stream in the distributed tracing case is composed of multiple transactions that will each bring a possibly different http.response.status_code value. It's also not a big effort to write this to both places for spans, I just want to be sure we want that in 6.6 because I'm not sure what the benefit of that is, since we can't make that work for pre-6.6 indices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I am good with leaving as is for 6.x and reconsider for 7.0

model/ecs.go Show resolved Hide resolved
Conflicts:
	docs/data/elasticsearch/generated/errors.json
	docs/data/elasticsearch/generated/transactions.json
	processor/stream/approved-es-documents/testV2IntakeIntegrationErrors.approved.json
	processor/stream/approved-es-documents/testV2IntakeIntegrationTransactions.approved.json
	processor/stream/package_tests/error_attrs_test.go
	processor/stream/package_tests/transaction_attrs_test.go
	tests/fields.go
per feedback from @simitt, turns out context.process.argv is not indexed so alias does not make sense
Copy link

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a few more things

copy_to: true

- from: context.process.argv
to: process.args
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be indexed in an ECS index.

Perhaps if APM doesn't make this useable this requirement can be relaxed? @ruflin WDYT?

_meta/ecs-migration.yml Show resolved Hide resolved
to: client.ip

- from: context.user.user-agent
to: user_agent.original.text
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user_agent.original.text would be a multi-field, which is defined in the context of the "real" field, which is user_agent.original in this case.

I guess we're hitting a corner case here. The field (in terms of setting an alias) is moving to user_agent.original. I'm not sure it's possible to create an alias to a multi-field. We need to look into this.

But the usage side, if APM intends to use this as "what field does the query target", then yeah it's now user_agent.original.text.

cc @ruflin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can alias to a multi-field but not back from a multi-field (marked as a non-reversible alias)

_meta/ecs-migration.yml Show resolved Hide resolved
model/ecs.go Show resolved Hide resolved
@graphaelli
Copy link
Member Author

It should be indexed in an ECS index.
Perhaps if APM doesn't make this useable this requirement can be relaxed? @ruflin WDYT?

We've decided it's not worth indexing this field - it's often huge (think java, where it's not even captured by default) and uninteresting for queries - other fields are better suited for search/aggs, like process.name, and then args are handy metadata per document.

@graphaelli
Copy link
Member Author

After further consideration of elastic/beats#9412 (comment), removing the context.tags -> labels copy. There is still time to restore it, even behind a flag, if desired.

@simitt
Copy link
Contributor

simitt commented Dec 17, 2018

@graphaelli I don't quite understand what the problem with copying context.tags to labels is based on customer defined keys. I though we would copy all fields nested under `context.tags.

Apart from that, this PR LGTM.

@graphaelli
Copy link
Member Author

The main concern is that it may surprisingly take a significant amount of space. I'd like to merge this and follow up after we discuss a bit further on a separate issue.

@graphaelli graphaelli merged commit 7cae5b1 into elastic:6.x Dec 17, 2018
@graphaelli graphaelli deleted the ecs-6.x branch December 17, 2018 14:05
@roncohen
Copy link
Contributor

Exciting! Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants