Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding a better name for url.hostname, source.hostname and destination.hostname #166

Closed
webmat opened this issue Nov 2, 2018 · 10 comments

Comments

@webmat
Copy link
Contributor

webmat commented Nov 2, 2018

As most of you have realized, using the word "hostname" for two different purposes in ECS is bugging me. I'm very happy of the resistance I've faced when bringing this up so far, because it has forced me to dig deeper and figure this out more precisely. Hopefully this helps me make a better case for why in one of these two cases, "hostname" is incorrect.

Two usages of hostname in ECS

Let's review the two different uses of "hostname" currently in ECS:

  • "hostname" as a server or device name, as given by administrators. Or the value you get when you run the command hostname on a host/device.
    • This use I agree 100% with.
    • Currently used this way in device.hostname and host.hostname.
    • To complicate the following discussion, some network devices default to returning their main IP address, when they don't have a textual hostname configured.
  • "hostname" as a network address, which may be an IP or a domain name.
    • This is the use I object with, and what I want to address here. From most sources I've seen, this usage of "hostname" is incorrect, it should be "host".
    • Currently used this way in source.hostname, destination.hostname and url.hostname
    • If some of us thought this was meant to be a place to populate with the hostname of a host under management (e.g. via enrichment):
      1. This confirms that this is an ambiguous name, because that's not the actual purpose. The purpose is to store an address prior to having determined whether it's an IP or a name.
      1. Enrichment with known host details -- if someone wanted to do that -- would be done more accurately by nesting a subset of the host fields in there instead anyway (e.g. host.id, host.hostname).

What's wrong with hostname?

In the RFCs I'm aware of, when mentioning "a network address, which may be an IP or a domain name", they all refer to that as a "host", not a "hostname". They sometimes use the word "hostname" (but not always) to refer to an address that's a registered name (such as a domain name or a local DNS entry).

This is not a recent change either, the oldest RFC I'm linking to below is from 1994, and is using "host" to mean the "ip or name" concept. Here's a few excerpts:

The fact that they are sometimes mentioned in the same places or even in some cases used interchangeably in tool documentation (we're not the only ones facing this mixup) helps explain why the two are often mixed up. I haven't looked farther back in the RFCs, so that mixup may indeed come from there.

However as @MikePaquette pointed out, using the name "host" in these places would conflict with our top level field set "host". Even if "host" is not currently a reuseable object, I do think it could become one, and the most obvious places I would expect to nest it is at source.host and destination.host.

So I agree we should not rename these 3 fields to *.host.

Here are some suggestions for new field names, based on what I've seen in the various server documentation tools and the RFCs. Remember that these proposed renames apply only to source, destination and url, not to host and device:

  • host_address
  • uri_host or url_host
  • Drop hostname from source, destination and url.
    • In most cases people will want to store the value in their proper field (ip or domain), after determining which type it is. Them wanting to keep the ambiguous field anyway could be considered a use case, and they're free to name this field however they want, if it's not in ECS.
    • Independent from subpoint above, we could also work on finding the right name for this field after Beta1, and just drop the field out of ECS temporarily for the Beta1 release.

I'm open to other suggestions, of course.

@webmat webmat added the discuss label Nov 2, 2018
@webmat webmat changed the title Should we replace url.hostname? Finding a better name for url.hostname, source.hostname and destination.hostname Nov 2, 2018
@webmat
Copy link
Contributor Author

webmat commented Nov 2, 2018

@MikePaquette @ruflin @robgil @andrewkroh Ok, this proposal is now better fleshed out and ready to be discussed :-)

@vbohata
Copy link

vbohata commented Nov 5, 2018

We are currently using this in logstash: (?:%{IP:[server][host][ip]}|%{NOTSPACE:[server][host][name]}) which seems to make very clear to what is what.

@ruflin
Copy link
Member

ruflin commented Nov 5, 2018

@webmat Thanks for the detailed research on this issue. +1 on not renaming it to *.host because of potential confusion.

For the alternative names:

  • url.host_address: My first thought would be this is an IP address
  • url.url_host: Does not seem very nice. Is this host always coming from a url?

@vbohata Your server.host.name, can it contain also an ip?

Two alternatives:

  • source.name: We precent to have host in the name and make it clear it's whatever name we have for the source, it can be an ip, domain or even something else. This matches well with host.name.
  • Alternative we could go back to source.host.name?

Even though hostname is technically not 100% accurate, it's still an option for me.

@vbohata
Copy link

vbohata commented Nov 5, 2018

No, server.host.ip will contain IP if matches LS pattern. Otherwise server.host.name will contain host name. I do not want to mix these values as I use them later... for example for related field I am using.

@webmat
Copy link
Contributor Author

webmat commented Nov 5, 2018

@vbohata Yes, your approach actually does away with the ambiguity totally. What this field is here is a place to store the value in the ambiguous state. At the time where it's still unknown if it's an IP or a registered name.

It also makes me realize that under source and destination, we could actually get away with only IP and the domain fields, and leave hostname (or any other name for this field) out of ECS. Typically people will want to figure out with certainty whether this value is an IP or a name. So they should have some processing to determine that, as demonstrated above by @vbohata. If they want to keep the ambiguous value as well in their stream after the extraction is done, it's totally up to them, and perhaps this ambiguous field doesn't even belong into ECS. I'll add that above as one of the options.

@ruflin I have perhaps not said the following explicitly enough in the body of the issue. But the reason I really want to address this is to avoid the perpetual questions asking "what's the difference between hostname and the domain fields?".

The fact that the initial question (#84) came from a person very familiar with the space is a very good indicator to me that this is too ambiguous. When this gets out more broadly, I'm afraid that this will result in essentially the same question being asked over and over (not everyone pores over the GH issues before asking their own question).

@webmat
Copy link
Contributor Author

webmat commented Nov 5, 2018

@ruflin To answer your question about url.url_host, in this specific case, it would come from a url, as it's the one nested under url :-)

But you're very right that in the other two locations we're discussing, source and destination, perhaps the name "url_host" is incorrect. Because it may indeed not come from an actual URL. Could be from an email address or other bit of information that has a host.

@webmat webmat self-assigned this Nov 5, 2018
@MikePaquette
Copy link
Contributor

@webmat I like your suggestion in #166 (comment) best -

It also makes me realize that under source and destination, we could actually get away with only IP and the domain fields, and leave hostname (or any other name for this field) out of ECS.

I am +1 to remove source.hostname and destination.hostname from ECS altogether.

Then we'd have these anticipated common mappings:

  • bro dns.log query -> ecs destination.domain
  • bro http.log host-> ecs destination.domain
  • cef dhost -> ecs destination.domain
  • cef shost -> ecs source.domain

Also, I'm still not sure why we need the additional field for URL. Is there a good example where the URL host field could not /would not map to ecs destination.domain as well ?

@webmat
Copy link
Contributor Author

webmat commented Nov 6, 2018

@MikePaquette Technically, the URL's host can be an IP or a name. So that's also a place where the ambiguous concept of "host" is useful.

This makes me realize that in the current state, url only has url.host.name, and no place to store an IP. So if we want to perform the same cleanup as under source and destination for URL, we may need to add an IP field. This feels a bit weird. But since this is an edge case, I think it's fine if we only define url.domain for now.

So in total the overall proposal here is:

  • Remove source.hostname
  • Remove destination.hostname
  • Remove url.host.name
  • Add url.domain

Note: source and destination already have a domain field, but URL doesn't have it yet.

@MikePaquette
Copy link
Contributor

@webmat thanks, LGTM.

BTW, I will make a case for defining a related.domain when we get to discuss #67 which would be an array containing a copy of whatever we populate source.domain, destination.domain, and url.domain with.

@webmat
Copy link
Contributor Author

webmat commented Nov 6, 2018

@ruflin Would you be good for this version of the proposal for Beta1? #166 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants