WARNING: This is the master branch. The current release v1.0.0-beta2 can be found here.
The Elastic Common Schema (ECS) defines a common set of fields for ingesting data into Elasticsearch. A common schema helps you correlate data from sources like logs and metrics or IT operations analytics and security analytics.
ECS is still under development and backward compatibility is not guaranteed. Any feedback on the general structure, missing fields, or existing fields is appreciated. For contributions please read the Contributing Guide.
The master branch of this repository should never be considered an official release of ECS. You can browse official releases of ECS here.
Please note that when the README.md file and other generated files (like schema.csv and template.json) are not in agreement, the README.md should be considered the official spec. The other two files are simply provided as a convenience, and may not always be fully up to date.
ECS defines these fields.
- Base fields
- Agent fields
- Client fields
- Cloud fields
- Container fields
- Destination fields
- ECS fields
- Error fields
- Event fields
- File fields
- Geo fields
- Group fields
- Host fields
- HTTP fields
- Log fields
- Network fields
- Observer fields
- Organization fields
- Operating System fields
- Process fields
- Related fields
- Server fields
- Service fields
- Source fields
- URL fields
- User fields
- User agent fields
The base set contains all fields which are on the top level. These fields are common across all types of events.
The agent fields contain the data about the software entity, if any, that collects, detects, or observes events on a host, or takes measurements on a host. Examples include Beats. Agents may also run on observers. ECS agent.* fields shall be populated with details of the agent running on the host or observer where the event happened or the measurement was taken.
Examples: In the case of Beats for logs, the agent.name is filebeat. For APM, it is the agent running in the app/service. The agent information does not change if data is sent through queuing systems like Kafka, Redis, or processing systems such as Logstash or APM Server.
A client is defined as the initiator of a network connection for events regarding sessions, connections, or bidirectional flow records. For TCP events, the client is the initiator of the TCP connection that sends the SYN packet(s). For other protocols, the client is generally the initiator or requestor in the network transaction. Some systems use the term "originator" to refer the client in TCP connections. The client fields describe details about the system acting as the client in the network event. Client fields are usually populated in conjunction with server fields. Client fields are generally not populated for packet-level events.
Client / server representations can add semantic context to an exchange, which is helpful to visualize the data in certain situations. If your context falls in that category, you should still ensure that source and destination are filled appropriately.
Fields related to the cloud or infrastructure the events are coming from.
Examples: If Metricbeat is running on an EC2 host and fetches data from its host, the cloud info contains the data about this machine. If Metricbeat runs on a remote machine outside the cloud and fetches data from a service running in the cloud, the field contains cloud data from the machine the service is running on.
Container fields are used for meta information about the specific container that is the source of information. These fields help correlate data based containers from any runtime.
Destination fields describe details about the destination of a packet/event. Destination fields are usually populated in conjunction with source fields.
Meta-information specific to ECS.
These fields can represent errors of any kind. Use them for errors that happen while fetching events or in cases where the event itself contains an error.
Field | Description | Level | Type | Example |
---|---|---|---|---|
error.id | Unique identifier for the error. | core | keyword | |
error.message | Error message. | core | text | |
error.code | Error code describing the error. | core | keyword |
The event fields are used for context information about the log or metric event itself. A log is defined as an event containing details of something that happened. Log events must include the time at which the thing happened. Examples of log events include a process starting on a host, a network packet being sent from a source to a destination, or a network connection between a client and a server being initiated or closed. A metric is defined as an event containing one or more numerical or categorical measurements and the time at which the measurement was taken. Examples of metric events include memory pressure measured on a host, or vulnerabilities measured on a scanned host.
A file is defined as a set of information that has been created on, or has existed on a filesystem. File objects can be associated with host events, network events, and/or file events (e.g., those produced by File Integrity Monitoring [FIM] products or services). File fields provide details about the affected file associated with the event or metric.
Geo fields can carry data about a specific location related to an event or geo information derived from an IP field.
The geo
fields are expected to be nested at: client.geo
, destination.geo
, host.geo
, observer.geo
, server.geo
, source.geo
.
Note also that the geo
fields are not expected to be used directly at the top level.
The group fields are meant to represent groups that are relevant to the event.
The group
fields are expected to be nested at: user.group
.
Note also that the group
fields may be used directly at the top level.
Field | Description | Level | Type | Example |
---|---|---|---|---|
group.id | Unique identifier for the group on the system/platform. | extended | keyword | |
group.name | Name of the group. | extended | keyword |
A host is defined as a general computing instance. ECS host.* fields should be populated with details about the host on which the event happened, or on which the measurement was taken. Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes.
Fields related to HTTP activity.
Fields which are specific to log events.
The network is defined as the communication path over which a host or network event happens. The network.* fields should be populated with details about the network activity associated with an event.
Field | Description | Level | Type | Example |
---|---|---|---|---|
network.name | Name given by operators to sections of their network. | extended | keyword | Guest Wifi |
network.type | In the OSI Model this would be the Network Layer. ipv4, ipv6, ipsec, pim, etc The field value must be normalized to lowercase for querying. See "Lowercase Capitalization" in the "Implementing ECS" section. |
core | keyword | ipv4 |
network.iana_number | IANA Protocol Number (https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml). Standardized list of protocols. This aligns well with NetFlow and sFlow related logs which use the IANA Protocol Number. | extended | keyword | 6 |
network.transport | Same as network.iana_number, but instead using the Keyword name of the transport layer (udp, tcp, ipv6-icmp, etc.) The field value must be normalized to lowercase for querying. See "Lowercase Capitalization" in the "Implementing ECS" section. |
core | keyword | tcp |
network.application | A name given to an application. This can be arbitrarily assigned for things like microservices, but also apply to things like skype, icq, facebook, twitter. This would be used in situations where the vendor or service can be decoded such as from the source/dest IP owners, ports, or wire format. The field value must be normalized to lowercase for querying. See "Lowercase Capitalization" in the "Implementing ECS" section. |
extended | keyword | aim |
network.protocol | L7 Network protocol name. ex. http, lumberjack, transport protocol. The field value must be normalized to lowercase for querying. See "Lowercase Capitalization" in the "Implementing ECS" section. |
core | keyword | http |
network.direction | Direction of the network traffic. Recommended values are: * inbound * outbound * internal * external * unknown When mapping events from a host-based monitoring context, populate this field from the host's point of view. When mapping events from a network or perimeter-based monitoring context, populate this field from the point of view of your network perimeter. |
core | keyword | inbound |
network.forwarded_ip | Host IP address when the source IP address is the proxy. | core | ip | 192.1.1.2 |
network.community_id | A hash of source and destination IPs and ports, as well as the protocol used in a communication. This is a tool-agnostic standard to identify flows. Learn more at https://github.com/corelight/community-id-spec. |
extended | keyword | 1:hO+sN4H+MG5MY/8hIrXPqc4ZQz0= |
network.bytes | Total bytes transferred in both directions. If source.bytes and destination.bytes are known, network.bytes is their sum. |
core | long | 368 |
network.packets | Total packets transferred in both directions. If source.packets and destination.packets are known, network.packets is their sum. |
core | long | 24 |
An observer is defined as a special network, security, or application device used to detect, observe, or create network, security, or application-related events and metrics. This could be a custom hardware appliance or a server that has been configured to run special network, security, or application software. Examples include firewalls, intrusion detection/prevention systems, network monitoring sensors, web application firewalls, data loss prevention systems, and APM servers. The observer.* fields shall be populated with details of the system, if any, that detects, observes and/or creates a network, security, or application event or metric. Message queues and ETL components used in processing events or metrics are not considered observers in ECS.
The organization fields enrich data with information about the company or entity the data is associated with. These fields help you arrange or filter data stored in an index by one or multiple organizations.
Field | Description | Level | Type | Example |
---|---|---|---|---|
organization.name | Organization name. | extended | keyword | |
organization.id | Unique identifier for the organization. | extended | keyword |
The OS fields contain information about the operating system.
The os
fields are expected to be nested at: host.os
, observer.os
, user_agent.os
.
Note also that the os
fields are not expected to be used directly at the top level.
These fields contain information about a process. These fields can help you correlate metrics information with a process id/name from a log message. The process.pid
often stays in the metric itself and is copied to the global field for correlation.
This field set is meant to facilitate pivoting around a piece of data. Some pieces of information can be seen in many places in ECS. To facilitate searching for them, append values to their corresponding field in related.
. A concrete example is IP addresses, which can be under host, observer, source, destination, client, server, and network.forwarded_ip. If you append all IPs to related.ip
, you can then search for a given IP trivially, no matter where it appeared, by querying related.ip:a.b.c.d
.
Field | Description | Level | Type | Example |
---|---|---|---|---|
related.ip | All of the IPs seen on your event. | extended | ip |
A Server is defined as the responder in a network connection for events regarding sessions, connections, or bidirectional flow records. For TCP events, the server is the receiver of the initial SYN packet(s) of the TCP connection. For other protocols, the server is generally the responder in the network transaction. Some systems actually use the term "responder" to refer the server in TCP connections. The server fields describe details about the system acting as the server in the network event. Server fields are usually populated in conjunction with client fields. Server fields are generally not populated for packet-level events.
Client / server representations can add semantic context to an exchange, which is helpful to visualize the data in certain situations. If your context falls in that category, you should still ensure that source and destination are filled appropriately.
The service fields describe the service for or from which the data was collected. These fields help you find and correlate logs for a specific service and version.
Source fields describe details about the source of a packet/event. Source fields are usually populated in conjunction with destination fields.
URL fields provide a complete URL, with scheme, host, and path.
Field | Description | Level | Type | Example |
---|---|---|---|---|
url.original | Unmodified original url as seen in the event source. Note that in network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often just represented as a path. This field is meant to represent the URL as it was observed, complete or not. |
extended | keyword | https://www.elastic.co:443/search?q=elasticsearch#top or /search?q=elasticsearch |
url.full | If full URLs are important to your use case, they should be stored in url.full , whether this field is reconstructed or present in the event source. |
extended | keyword | https://www.elastic.co:443/search?q=elasticsearch#top |
url.scheme | Scheme of the request, such as "https". Note: The : is not part of the scheme. |
extended | keyword | https |
url.domain | Domain of the request, such as "www.elastic.co". In some cases a URL may refer to an IP and/or port directly, without a domain name. In this case, the IP address would go to the domain field. |
extended | keyword | www.elastic.co |
url.port | Port of the request, such as 443. | extended | integer | 443 |
url.path | Path of the request, such as "/search". | extended | keyword | |
url.query | The query field describes the query string of the request, such as "q=elasticsearch". The ? is excluded from the query string. If a URL contains no ? , there is no query field. If there is a ? but no query, the query field exists with an empty string. The exists query can be used to differentiate between the two cases. |
extended | keyword | |
url.fragment | Portion of the url after the # , such as "top".The # is not part of the fragment. |
extended | keyword | |
url.username | Username of the request. | extended | keyword | |
url.password | Password of the request. | extended | keyword |
The user fields describe information about the user that is relevant to the event. Fields can have one entry or multiple entries. If a user has more than one id, provide an array that includes all of them.
The user
fields are expected to be nested at: client.user
, destination.user
, host.user
, server.user
, source.user
.
Note also that the user
fields may be used directly at the top level.
The user_agent fields normally come from a browser request. They often show up in web service logs coming from the parsed user agent string.
These are example on how ECS fields can be used in different use cases. Most use cases not only contain ECS fields but additional fields which are not in ECS to describe the full use case. The fields which are not in ECS are in italic.
Contributions of additional uses cases on top of ECS are welcome.
ECS does not define the following field sets yet, but the following are expected in the future. Please avoid using them:
match.*
protocol.*
threat.*
vulnerability.*
- The document MUST have the
@timestamp
field. - The data type defined for an ECS field MUST be used.
- It SHOULD have the field
ecs.version
to define which version of ECS it uses. - As many fields as possible should be mapped to ECS.
Writing fields
- All fields must be lower case
- Combine words using underscore
- No special characters except
_
Naming fields
- Present tense. Use present tense unless field describes historical information.
- Singular or plural. Use singular and plural names properly to reflect the field content. For example, use
requests_per_sec
rather thanrequest_per_sec
. - General to specific. Organise the prefixes from general to specific to allow grouping fields into objects with a prefix like
host.*
. - Avoid repetition. Avoid stuttering of words. If part of the field name is already in the prefix, do not repeat it. Example:
host.host_ip
should behost.ip
. - Use prefixes. Fields must be prefixed except for the base fields. For example all
host
fields are prefixed withhost.
. Seedot
notation in FAQ for more details. - Do not use abbreviations. (A few exceptions like
ip
exist.)
In order to be help allow for correlation across different sources, ECS must sometimes enforce normalization on field values.
Some field descriptions mention they should be normalized to lowercase. Different approaches can be taken to accomplish this. The goal of requesting this is to avoid the same value appearing distinctly in aggregations, or avoid having to search for all capitalizations possible (e.g. IPV4, IPv4, ipv4).
The simplest implementation of this requirement is to lowercase the value before indexing in Elasticsearch. This can be done with a Logstash filter or an Ingest Node processor, for example. Another approach that satisfies the goal is to configure the keyword indexing of the field to use a normalize filter using the lowercase filter. The normalize filter leaves your data unmodified (the document still shows "IPv4", for example). However the value in the index will be lowercase. This satisfies the requirement of predictable querying and aggregation across data sources.
Elasticsearch can index text multiple ways:
- text indexing allows for full text search, or searching arbitrary words that are part of the field.
- keyword indexing allows for much faster exact match filtering, prefix search, and allows for aggregations (what Kibana visualizations are built on).
By default, unless your index mapping or index template specifies otherwise
(as the ECS index template does),
Elasticsearch indexes text field as text
at the canonical field name,
and indexes a second time as keyword
, nested in a multi-field.
Default Elasticsearch convention:
- Canonical field:
myfield
istext
- Multi-field:
myfield.keyword
iskeyword
For monitoring use cases, keyword
indexing is needed almost exclusively, with
full text search on very few fields. Given this premise, ECS defaults
all text indexing to keyword
at the top level (with very few exceptions).
Any use case that requires full text search indexing on additional fields
can simply add a multi-field
for full text search. Doing so does not conflict with ECS,
as the canonical field name will remain keyword
indexed.
ECS multi-field convention for text:
- Canonical field:
myfield
iskeyword
- Multi-field:
myfield.text
istext
The only exceptions to this convention are fields message
and error.message
,
which are indexed for full text search only, with no multi-field.
These two fields don't follow the new convention because they are deemed too big
of a breaking change with these two widely used fields in Beats.
Any future field that will be indexed for full text search in ECS will however
follow the multi-field convention where text
indexing is nested in the multi-field.
Despite the fact that IDs and codes (e.g. error codes) are often integers,
this is not always the case.
Since we want to make it possible to map as many systems and data sources
to ECS as possible, we default to using the keyword
type for IDs and codes.
Some specific kinds of codes are always integers, like HTTP status codes.
If those have a specific corresponding specific field (as HTTP status does),
its type can safely be an integer type.
But generic field like error.code
cannot have this guarantee, and are therefore keyword
.
The benefits to a user adopting these fields and names in their clusters are:
- Data correlation. Ability to easily correlate data from the same or different sources, including:
- data from metrics, logs, and apm
- data from the same machines/hosts
- data from the same service
- Ease of recall. Improved ability to remember commonly used field names (because there is a single set, not a set per data source)
- Ease of deduction. Improved ability to deduce field names (because the field naming follows a small number of rules with few exceptions)
- Reuse. Ability to re-use analysis content (searches, visualizations, dashboards, alerts, reports, and ML jobs) across multiple data sources
- Future proofing. Ability to use any future Elastic-provided analysis content in your environment without modifications
The rename processor can help you resolve field conflicts. For example, imagine that you already have a field called "user," but ECS employs user
as an object. You can use the rename processor on ingest time to rename your field to the matching ECS field. If your field does not match ECS, you can rename your field to user.value
instead.
Events may contain fields in addition to ECS fields. These fields can follow the ECS naming and writing rules, but this is not a requirement.
There are two common key formats for ingesting data into Elasticsearch:
- Dot notation:
user.firstname: Nicolas
,user.lastname: Ruflin
- Underline notation:
user_firstname: Nicolas
,user_lastname: Ruflin
For ECS we decided to use the dot notation. Here's some background on this decision.
Ingesting user.firstname: Nicolas
and user.lastname: Ruflin
is identical to ingesting the following JSON:
"user": {
"firstname": "Nicolas",
"lastname": "Ruflin"
}
In Elasticsearch, user
is represented as an object datatype. In the case of the underline notation, both are just string datatypes.
NOTE: ECS does not use nested datatypes, which are arrays of objects.
With dot notation, each prefix in Elasticsearch is an object. Each object can have parameters that control how fields inside the object are treated. In the context of ECS, for example, these parameters would allow you to disable dynamic property creation for certain prefixes.
Individual objects give you more flexibility on both the ingest and the event sides. In Elasticsearch, for example, you can use the remove processor to drop complete objects instead of selecting each key inside. You don't have to know ahead of time which keys will be in an object.
In Beats, you can simplify the creation of events. For example, you can treat each object as an object (or struct in Golang), which makes constructing and modifying each part of the final event easier.
In Elasticsearch, each key can only have one type. For example, if user
is an object
, you can't use it as a keyword
type in the same index, like {"user": "nicolas ruflin"}
. This restriction can be an issue in certain datasets. For the ECS data itself, this is not an issue because all fields are predefined.
Mixing the underline notation with the ECS dot notation is not a problem. As long as there are no conflicts, they can coexist in the same document.