Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First take on a comprehensive ingest guide #1373

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/en/ingest-arch/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ include::8-ls-input.asciidoc[]

include::99-airgapped.asciidoc[]

include::../ingest-guide/index.asciidoc[]

// === Next set of architectures
// include::3-schemamod.asciidoc[]
// include::6b-filebeat-es.asciidoc[]
Expand Down
18 changes: 18 additions & 0 deletions docs/en/ingest-guide/index.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
include::{docs-root}/shared/versions/stack/{source_branch}.asciidoc[]
include::{docs-root}/shared/attributes.asciidoc[]

:doctype: book

[[ingest-guide]]
= Elastic Ingest Guide

include::ingest-intro.asciidoc[]
include::ingest-tools.asciidoc[]
include::ingest-static.asciidoc[]
include::ingest-timestamped.asciidoc[]
include::ingest-solutions.asciidoc[]
include::ingest-faq.asciidoc[]

//include:: Prereqs (for using data after ingest)
//include:: Migration for ingest
//include:: Troubleshooting
77 changes: 77 additions & 0 deletions docs/en/ingest-guide/ingest-faq.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
[[ingest-faq]]
== Frequently Asked Questions

Q: What Elastic products and tools are available for ingesting data into Elasticsearch.

Q: What's the best option for ingesting data?

Q: What's the role of Logstash `filter-elastic-integration`?



.WORK IN PROGRESS
****
Temporary parking lot to capture outstanding questions and notes.
****



Also cover (here or in general outline):

- https://www.elastic.co/guide/en/kibana/master/connect-to-elasticsearch.html#_add_sample_data[Sample data]
- OTel
- Beats
- Use case: GeoIP
- Airgapped
- Place for table, also adding use case + products (Exp: Logstash for multi-tenant)
- Role of LS in general content use cases



[discrete]
=== Questions to answer:

* Messaging for data sources that don't have an integration
- We're deemphasizing beats in preparation for deprecation
- We're not quite there with OTel yet
* How should we handle this in the near term?
Probably doesn't make sense to either ignore or jump them straight to Logstash

* Should we mention Fleet and Stand-alone agent?
** If so, when, where, and how?
* How does this relate to Ingest Architectures
* Enrichment for general content

* How to message current vs. desired state.
Especially Beats and OTel.
* HOW TO MESSAGE OTel - Current state. Future state.
* Consistent use of terminology vs. matching users' vocabulary (keywords)

[discrete]
==== Random

* DocsV3 - need for a sheltered space to develop new content
** Related: https://github.com/elastic/docsmobile/issues/708
** Need a place to incubate a new doc (previews, links, etc.)
** Refine messaging in private


[discrete]
=== Other resources to use, reference, reconcile

* Timeseries decision tree (needs updates)
* PM's video
** Needs an update. (We might relocate content before updating.)
* PM's product table
** Needs an update.(We might relocate content before updating.)
** Focuses on Agent over integrations.
** Same link text resolves to different locations.
** Proposal: Harvest the good and possibly repurpose the table format.
* Ingest Reference architectures
* Linkable content such as beats? Solutions ingest resources?

* https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-guides.html[Starting with the Elastic Platform and Solutions]
* https://www.elastic.co/guide/en/observability/current/observability-get-started.html[Get started with Elastic Observability]
* https://www.elastic.co/guide/en/security/current/ingest-data.html[Ingest data into Elastic Security]
*

42 changes: 42 additions & 0 deletions docs/en/ingest-guide/ingest-intro.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
[discrete]
[[ingest-intro]]
== Ingesting data into {es}

Bring your data!
Whether you call it _adding_, _indexing_, or _ingesting_ data, you have to get
the data into {es} before you can search it, visualize it, and use it for insights.

Our ingest tools are flexible, and support a wide range of scenarios.
We can help you with everything from popular and straightforward use cases, all
the way to advanced use cases that require additional processing in order to modify or
reshape your data before sending it to {es}.

You can ingest:

* **General content** (data without timestamps), such as HTML pages, catalogs, and files
* **Timestamped (time series) data**, such as logs, metrics and traces for Search, Security, Observability, or your own solution

[ingest-best-approach]
.What's the best approach for ingesting data?
****
The best choice for ingesting data is the _simplest option_ that _meets your needs_ and _satisfies your use case_.

**General content**. Choose the ingest tool that aligns with your data source.

* To index **documents** directly into {es}, use the {es} document APIs.
* To send **application data** directly to {es}, use an Elastic language client.
* To index **web page content**, use the Elastic web crawler.
* To sync **data from third-party sources**, use connectors.
* To index **single files** for testing, use the Kibana file uploader.

If you would like to play around before you add your own data, try using our {kibana-ref}/connect-to-elasticsearch.html#_add_sample_data[sample data].

**Timestamped data**. Start with {fleet-guide}[Elastic Agent] and one of the hundreds of {integrations-docs}[Elastic integrations] that are available.
Integrations are available for many popular platforms and services, and are a good place to start.
Check out the {integrations-docs}/all_integrations[Integration quick reference] to search for available integrations.
If you don't find an integration for your data source, or if you need additional processing, we still have you covered.
****




47 changes: 47 additions & 0 deletions docs/en/ingest-guide/ingest-solutions.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
[[ingest-for-solutions]]
== Ingesting data for Elastic solutions

.WORK IN PROGRESS
****
For MVP: Add descriptions and links to existing solutions content.
****

[discrete]
[[ingest-for-obs]]
=== Observability
Monitor and gain insights

* Logs
* Metrics
* Application traces

Add links to targeted ingest resources in Observability docs

[discrete]
[[ingest-for-security]]
=== Security
Detect and respond to threats

* Logs
* Metrics
* SIEM
* Endpoint
* Files



https://www.elastic.co/guide/en/security/current/ingest-data.html[Ingest data into Elastic Security]:

* Elastic Agent + integrations (spotlight Defend integration)
* Beats
* Elastic Agent from Splunk
* Third-party collectors + ECS


[discrete]
[[ingest-for-search]]
== Search

* https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html[Adding data with APIs]
* https://www.elastic.co/guide/en/fleet/current/beats-agent-comparison.html#additional-capabilities-beats-and-agent[Elasticsearch ingest pipelines]

39 changes: 39 additions & 0 deletions docs/en/ingest-guide/ingest-static.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
[[intro-general]]
== Ingesting general content

Describe general content (non-timestamped)and give examples.

.WORK IN PROGRESS
****
Progressive disclosure: Start with basic use cases and work up to advanced processing

Possibly repurpose and use ingest decision tree with Beats removed?
****

[discrete]
=== Basic use cases

* {es} document APIs for documents.
* Elastic language clients for application data.
* Elastic web crawler for web page content.
* Connectors for data from third-party sources, such as Slack, etc.
* Kibana file uploader for individual files.
* LOGSTASH???
** ToDO: Check out Logstash enterprisesearch-integration

* To index **documents** directly into {es}, use the {es} document APIs.
* To send **application data** directly to {es}, use an Elastic language client.
* To index **web page content**, use the Elastic web crawler.
* To sync **data from third-party sources**, use connectors.
* To index **single files** for testing, use the Kibana file uploader.

[discrete]
=== Advanced use cases: Data enrichment and transformation

Tools for enriching ingested data:

- Logstash - GEOIP enrichment. Other examples?
** Use enterprisesearch input -> Filter(s) -> ES or enterprisesearch output
- What else?


104 changes: 104 additions & 0 deletions docs/en/ingest-guide/ingest-timestamped.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
[[intro-timeseries]]
== Ingesting timeseries data

.WORK IN PROGRESS
****
Progressive disclosure: Start with basic use cases and work up to advanced processing

Possibly repurpose and use ingest decision tree with Beats removed?
****

Timestamped data:
The preferred way to index timestamped data is to use Elastic Agent. Elastic Agent is a single, unified way to add monitoring for logs, metrics, and other types of data to a host. It can also protect hosts from security threats, query data from operating systems, and forward data from remote services or hardware. Each Elastic Agent based integration includes default ingestion rules, dashboards, and visualizations to start analyzing your data right away. Fleet Management enables you to centrally manage all of your deployed Elastic Agents from Kibana.

If no Elastic Agent integration is available for your data source, use Beats to collect your data. Beats are data shippers designed to collect and ship a particular type of data from a server. You install a separate Beat for each type of data to collect. Modules that provide default configurations, Elasticsearch ingest pipeline definitions, and Kibana dashboards are available for some Beats, such as Filebeat and Metricbeat. No Fleet management capabilities are provided for Beats.

If neither Elastic Agent or Beats supports your data source, use Logstash. Logstash is an open source data collection engine with real-time pipelining capabilities that supports a wide variety of data sources. You might also use Logstash to persist incoming data to ensure data is not lost if there’s an ingestion spike, or if you need to send the data to multiple destinations.

---> Basic diagram

[discrete]
=== Basic use case: Integrations to ES

Reiterate Integrations as basic ingest use case

ToDo: evaluate terminology (basic???)


[discrete]
=== Advanced use case: Integration to Logstash to ES

Highlight logstash-filter-elastic_agent capabilities


[discrete]
=== Other advanced use cases (from decision tree)

* Agent + Agent processors???
* Agent + Runtime fields???



// CONTENT LIFTED FROM former `TOOLS` topic


[discrete]
=== Elastic agent and Elastic integrations
The best choice for ingesting data is the _simplest option_ that _meets your needs_ and _satisfies your use case_.
For many popular ingest scenarios, the best option is Elastic agent and Elastic integrations.

* Elastic agent installed on the endpoints where you want to collect data.
Elastic Agent collects the data from one or more endpoints, and forwards the data to the service or location where is used.
* An Elastic integration to receive that data from agents

TIP: Start here!
Elastic Agent for data collection paired with Elastic integrations is the best ingest option for most use cases.


[discrete]
=== OTel
Coming on strong. Where are we now, and cautiously explain where we're going in the near term.

Open Telemetry is a leader for collecting Observability data

Elastic is a supporting member.
We're contributing to the OTel project, and are using elastic/opentelemetry for specialized development not applicable to upstream.

* https://www.elastic.co/guide/en/observability/current/apm-open-telemetry.html

Contributing to upstream and doing our on for work specific to Elastic
* https://github.com/open-telemetry/opentelemetry-collector-contrib
* https://github.com/elastic/opentelemetry

[discrete]
=== Logstash

{ls} is an open source data collection engine with real-time pipelining capabilities.
It supports a wide variety of data sources, and can dynamically unify data from disparate sources and normalize the data into destinations of your choice.

{ls} can collect data using a variety of {ls} input plugins, enrich and transform the data with {ls} filter plugins, and output the data to {es} and other destinations using the {ls} output plugins.

You can use Logstash to extend Beats for advanced use cases, such as data routed to multiple destinations or when you need to make your data persistent.

* {ls} input for when no integration is available
* {ls} integrations filter for advanced processing

TIP:

If an integration is available for your datasource, start with Elastic Agent + integration.

Use Logstash if there's no integration for your data source or for advanced processing:

Use {ls} when:

* no integration (use Logstash input)
* an Elastic integration exists, but you need advanced processing between the Elastic integration and {es}:

Advanced use cases solved by {ls}:

* {ls} for https://www.elastic.co/guide/en/ingest/current/ls-enrich.html[data enrichment] before sending data to {es}
* https://www.elastic.co/guide/en/ingest/current/lspq.html[{ls} Persistent Queue (PQ) for buffering]
* https://www.elastic.co/guide/en/ingest/current/ls-networkbridge.html[{ls} as a proxy] when there are network restrictions that prevent connections between Elastic Agent and {es}
* https://www.elastic.co/guide/en/ingest/current/ls-multi.html[{ls} for routing data to multiple {es} clusters and additional destinations]
* https://www.elastic.co/guide/en/ingest/current/agent-proxy.html[{ls} as a proxy]

Loading