[DISCUSS] Documentation: ingestion process #16914

pkovac2 · 2024-12-27T13:06:24Z

Ingest process information missing

Hey all,

recently we have deployed multiple instances of OpenSearch via K8S operator . We are fairly new to OS so we're trying to understand how the things work under the hood, so that we can investigate properly in case of problems. At this moment we're trying to understand how the data ingestion process works internally in the OpenSearch. Unfortunately there's literally nothing in the OS documentation how the data ingestion process is handled in detail. Our current setup looks like below (we don't use Data Prepper):

FluentBit -> OpenSearch ingest service LB -> Dedicated ingest nodes -> ?? (master-> data nodes)

What we are trying to understand is:

Do we need dedicated ingest nodes if there's no ingest pipeline configured? Based on the docs, dedicated ingest nodes are only useful to run ingest pipelines, is there anything else the dedicated ingest nodes do?
How is the ingest process working in general? Can this be described in detail in the official documentation? We'd like to understand the whole process, let's say we have an OS cluster where we have:

3 dedicated cluster manager nodes
3 dedicated ingest nodes
x dedicated datanodes

How does the ingest flow look like once the data is received? I'd assume it's something like:
ingest node -> ask master node which data node(s) to use -> data node(s).

But this is not described in the documentation at all (among with who and how is it decided which data nodes to use).

Also based on the official docs, each node is a coordinating node unless dedicated coordinating nodes are specified. How to measure / based on what to decide whether dedicated coordinating node is necessary?

I think we can consider this issue as documentation request type as well.

Many thanks!

Related component

Other

To Reproduce

visit the https://opensearch.org/docs/latest/
search for ingestion process related docs e.g https://opensearch.org/docs/latest/observing-your-data/log-ingestion/
no detailed information is found

Expected behavior

visit the https://opensearch.org/docs/latest/
ingestion process is documented in detail

Additional Details

No response

kkewwei · 2025-01-05T06:30:12Z

@pkovac2.

If there's no ingest pipeline configured, ingest nodes is not needed.
Ingest process working worked as follows:

Coordinated receive the doc with pipeline, it will send request to the request to ingest node.
Ingest node resolve the pipe and create the new request.
Ingest node will classify the documents according to shardId and then send the request out.

krisfreedain · 2025-01-27T17:25:59Z

Catch All Triage - 1, 2, 3

pkovac2 added bug Something isn't working untriaged labels Dec 27, 2024

github-actions bot added the Other label Dec 27, 2024

krisfreedain removed the untriaged label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Documentation: ingestion process #16914

[DISCUSS] Documentation: ingestion process #16914

pkovac2 commented Dec 27, 2024 •

edited

Loading

kkewwei commented Jan 5, 2025 •

edited

Loading

krisfreedain commented Jan 27, 2025

[DISCUSS] Documentation: ingestion process #16914

[DISCUSS] Documentation: ingestion process #16914

Comments

pkovac2 commented Dec 27, 2024 • edited Loading

Ingest process information missing

Related component

To Reproduce

Expected behavior

Additional Details

kkewwei commented Jan 5, 2025 • edited Loading

krisfreedain commented Jan 27, 2025

pkovac2 commented Dec 27, 2024 •

edited

Loading

kkewwei commented Jan 5, 2025 •

edited

Loading