Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honor default_pipeline on scripted _index rewrites #42019

Open
peterpramb opened this issue May 9, 2019 · 9 comments
Open

Honor default_pipeline on scripted _index rewrites #42019

peterpramb opened this issue May 9, 2019 · 9 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team

Comments

@peterpramb
Copy link

Describe the feature:

Ingest pipelines allow for rewriting _index using the script processor (and possibly others), dynamically dispatching documents to different indices. Unfortunately, while index templates are considered for the new target index, a defined default_pipeline is not executed.

It would allow for more flexible pipeline chaining just to add new index templates when needed instead of updating the central ingest pipeline every time with new target pipelines. Another use case would be to specify only an additional pipeline for some indices and none for others.

It will be the responsibility of the user to prevent any circular loops, though.

@peterpramb
Copy link
Author

Consider the following example:

Events are ingested to the index event-ingest, which has a default pipeline set in its index template. The pipeline examines event.type (which contains the originating software) and reroutes the event to the index event-<event.type>. That one might now have another default pipeline set in its index template when further processing is needed (and potentially another index rewrite), or none if no further processing is needed.

No need to update the central ingest pipeline and put a long list of conditional pipeline processors there, the flow is only controlled by index templates.

But as already mentioned - It will be the responsibility of the user to prevent any circular loops in such a setup.

@peterpramb
Copy link
Author

That should really be service.type, sorry...

@jakelandis jakelandis added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label May 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@jakelandis jakelandis added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed :Data Management/ILM+SLM Index and Snapshot lifecycle management labels May 10, 2019
@jakelandis
Copy link
Contributor

@peterpramb - I believe that this #39607 (as of 6.7) addresses your request. Can you try your test case out on 6.7+ and if it still doesn't work can you provide a reproduction scenario ?

@peterpramb
Copy link
Author

peterpramb commented May 12, 2019

Unfortunately I'm at 6.7.1.

Here is a simple test case:

  1. Ingest pipeline
  • Pipeline
PUT /_ingest/pipeline/testing-ingest-pipeline
{
    "description": "Chained pipelines via index templates (#42019)",
    "processors": [
        {
            "append": {
                "field": "pipeline_set",
                "value": "ingest"
            }
        },
        {
            "script": {
                "lang": "painless",
                "source": "ctx._index = 'testing-index-chained';"
            }
        }
    ],
    "version": 20190512
}
  • Template
PUT /_template/testing-ingest-template
{
    "index_patterns": [
        "testing-index-ingest"
    ],
    "version": 20190512,
    "order": 0,
    "settings": {
        "index": {
            "default_pipeline": "testing-ingest-pipeline"
        }
    }
}
  1. Chained pipeline
  • Pipeline
PUT /_ingest/pipeline/testing-chained-pipeline
{
    "description": "Chained pipelines via index templates (#42019)",
    "processors": [
        {
            "append": {
                "field": "pipeline_set",
                "value": "chained"
            }
        },
        {
            "script": {
                "lang": "painless",
                "source": "ctx._index = 'testing-index-final';"
            }
        }
    ],
    "version": 20190512
}
  • Template
PUT /_template/testing-chained-template
{
    "index_patterns": [
        "testing-index-chained"
    ],
    "version": 20190512,
    "order": 0,
    "settings": {
        "index": {
            "default_pipeline": "testing-chained-pipeline"
        }
    }
}
  1. Testing
  • Ingest document
POST /testing-index-ingest/_doc/
{
    "field": "value"
}
  • Result
{
    "_id": "GWvEq2oBkBB05GCNyfzM",
    "_index": "testing-index-chained",
    "_primary_term": 1,
    "_seq_no": 0,
    "_shards": {
        "failed": 0,
        "successful": 2,
        "total": 2
    },
    "_type": "_doc",
    "_version": 1,
    "result": "created"
}
  • Retrieve document
GET /testing-index-chained/_doc/GWvEq2oBkBB05GCNyfzM
{
    "_id": "GWvEq2oBkBB05GCNyfzM",
    "_index": "testing-index-chained",
    "_primary_term": 1,
    "_seq_no": 0,
    "_source": {
        "field": "value",
        "pipeline_set": [
            "ingest"
        ]
    },
    "_type": "_doc",
    "_version": 1,
    "found": true
}

@peterpramb
Copy link
Author

peterpramb commented May 12, 2019

And this is the resulting index:

health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   testing-index-chained Lou5CsZ3R7iImP9TF0qdYg   2   1          1            0      8.7kb          4.3kb

What should be testing-index-final instead.

@peterpramb
Copy link
Author

The Elasticsearch version:

Version: 6.7.1, Build: default/tar/2f32220/2019-04-02T15:59:27.961366Z, JVM: 1.8.0_202

@peterpramb
Copy link
Author

Still not working in 7.2.0, chained pipelines are simply ignored.

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

6 participants