Skip to content

[BUG] Grok processor only extracts the first matched value for repeated field name #18790

@gaobinlong

Description

@gaobinlong

Describe the bug

When there're repeated field name defined in the grok pattern, grok processor only extracts the first matched value and drop all other values, which is not expected.

Related component

Indexing

To Reproduce

Execute simulate ingest pipeline API

curl -XPOST "http://localhost:9200/_ingest/pipeline/_simulate" -H 'Content-Type: application/json' -d'
{
  "pipeline": {
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "%{IP:client_ip} %{USER:ident} %{USER:auth} %{HTTPDATE:timestamp} \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}\" %{NUMBER:response} %{NUMBER:bytes} \"%{DATA:referrer}\" \"%{DATA:user_agent}\" \"%{IP:x_forwarded_for}, %{IP:x_forwarded_for}, %{IP:x_forwarded_for}\""
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "test",
      "_source": {
        "message": "203.0.113.57 - - 17/Jul/2025:13:55:23 +0000 \"GX /index.html HTTP/1.1\" 200 654 \"-\" \"Mozilla/5.0\" \"198.51.100.1, 203.0.113.5, 10.0.0.2\""
      }
    }
  ]
}'

Check the result, you can see that the field x_forwarded_for only includes the first matched value 198.51.100.1, another two matched values are dropped.

{
  "docs": [
    {
      "doc": {
        "_index": "test",
        "_id": "_id",
        "_source": {
          "request": "/index.html",
          "method": "GX",
          "auth": "-",
          "ident": "-",
          "http_version": "1.1",
          "message": "203.0.113.57 - - 17/Jul/2025:13:55:23 +0000 \"GX /index.html HTTP/1.1\" 200 654 \"-\" \"Mozilla/5.0\" \"198.51.100.1, 203.0.113.5, 10.0.0.2\"",
          "x_forwarded_for": "198.51.100.1",
          "referrer": "-",
          "response": "200",
          "bytes": "654",
          "client_ip": "203.0.113.57",
          "user_agent": "Mozilla/5.0",
          "timestamp": "17/Jul/2025:13:55:23 +0000"
        },
        "_ingest": {
          "timestamp": "2025-07-18T02:53:29.016554Z"
        }
      }
    }
  ]
}

Expected behavior

For the above case, x_forwarded_for should be in array format, includes all the three matched values, like this:

"x_forwarded_for": [
            "198.51.100.1",
            "203.0.113.5",
            "10.0.0.2"
          ],

Additional Details

OpenSearch 3.1.0 and earlier versions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingIndexing, Bulk Indexing and anything related to indexingbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions