Skip to content

[BUG] Inconsistent behaviour for date field in nested collection queries #1545

@parked-toes

Description

@parked-toes

What is the bug?
When a query contains nested collection (self join) it returns different data and datatype, compared to a "simple" query.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create mapping:
PUT datetype_test
{
  "mappings": {
    "properties": {
      "dateAsDate": {
        "type": "date"
      },
      "longAsDate": {
        "type": "date"
      },      
      "id": {
        "type": "long"
      },
      "projects": {
        "type": "nested", 
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}

Note that both date fields are mapped as date, projects is nested

  1. Index data:
POST datetype_test/_bulk?refresh
{"index":{"_id":"1"}}
{"id":3,"dateAsDate":"2023-01-01T00:00:00", "longAsDate": 1672531200000, "projects":[{"name":"SQL Spectrum querying"},{"name":"SQL security"},{"name":"OpenSearch security"}]}
{"index":{"_id":"2"}}
{"id":4,"dateAsDate":"2024-01-01T00:00:00", "longAsDate": 1704067200000, "projects":[]}
{"index":{"_id":"3"}}
{"id":6,"dateAsDate":"2025-01-01T00:00:00","longAsDate": 1735689600000, "projects":[{"name":"SQL security"},{"name":"Hello security"}]}
  1. Query date fields using simple query:
SELECT t.dateAsDate,
       t.longAsDate
FROM datetype_test AS t

Result:

{
  "schema": [
    {
      "name": "dateAsDate",
      "type": "timestamp"
    },
    {
      "name": "longAsDate",
      "type": "timestamp"
    }
  ],
  "datarows": [
    [
      "2023-01-01 00:00:00",
      "2023-01-01 00:00:00"
    ],
    [
      "2024-01-01 00:00:00",
      "2024-01-01 00:00:00"
    ],
    [
      "2025-01-01 00:00:00",
      "2025-01-01 00:00:00"
    ]
  ],
  "total": 3,
  "size": 3,
  "status": 200
}

Query the same date fields using nested collection query:

SELECT t.dateAsDate,
       t.longAsDate
FROM datetype_test AS t,
       t.projects AS p
{
  "schema": [
    {
      "name": "dateAsDate",
      "type": "date"
    },
    {
      "name": "longAsDate",
      "type": "date"
    }
  ],
  "total": 3,
  "datarows": [
    [
      "2023-01-01 00:00:00.000",
      1672531200000
    ],
    [
      "2024-01-01 00:00:00.000",
      1704067200000
    ],
    [
      "2025-01-01 00:00:00.000",
      1735689600000
    ]
  ],
  "size": 3,
  "status": 200
}

Note that there is a type change: timestamp -> date in schema and inconsistent resultset field formats.

Moreover, both require inconsistent date format in WHERE clause:

WHERE t.dateAsDate > "2023-05-01T00:00:00Z"

vs

WHERE t.dateAsDate > "2023-05-01 00:00:00"

What is the expected behavior?
All queries return date type in schema and "2025-01-01 00:00:00.000" like format for both date fields as defined in mapping.

What is your host/environment?
2.6.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions