Skip to content

[Doris On ES] [Bug] Error handling for date field with docvalue and inconsistent result for source and docvalue #3936

@wuyunfeng

Description

@wuyunfeng
  1. Before ES 5.x request docvalue fields for date field, the Millisecond timestamp would returned for client, but after 6.4.0 all field which fetched from docvalue would be formatted , this also can be found Elasticsearch issue: Option to allow docvalue_fields of date types to return ms elastic/elasticsearch#27740, the format for date default is just 2020-06-18T12:10:30.000Z.

  2. If we index some date field with second timestamp , and query table with enable_docvalue_scan true and false, users would get inconsistent result because the different processing logic for date.

ES documents:

       {
            "_index": "timestamp_test",
            "_type": "doc",
            "_id": "AXLbzdJY516Vuc7SL51m",
            "_score": 1,
            "_source": {
               "k1": "2020-6-25"
            }
         },
         {
            "_index": "timestamp_test",
            "_type": "doc",
            "_id": "AXLbzddn516Vuc7SL51n",
            "_score": 1,
            "_source": {
               "k1": 1592816393000
            }
         }
CREATE EXTERNAL TABLE `timestamp_source` (
  `k1` date NULL COMMENT ""
) ENGINE=ELASTICSEARCH

a. create "enable_docvalue_scan" = "false" table :

For ES 5.5:

mysql> select k1 from timestamp_source;
+------------+
| k1         |
+------------+
| 2020-06-25 | // origin _source value: "2020-6-25"       ----- right
| d444-05-05 | // origin _source value: 1593043200000 (millisecond)   ----- wrong should divided by 1000
+------------+

For ES 6.5 or above:

mysql> select * from timestamp_source;
+------------+
| k1         |
+------------+
| 2020-06-25 | // origin _source value: "2020-6-25"       ----- right
| d444-05-05 | // origin _source value: 1593043200000 (millisecond)   ----- wrong should divided by 1000
+------------+

b. a. create "enable_docvalue_scan" = "true" table :

For ES 5.5:

mysql> select k1 from timestamp_dv; 
+------------+
| k1         |
+------------+
| d451-07-13 |   // 1593043200000   ----- wrong, should divided by 1000
| d444-05-05 | //  1592816393000   ----- wrong, should divided by 1000 
+------------+

For ES 6.5 or above:

mysql> select * from timestamp_dv; 
+------------+
| k1         |
+------------+
| 1970-01-01 | //  value: "2020-06-25T00:00:00.000Z"  ----- wrong GetInt64(value) = 24
| 1970-01-01 | // value: "2020-06-22T08:59:53.000Z"   ----- wrong GetInt64(value) = 24
+------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions