Skip to content

Conversation

@BabySid
Copy link
Contributor

@BabySid BabySid commented May 7, 2020

#3479

Here I try to explain the cause of the problem and how to fix it.

The Cause of The problem
Take the case in issue(#3479 ) as an example:
The general results are as follows:

GET table/_doc/_search
{"query":{"match_all":{}},"stored_fields":"_none_","docvalue_fields":["k1"],"sort":["_doc"],"size":100}

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    ……
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "table",
        "_score": null,
        "sort": [
          0
        ]
      },
      {
        "_index": "table",
        "_score": null,
        "fields": {
          "k1": [
            "kkk1"
          ]
        },
        "sort": [
          0
        ]
      },
      {
        "_index": "table",
        "_score": null,
        "sort": [
          0
        ]
      }
    ]
  }
}

But in Doris on ES,Be fetched data parallelly on all shards, and use filter_path to reduce the network cost. The process will be as follows:

GET table/_doc/_search?preference=_shards:1&filter_path=_scroll_id,hits.hits._source,hits.total,_id,hits.hits._source.fields,hits.hits.fields
{"query":{"match_all":{}},"stored_fields":"_none_","docvalue_fields":["k1"],"sort":["_doc"],"size":100}

{
  "hits": {
    "total": 0
  }
}

GET table/_doc/_search?preference=_shards:2&filter_path=_scroll_id,hits.hits._source,hits.total,_id,hits.hits._source.fields,hits.hits.fields
{"query":{"match_all":{}},"stored_fields":"_none_","docvalue_fields":["k1"],"sort":["_doc"],"size":100}
{
  "hits": {
    "total": 1
  }
}

GET table/_doc/_search?preference=_shards:3&filter_path=_scroll_id,hits.hits._source,hits.total,_id,hits.hits._source.fields,hits.hits.fields
{"query":{"match_all":{}},"stored_fields":"_none_","docvalue_fields":["k1"],"sort":["_doc"],"size":100}
{
  "hits": {
    "total": 1,
    "hits": [
      {
        "fields": {
          "k1": [
            "kkk1"
          ]
        }
      }
    ]
  }
}

Scan-Worker On BE which processed result of shard2 will failed.

The reasons are as follows:

  1. "filter_path" causes the hits.hits object not exist.
  2. In the current implementation, if there are some data rows(total > 0), the hits.hits. object must be an array

How To Fix it

Two Method:

  1. modify "filter_path" to contain the hits.
    Pros: Fixed Code is very simple
    Cons: More network cost
  2. Deal with the case where fields are missing in a batch.
    Pros: No loss of performance
    Cons: Code is more complex

Performance first, I use Method2.

Design

  1. Add a variable "_doc_value_mode" into Class "EsScrollParser" to =indicate whether the data processed by this parser is doc_value_mode or not.
  2. "_doc_value_mode" is passed from ESScollReader <- ESScanner <- ScrollQueryBuilder::build() that determines whether DSL is enable doc_value_mode
  3. When hits.hits of response from ES is empty and total > 0. We know there are data lines, but the corresponding fields do not exist. EsScrollParser will use "_doc_value_mode" and _total to construct _total lines which fields are assigned with 'NULL'

@morningman
Copy link
Contributor

@wuyunfeng Please help to review this PR.

@morningman morningman self-assigned this May 7, 2020
@morningman morningman added area/sql/execution Issues or PRs related to the execution engine kind/fix Categorizes issue or PR as related to a bug. area/doris-on-es Issues or PRs related to Doris on ElasticSearch labels May 7, 2020
@wuyunfeng
Copy link
Member

wuyunfeng commented May 8, 2020

Can you prefix [Doris On ES] on this PR title and describe the problem you resolved more meticulous and accurately

@BabySid BabySid changed the title fix bug of query failed when column not exist [Doris On ES]fix bug of query failed in doc_value_mode when fields have none data May 8, 2020
@BabySid BabySid changed the title [Doris On ES]fix bug of query failed in doc_value_mode when fields have none data [Doris On ES]fix bug of query failed in doc_value_mode when fields have none value May 8, 2020
wuyunfeng
wuyunfeng previously approved these changes May 9, 2020
Copy link
Member

@wuyunfeng wuyunfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and I left some minor comment.

@blackfox1983 Can you give some example to show How this PR works on your issue or your PR comment?

@imay

return Status::OK();
}

// _fields(doc_value) is fetched from E.S.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// _fields(doc_value) is fetched from E.S.
// _fields(doc_value) is fetched from ES

}


// here is operations for `enable_doc_value_scan`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// here is operations for `enable_doc_value_scan`.
// here is operations for `use_doc_value`.

rapidjson::Document _document_node;
rapidjson::Value _inner_hits_node;

bool _use_doc_value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use doc_value_mode is more suitable?
in future, we can use different parser for _source or doc_value mode

Copy link
Member

@wuyunfeng wuyunfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks @blackfox1983

@imay Can you spare some time to review this PR?

Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imay imay added the approved Indicates a PR has been approved by one committer. label May 9, 2020
@imay imay merged commit 5a57ecc into apache:master May 11, 2020
imay pushed a commit that referenced this pull request Jun 4, 2020
…ake usage of total (#3751)

The other PR : #3513 (#3479) try to resolved the `inner hits node is not an array` because when a  query( batch-size) run against new segment without this field, as-well the filter_path just only take `hits.hits.fields` 、`hits.hits._source` into account, this would appear an null inner hits node:
```
{
   "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2",
   "hits": {
      "total": 1
   }
}
```

Unfortunately this PR introduce another serious inconsistent result with different batch_size because of misusing the `total`.

To avoid this two problem,  we just add `hits.hits._score` to filter_path when `docvalue_mode` is true,   `_score`  would always `null` ,  and populate the inner hits node:

```
{
   "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2",
   "hits": {
      "total": 1,
      "hits": [
         {
            "_score": null
         }
      ]
   }
}
```

related issue: #3752
morningman pushed a commit to morningman/doris that referenced this pull request Jun 7, 2020
…ake usage of total (apache#3751)

The other PR : apache#3513 (apache#3479) try to resolved the `inner hits node is not an array` because when a  query( batch-size) run against new segment without this field, as-well the filter_path just only take `hits.hits.fields` 、`hits.hits._source` into account, this would appear an null inner hits node:
```
{
   "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2",
   "hits": {
      "total": 1
   }
}
```

Unfortunately this PR introduce another serious inconsistent result with different batch_size because of misusing the `total`.

To avoid this two problem,  we just add `hits.hits._score` to filter_path when `docvalue_mode` is true,   `_score`  would always `null` ,  and populate the inner hits node:

```
{
   "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2",
   "hits": {
      "total": 1,
      "hits": [
         {
            "_score": null
         }
      ]
   }
}
```

related issue: apache#3752
morningman pushed a commit that referenced this pull request Jun 26, 2020
Prior to this PR, Doris On ES merged another PR #3513 which misusing the `total` node.  After Doris On ES introduce `terminate_after` (#2576), the `total` documents would not be computed, rely  on this `total` field would be dangerous, we just rely on the actual document count by counting the `inner hits` node which it means to be. So we just remove all total parsing and related logic from Doris On ES, this maybe improve performance slightly because of ignoring and skipping `total` json node.
morningman pushed a commit to morningman/doris that referenced this pull request Jun 28, 2020
…#3932)

Prior to this PR, Doris On ES merged another PR apache#3513 which misusing the `total` node.  After Doris On ES introduce `terminate_after` (apache#2576), the `total` documents would not be computed, rely  on this `total` field would be dangerous, we just rely on the actual document count by counting the `inner hits` node which it means to be. So we just remove all total parsing and related logic from Doris On ES, this maybe improve performance slightly because of ignoring and skipping `total` json node.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/doris-on-es Issues or PRs related to Doris on ElasticSearch area/sql/execution Issues or PRs related to the execution engine kind/fix Categorizes issue or PR as related to a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants