Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC fetch all #697

Closed
wants to merge 2 commits into from
Closed

PoC fetch all #697

wants to merge 2 commits into from

Conversation

seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Jul 15, 2022

Description

PoC for using PlanContext during LogicalPlan generation Analyzer.analyze(...) to decide whether to 1) fetch all data using scrolling, or 2) fetch data via regular request bounded by the query size limit.
This is useful for ML commands.

I added a topOfAll command in this PR that's purely for demo purpose. It's almost identical to the top command, except that top returns the top results of the first 200 rows, while topOfAll returns the top results of all rows.

Sample queries using the opensearch_dashboards_sample_data_flights:

  1. Get top Origin of the first 200 (default query size limit) rows => “Licenciado Benito Juarez International Airport”
  2. Get top Origin of all rows => “Mariscal Sucre International Airport”
  3. number of rows with Origin = “Licenciado Benito Juarez International Airport”: 134
  4. number of rows with Origin = “Mariscal Sucre International Airport”: 285
❯ curl -XPOST https://localhost:9200/_plugins/_ppl -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{
"query": "source=opensearch_dashboards_sample_data_flights | top 1 Origin"
}'

{
  "schema": [
    {
      "name": "Origin",
      "type": "string"
    }
  ],
  "datarows": [
    [
      "Licenciado Benito Juarez International Airport"
    ]
  ],
  "total": 1,
  "size": 1
}
❯ curl -XPOST https://localhost:9200/_plugins/_ppl -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{
"query": "source=opensearch_dashboards_sample_data_flights | topOfAll 1 Origin"
}'

{
  "schema": [
    {
      "name": "Origin",
      "type": "string"
    }
  ],
  "datarows": [
    [
      "Mariscal Sucre International Airport"
    ]
  ],
  "total": 1,
  "size": 1
}
❯ curl -XPOST https://localhost:9200/_plugins/_ppl -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{
"query": "source=opensearch_dashboards_sample_data_flights Origin=\"Licenciado Benito Juarez International Airport\"| stats count()"
}'

{
  "schema": [
    {
      "name": "count()",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      134
    ]
  ],
  "total": 1,
  "size": 1
}
❯ curl -XPOST https://localhost:9200/_plugins/_ppl -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{
"query": "source=opensearch_dashboards_sample_data_flights Origin=\"Mariscal Sucre International Airport\"| stats count()"
}'

{
  "schema": [
    {
      "name": "count()",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      285
    ]
  ],
  "total": 1,
  "size": 1
}

Explain (for OpenSearchIndexScan, one uses OpenSearchQueryRequest, while the other uses OpenSearchScrollRequest):

❯ curl -XPOST https://localhost:9200/_plugins/_ppl/_explain -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{"query": "source=opensearch_dashboards_sample_data_flights | top 1 Origin"}'

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[Origin]"
    },
    "children": [
      {
        "name": "RareTopNOperator",
        "description": {
          "commandType": "TOP",
          "noOfResults": 1,
          "fields": "[Origin]",
          "groupBy": "[]"
        },
        "children": [
          {
            "name": "OpenSearchIndexScan",
            "description": {
              "request": "OpenSearchQueryRequest(indexName=opensearch_dashboards_sample_data_flights, sourceBuilder={\"from\":0,\"size\":200,\"timeout\":\"1m\"}, searchDone=false)"
            },
            "children": []
          }
        ]
      }
    ]
  }
}
❯ curl -XPOST https://localhost:9200/_plugins/_ppl/_explain -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{"query": "source=opensearch_dashboards_sample_data_flights | topOfAll 1 Origin"}'

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[Origin]"
    },
    "children": [
      {
        "name": "RareTopNOperator",
        "description": {
          "commandType": "TOP",
          "noOfResults": 1,
          "fields": "[Origin]",
          "groupBy": "[]"
        },
        "children": [
          {
            "name": "OpenSearchIndexScan",
            "description": {
              "request": "OpenSearchScrollRequest(indexName=opensearch_dashboards_sample_data_flights, scrollId=null, sourceBuilder={})"
            },
            "children": []
          }
        ]
      }
    ]
  }
}

Issues Resolved

#656

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az seankao-az changed the title Poc fetch all PoC fetch all Jul 15, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #697 (89db77b) into main (8d9b459) will decrease coverage by 31.98%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##               main     #697       +/-   ##
=============================================
- Coverage     94.74%   62.76%   -31.99%     
=============================================
  Files           283       10      -273     
  Lines          7676      658     -7018     
  Branches        561      119      -442     
=============================================
- Hits           7273      413     -6860     
+ Misses          349      192      -157     
+ Partials         54       53        -1     
Flag Coverage Δ
query-workbench 62.76% <ø> (ø)
sql-engine ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...a/org/opensearch/sql/analysis/AnalysisContext.java
...ain/java/org/opensearch/sql/analysis/Analyzer.java
.../main/java/org/opensearch/sql/planner/Planner.java
...rc/main/java/org/opensearch/sql/storage/Table.java
...search/sql/opensearch/storage/OpenSearchIndex.java
...ch/sql/opensearch/storage/OpenSearchIndexScan.java
...ensearch/storage/system/OpenSearchSystemIndex.java
...c/main/java/org/opensearch/sql/ppl/PPLService.java
...java/org/opensearch/sql/ppl/parser/AstBuilder.java
.../org/opensearch/sql/ppl/utils/ArgumentFactory.java
... and 263 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d9b459...89db77b. Read the comment docs.

@seankao-az seankao-az mentioned this pull request Jul 20, 2022
6 tasks
if (node.getCommandType() != RareTopN.CommandType.TOPOFALL) {
return new LogicalRareTopN(child, node.getCommandType(), noOfResults, fields, groupBys);
} else {
return new LogicalRareTopN(child, RareTopN.CommandType.TOP, noOfResults, fields, groupBys);
Copy link
Collaborator Author

@seankao-az seankao-az Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only made minimal changes for the RareTopN to make topOfAll work, so here it may look ugly. There's no LogicalRareTopN for topOfAll. topOfAll is treated as a top command logically. They have the same logical plan and differ only in the planContext. This leads to different physical plan generation.
The RareTopN.CommandType.TOPOFALL is merely for this class to know whether to setIndexScanType as in line 245

@penghuo
Copy link
Collaborator

penghuo commented Jul 20, 2022

#703

@seankao-az seankao-az closed this Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants