-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support describe command in PPL #644
Comments
Regarding the command syntax, I have a couple of candidates. I need some comments and feedback. Candidate 1
Examples:
Comments:
Candidate 2
Examples:
Comments:
Candidate 3
Examples:
Comments:
Candidate 4A wilder one would be allowing piping out from the describe command, so that we can do something like this: |
My opinion:
|
You may run into issues with the column_pattern, like: #259 |
Currently, when I run SQL describe tables - if I use the columns like pattern I get no data back:
If I don't include the COLUMNS LIKE, I get full data |
source is always the first cmd, that is a PPL syntax limitation for now which invalidates option, 4,3,1 Ideally a user should be able to return query data and describe data. or choose between it. Hence, Open 2 is best source= | describe [column_pattern=] Examples: source=web_logs | describe source=web_logs | describe pipe-passthrough=true ( i.e describe with existing query response ) [ default false ] |
@penghuo @mengweieric please review and comment |
column_pattern is good to have, may not be required for phase 1 release |
not necessary. if we consider DML. we could loose this limitation. I prefer add option 4, but remove
|
if we could allow the first cmd to be describe, I prefer option 4, as any cmd that works with search could potentially also work with describe. However, a lot of cmds only make sense for search queries, so supporting all of them for describe might be an overkill. |
You'd want to support anything that you get for free. Many commands would work out of the box because the describe command returns in a table format. |
Need feedback to decide what response format to use for PPL describe. Currently all SQL responses are formatted by JdbcResponseFormatter (by default), while all PPL responses are formatted by SimpleJsonResponseFormatter (by default). This is what caused the mismatch in their responses for the DESCRIBE cmd. The options are:
Personally leaning towards option 2 ReferencesRelevant lines:
Example of mismatching resultsNotice that the datarows, which contains the metadata of the queried index, are actually identical (for this example, at least). diff --git a/ppl_result b/sql_result
index 29ae5b87..76f5f8d4 100644
--- a/ppl_result
+++ b/sql_result
@@ -2,99 +2,99 @@
"schema": [
{
"name": "TABLE_CAT",
- "type": "string"
+ "type": "keyword"
},
{
"name": "TABLE_SCHEM",
- "type": "string"
+ "type": "keyword"
},
{
"name": "TABLE_NAME",
- "type": "string"
+ "type": "keyword"
},
{
"name": "COLUMN_NAME",
- "type": "string"
+ "type": "keyword"
},
{
"name": "DATA_TYPE",
- "type": "string"
+ "type": "keyword"
},
{
"name": "TYPE_NAME",
- "type": "string"
+ "type": "keyword"
},
{
"name": "COLUMN_SIZE",
- "type": "string"
+ "type": "keyword"
},
{
"name": "BUFFER_LENGTH",
- "type": "string"
+ "type": "keyword"
},
{
"name": "DECIMAL_DIGITS",
- "type": "string"
+ "type": "keyword"
},
{
"name": "NUM_PREC_RADIX",
- "type": "string"
+ "type": "keyword"
},
{
"name": "NULLABLE",
- "type": "string"
+ "type": "keyword"
},
{
"name": "REMARKS",
- "type": "string"
+ "type": "keyword"
},
{
"name": "COLUMN_DEF",
- "type": "string"
+ "type": "keyword"
},
{
"name": "SQL_DATA_TYPE",
- "type": "string"
+ "type": "keyword"
},
{
"name": "SQL_DATETIME_SUB",
- "type": "string"
+ "type": "keyword"
},
{
"name": "CHAR_OCTET_LENGTH",
- "type": "string"
+ "type": "keyword"
},
{
"name": "ORDINAL_POSITION",
- "type": "string"
+ "type": "keyword"
},
{
"name": "IS_NULLABLE",
- "type": "string"
+ "type": "keyword"
},
{
"name": "SCOPE_CATALOG",
- "type": "string"
+ "type": "keyword"
},
{
"name": "SCOPE_SCHEMA",
- "type": "string"
+ "type": "keyword"
},
{
"name": "SCOPE_TABLE",
- "type": "string"
+ "type": "keyword"
},
{
"name": "SOURCE_DATA_TYPE",
- "type": "string"
+ "type": "keyword"
},
{
"name": "IS_AUTOINCREMENT",
- "type": "string"
+ "type": "keyword"
},
{
"name": "IS_GENERATEDCOLUMN",
- "type": "string"
+ "type": "keyword"
}
],
"datarows": [
@@ -802,5 +802,6 @@
]
],
"total": 27,
- "size": 27
+ "size": 27,
+ "status": 200
} |
@penghuo could you review this pls |
PPL does not intend to support JDBC. The difference is only schema type, string is more make sense. By the way, most of columns is meaningless. I suppose frontend only care about COLUMN_NAME, DATA_TYPE. |
1. Overview
1.1 Introduction
Currently, users have to use either OpenSearch query DSL or SQL to get the schema for an index, and then manually analyze the query result to gather useful information.
This new feature provides users with a new command,
describe
, in PPL, so that they can not only retrieve metadata for an index, but also leverage PPL's pipe (|
) syntax and other commands to further query the metadata.1.2 Use Cases
Observability plugin
Queries for the observability plugin are associated with a datetime range specified in the datetime selection UI. We do this by inserting an additional
where
command into the PPL query, entered by the user, to filter out the datetime range. To achieve that, we would need to know the field names that contain the datetime info.Currently, we first use OpenSearch query DSL to fetch the fields for an index, and then filter out time fields in client code.
Using this new feature would allow us to construct a PPL query to do these all at once.
2. Requirements
|
) out3. Out of Scope
4. Design
4.1 Proposed Design
Syntax:
describe
is a top level command just likesearch
. Piping intodescribe
is not allowed.Sample query:
4.2 Alternatives Considered
Alternative 1
Syntax:
This syntax is analogous to the
search source=<index>
command, but we decide not to includesource=
in thedescribe
command because rather than fetching data from the index, we're fetching metadata of the index.Alternative 2
Syntax:
Getting metadata of an index requires reading a metadata table of the index, instead of fetching the index itself. The semantics of the syntax does not quite fit this context.
The text was updated successfully, but these errors were encountered: