Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement analyze skipping index statement #284

Merged

Conversation

rupal-bq
Copy link
Contributor

@rupal-bq rupal-bq commented Mar 14, 2024

Description

Add ANALYZE SKIPPING INDEX statement. This returns recommendation for skipping index based on following rules.

  • All top-level columns are selected
  • PARTITION algorithm is recommended for partition columns
  • MIN_MAX algorithm is recommended for numerical data types columns
  • VALUE_SET algorithm is recommended for boolean data type columns
  • BLOOM_FILTER algorithm is recommended for all other supported columns
  • Unsupported data type columns are skipped.

Issues Resolved

#221

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
…ch_spark into analyze-skipping-index

Signed-off-by: Rupal Mahajan <maharup@amazon.com>
@@ -105,6 +106,10 @@ vacuumCoveringIndexStatement
: VACUUM INDEX indexName ON tableName
;

analyzeSkippingIndexStatement
: ANALYZE SKIPPING INDEX ON tableName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this grammar finalized? What is the semantic meaning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is proposed grammar. Please comment if you have any other suggestions. Analyze refers to examining data to get insights. This command will return recommendation for creating skipping index (skipping index columns with suggested data structure) based on table data.

Copy link
Collaborator

@noCharger noCharger Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is proposed grammar.

Any reference / compatibility analysis with the mainstream syntax?

Please comment if you have any other suggestions.

Just brainstorming -

ANALYZE TABLE tableName FOR SKIPPING INDEX RECOMMENDATIONS;

Or

ANALYZE TABLE tableName RECOMMEND SKIPPING INDEX COLUMNS;

The assumption is we may want to do more things other from the recommendation.

ref https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/ANALYZE.html#GUID-535CE98E-2359-4147-839F-DCB3772C1B0E

Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
…ndex

Signed-off-by: Rupal Mahajan <maharup@amazon.com>
@rupal-bq rupal-bq changed the title Add sql grammar support for analyze skipping index statement Implement analyze skipping index statement Mar 18, 2024
Signed-off-by: Rupal Mahajan <maharup@amazon.com>

class DataTypeSkippingStrategy extends AnalyzeSkippingStrategy {

val rules = Map(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if more flexible to move this static mapping to config file? Or maybe not necessary for this P0 solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. added this here thinking it's specific to data type based recommendation and won't be used by other strategies(e.g. recommendation based on table stats).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take this up as fast follow up because it will unblock sql plugin if we can finalize grammar before 2.13 release.

@dai-chen dai-chen added enhancement New feature or request 0.3 labels Mar 18, 2024
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we also want to merge implementation in this PR, could you update user manual like this? https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#all-indexes. Or if no time, we can just merge grammar in this PR.

Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
@rupal-bq
Copy link
Contributor Author

If we also want to merge implementation in this PR, could you update user manual like this? https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#all-indexes. Or if no time, we can just merge grammar in this PR.

Thanks! Updated user manual.

Comment on lines +35 to +43
val partitionFields = table.partitioning().flatMap { transform =>
transform
.references()
.collect({ case reference =>
reference.fieldNames()
})
.flatten
.toSet
}
Copy link
Collaborator

@dai-chen dai-chen Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the right API because I've only used table.schema(). Could you double check this along with the comment above later? I will merge this PR for now so we can get the grammar into SQL plugin side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will do. Thanks!

@dai-chen dai-chen merged commit e6a97dc into opensearch-project:main Mar 18, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.3 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants