Move analyze skipping index rules to config #288

rupal-bq · 2024-03-19T18:20:18Z

Description

Address comments from Implement analyze skipping index statement #284

Issues Resolved

[FEATURE] Automate skipping index column and algorithm selection #221

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rupal Mahajan <maharup@amazon.com>

rupal-bq · 2024-03-19T21:15:36Z

@dai-chen

I'm not sure if this is the right API because I've only used table.schema(). Could you double check this along with the comment above later? #284 (comment)

checked partitioning api again, it returns the physical partitioning of this table. We used schema to get list of columns and partitioning to get partition columns.

dai-chen · 2024-03-21T21:31:20Z

@dai-chen

I'm not sure if this is the right API because I've only used table.schema(). Could you double check this along with the comment above later? #284 (comment)

checked partitioning api again, it returns the physical partitioning of this table. We used schema to get list of columns and partitioning to get partition columns.

Could you clarify what's the physical partitioning? I was concerned if this API return static list of partitioned column or all partitions. Because I see you also use toSet to deduplicate.

dai-chen · 2024-03-21T21:38:11Z

...ain/scala/org/opensearch/flint/spark/skipping/recommendations/DataTypeSkippingStrategy.scala

-          rules("PARTITION")._1,
-          rules("PARTITION")._2)
-      } else if (rules.contains(field.dataType.toString)) {
+          rules.getString("recommendation.data_type_rules.PARTITION.skipping_type"),


extract util method to avoid appending string in different place?

dai-chen · 2024-03-21T21:38:37Z

...ain/scala/org/opensearch/flint/spark/skipping/recommendations/DataTypeSkippingStrategy.scala

@@ -7,27 +7,16 @@ package org.opensearch.flint.spark.skipping.recommendations

 import scala.collection.mutable.ArrayBuffer

-import org.opensearch.flint.spark.skipping.FlintSparkSkippingStrategy.SkippingKind.{BLOOM_FILTER, MIN_MAX, PARTITION, VALUE_SET}
+import com.typesafe.config.{Config, ConfigFactory}

 import org.apache.spark.sql.{Row, SparkSession}
 import org.apache.spark.sql.flint.{loadTable, parseTableName}

 class DataTypeSkippingStrategy extends AnalyzeSkippingStrategy {


Could you add missing Javadoc on new class, interface and public methods?

rupal-bq · 2024-03-22T17:39:41Z

Thanks for reviewing @dai-chen. Closing this PR for now. Will address all comments in another PR.

Move data type rules to config file

4ed299b

Signed-off-by: Rupal Mahajan <maharup@amazon.com>

rupal-bq marked this pull request as ready for review March 19, 2024 21:15

rupal-bq requested review from dai-chen, vamsimanohar, penghuo, anirudha, kaituo and YANG-DB as code owners March 19, 2024 21:15

dai-chen added maintenance Code refactoring 0.3 labels Mar 21, 2024

dai-chen reviewed Mar 21, 2024

View reviewed changes

rupal-bq closed this Mar 22, 2024

rupal-bq mentioned this pull request Apr 1, 2024

Add skipping index recommendations for specific columns #300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move analyze skipping index rules to config #288

Move analyze skipping index rules to config #288

rupal-bq commented Mar 19, 2024

rupal-bq commented Mar 19, 2024

dai-chen commented Mar 21, 2024

dai-chen Mar 21, 2024

dai-chen Mar 21, 2024

rupal-bq commented Mar 22, 2024

Move analyze skipping index rules to config #288

Move analyze skipping index rules to config #288

Conversation

rupal-bq commented Mar 19, 2024

Description

Issues Resolved

rupal-bq commented Mar 19, 2024

dai-chen commented Mar 21, 2024

dai-chen Mar 21, 2024

Choose a reason for hiding this comment

dai-chen Mar 21, 2024

Choose a reason for hiding this comment

rupal-bq commented Mar 22, 2024