Skip to content

[SPARK-54403][SQL][Metric View] Add YAML serde infrastructure for metric views#53146

Closed
linhongliu-db wants to merge 8 commits intoapache:masterfrom
linhongliu-db:metric-view-serde
Closed

[SPARK-54403][SQL][Metric View] Add YAML serde infrastructure for metric views#53146
linhongliu-db wants to merge 8 commits intoapache:masterfrom
linhongliu-db:metric-view-serde

Conversation

@linhongliu-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds the complete serialization/deserialization infrastructure for parsing metric view YAML definitions:

  • Add Jackson YAML dependencies to pom.xml
  • Implement canonical model for metric views:
    • Column, Expression (Dimension/Measure), MetricView, Source
    • YAMLVersion validation and exception types
  • Implement version-specific serde (v0.1):
    • YAML deserializer/serializer
    • Base classes for extensibility
  • Add JSON utilities for metadata serialization

Why are the changes needed?

SPIP: Metrics & semantic modeling in Spark

Does this PR introduce any user-facing change?

No

How was this patch tested?

build/sbt "catalyst/testOnly org.apache.spark.sql.metricview.serde.MetricViewFactorySuite"

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code
Co-Authored-By: Claude noreply@anthropic.com

… for metric views

This commit adds the complete serialization/deserialization infrastructure
for parsing metric view YAML definitions:

- Add Jackson YAML dependencies to pom.xml
- Implement canonical model for metric views:
  - Column, Expression (Dimension/Measure), MetricView, Source
  - YAMLVersion validation and exception types
- Implement version-specific serde (v0.1):
  - YAML deserializer/serializer
  - Base classes for extensibility
- Add JSON utilities for metadata serialization
pom.xml Outdated
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
<version>${fasterxml.jackson.version}</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all jackson deps are managed by jackson-bom now, you don't need to declare it again here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


object JsonUtils {
// Singleton ObjectMapper that can be used across the project
private lazy val mapper: ObjectMapper = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comments, should we try to use this singleton as much as possible in the subsequent development?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm surprised that there is no such util in the Spark repo. For now, I plan to use it for all the metric view development, but not sure how much other code needs this.

@linhongliu-db
Copy link
Contributor Author

cc @cloud-fan to review

}
}

object YamlMapperProvider extends YamlMapperProviderBase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add V01 postfix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Trait representing the capability to validate an object
trait Validatable {
def validate(): Try[Unit]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use Try? Can we fail directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

case class Column[T <: Expression](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this type parameter really useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 88 to 89

def validate(): Unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def validate(): Unit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

expression: Expression,
ordinal: Int) extends Validatable {
override def validate(): Unit = {
// No validation needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then shall we remove extends Validatable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}

def getColumnMetadata: ColumnMetadata = {
val truncatedExpr = expression.expr.take(Constants.MAXIMUM_PROPERTY_SIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't parse back a truncated expr, shall we just fail here if it's too large?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Only parse the "version" field and ignore all others
@JsonIgnoreProperties(ignoreUnknown = true)
private[sql] case class YAMLVersion(version: String) extends Validatable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be Validatable? We only create YAMLVersion in MetricViewFactory.fromYAML which already does validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all Validatable

@linhongliu-db
Copy link
Contributor Author

cc @cloud-fan updated the PR based on comments, could you please take another look?

import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule

private[sql] object JsonUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's dead code for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[sql] object YAMLVersion {
private def validYAMLVersions: Set[String] = Set("0.1")

def apply(version: String): YAMLVersion = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. the actual version validation is at MetricViewFactory and this code is useless. There is also a test that confirm this: https://github.com/apache/spark/pull/53146/files#diff-23bf5ddc582ff6684f7cc8950a12f4d8e745ff3fb7b0142dd00015e1f159fc8aR144

def dimensions: Seq[ColumnBase]
def measures: Seq[ColumnBase]

def toCanonical: MetricView = {
Copy link
Contributor

@cloud-fan cloud-fan Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit more about how the canonical entities can help with metric view version evolution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the comment

linhongliu-db and others added 2 commits December 9, 2025 09:07
…rde/MetricViewCanonical.scala

Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
@linhongliu-db
Copy link
Contributor Author

@cloud-fan updated. :-)

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in dc47def Dec 10, 2025
cloud-fan pushed a commit that referenced this pull request Dec 17, 2025
…ution

### What changes were proposed in this pull request?
This PR implements the command to create metric views and the analysis rule to resolve a metric view query:

- CREATE Metric view
  - Add SQL grammar to support `WITH METRIC` when creating a view
  - Add dollar-quoted string support for YAML definitions
  - Implement CreateMetricViewCommand to analyze the view body
  - Use a table property to indicate that the View is a metric view since HIVE has no dedicated table type
- SELECT Metric view
  - Update SessionCatalog to parse metric view definitions on read
  - Add MetricViewPlanner utility to parse the YAML definition and construct an unresolved plan
  - Add ResolveMetricView rule to substitute the dimensions and measures reference to actual expressions

NOTE: This PR depends on #53146

This PR also marks `org.apache.spark.sql.metricview` as an internal package

### Why are the changes needed?
[SPIP: Metrics & semantic modeling in Spark](https://docs.google.com/document/d/1xVTLijvDTJ90lZ_ujwzf9HvBJgWg0mY6cYM44Fcghl0/edit?tab=t.0#heading=h.4iogryr5qznc)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
```
build/sbt "hive/testOnly  org.apache.spark.sql.execution.SimpleMetricViewSuite"
build/sbt "hive/testOnly  org.apache.spark.sql.hive.execution.HiveMetricViewSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53158 from linhongliu-db/metric-view-create-and-select.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants