Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 19, 2019

What changes were proposed in this pull request?

I extended ExtractBenchmark to support the INTERVAL type of the source parameter of the date_part function.

Why are the changes needed?

  • To detect performance issues while changing implementation of the date_part function in the future.
  • To find out current performance bottlenecks in date_part for the INTERVAL type

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the benchmark and print out produced values per each field value.

@SparkQA
Copy link

SparkQA commented Oct 19, 2019

Test build #112316 has finished for PR 26175 at commit b361b51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case "extract" => s"EXTRACT($field FROM ${castExpr(from)})"
case "date_part" => s"DATE_PART('$field', ${castExpr(from)})"
case "extract" => s"EXTRACT($field FROM ${castExpr(from)}) AS $field"
case "date_part" => s"DATE_PART('$field', ${castExpr(from)}) AS $field"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do we need to add alias?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I debugged this, I showed dataframes to terminal. And printed tables were so wide.

"QUARTER", "MONTH", "DAY",
"HOUR", "MINUTE", "SECOND",
"MILLISECONDS", "MICROSECONDS", "EPOCH")
val settings = Map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal but Settings seems only used within runBenchmarkSuite. I think it's fine to just make this pretty with indentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? Could you clarify, please. Do you want to replace Settings by let's say tuples?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one?

    for {
      (dataType, (fields, funcs, iterNum)) <- Map(
        "timestamp" -> (datetimeFields, Seq("extract", "date_part"), N),
        "date" -> (datetimeFields, Seq("extract", "date_part"), N),
        "interval" -> (intervalFields, Seq("date_part"), N))
      func <- funcs} {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I was thinking like that. I don't mind if you prefer the current way. no biggie.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine otherwise.

@HyukjinKwon
Copy link
Member

Merged to master.

@MaxGekk MaxGekk deleted the extract-interval-benchmark branch June 5, 2020 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants