Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

Conversation

@nraychaudhuri
Copy link

Possible fix for SPARK-7831 issue.

@typesafe-tools
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/71/

Build Log
last 10 lines

[...truncated 16 lines...]
 > git rev-parse refs/remotes/origin/origin/pr/23/merge^{commit} # timeout=10
Checking out Revision f680cede57596162e2b3ecaff31dab2b6bf02f00 (refs/remotes/origin/pr/23/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f680cede57596162e2b3ecaff31dab2b6bf02f00
First time build. Skipping changelog.
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 completed with result FAILURE
Notifying upstream projects of job completion
Setting status of 082f95bb852cdbc335200a23c0eff3b2780eab4c to FAILURE with url http://ci.typesafe.com/job/ghprb-spark-multi-conf/71/ and message: Merged build finished.

Test FAILed.

@typesafe-tools
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/72/

Build Log
last 10 lines

[...truncated 16 lines...]
 > git rev-parse refs/remotes/origin/origin/pr/23/merge^{commit} # timeout=10
Checking out Revision 0e95f314928028004549e5d398991349edb5e6d8 (refs/remotes/origin/pr/23/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0e95f314928028004549e5d398991349edb5e6d8
First time build. Skipping changelog.
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 completed with result FAILURE
Notifying upstream projects of job completion
Setting status of 02bcbe7b644e5fe8a68f8653a119d657d59f4118 to FAILURE with url http://ci.typesafe.com/job/ghprb-spark-multi-conf/72/ and message: Merged build finished.

Test FAILed.

@skonto
Copy link

skonto commented Jan 8, 2016

retest this please jenkins

@typesafe-tools
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/82/

Build Log
last 10 lines

[...truncated 16 lines...]
 > git rev-parse refs/remotes/origin/origin/pr/23/merge^{commit} # timeout=10
Checking out Revision 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5 (refs/remotes/origin/pr/23/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5
First time build. Skipping changelog.
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 completed with result FAILURE
Notifying upstream projects of job completion
Setting status of 02bcbe7b644e5fe8a68f8653a119d657d59f4118 to FAILURE with url http://ci.typesafe.com/job/ghprb-spark-multi-conf/82/ and message: Merged build finished.

Test FAILed.

@nraychaudhuri
Copy link
Author

retest this please jenkins

@typesafe-tools
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/86/

Build Log
last 10 lines

[...truncated 16 lines...]
 > git rev-parse refs/remotes/origin/origin/pr/23/merge^{commit} # timeout=10
Checking out Revision 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5 (refs/remotes/origin/pr/23/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5
First time build. Skipping changelog.
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 completed with result FAILURE
Notifying upstream projects of job completion
Setting status of 02bcbe7b644e5fe8a68f8653a119d657d59f4118 to FAILURE with url http://ci.typesafe.com/job/ghprb-spark-multi-conf/86/ and message: Merged build finished.

Test FAILed.

@nraychaudhuri
Copy link
Author

retest this please jenkins

@typesafe-tools
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/87/

Build Log
last 10 lines

[...truncated 11 lines...]
 > git rev-parse refs/remotes/origin/origin/pr/23/merge^{commit} # timeout=10
Checking out Revision 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5 (refs/remotes/origin/pr/23/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5
First time build. Skipping changelog.
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 completed with result FAILURE
Notifying upstream projects of job completion
Setting status of 02bcbe7b644e5fe8a68f8653a119d657d59f4118 to FAILURE with url http://ci.typesafe.com/job/ghprb-spark-multi-conf/87/ and message: Merged build finished.

Test FAILed.

@nraychaudhuri
Copy link
Author

retest this please jenkins

@typesafe-tools
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/88/

Build Log
last 10 lines

[...truncated 16 lines...]
 > git rev-parse refs/remotes/origin/origin/pr/23/merge^{commit} # timeout=10
Checking out Revision 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5 (refs/remotes/origin/pr/23/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5
First time build. Skipping changelog.
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 completed with result FAILURE
Notifying upstream projects of job completion
Setting status of 02bcbe7b644e5fe8a68f8653a119d657d59f4118 to FAILURE with url http://ci.typesafe.com/job/ghprb-spark-multi-conf/88/ and message: Merged build finished.

Test FAILed.

@nraychaudhuri
Copy link
Author

retest this please jenkins

@typesafe-tools
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/89/

Build Log
last 10 lines

[...truncated 11 lines...]
 > git rev-parse refs/remotes/origin/origin/pr/23/merge^{commit} # timeout=10
Checking out Revision 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5 (refs/remotes/origin/pr/23/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5
First time build. Skipping changelog.
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 completed with result FAILURE
Notifying upstream projects of job completion
Setting status of 02bcbe7b644e5fe8a68f8653a119d657d59f4118 to FAILURE with url http://ci.typesafe.com/job/ghprb-spark-multi-conf/89/ and message: Merged build finished.

Test FAILed.

@nraychaudhuri
Copy link
Author

I don't think these test failures are related to my changes. Any comment on code changes?

@nraychaudhuri
Copy link
Author

retest this please jenkins

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--disable-driver-failover would be too long? just a suggestion...

@skonto
Copy link

skonto commented Jan 8, 2016

LGTM i will run it locally. any spark documentation that needs update? Need any unit tests?

@typesafe-tools
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/90/

Build Log
last 10 lines

[...truncated 11 lines...]
 > git rev-parse refs/remotes/origin/origin/pr/23/merge^{commit} # timeout=10
Checking out Revision 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5 (refs/remotes/origin/pr/23/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2e2d68dbec0664cfd0fb06eb3e414b38fa3af5a5
First time build. Skipping changelog.
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 completed with result FAILURE
Notifying upstream projects of job completion
Setting status of 02bcbe7b644e5fe8a68f8653a119d657d59f4118 to FAILURE with url http://ci.typesafe.com/job/ghprb-spark-multi-conf/90/ and message: Merged build finished.

Test FAILed.

@nraychaudhuri
Copy link
Author

@skonto thanks for reviewing it. I looked around. No tests for this code exists. I guess I can add one but I don't know how much work is required as this one is invoked from shell script. I might push this out and see what feedback I get from them ;)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method used? I cannot find reference to it in the codebase.

@nraychaudhuri
Copy link
Author

Closing this one, lets have discussion here apache#10701

typesafe-tools pushed a commit that referenced this pull request May 27, 2016
…w queries

## What changes were proposed in this pull request?

This PR aims to implement decimal aggregation optimization for window queries by improving existing `DecimalAggregates`. Historically, `DecimalAggregates` optimizer is designed to transform general `sum/avg(decimal)`, but it breaks recently added windows queries like the followings. The following queries work well without the current `DecimalAggregates` optimizer.

**Sum**
```scala
scala> sql("select sum(a) over () from (select explode(array(1.0,2.0)) a) t").head
java.lang.RuntimeException: Unsupported window function: MakeDecimal((sum(UnscaledValue(a#31)),mode=Complete,isDistinct=false),12,1)
scala> sql("select sum(a) over () from (select explode(array(1.0,2.0)) a) t").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [sum(a) OVER (  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#23]
:     +- INPUT
+- Window [MakeDecimal((sum(UnscaledValue(a#21)),mode=Complete,isDistinct=false),12,1) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS sum(a) OVER (  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#23]
   +- Exchange SinglePartition, None
      +- Generate explode([1.0,2.0]), false, false, [a#21]
         +- Scan OneRowRelation[]
```

**Average**
```scala
scala> sql("select avg(a) over () from (select explode(array(1.0,2.0)) a) t").head
java.lang.RuntimeException: Unsupported window function: cast(((avg(UnscaledValue(a#40)),mode=Complete,isDistinct=false) / 10.0) as decimal(6,5))
scala> sql("select avg(a) over () from (select explode(array(1.0,2.0)) a) t").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [avg(a) OVER (  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#44]
:     +- INPUT
+- Window [cast(((avg(UnscaledValue(a#42)),mode=Complete,isDistinct=false) / 10.0) as decimal(6,5)) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS avg(a) OVER (  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#44]
   +- Exchange SinglePartition, None
      +- Generate explode([1.0,2.0]), false, false, [a#42]
         +- Scan OneRowRelation[]
```

After this PR, those queries work fine and new optimized physical plans look like the followings.

**Sum**
```scala
scala> sql("select sum(a) over () from (select explode(array(1.0,2.0)) a) t").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [sum(a) OVER (  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#35]
:     +- INPUT
+- Window [MakeDecimal((sum(UnscaledValue(a#33)),mode=Complete,isDistinct=false) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),12,1) AS sum(a) OVER (  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#35]
   +- Exchange SinglePartition, None
      +- Generate explode([1.0,2.0]), false, false, [a#33]
         +- Scan OneRowRelation[]
```

**Average**
```scala
scala> sql("select avg(a) over () from (select explode(array(1.0,2.0)) a) t").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [avg(a) OVER (  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#47]
:     +- INPUT
+- Window [cast(((avg(UnscaledValue(a#45)),mode=Complete,isDistinct=false) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) / 10.0) as decimal(6,5)) AS avg(a) OVER (  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#47]
   +- Exchange SinglePartition, None
      +- Generate explode([1.0,2.0]), false, false, [a#45]
         +- Scan OneRowRelation[]
```

In this PR, *SUM over window* pattern matching is based on the code of hvanhovell ; he should be credited for the work he did.

## How was this patch tested?

Pass the Jenkins tests (with newly added testcases)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#12421 from dongjoon-hyun/SPARK-14664.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants