[SPARK-3561] Initial commit to provide pluggable strategy to facilitate access to nat... #2849

olegz · 2014-10-20T00:40:17Z

Initial commit to provide pluggable strategy to facilitate access to native Hadoop resources

Added HadoopExecutionContext trait and its default implementation DefaultHadoopExecutionContext
Modified SparkContext to instantiate and delegate to the instance of HadoopExecutionContext where appropriate

Changed HadoopExecutionContext to JobExecutionContext
Changed DefaultHadoopExecutionContext to DefaultExecutionContext
Name changes are due to the fact that when Spark executes outside of Hadoop having Hadoop in the name woudl be confusing
Added initial documentation and tests

polished scaladoc

annotated JobExecutionContext with @DeveloperAPI

eliminated TaskScheduler null checks in favor of NoOpTaskScheduler
to be used in cases where execution of Spark DAG is delegated to an external execution environment

added execution-context check to SparkSubmit

Added recognition of execution-context to SparkContext
updated spark-class script to recognize when 'execution-context:' is used

polished merge

changed annotations from @DeveloperAPI to @experimental as part of the PR suggestion

externalized persist and unpersist operations

added classpath hooks to spark-class

AmplabJenkins · 2014-10-20T00:42:10Z

Can one of the admins verify this patch?

andrewor14 · 2014-10-20T17:35:45Z

Hey @olegz is there an associated JIRA for this? If so could you include it in the title?

olegz · 2014-10-20T17:38:50Z

@andrewor14 done.

AmplabJenkins · 2014-10-21T23:13:57Z

Can one of the admins verify this patch?

sarutak · 2014-10-31T01:51:09Z

core/src/main/scala/org/apache/spark/SparkContext.scala

Please don't use appName for Application ID because Application ID should be unique.

Thanks, I'll address it.

AmplabJenkins · 2014-11-19T20:37:12Z

Can one of the admins verify this patch?

…empDir() `File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#3449 from viirya/fix_createtempdir and squashes the following commits: 36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable. (cherry picked from commit 49fe879) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

…ilding for Scala 2.11'. To build with Scala 2.11, we have to execute `change-version-to-2.11.sh` before Maven execute, otherwise inter-module dependencies are broken. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes apache#3361 from ueshin/docs/building-spark_2.11 and squashes the following commits: 1d29126 [Takuya UESHIN] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'. (cherry picked from commit 0fcd24c) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

This PR adds the Spark version number to the UI footer; this is how it looks: ![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png) Author: Sean Owen <sowen@cloudera.com> Closes apache#3410 from srowen/SPARK-2143 and squashes the following commits: e9b3a7a [Sean Owen] Add Spark version to footer

Grammatical error in Programming Guide document Author: lewuathe <lewuathe@me.com> Closes apache#3412 from Lewuathe/typo-programming-guide and squashes the following commits: a3e2f00 [lewuathe] Typo in Programming Guide markdown (cherry picked from commit a217ec5) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498)  Author: Cheng Lian <lian@databricks.com> Closes apache#3498 from liancheng/fix-sql-doc-typo and squashes the following commits: 865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide (cherry picked from commit 2a4d389) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI. Author: Sean Owen <sowen@cloudera.com> Closes apache#3480 from srowen/SPARK-2192 and squashes the following commits: 47688f1 [Sean Owen] Add data/ to distributions (cherry picked from commit 6384f42) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Author: zsxwing <zsxwing@gmail.com> Closes apache#3521 from zsxwing/SPARK-4661 and squashes the following commits: 03cbe3f [zsxwing] Minor code and docs cleanup (cherry picked from commit 30a86ac) Signed-off-by: Reynold Xin <rxin@databricks.com>

Author: Madhu Siddalingaiah <madhu@madhu.com> Closes apache#3390 from msiddalingaiah/master and squashes the following commits: cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again) 332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code> cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions (cherry picked from commit 2b233f5) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440)  Author: Cheng Lian <lian@databricks.com> Closes apache#3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits: 2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown (cherry picked from commit 5db8dca) Signed-off-by: Michael Armbrust <michael@databricks.com>

@group

group tab is missing for scaladoc Author: Jacky Li <jacky.likun@gmail.com> Closes apache#3458 from jackylk/patch-7 and squashes the following commits: 0121a70 [Jacky Li] add @group tab in limit() and count() (cherry picked from commit bafee67) Signed-off-by: Michael Armbrust <michael@databricks.com>

Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#3208 from viirya/more_numericLit and squashes the following commits: e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal. 1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer. cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast. 91fe489 [Liang-Chi Hsieh] add Byte and Short. 1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility. (cherry picked from commit b57365a) Signed-off-by: Michael Armbrust <michael@databricks.com>

…nction like count(distinct c1,c2..) in Spark SQL Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Author: ravipesala <ravindra.pesala@huawei.com> Author: Michael Armbrust <michael@databricks.com> Closes apache#3511 from ravipesala/countdistinct and squashes the following commits: cc4dbb1 [ravipesala] style 070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL (cherry picked from commit 6a9ff19) Signed-off-by: Michael Armbrust <michael@databricks.com>

Author: ravipesala <ravindra.pesala@huawei.com> Closes apache#3516 from ravipesala/ddl_doc and squashes the following commits: d101fdf [ravipesala] Style issues fixed d2238cd [ravipesala] Corrected documentation (cherry picked from commit bc35381) Signed-off-by: Michael Armbrust <michael@databricks.com>

Author: wangfei <wangfei1@huawei.com> Closes apache#3533 from scwf/sql-doc1 and squashes the following commits: 962910b [wangfei] doc and comment fix (cherry picked from commit 7b79957) Signed-off-by: Michael Armbrust <michael@databricks.com>

…n output Here's one way to make the hashes match what Maven's plugins would create. It takes a little extra footwork since OS X doesn't have the same command line tools. An alternative is just to make Maven output these of course - would that be better? I ask in case there is a reason I'm missing, like, we need to hash files that Maven doesn't build. Author: Sean Owen <sowen@cloudera.com> Closes apache#4161 from srowen/SPARK-5308 and squashes the following commits: 70d09d0 [Sean Owen] Use $(...) syntax e25eff8 [Sean Owen] Generate MD5, SHA1 hashes in a format like Maven's plugin (cherry picked from commit ff356e2) Signed-off-by: Patrick Wendell <patrick@databricks.com>

fix python example of ALS in guide, use Rating instead of np.array. Author: Davies Liu <davies@databricks.com> Closes apache#4226 from davies/fix_als_guide and squashes the following commits: 1433d76 [Davies Liu] fix python example of als in guide (cherry picked from commit fdaad4e) Signed-off-by: Xiangrui Meng <meng@databricks.com>

This reverts commit f53a431.

This reverts commit 3e2d7d3.

Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes apache#4312 from nchammas/patch-2 and squashes the following commits: 9d943aa [Nicholas Chammas] [Docs] Fix Building Spark link text (cherry picked from commit 3f941b6) Signed-off-by: Andrew Or <andrew@databricks.com>

This patch makes Spark 1.2.1rc2 work again on Windows. Without it you get following log output on creating a Spark context: INFO org.apache.spark.SparkEnv:59 - Registering BlockManagerMaster ERROR org.apache.spark.util.Utils:75 - Failed to create local root dir in .... Ignoring this directory. ERROR org.apache.spark.storage.DiskBlockManager:75 - Failed to create any local dir. Author: Martin Weindel <martin.weindel@gmail.com> Author: mweindel <m.weindel@usu-software.de> Closes apache#4299 from MartinWeindel/branch-1.2 and squashes the following commits: 535cb7f [Martin Weindel] fixed last commit f17072e [Martin Weindel] moved condition to caller to avoid confusion on chmod700() return value 4de5e91 [Martin Weindel] reverted to unix line ends fe2740b [mweindel] moved comment ac4749c [mweindel] fixed chmod700 for Windows

…toreRelation's sameresult method only compare databasename and table name) override the MetastoreRelation's sameresult method only compare databasename and table name because in previous : cache table t1; select count(*) from t1; it will read data from memory but the sql below will not,instead it read from hdfs: select count(*) from t1 t; because cache data is keyed by logical plan and compare with sameResult ,so when table with alias the same table 's logicalplan is not the same logical plan with out alias so modify the sameresult method only compare databasename and table name Author: seayi <405078363@qq.com> Author: Michael Armbrust <michael@databricks.com> Closes apache#3898 from seayi/branch-1.2 and squashes the following commits: 8f0c7d2 [seayi] Update CachedTableSuite.scala a277120 [seayi] Update HiveMetastoreCatalog.scala 8d910aa [seayi] Update HiveMetastoreCatalog.scala

…he MetastoreRelation's sameresult method only compare databasename and table name)" This reverts commit 5486440.

This reverts commit 0a16aba.

This reverts commit b77f876.

@DeveloperAPI

…native Hadoop resources Added HadoopExecutionContext trait and its default implementation DefaultHadoopExecutionContext Modified SparkContext to instantiate and delegate to the instance of HadoopExecutionContext where appropriate Changed HadoopExecutionContext to JobExecutionContext Changed DefaultHadoopExecutionContext to DefaultExecutionContext Name changes are due to the fact that when Spark executes outside of Hadoop having Hadoop in the name woudl be confusing Added initial documentation and tests polished scaladoc annotated JobExecutionContext with @DeveloperAPI eliminated TaskScheduler null checks in favor of NoOpTaskScheduler to be used in cases where execution of Spark DAG is delegated to an external execution environment added execution-context check to SparkSubmit Added recognition of execution-context to SparkContext updated spark-class script to recognize when 'execution-context:' is used polished merge changed annotations from @DeveloperAPI to @experimental as part of the PR suggestion externalized persist and unpersist operations added classpath hooks to spark-class cleaned up comments cleaned up comments updated SparkContext to accomodate latest changes to Spark

AmplabJenkins · 2015-04-27T18:24:00Z

Can one of the admins verify this patch?

maidh91 · 2015-04-30T07:21:41Z

Is this patch still working? when will Spark finish to verify it?

nchammas · 2015-05-10T19:40:45Z

when will Spark finish to verify it?

@maidh91 - Please follow the discussion on the JIRA issue to get this kind of information: SPARK-3561

As for whether this patch still works, it hasn't been updated in a while and currently has a merge conflict, so probably not.

srowen · 2015-06-16T19:55:20Z

Can you close this PR? it's no longer mergeable and looks borked at this point.

olegz · 2015-06-16T19:59:13Z

Well, I am waiting in the resolution of https://issues.apache.org/jira/browse/SPARK-3561 since it has recently been updated to "In Progress". I would rather update the PR, to make it mergable unless there is a different proposed approach which I would like to read about.

srowen · 2015-06-16T20:22:49Z

The update was just made by automated tools, not any person. As far as I can tell the proposal in the JIRA is rejected. The problem with this PR is that not only does it not merge but for some reason it has a lot of other commits in it and touches 820 files. Maybe a full rebase would fix it, not sure. But it can be closed in any event; it will stay here for posterity anyway.

olegz · 2015-06-16T20:28:14Z

Sean
I am not sure I understand the "rejected" part, since no rejection has been issued (-1) in the JIRA.

srowen · 2015-06-16T20:36:07Z

Although I'm pretty sure that's the resolution, sure, leave it open if you like. But this PR can't be merged and seems to have gotten messed up somehow; I'm narrowly asking you to not leave it both in that state and open. Close it, or resolve the conflicts / merge history stuff.

maidh91 · 2015-06-16T22:41:20Z

Yes, please fix existing conflicts and merge it. It will be perfect if you can merge with the latest version.

srowen · 2015-06-16T22:54:32Z

@maidh91 this is not going to be merged. I'm suggesting it be closed actually.

maidh91 · 2015-06-17T03:25:36Z

I really hope that this patch will become official part in Spark. I think @srowen is right that we should clean all messy things and open again later. Spark 1.4.0 is just released and Spark-Submit 2015 is happened today and they introduce many new features. It is pretty pity that this patch is not one of them.

olegz · 2015-06-17T17:29:43Z

@srowen
I’d suggest to move this discussion to JIRA and see if we get a disposition there on the overall proposal and idea? The PR may not be in the mergable state and while GitHub is appropriate medium to discuss any technical issues, current discussion seems to go beyond that, hence my suggestion on moving it.

srowen · 2015-06-17T17:59:55Z

That's fine, but in the name of trying to clean up stale PRs, would you mind closing this PR? it's not mergeable and seems corrupted anyway. You can reopen another PR if you really want to.

olegz changed the title ~~Initial commit to provide pluggable strategy to facilitate access to nat...~~ [SPARK-3561] Initial commit to provide pluggable strategy to facilitate access to nat... Oct 20, 2014

olegz force-pushed the SH-1 branch 4 times, most recently from 2a85124 to fab7421 Compare October 30, 2014 14:07

sarutak reviewed Oct 31, 2014
View reviewed changes

pwendell and others added 18 commits November 28, 2014 16:55

Updating version in package.scala

eb4d457

Preparing Spark release v1.2.0-rc1

1056e9e

Preparing development version 1.2.1-SNAPSHOT

00316cc

HOTFIX: Rolling back incorrect version change

3a4609e

[SQL] Minor fix for doc and comment

31cf51b

Author: wangfei <wangfei1@huawei.com> Closes apache#3533 from scwf/sql-doc1 and squashes the following commits: 962910b [wangfei] doc and comment fix (cherry picked from commit 7b79957) Signed-off-by: Michael Armbrust <michael@databricks.com>

pwendell and others added 15 commits January 27, 2015 01:07

Preparing development version 1.2.2-SNAPSHOT

f53a431

Revert "Preparing development version 1.2.2-SNAPSHOT"

063a4c5

This reverts commit f53a431.

Revert "Preparing Spark release v1.2.1-rc1"

4026bba

This reverts commit 3e2d7d3.

Preparing Spark release v1.2.1-rc2

b77f876

Preparing development version 1.2.2-SNAPSHOT

0a16aba

Revert "[SPARK-5195][sql]Update HiveMetastoreCatalog.scala(override t…

88e0f2d

…he MetastoreRelation's sameresult method only compare databasename and table name)" This reverts commit 5486440.

Revert "Preparing development version 1.2.2-SNAPSHOT"

d944c0b

This reverts commit 0a16aba.

Revert "Preparing Spark release v1.2.1-rc2"

a64c7a8

This reverts commit b77f876.

Preparing Spark release v1.2.1-rc3

b6eaf77

olegz force-pushed the SH-1 branch from fab7421 to 015c579 Compare February 11, 2015 16:24

asfgit closed this in c4d2343 Jun 23, 2015

[SPARK-3561] Initial commit to provide pluggable strategy to facilitate access to nat... #2849

[SPARK-3561] Initial commit to provide pluggable strategy to facilitate access to nat... #2849

Uh oh!

Conversation

olegz commented Oct 20, 2014

Uh oh!

AmplabJenkins commented Oct 20, 2014

Uh oh!

andrewor14 commented Oct 20, 2014

Uh oh!

olegz commented Oct 20, 2014

Uh oh!

AmplabJenkins commented Oct 21, 2014

Uh oh!

sarutak Oct 31, 2014

Choose a reason for hiding this comment

Uh oh!

olegz Oct 31, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Nov 19, 2014

Uh oh!

AmplabJenkins commented Apr 27, 2015

Uh oh!

maidh91 commented Apr 30, 2015

Uh oh!

nchammas commented May 10, 2015

Uh oh!

srowen commented Jun 16, 2015

Uh oh!

olegz commented Jun 16, 2015

Uh oh!

srowen commented Jun 16, 2015

Uh oh!

olegz commented Jun 16, 2015

Uh oh!

srowen commented Jun 16, 2015

Uh oh!

maidh91 commented Jun 16, 2015

Uh oh!

srowen commented Jun 16, 2015

Uh oh!

maidh91 commented Jun 17, 2015

Uh oh!

olegz commented Jun 17, 2015

Uh oh!

srowen commented Jun 17, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants