Skip to content

Conversation

@olegz
Copy link

@olegz olegz commented Oct 20, 2014

Initial commit to provide pluggable strategy to facilitate access to native Hadoop resources

Added HadoopExecutionContext trait and its default implementation DefaultHadoopExecutionContext
Modified SparkContext to instantiate and delegate to the instance of HadoopExecutionContext where appropriate

Changed HadoopExecutionContext to JobExecutionContext
Changed DefaultHadoopExecutionContext to DefaultExecutionContext
Name changes are due to the fact that when Spark executes outside of Hadoop having Hadoop in the name woudl be confusing
Added initial documentation and tests

polished scaladoc

annotated JobExecutionContext with @DeveloperAPI

eliminated TaskScheduler null checks in favor of NoOpTaskScheduler
to be used in cases where execution of Spark DAG is delegated to an external execution environment

added execution-context check to SparkSubmit

Added recognition of execution-context to SparkContext
updated spark-class script to recognize when 'execution-context:' is used

polished merge

changed annotations from @DeveloperAPI to @experimental as part of the PR suggestion

externalized persist and unpersist operations

added classpath hooks to spark-class

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@andrewor14
Copy link
Contributor

Hey @olegz is there an associated JIRA for this? If so could you include it in the title?

@olegz olegz changed the title Initial commit to provide pluggable strategy to facilitate access to nat... [SPARK-3561] Initial commit to provide pluggable strategy to facilitate access to nat... Oct 20, 2014
@olegz
Copy link
Author

olegz commented Oct 20, 2014

@andrewor14 done.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@olegz olegz force-pushed the SH-1 branch 4 times, most recently from 2a85124 to fab7421 Compare October 30, 2014 14:07
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use appName for Application ID because Application ID should be unique.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll address it.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

pwendell and others added 18 commits November 28, 2014 16:55
…empDir()

`File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#3449 from viirya/fix_createtempdir and squashes the following commits:

36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable.

(cherry picked from commit 49fe879)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
…ilding for Scala 2.11'.

To build with Scala 2.11, we have to execute `change-version-to-2.11.sh` before Maven execute, otherwise inter-module dependencies are broken.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes apache#3361 from ueshin/docs/building-spark_2.11 and squashes the following commits:

1d29126 [Takuya UESHIN] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'.

(cherry picked from commit 0fcd24c)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
This PR adds the Spark version number to the UI footer; this is how it looks:

![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png)

Author: Sean Owen <sowen@cloudera.com>

Closes apache#3410 from srowen/SPARK-2143 and squashes the following commits:

e9b3a7a [Sean Owen] Add Spark version to footer
Grammatical error in Programming Guide document

Author: lewuathe <lewuathe@me.com>

Closes apache#3412 from Lewuathe/typo-programming-guide and squashes the following commits:

a3e2f00 [lewuathe] Typo in Programming Guide markdown

(cherry picked from commit a217ec5)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes apache#3498 from liancheng/fix-sql-doc-typo and squashes the following commits:

865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide

(cherry picked from commit 2a4d389)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI.

Author: Sean Owen <sowen@cloudera.com>

Closes apache#3480 from srowen/SPARK-2192 and squashes the following commits:

47688f1 [Sean Owen] Add data/ to distributions

(cherry picked from commit 6384f42)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Author: zsxwing <zsxwing@gmail.com>

Closes apache#3521 from zsxwing/SPARK-4661 and squashes the following commits:

03cbe3f [zsxwing] Minor code and docs cleanup

(cherry picked from commit 30a86ac)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Madhu Siddalingaiah <madhu@madhu.com>

Closes apache#3390 from msiddalingaiah/master and squashes the following commits:

cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again)
332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code>
cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions

(cherry picked from commit 2b233f5)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes apache#3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits:

2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown

(cherry picked from commit 5db8dca)
Signed-off-by: Michael Armbrust <michael@databricks.com>
group tab is missing for scaladoc

Author: Jacky Li <jacky.likun@gmail.com>

Closes apache#3458 from jackylk/patch-7 and squashes the following commits:

0121a70 [Jacky Li] add @group tab in limit() and count()

(cherry picked from commit bafee67)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#3208 from viirya/more_numericLit and squashes the following commits:

e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal.
1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer.
cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast.
91fe489 [Liang-Chi Hsieh] add Byte and Short.
1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility.

(cherry picked from commit b57365a)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…nction like count(distinct c1,c2..) in Spark SQL

Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL

Author: ravipesala <ravindra.pesala@huawei.com>
Author: Michael Armbrust <michael@databricks.com>

Closes apache#3511 from ravipesala/countdistinct and squashes the following commits:

cc4dbb1 [ravipesala] style
070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL

(cherry picked from commit 6a9ff19)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: ravipesala <ravindra.pesala@huawei.com>

Closes apache#3516 from ravipesala/ddl_doc and squashes the following commits:

d101fdf [ravipesala] Style issues fixed
d2238cd [ravipesala] Corrected documentation

(cherry picked from commit bc35381)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: wangfei <wangfei1@huawei.com>

Closes apache#3533 from scwf/sql-doc1 and squashes the following commits:

962910b [wangfei] doc and comment fix

(cherry picked from commit 7b79957)
Signed-off-by: Michael Armbrust <michael@databricks.com>
pwendell and others added 15 commits January 27, 2015 01:07
…n output

Here's one way to make the hashes match what Maven's plugins would create. It takes a little extra footwork since OS X doesn't have the same command line tools. An alternative is just to make Maven output these of course - would that be better? I ask in case there is a reason I'm missing, like, we need to hash files that Maven doesn't build.

Author: Sean Owen <sowen@cloudera.com>

Closes apache#4161 from srowen/SPARK-5308 and squashes the following commits:

70d09d0 [Sean Owen] Use $(...) syntax
e25eff8 [Sean Owen] Generate MD5, SHA1 hashes in a format like Maven's plugin

(cherry picked from commit ff356e2)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
fix python example of ALS in guide, use Rating instead of np.array.

Author: Davies Liu <davies@databricks.com>

Closes apache#4226 from davies/fix_als_guide and squashes the following commits:

1433d76 [Davies Liu] fix python example of als in guide

(cherry picked from commit fdaad4e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes apache#4312 from nchammas/patch-2 and squashes the following commits:

9d943aa [Nicholas Chammas] [Docs] Fix Building Spark link text

(cherry picked from commit 3f941b6)
Signed-off-by: Andrew Or <andrew@databricks.com>
This patch makes Spark 1.2.1rc2 work again on Windows.

Without it you get following log output on creating a Spark context:
INFO  org.apache.spark.SparkEnv:59 - Registering BlockManagerMaster
ERROR org.apache.spark.util.Utils:75 - Failed to create local root dir in .... Ignoring this directory.
ERROR org.apache.spark.storage.DiskBlockManager:75 - Failed to create any local dir.

Author: Martin Weindel <martin.weindel@gmail.com>
Author: mweindel <m.weindel@usu-software.de>

Closes apache#4299 from MartinWeindel/branch-1.2 and squashes the following commits:

535cb7f [Martin Weindel] fixed last commit
f17072e [Martin Weindel] moved condition to caller to avoid confusion on chmod700() return value
4de5e91 [Martin Weindel] reverted to unix line ends
fe2740b [mweindel] moved comment
ac4749c [mweindel] fixed chmod700 for Windows
…toreRelation's sameresult method only compare databasename and table name)

override  the MetastoreRelation's  sameresult method only compare databasename and table name

because in previous :
cache table t1;
select count(*) from t1;
it will read data from memory  but the sql below will not,instead it read from hdfs:
select count(*) from t1 t;

because cache data is keyed by logical plan and compare with sameResult ,so  when table with alias  the same table 's logicalplan is not the same logical plan with out alias  so modify  the sameresult method only compare databasename and table name

Author: seayi <405078363@qq.com>
Author: Michael Armbrust <michael@databricks.com>

Closes apache#3898 from seayi/branch-1.2 and squashes the following commits:

8f0c7d2 [seayi] Update CachedTableSuite.scala
a277120 [seayi] Update HiveMetastoreCatalog.scala
8d910aa [seayi] Update HiveMetastoreCatalog.scala
…he MetastoreRelation's sameresult method only compare databasename and table name)"

This reverts commit 5486440.
…native Hadoop resources

Added HadoopExecutionContext trait and its default implementation DefaultHadoopExecutionContext
Modified SparkContext to instantiate and delegate to the instance of HadoopExecutionContext where appropriate

Changed HadoopExecutionContext to JobExecutionContext
Changed DefaultHadoopExecutionContext to DefaultExecutionContext
Name changes are due to the fact that when Spark executes outside of Hadoop having Hadoop in the name woudl be confusing
Added initial documentation and tests

polished scaladoc

annotated JobExecutionContext with @DeveloperAPI

eliminated TaskScheduler null checks in favor of NoOpTaskScheduler
to be used in cases where execution of Spark DAG is delegated to an external execution environment

added execution-context check to SparkSubmit

Added recognition of execution-context to SparkContext
updated spark-class script to recognize when 'execution-context:' is used

polished merge

changed annotations from @DeveloperAPI to @experimental as part of the PR suggestion

externalized persist and unpersist operations

added classpath hooks to spark-class

cleaned up comments

cleaned up comments

updated SparkContext to accomodate latest changes to Spark
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@maidh91
Copy link

maidh91 commented Apr 30, 2015

Is this patch still working? when will Spark finish to verify it?

@nchammas
Copy link
Contributor

when will Spark finish to verify it?

@maidh91 - Please follow the discussion on the JIRA issue to get this kind of information: SPARK-3561

As for whether this patch still works, it hasn't been updated in a while and currently has a merge conflict, so probably not.

@srowen
Copy link
Member

srowen commented Jun 16, 2015

Can you close this PR? it's no longer mergeable and looks borked at this point.

@olegz
Copy link
Author

olegz commented Jun 16, 2015

Well, I am waiting in the resolution of https://issues.apache.org/jira/browse/SPARK-3561 since it has recently been updated to "In Progress". I would rather update the PR, to make it mergable unless there is a different proposed approach which I would like to read about.

@srowen
Copy link
Member

srowen commented Jun 16, 2015

The update was just made by automated tools, not any person. As far as I can tell the proposal in the JIRA is rejected. The problem with this PR is that not only does it not merge but for some reason it has a lot of other commits in it and touches 820 files. Maybe a full rebase would fix it, not sure. But it can be closed in any event; it will stay here for posterity anyway.

@olegz
Copy link
Author

olegz commented Jun 16, 2015

Sean
I am not sure I understand the "rejected" part, since no rejection has been issued (-1) in the JIRA.

@srowen
Copy link
Member

srowen commented Jun 16, 2015

Although I'm pretty sure that's the resolution, sure, leave it open if you like. But this PR can't be merged and seems to have gotten messed up somehow; I'm narrowly asking you to not leave it both in that state and open. Close it, or resolve the conflicts / merge history stuff.

@maidh91
Copy link

maidh91 commented Jun 16, 2015

Yes, please fix existing conflicts and merge it. It will be perfect if you can merge with the latest version.

@srowen
Copy link
Member

srowen commented Jun 16, 2015

@maidh91 this is not going to be merged. I'm suggesting it be closed actually.

@maidh91
Copy link

maidh91 commented Jun 17, 2015

I really hope that this patch will become official part in Spark. I think @srowen is right that we should clean all messy things and open again later. Spark 1.4.0 is just released and Spark-Submit 2015 is happened today and they introduce many new features. It is pretty pity that this patch is not one of them.

@olegz
Copy link
Author

olegz commented Jun 17, 2015

@srowen
I’d suggest to move this discussion to JIRA and see if we get a disposition there on the overall proposal and idea? The PR may not be in the mergable state and while GitHub is appropriate medium to discuss any technical issues, current discussion seems to go beyond that, hence my suggestion on moving it.

@srowen
Copy link
Member

srowen commented Jun 17, 2015

That's fine, but in the name of trying to clean up stale PRs, would you mind closing this PR? it's not mergeable and seems corrupted anyway. You can reopen another PR if you really want to.

@asfgit asfgit closed this in c4d2343 Jun 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.