-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3561] Initial commit to provide pluggable strategy to facilitate access to nat... #2849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
Hey @olegz is there an associated JIRA for this? If so could you include it in the title? |
|
@andrewor14 done. |
|
Can one of the admins verify this patch? |
2a85124 to
fab7421
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use appName for Application ID because Application ID should be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll address it.
|
Can one of the admins verify this patch? |
…empDir() `File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#3449 from viirya/fix_createtempdir and squashes the following commits: 36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable. (cherry picked from commit 49fe879) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
…ilding for Scala 2.11'. To build with Scala 2.11, we have to execute `change-version-to-2.11.sh` before Maven execute, otherwise inter-module dependencies are broken. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes apache#3361 from ueshin/docs/building-spark_2.11 and squashes the following commits: 1d29126 [Takuya UESHIN] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'. (cherry picked from commit 0fcd24c) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
This PR adds the Spark version number to the UI footer; this is how it looks:  Author: Sean Owen <sowen@cloudera.com> Closes apache#3410 from srowen/SPARK-2143 and squashes the following commits: e9b3a7a [Sean Owen] Add Spark version to footer
Grammatical error in Programming Guide document Author: lewuathe <lewuathe@me.com> Closes apache#3412 from Lewuathe/typo-programming-guide and squashes the following commits: a3e2f00 [lewuathe] Typo in Programming Guide markdown (cherry picked from commit a217ec5) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
<!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes apache#3498 from liancheng/fix-sql-doc-typo and squashes the following commits: 865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide (cherry picked from commit 2a4d389) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI. Author: Sean Owen <sowen@cloudera.com> Closes apache#3480 from srowen/SPARK-2192 and squashes the following commits: 47688f1 [Sean Owen] Add data/ to distributions (cherry picked from commit 6384f42) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Author: zsxwing <zsxwing@gmail.com> Closes apache#3521 from zsxwing/SPARK-4661 and squashes the following commits: 03cbe3f [zsxwing] Minor code and docs cleanup (cherry picked from commit 30a86ac) Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Madhu Siddalingaiah <madhu@madhu.com> Closes apache#3390 from msiddalingaiah/master and squashes the following commits: cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again) 332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code> cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions (cherry picked from commit 2b233f5) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes apache#3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits: 2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown (cherry picked from commit 5db8dca) Signed-off-by: Michael Armbrust <michael@databricks.com>
group tab is missing for scaladoc Author: Jacky Li <jacky.likun@gmail.com> Closes apache#3458 from jackylk/patch-7 and squashes the following commits: 0121a70 [Jacky Li] add @group tab in limit() and count() (cherry picked from commit bafee67) Signed-off-by: Michael Armbrust <michael@databricks.com>
Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#3208 from viirya/more_numericLit and squashes the following commits: e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal. 1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer. cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast. 91fe489 [Liang-Chi Hsieh] add Byte and Short. 1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility. (cherry picked from commit b57365a) Signed-off-by: Michael Armbrust <michael@databricks.com>
…nction like count(distinct c1,c2..) in Spark SQL Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Author: ravipesala <ravindra.pesala@huawei.com> Author: Michael Armbrust <michael@databricks.com> Closes apache#3511 from ravipesala/countdistinct and squashes the following commits: cc4dbb1 [ravipesala] style 070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL (cherry picked from commit 6a9ff19) Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: ravipesala <ravindra.pesala@huawei.com> Closes apache#3516 from ravipesala/ddl_doc and squashes the following commits: d101fdf [ravipesala] Style issues fixed d2238cd [ravipesala] Corrected documentation (cherry picked from commit bc35381) Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: wangfei <wangfei1@huawei.com> Closes apache#3533 from scwf/sql-doc1 and squashes the following commits: 962910b [wangfei] doc and comment fix (cherry picked from commit 7b79957) Signed-off-by: Michael Armbrust <michael@databricks.com>
…n output Here's one way to make the hashes match what Maven's plugins would create. It takes a little extra footwork since OS X doesn't have the same command line tools. An alternative is just to make Maven output these of course - would that be better? I ask in case there is a reason I'm missing, like, we need to hash files that Maven doesn't build. Author: Sean Owen <sowen@cloudera.com> Closes apache#4161 from srowen/SPARK-5308 and squashes the following commits: 70d09d0 [Sean Owen] Use $(...) syntax e25eff8 [Sean Owen] Generate MD5, SHA1 hashes in a format like Maven's plugin (cherry picked from commit ff356e2) Signed-off-by: Patrick Wendell <patrick@databricks.com>
fix python example of ALS in guide, use Rating instead of np.array. Author: Davies Liu <davies@databricks.com> Closes apache#4226 from davies/fix_als_guide and squashes the following commits: 1433d76 [Davies Liu] fix python example of als in guide (cherry picked from commit fdaad4e) Signed-off-by: Xiangrui Meng <meng@databricks.com>
This reverts commit f53a431.
This reverts commit 3e2d7d3.
Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes apache#4312 from nchammas/patch-2 and squashes the following commits: 9d943aa [Nicholas Chammas] [Docs] Fix Building Spark link text (cherry picked from commit 3f941b6) Signed-off-by: Andrew Or <andrew@databricks.com>
This patch makes Spark 1.2.1rc2 work again on Windows. Without it you get following log output on creating a Spark context: INFO org.apache.spark.SparkEnv:59 - Registering BlockManagerMaster ERROR org.apache.spark.util.Utils:75 - Failed to create local root dir in .... Ignoring this directory. ERROR org.apache.spark.storage.DiskBlockManager:75 - Failed to create any local dir. Author: Martin Weindel <martin.weindel@gmail.com> Author: mweindel <m.weindel@usu-software.de> Closes apache#4299 from MartinWeindel/branch-1.2 and squashes the following commits: 535cb7f [Martin Weindel] fixed last commit f17072e [Martin Weindel] moved condition to caller to avoid confusion on chmod700() return value 4de5e91 [Martin Weindel] reverted to unix line ends fe2740b [mweindel] moved comment ac4749c [mweindel] fixed chmod700 for Windows
…toreRelation's sameresult method only compare databasename and table name) override the MetastoreRelation's sameresult method only compare databasename and table name because in previous : cache table t1; select count(*) from t1; it will read data from memory but the sql below will not,instead it read from hdfs: select count(*) from t1 t; because cache data is keyed by logical plan and compare with sameResult ,so when table with alias the same table 's logicalplan is not the same logical plan with out alias so modify the sameresult method only compare databasename and table name Author: seayi <405078363@qq.com> Author: Michael Armbrust <michael@databricks.com> Closes apache#3898 from seayi/branch-1.2 and squashes the following commits: 8f0c7d2 [seayi] Update CachedTableSuite.scala a277120 [seayi] Update HiveMetastoreCatalog.scala 8d910aa [seayi] Update HiveMetastoreCatalog.scala
…he MetastoreRelation's sameresult method only compare databasename and table name)" This reverts commit 5486440.
This reverts commit 0a16aba.
This reverts commit b77f876.
…native Hadoop resources Added HadoopExecutionContext trait and its default implementation DefaultHadoopExecutionContext Modified SparkContext to instantiate and delegate to the instance of HadoopExecutionContext where appropriate Changed HadoopExecutionContext to JobExecutionContext Changed DefaultHadoopExecutionContext to DefaultExecutionContext Name changes are due to the fact that when Spark executes outside of Hadoop having Hadoop in the name woudl be confusing Added initial documentation and tests polished scaladoc annotated JobExecutionContext with @DeveloperAPI eliminated TaskScheduler null checks in favor of NoOpTaskScheduler to be used in cases where execution of Spark DAG is delegated to an external execution environment added execution-context check to SparkSubmit Added recognition of execution-context to SparkContext updated spark-class script to recognize when 'execution-context:' is used polished merge changed annotations from @DeveloperAPI to @experimental as part of the PR suggestion externalized persist and unpersist operations added classpath hooks to spark-class cleaned up comments cleaned up comments updated SparkContext to accomodate latest changes to Spark
|
Can one of the admins verify this patch? |
|
Is this patch still working? when will Spark finish to verify it? |
@maidh91 - Please follow the discussion on the JIRA issue to get this kind of information: SPARK-3561 As for whether this patch still works, it hasn't been updated in a while and currently has a merge conflict, so probably not. |
|
Can you close this PR? it's no longer mergeable and looks borked at this point. |
|
Well, I am waiting in the resolution of https://issues.apache.org/jira/browse/SPARK-3561 since it has recently been updated to "In Progress". I would rather update the PR, to make it mergable unless there is a different proposed approach which I would like to read about. |
|
The update was just made by automated tools, not any person. As far as I can tell the proposal in the JIRA is rejected. The problem with this PR is that not only does it not merge but for some reason it has a lot of other commits in it and touches 820 files. Maybe a full rebase would fix it, not sure. But it can be closed in any event; it will stay here for posterity anyway. |
|
Sean |
|
Although I'm pretty sure that's the resolution, sure, leave it open if you like. But this PR can't be merged and seems to have gotten messed up somehow; I'm narrowly asking you to not leave it both in that state and open. Close it, or resolve the conflicts / merge history stuff. |
|
Yes, please fix existing conflicts and merge it. It will be perfect if you can merge with the latest version. |
|
@maidh91 this is not going to be merged. I'm suggesting it be closed actually. |
|
I really hope that this patch will become official part in Spark. I think @srowen is right that we should clean all messy things and open again later. Spark 1.4.0 is just released and Spark-Submit 2015 is happened today and they introduce many new features. It is pretty pity that this patch is not one of them. |
|
@srowen |
|
That's fine, but in the name of trying to clean up stale PRs, would you mind closing this PR? it's not mergeable and seems corrupted anyway. You can reopen another PR if you really want to. |
Initial commit to provide pluggable strategy to facilitate access to native Hadoop resources
Added HadoopExecutionContext trait and its default implementation DefaultHadoopExecutionContext
Modified SparkContext to instantiate and delegate to the instance of HadoopExecutionContext where appropriate
Changed HadoopExecutionContext to JobExecutionContext
Changed DefaultHadoopExecutionContext to DefaultExecutionContext
Name changes are due to the fact that when Spark executes outside of Hadoop having Hadoop in the name woudl be confusing
Added initial documentation and tests
polished scaladoc
annotated JobExecutionContext with @DeveloperAPI
eliminated TaskScheduler null checks in favor of NoOpTaskScheduler
to be used in cases where execution of Spark DAG is delegated to an external execution environment
added execution-context check to SparkSubmit
Added recognition of execution-context to SparkContext
updated spark-class script to recognize when 'execution-context:' is used
polished merge
changed annotations from @DeveloperAPI to @experimental as part of the PR suggestion
externalized persist and unpersist operations
added classpath hooks to spark-class