[STREAMING]There is a dependent package conflict #8738

272029252 · 2015-09-14T04:25:57Z

When I use the Streaming, there is a dependent package conflict.curator-client
[INFO] - org.apache.spark:spark-core_2.10:jar:1.5.0:compile
[INFO] +- org.apache.curator:curator-recipes:jar:2.4.0:compile
[INFO] | - org.apache.curator:curator-framework:jar:2.4.0:compile
[INFO] | - (org.apache.curator:curator-client:jar:2.4.0:compile - omitted for conflict with 2.1.0-incubating)
[INFO] - org.tachyonproject:tachyon-client:jar:0.7.1:compile
[INFO] - org.apache.curator:curator-client:jar:2.1.0-incubating:compile

The BYTE_ARRAY_OFFSET could be different in JVM with different configurations (for example, different heap size, 24 if heap > 32G, otherwise 16), so offset of UTF8String is not portable, we should handler that during serialization. Author: Davies Liu <davies@databricks.com> Closes #8210 from davies/serialize_utf8string. (cherry picked from commit 7c1e568) Signed-off-by: Davies Liu <davies.liu@gmail.com>

…ters in doc Tiny modification to a few comments ```sbt publishLocal``` work again. Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #8209 from hvanhovell/SPARK-9980. (cherry picked from commit a85fb6c) Signed-off-by: Sean Owen <sowen@cloudera.com>

We should skip unresolved `LogicalPlan`s for `PullOutNondeterministic`, as calling `output` on unresolved `LogicalPlan` will produce confusing error message. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8203 from cloud-fan/error-msg and squashes the following commits: 1c67ca7 [Wenchen Fan] move test 7593080 [Wenchen Fan] correct error message for aggregate (cherry picked from commit 5705672) Signed-off-by: Michael Armbrust <michael@databricks.com>

…reaming pyspark tests Recently, PySpark ML streaming tests have been flaky, most likely because of the batches not being processed in time. Proposal: Replace the use of _ssc_wait (which waits for a fixed amount of time) with a method which waits for a fixed amount of time but can terminate early based on a termination condition method. With this, we can extend the waiting period (to make tests less flaky) but also stop early when possible (making tests faster on average, which I verified locally). CC: mengxr tdas freeman-lab Author: Joseph K. Bradley <joseph@databricks.com> Closes #8087 from jkbradley/streaming-ml-tests. (cherry picked from commit 1db7179) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

This is a WIP patch for SPARK-8844 for collecting reviews. This bug is about reading an empty DataFrame. in readCol(), lapply(1:numRows, function(x) { does not take into consideration the case where numRows = 0. Will add unit test case. Author: Sun Rui <rui.sun@intel.com> Closes #7419 from sun-rui/SPARK-8844. (cherry picked from commit 5f9ce73) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

…rrow deps The shuffle locality patch made the DAGScheduler aware of shuffle data, but for RDDs that have both narrow and shuffle dependencies, it can cause them to place tasks based on the shuffle dependency instead of the narrow one. This case is common in iterative join-based algorithms like PageRank and ALS, where one RDD is hash-partitioned and one isn't. Author: Matei Zaharia <matei@databricks.com> Closes #8220 from mateiz/shuffle-loc-fix. (cherry picked from commit cf01607) Signed-off-by: Matei Zaharia <matei@databricks.com>

The `initialSize` argument of `ColumnBuilder.initialize()` should be the number of rows rather than bytes. However `InMemoryColumnarTableScan` passes in a byte size, which makes Spark SQL allocate more memory than necessary when building in-memory columnar buffers. Author: Kun Xu <viper_kun@163.com> Closes #8189 from viper-kun/errorSize. (cherry picked from commit 182f9b7) Signed-off-by: Cheng Lian <lian@databricks.com>

In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly handled. For example, the schema of a Parquet file to be read can be: ``` message individual { required group f1 { optional binary f11 (utf8); } } ``` while the global schema is: ``` message global { required group f1 { optional binary f11 (utf8); optional int32 f12; } } ``` This PR fixes this issue by padding missing fields when creating actual converters. Author: Cheng Lian <lian@databricks.com> Closes #8228 from liancheng/spark-10005/nested-schema-merging. (cherry picked from commit ae2370e) Signed-off-by: Yin Huai <yhuai@databricks.com>

… a variable parameter ### Summary - Add `lit` function - Add `concat`, `greatest`, `least` functions I think we need to improve `collect` function in order to implement `struct` function. Since `collect` doesn't work with arguments which includes a nested `list` variable. It seems that a list against `struct` still has `jobj` classes. So it would be better to solve this problem on another issue. ### JIRA [[SPARK-9871] Add expression functions into SparkR which have a variable parameter - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9871) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8194 from yu-iskw/SPARK-9856. (cherry picked from commit 26e7605) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

mengxr Author: Feynman Liang <fliang@databricks.com> Closes #8206 from feynmanliang/SPARK-9959-arules-java. (cherry picked from commit f7efda3) Signed-off-by: Xiangrui Meng <meng@databricks.com>

…sk() fails When inserting data into a `HadoopFsRelation`, if `commitTask()` of the writer container fails, `abortTask()` will be invoked. However, both `commitTask()` and `abortTask()` try to close the output writer(s). The problem is that, closing underlying writers may not be an idempotent operation. E.g., `ParquetRecordWriter.close()` throws NPE when called twice. Author: Cheng Lian <lian@databricks.com> Closes #8236 from liancheng/spark-7837/double-closing. (cherry picked from commit 76c155d) Signed-off-by: Cheng Lian <lian@databricks.com>

…truct fields This issue has been fixed by #8215, this PR added regression test for it. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8222 from cloud-fan/minor and squashes the following commits: 0bbfb1c [Wenchen Fan] fix style... 7e2d8d9 [Wenchen Fan] add test (cherry picked from commit a4acdab) Signed-off-by: Michael Armbrust <michael@databricks.com>

…FrameWriter.jdbc This PR uses `JDBCRDD.getConnector` to load JDBC driver before creating connection in `DataFrameReader.jdbc` and `DataFrameWriter.jdbc`. Author: zsxwing <zsxwing@gmail.com> Closes #8232 from zsxwing/SPARK-10036 and squashes the following commits: adf75de [zsxwing] Add extraOptions to the connection properties 57f59d4 [zsxwing] Load JDBC driver in DataFrameReader.jdbc and DataFrameWriter.jdbc (cherry picked from commit f10660f) Signed-off-by: Michael Armbrust <michael@databricks.com>

…in sql expressions JIRA: https://issues.apache.org/jira/browse/SPARK-9526 This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression. Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #7855 from yjshen/property_check. (cherry picked from commit b265e28) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

…pression1. https://issues.apache.org/jira/browse/SPARK-9592 #8113 has the fundamental fix. But, if we want to minimize the number of changed lines, we can go with this one. Then, in 1.6, we merge #8113. Author: Yin Huai <yhuai@databricks.com> Closes #8172 from yhuai/lastFix and squashes the following commits: b28c42a [Yin Huai] Regression test. af87086 [Yin Huai] Fix last. (cherry picked from commit 772e7c1) Signed-off-by: Michael Armbrust <michael@databricks.com>

…ting mengxr jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8255 from feynmanliang/SPARK-10068. (cherry picked from commit fdaf17f) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Author: Sameer Abhyankar <sabhyankar@sabhyankar-MBP.Samavihome> Author: Sameer Abhyankar <sabhyankar@sabhyankar-MBP.local> Closes #7729 from sabhyankar/branch_8920. (cherry picked from commit 088b11e) Signed-off-by: Xiangrui Meng <meng@databricks.com>

…le:1.6.0 is in SBT assembly jar PR #7967 enables Spark SQL to persist Parquet tables in Hive compatible format when possible. One of the consequence is that, we have to set input/output classes to `MapredParquetInputFormat`/`MapredParquetOutputFormat`, which rely on com.twitter:parquet-hadoop:1.6.0 bundled with Hive 1.2.1. When loading such a table in Spark SQL, `o.a.h.h.ql.metadata.Table` first loads these input/output format classes, and thus classes in com.twitter:parquet-hadoop:1.6.0. However, the scope of this dependency is defined as "runtime", and is not packaged into Spark assembly jar. This results in a `ClassNotFoundException`. This issue can be worked around by asking users to add parquet-hadoop 1.6.0 via the `--driver-class-path` option. However, considering Maven build is immune to this problem, I feel it can be confusing and inconvenient for users. So this PR fixes this issue by changing scope of parquet-hadoop 1.6.0 to "compile". Author: Cheng Lian <lian@databricks.com> Closes #8198 from liancheng/spark-9974/bundle-parquet-1.6.0. (cherry picked from commit 52ae952) Signed-off-by: Reynold Xin <rxin@databricks.com>

…ure.ElementwiseProduct Add Python API, user guide and example for ml.feature.ElementwiseProduct. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8061 from yanboliang/SPARK-9768. (cherry picked from commit 0076e82) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

@SInCE

Added since tags to mllib.regression Author: Prayag Chandran <prayagchandran@gmail.com> Closes #7518 from prayagchandran/sinceTags and squashes the following commits: fa4dda2 [Prayag Chandran] Re-formatting 6c6d584 [Prayag Chandran] Corrected a few tags. Removed few unnecessary tags 1a0365f [Prayag Chandran] Reformating and adding a few more tags 89fdb66 [Prayag Chandran] SPARK-8916 [Documentation, MLlib] Add @SInCE tags to mllib.regression (cherry picked from commit 18523c1) Signed-off-by: DB Tsai <dbt@netflix.com>

Adds user guide for `PrefixSpan`, including Scala and Java example code. mengxr zhangjiajin Author: Feynman Liang <fliang@databricks.com> Closes #8253 from feynmanliang/SPARK-9898. (cherry picked from commit 0b6b017) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Author: Sandy Ryza <sandy@cloudera.com> Closes #8230 from sryza/sandy-spark-7707. (cherry picked from commit f9d1a92) Signed-off-by: Xiangrui Meng <meng@databricks.com>

…-sample KS test added doc examples for python. Author: jose.cambronero <jose.cambronero@cloudera.com> Closes #8154 from josepablocam/spark_9902. (cherry picked from commit c90c605) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8251 from vanzin/SPARK-10059. (cherry picked from commit ee093c8) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

This PR adds a short description of `ml.feature` package with code example. The Java package doc will come in a separate PR. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #8260 from mengxr/SPARK-7808. (cherry picked from commit e290029) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8265 from yu-iskw/minor-translate-comment. (cherry picked from commit a091031) Signed-off-by: Reynold Xin <rxin@databricks.com>

… is binary in ArrayData The type for array of array in Java is slightly different than array of others. cc cloud-fan Author: Davies Liu <davies@databricks.com> Closes #8250 from davies/array_binary. (cherry picked from commit 5af3838) Signed-off-by: Reynold Xin <rxin@databricks.com>

…ghts public Fix the issue that ```layers``` and ```weights``` should be public variables of ```MultilayerPerceptronClassificationModel```. Users can not get ```layers``` and ```weights``` from a ```MultilayerPerceptronClassificationModel``` currently. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8263 from yanboliang/mlp-public. (cherry picked from commit dd0614f) Signed-off-by: Xiangrui Meng <meng@databricks.com>

it might be a typo introduced at the first moment or some leftover after some renaming...... the name of the method accessing the index file is called `getBlockData` now (not `getBlockLocation` as indicated in the comments) Author: CodingCat <zhunansjtu@gmail.com> Closes #8238 from CodingCat/minor_1. (cherry picked from commit c34e9ff) Signed-off-by: Sean Owen <sowen@cloudera.com>

Parquet hard coded a JUL logger which always writes to stdout. This PR redirects it via SLF4j JUL bridge handler, so that we can control Parquet logs via `log4j.properties`. This solution is inspired by https://github.com/Parquet/parquet-mr/issues/390#issuecomment-46064909. Author: Cheng Lian <lian@databricks.com> Closes #8196 from liancheng/spark-8118/redirect-parquet-jul. (cherry picked from commit 5723d26) Signed-off-by: Cheng Lian <lian@databricks.com>

Copied model must have the same parent, but ml.IsotonicRegressionModel.copy did not set parent. Here fix it and add test case. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8637 from yanboliang/spark-10470. (cherry picked from commit f7b55db) Signed-off-by: Xiangrui Meng <meng@databricks.com>

https://issues.apache.org/jira/browse/SPARK-10441 This is the backport of #8597 for 1.5 branch. Author: Yin Huai <yhuai@databricks.com> Closes #8655 from yhuai/timestampJson-1.5.

…ion about rate limiting and backpressure Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8656 from tdas/SPARK-10492 and squashes the following commits: 986cdd6 [Tathagata Das] Added information on backpressure (cherry picked from commit 52b24a6) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

…or nested structs We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR #8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR #8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428. This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR #8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR #8515 depends on it. This should be safe since #8454 only touches testing code. Author: Cheng Lian <lian@databricks.com> Closes #8583 from liancheng/spark-10301/for-1.5.

…ream and throw a better exception when reading QueueInputDStream Output a warning when serializing QueueInputDStream rather than throwing an exception to allow unit tests use it. Moreover, this PR also throws an better exception when deserializing QueueInputDStream to make the user find out the problem easily. The previous exception is hard to understand: https://issues.apache.org/jira/browse/SPARK-8553 Author: zsxwing <zsxwing@gmail.com> Closes #8624 from zsxwing/SPARK-10071 and squashes the following commits: 847cfa8 [zsxwing] Output a warning when writing QueueInputDStream and throw a better exception when reading QueueInputDStream (cherry picked from commit 820913f) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

The YARN backend doesn't like when user code calls `System.exit`, since it cannot know the exit status and thus cannot set an appropriate final status for the application. So, for pyspark, avoid that call and instead throw an exception with the exit code. SparkSubmit handles that exception and exits with the given exit code, while YARN uses the exit code as the failure code for the Spark app. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7751 from vanzin/SPARK-9416. (cherry picked from commit f68d024)

The fix for SPARK-7736 introduced a race where a port value of "-1" could be passed down to the pyspark process, causing it to fail to connect back to the JVM. This change adds code to fix that race. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8258 from vanzin/SPARK-7736. (cherry picked from commit c1840a8)

…ld be 0.0 (original: 1.0) Small typo in the example for `LabelledPoint` in the MLLib docs. Author: Sean Paradiso <seanparadiso@gmail.com> Closes #8680 from sparadiso/docs_mllib_smalltypo. (cherry picked from commit 1dc7548) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Data Spill with UnsafeRow causes assert failure. ``` java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.writeKey(UnsafeRowSerializer.scala:75) at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:180) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:688) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:687) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:687) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:683) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:683) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:80) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ``` To reproduce that with code (thanks andrewor14): ```scala bin/spark-shell --master local --conf spark.shuffle.memoryFraction=0.005 --conf spark.shuffle.sort.bypassMergeThreshold=0 sc.parallelize(1 to 2 * 1000 * 1000, 10) .map { i => (i, i) }.toDF("a", "b").groupBy("b").avg().count() ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #8635 from chenghao-intel/unsafe_spill. (cherry picked from commit e048111) Signed-off-by: Andrew Or <andrew@databricks.com>

From JIRA: Add documentation for tungsten-sort. From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'hash' two options)." Author: Holden Karau <holden@pigscanfly.ca> Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort. (cherry picked from commit a76bde9) Signed-off-by: Andrew Or <andrew@databricks.com>

…or.cores This is a regression introduced in #4960, this commit fixes it and adds a test. tnachen andrewor14 please review, this should be an easy one. Author: Iulian Dragos <jaguarul@gmail.com> Closes #8653 from dragos/issue/mesos/fine-grained-maxExecutorCores. (cherry picked from commit f0562e8) Signed-off-by: Andrew Or <andrew@databricks.com>

…osExecutor.cores" This reverts commit 8cf1619.

Previously, project/plugins.sbt explicitly set scalaVersion to 2.10.4. This can cause issues when using a version of sbt that is compiled against a different version of Scala (for example sbt 0.13.9 uses 2.10.5). Removing this explicit setting will cause build files to be compiled and run against the same version of Scala that sbt is compiled against. Note that this only applies to the project build files (items in project/), it is distinct from the version of Scala we target for the actual spark compilation. Author: Ahir Reddy <ahirreddy@gmail.com> Closes #8709 from ahirreddy/sbt-scala-version-fix. (cherry picked from commit 9bbe33f) Signed-off-by: Sean Owen <sowen@cloudera.com>

…s" if it is too flaky If hadoopFsRelationSuites's "test all data types" is too flaky we can disable it for now. https://issues.apache.org/jira/browse/SPARK-10540 Author: Yin Huai <yhuai@databricks.com> Closes #8705 from yhuai/SPARK-10540-ignore. (cherry picked from commit 6ce0886) Signed-off-by: Yin Huai <yhuai@databricks.com>

Cherry-pick this to branch 1.5. Author: Rohit Agarwal <rohita@qubole.com> Closes #8701 from tgravescs/SPARK-9924-1.5 and squashes the following commits: 16e1c5f [Rohit Agarwal] [SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them are already running.

…l the test This commit ensures if an assertion fails within a thread, it will ultimately fail the test. Otherwise we end up potentially masking real bugs by not propagating assertion failures properly. Author: Andrew Or <andrew@databricks.com> Closes #8723 from andrewor14/fix-threading-suite. (cherry picked from commit d74c6a1) Signed-off-by: Andrew Or <andrew@databricks.com>

…asks important error information When throwing an IllegalArgumentException in SnappyCompressionCodec.init, chain the existing exception. This allows potentially important debugging info to be passed to the user. Manual testing shows the exception chained properly, and the test suite still looks fine as well. This contribution is my original work and I license the work to the project under the project's open source license. Author: Daniel Imfeld <daniel@danielimfeld.com> Closes #8725 from dimfeld/dimfeld-patch-1. (cherry picked from commit 6d83678) Signed-off-by: Sean Owen <sowen@cloudera.com>

https://issues.apache.org/jira/browse/SPARK-10554 Fixes NPE when ShutdownHook tries to cleanup temporary folders Author: Nithin Asokan <Nithin.Asokan@Cerner.com> Closes #8720 from nasokan/SPARK-10554. (cherry picked from commit 8285e3b) Signed-off-by: Sean Owen <sowen@cloudera.com>

spark.mesos.mesosExecutor.cores when launching Mesos executors (regression) (cherry picked from commit 03e8d0a) backported to branch-1.5 /cc andrewor14 Author: Iulian Dragos <jaguarul@gmail.com> Closes #8732 from dragos/issue/mesos/fine-grained-maxExecutorCores-1.5.

AmplabJenkins · 2015-09-14T04:27:12Z

Can one of the admins verify this patch?

srowen · 2015-09-14T07:12:12Z

Do you mind closing this PR? it has thousands of changed files. Please ask on user@spark.apache.org, not in a PR.

…e.version is wrong. The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1. Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8739 from sarutak/SPARK-10584. (cherry picked from commit cf2821e) Signed-off-by: Yin Huai <yhuai@databricks.com>

Fixes bug where IndexToString output schema was DoubleType. Correct me if I'm wrong, but it doesn't seem like the output needs to have any "ML Attribute" metadata. Author: Nick Pritchard <nicholas.pritchard@falkonry.com> Closes #8751 from pnpritchard/SPARK-10573. (cherry picked from commit 8a634e9) Signed-off-by: Xiangrui Meng <meng@databricks.com>

…itive Or Hive can't read it back correctly. Thanks vanzin for report this. Author: Davies Liu <davies@databricks.com> Closes #8674 from davies/positive_nano. (cherry picked from commit 7e32387) Signed-off-by: Davies Liu <davies.liu@gmail.com>

Make this lazy so that it can set the yarn mode before creating the securityManager. Author: Tom Graves <tgraves@yahoo-inc.com> Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #8719 from tgravescs/SPARK-10549.

…k Basis Read `PEAK_EXECUTION_MEMORY` using `update` to get per task partial value instead of cumulative value. I tested with this workload: ```scala val size = 1000 val repetitions = 10 val data = sc.parallelize(1 to size, 5).map(x => (util.Random.nextInt(size / repetitions),util.Random.nextDouble)).toDF("key", "value") val res = data.toDF.groupBy("key").agg(sum("value")).count ``` Before: ![image](https://cloud.githubusercontent.com/assets/4317392/9828197/07dd6874-58b8-11e5-9bd9-6ba927c38b26.png) After: ![image](https://cloud.githubusercontent.com/assets/4317392/9828151/a5ddff30-58b7-11e5-8d31-eda5dc4eae79.png) Tasks view: ![image](https://cloud.githubusercontent.com/assets/4317392/9828199/17dc2b84-58b8-11e5-92a8-be89ce4d29d1.png) cc andrewor14 I appreciate if you can give feedback on this since I think you introduced display of this metric. Author: Forest Fang <forest.fang@outlook.com> Closes #8726 from saurfang/stagepage. (cherry picked from commit fd1e8cd) Signed-off-by: Andrew Or <andrew@databricks.com>

…l the test (round 2) This is a follow-up patch to #8723. I missed one case there. Author: Andrew Or <andrew@databricks.com> Closes #8727 from andrewor14/fix-threading-suite. (cherry picked from commit 7b6c856) Signed-off-by: Andrew Or <andrew@databricks.com>

Author: Davies Liu <davies@databricks.com> Closes #8707 from davies/fix_namedtuple.

Links work now properly + consistent use of *Spark standalone cluster* (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs). Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8759 from jaceklaskowski/docs-submitting-apps. (cherry picked from commit 833be73) Signed-off-by: Reynold Xin <rxin@databricks.com>

Davies Liu and others added 30 commits August 14, 2015 22:31

[SPARK-9959] [MLLIB] Association Rules Java Compatibility

d554bf4

mengxr Author: Feynman Liang <fliang@databricks.com> Closes #8206 from feynmanliang/SPARK-9959-arules-java. (cherry picked from commit f7efda3) Signed-off-by: Xiangrui Meng <meng@databricks.com>

[SPARK-10068] [MLLIB] Adds links to MLlib types, algos, utilities lis…

bb3bb2a

…ting mengxr jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8255 from feynmanliang/SPARK-10068. (cherry picked from commit fdaf17f) Signed-off-by: Xiangrui Meng <meng@databricks.com>

[SPARK-7707] User guide and example code for KernelDensity

5de0ffb

Author: Sandy Ryza <sandy@cloudera.com> Closes #8230 from sryza/sandy-spark-7707. (cherry picked from commit f9d1a92) Signed-off-by: Xiangrui Meng <meng@databricks.com>

[SPARK-10059] [YARN] Explicitly add JSP dependencies for tests.

bfb4c84

Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8251 from vanzin/SPARK-10059. (cherry picked from commit ee093c8) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

[MINOR] Format the comment of translate at functions.scala

2803e8b

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8265 from yu-iskw/minor-translate-comment. (cherry picked from commit a091031) Signed-off-by: Reynold Xin <rxin@databricks.com>

yanboliang and others added 19 commits September 8, 2015 13:08

[SPARK-10441] [SQL] [BRANCH-1.5] Save data correctly to json.

7fd4674

https://issues.apache.org/jira/browse/SPARK-10441 This is the backport of #8597 for 1.5 branch. Author: Yin Huai <yhuai@databricks.com> Closes #8655 from yhuai/timestampJson-1.5.

Revert "[SPARK-6350] [MESOS] Fine-grained mode scheduler respects mes…

89d351b

…osExecutor.cores" This reverts commit 8cf1619.

sarutak and others added 8 commits September 14, 2015 12:06

[SPARK-10542] [PYSPARK] fix serialize namedtuple

d5c0361

Author: Davies Liu <davies@databricks.com> Closes #8707 from davies/fix_namedtuple.

asfgit closed this in 0d9ab01 Sep 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[STREAMING]There is a dependent package conflict #8738

[STREAMING]There is a dependent package conflict #8738

Uh oh!

272029252 commented Sep 14, 2015

Uh oh!

AmplabJenkins commented Sep 14, 2015

Uh oh!

srowen commented Sep 14, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[STREAMING]There is a dependent package conflict #8738

[STREAMING]There is a dependent package conflict #8738

Uh oh!

Conversation

272029252 commented Sep 14, 2015

Uh oh!

AmplabJenkins commented Sep 14, 2015

Uh oh!

srowen commented Sep 14, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants