Skip to content

Conversation

@272029252
Copy link

When I use the Streaming, there is a dependent package conflict.curator-client
[INFO] - org.apache.spark:spark-core_2.10:jar:1.5.0:compile
[INFO] +- org.apache.curator:curator-recipes:jar:2.4.0:compile
[INFO] | - org.apache.curator:curator-framework:jar:2.4.0:compile
[INFO] | - (org.apache.curator:curator-client:jar:2.4.0:compile - omitted for conflict with 2.1.0-incubating)
[INFO] - org.tachyonproject:tachyon-client:jar:0.7.1:compile
[INFO] - org.apache.curator:curator-client:jar:2.1.0-incubating:compile

Davies Liu and others added 30 commits August 14, 2015 22:31
The BYTE_ARRAY_OFFSET could be different in JVM with different configurations (for example, different heap size, 24 if heap > 32G, otherwise 16), so offset of UTF8String is not portable, we should handler that during serialization.

Author: Davies Liu <davies@databricks.com>

Closes #8210 from davies/serialize_utf8string.

(cherry picked from commit 7c1e568)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
…ters in doc

Tiny modification to a few comments ```sbt publishLocal``` work again.

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #8209 from hvanhovell/SPARK-9980.

(cherry picked from commit a85fb6c)
Signed-off-by: Sean Owen <sowen@cloudera.com>
We should skip unresolved `LogicalPlan`s for `PullOutNondeterministic`, as calling `output` on unresolved `LogicalPlan` will produce confusing error message.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #8203 from cloud-fan/error-msg and squashes the following commits:

1c67ca7 [Wenchen Fan] move test
7593080 [Wenchen Fan] correct error message for aggregate

(cherry picked from commit 5705672)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…reaming pyspark tests

Recently, PySpark ML streaming tests have been flaky, most likely because of the batches not being processed in time.  Proposal: Replace the use of _ssc_wait (which waits for a fixed amount of time) with a method which waits for a fixed amount of time but can terminate early based on a termination condition method.  With this, we can extend the waiting period (to make tests less flaky) but also stop early when possible (making tests faster on average, which I verified locally).

CC: mengxr tdas freeman-lab

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8087 from jkbradley/streaming-ml-tests.

(cherry picked from commit 1db7179)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
This is a WIP patch for SPARK-8844  for collecting reviews.

This bug is about reading an empty DataFrame. in readCol(),
      lapply(1:numRows, function(x) {
does not take into consideration the case where numRows = 0.

Will add unit test case.

Author: Sun Rui <rui.sun@intel.com>

Closes #7419 from sun-rui/SPARK-8844.

(cherry picked from commit 5f9ce73)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
…rrow deps

The shuffle locality patch made the DAGScheduler aware of shuffle data,
but for RDDs that have both narrow and shuffle dependencies, it can
cause them to place tasks based on the shuffle dependency instead of the
narrow one. This case is common in iterative join-based algorithms like
PageRank and ALS, where one RDD is hash-partitioned and one isn't.

Author: Matei Zaharia <matei@databricks.com>

Closes #8220 from mateiz/shuffle-loc-fix.

(cherry picked from commit cf01607)
Signed-off-by: Matei Zaharia <matei@databricks.com>
The `initialSize` argument of `ColumnBuilder.initialize()` should be the
number of rows rather than bytes.  However `InMemoryColumnarTableScan`
passes in a byte size, which makes Spark SQL allocate more memory than
necessary when building in-memory columnar buffers.

Author: Kun Xu <viper_kun@163.com>

Closes #8189 from viper-kun/errorSize.

(cherry picked from commit 182f9b7)
Signed-off-by: Cheng Lian <lian@databricks.com>
In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly handled.

For example, the schema of a Parquet file to be read can be:

```
message individual {
  required group f1 {
    optional binary f11 (utf8);
  }
}
```

while the global schema is:

```
message global {
  required group f1 {
    optional binary f11 (utf8);
    optional int32 f12;
  }
}
```

This PR fixes this issue by padding missing fields when creating actual converters.

Author: Cheng Lian <lian@databricks.com>

Closes #8228 from liancheng/spark-10005/nested-schema-merging.

(cherry picked from commit ae2370e)
Signed-off-by: Yin Huai <yhuai@databricks.com>
… a variable parameter

### Summary

- Add `lit` function
- Add `concat`, `greatest`, `least` functions

I think we need to improve `collect` function in order to implement `struct` function. Since `collect` doesn't work with arguments which includes a nested `list` variable. It seems that a list against `struct` still has `jobj` classes. So it would be better to solve this problem on another issue.

### JIRA
[[SPARK-9871] Add expression functions into SparkR which have a variable parameter - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9871)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8194 from yu-iskw/SPARK-9856.

(cherry picked from commit 26e7605)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
mengxr

Author: Feynman Liang <fliang@databricks.com>

Closes #8206 from feynmanliang/SPARK-9959-arules-java.

(cherry picked from commit f7efda3)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
…sk() fails

When inserting data into a `HadoopFsRelation`, if `commitTask()` of the writer container fails, `abortTask()` will be invoked. However, both `commitTask()` and `abortTask()` try to close the output writer(s). The problem is that, closing underlying writers may not be an idempotent operation. E.g., `ParquetRecordWriter.close()` throws NPE when called twice.

Author: Cheng Lian <lian@databricks.com>

Closes #8236 from liancheng/spark-7837/double-closing.

(cherry picked from commit 76c155d)
Signed-off-by: Cheng Lian <lian@databricks.com>
…truct fields

This issue has been fixed by #8215, this PR added regression test for it.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #8222 from cloud-fan/minor and squashes the following commits:

0bbfb1c [Wenchen Fan] fix style...
7e2d8d9 [Wenchen Fan] add test

(cherry picked from commit a4acdab)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…FrameWriter.jdbc

This PR uses `JDBCRDD.getConnector` to load JDBC driver before creating connection in `DataFrameReader.jdbc` and `DataFrameWriter.jdbc`.

Author: zsxwing <zsxwing@gmail.com>

Closes #8232 from zsxwing/SPARK-10036 and squashes the following commits:

adf75de [zsxwing] Add extraOptions to the connection properties
57f59d4 [zsxwing] Load JDBC driver in DataFrameReader.jdbc and DataFrameWriter.jdbc

(cherry picked from commit f10660f)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…in sql expressions

JIRA: https://issues.apache.org/jira/browse/SPARK-9526

This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression.

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #7855 from yjshen/property_check.

(cherry picked from commit b265e28)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
…pression1.

https://issues.apache.org/jira/browse/SPARK-9592

#8113 has the fundamental fix. But, if we want to minimize the number of changed lines, we can go with this one. Then, in 1.6, we merge #8113.

Author: Yin Huai <yhuai@databricks.com>

Closes #8172 from yhuai/lastFix and squashes the following commits:

b28c42a [Yin Huai] Regression test.
af87086 [Yin Huai] Fix last.

(cherry picked from commit 772e7c1)
Signed-off-by: Michael Armbrust <michael@databricks.com>
…ting

mengxr jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8255 from feynmanliang/SPARK-10068.

(cherry picked from commit fdaf17f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Author: Sameer Abhyankar <sabhyankar@sabhyankar-MBP.Samavihome>
Author: Sameer Abhyankar <sabhyankar@sabhyankar-MBP.local>

Closes #7729 from sabhyankar/branch_8920.

(cherry picked from commit 088b11e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
…le:1.6.0 is in SBT assembly jar

PR #7967 enables Spark SQL to persist Parquet tables in Hive compatible format when possible. One of the consequence is that, we have to set input/output classes to `MapredParquetInputFormat`/`MapredParquetOutputFormat`, which rely on com.twitter:parquet-hadoop:1.6.0 bundled with Hive 1.2.1.

When loading such a table in Spark SQL, `o.a.h.h.ql.metadata.Table` first loads these input/output format classes, and thus classes in com.twitter:parquet-hadoop:1.6.0.  However, the scope of this dependency is defined as "runtime", and is not packaged into Spark assembly jar.  This results in a `ClassNotFoundException`.

This issue can be worked around by asking users to add parquet-hadoop 1.6.0 via the `--driver-class-path` option.  However, considering Maven build is immune to this problem, I feel it can be confusing and inconvenient for users.

So this PR fixes this issue by changing scope of parquet-hadoop 1.6.0 to "compile".

Author: Cheng Lian <lian@databricks.com>

Closes #8198 from liancheng/spark-9974/bundle-parquet-1.6.0.

(cherry picked from commit 52ae952)
Signed-off-by: Reynold Xin <rxin@databricks.com>
…ure.ElementwiseProduct

Add Python API, user guide and example for ml.feature.ElementwiseProduct.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8061 from yanboliang/SPARK-9768.

(cherry picked from commit 0076e82)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Added since tags to mllib.regression

Author: Prayag Chandran <prayagchandran@gmail.com>

Closes #7518 from prayagchandran/sinceTags and squashes the following commits:

fa4dda2 [Prayag Chandran] Re-formatting
6c6d584 [Prayag Chandran] Corrected a few tags. Removed few unnecessary tags
1a0365f [Prayag Chandran] Reformating and adding a few more tags
89fdb66 [Prayag Chandran] SPARK-8916 [Documentation, MLlib] Add @SInCE tags to mllib.regression

(cherry picked from commit 18523c1)
Signed-off-by: DB Tsai <dbt@netflix.com>
Adds user guide for `PrefixSpan`, including Scala and Java example code.

mengxr zhangjiajin

Author: Feynman Liang <fliang@databricks.com>

Closes #8253 from feynmanliang/SPARK-9898.

(cherry picked from commit 0b6b017)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Author: Sandy Ryza <sandy@cloudera.com>

Closes #8230 from sryza/sandy-spark-7707.

(cherry picked from commit f9d1a92)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
…-sample KS test

added doc examples for python.

Author: jose.cambronero <jose.cambronero@cloudera.com>

Closes #8154 from josepablocam/spark_9902.

(cherry picked from commit c90c605)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8251 from vanzin/SPARK-10059.

(cherry picked from commit ee093c8)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
This PR adds a short description of `ml.feature` package with code example. The Java package doc will come in a separate PR. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #8260 from mengxr/SPARK-7808.

(cherry picked from commit e290029)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8265 from yu-iskw/minor-translate-comment.

(cherry picked from commit a091031)
Signed-off-by: Reynold Xin <rxin@databricks.com>
… is binary in ArrayData

The type for array of array in Java is slightly different than array of others.

cc cloud-fan

Author: Davies Liu <davies@databricks.com>

Closes #8250 from davies/array_binary.

(cherry picked from commit 5af3838)
Signed-off-by: Reynold Xin <rxin@databricks.com>
…ghts public

Fix the issue that ```layers``` and ```weights``` should be public variables of ```MultilayerPerceptronClassificationModel```. Users can not get ```layers``` and ```weights``` from a ```MultilayerPerceptronClassificationModel``` currently.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8263 from yanboliang/mlp-public.

(cherry picked from commit dd0614f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
it might be a typo  introduced at the first moment or some leftover after some renaming......

the name of the method accessing the index file is called `getBlockData` now (not `getBlockLocation` as indicated in the comments)

Author: CodingCat <zhunansjtu@gmail.com>

Closes #8238 from CodingCat/minor_1.

(cherry picked from commit c34e9ff)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Parquet hard coded a JUL logger which always writes to stdout. This PR redirects it via SLF4j JUL bridge handler, so that we can control Parquet logs via `log4j.properties`.

This solution is inspired by https://github.com/Parquet/parquet-mr/issues/390#issuecomment-46064909.

Author: Cheng Lian <lian@databricks.com>

Closes #8196 from liancheng/spark-8118/redirect-parquet-jul.

(cherry picked from commit 5723d26)
Signed-off-by: Cheng Lian <lian@databricks.com>
yanboliang and others added 19 commits September 8, 2015 13:08
Copied model must have the same parent, but ml.IsotonicRegressionModel.copy did not set parent.
Here fix it and add test case.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8637 from yanboliang/spark-10470.

(cherry picked from commit f7b55db)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
https://issues.apache.org/jira/browse/SPARK-10441

This is the backport of #8597 for 1.5 branch.

Author: Yin Huai <yhuai@databricks.com>

Closes #8655 from yhuai/timestampJson-1.5.
…ion about rate limiting and backpressure

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8656 from tdas/SPARK-10492 and squashes the following commits:

986cdd6 [Tathagata Das] Added information on backpressure

(cherry picked from commit 52b24a6)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
…or nested structs

We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR #8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR #8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428.

This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR #8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR #8515 depends on it. This should be safe since #8454 only touches testing code.

Author: Cheng Lian <lian@databricks.com>

Closes #8583 from liancheng/spark-10301/for-1.5.
…ream and throw a better exception when reading QueueInputDStream

Output a warning when serializing QueueInputDStream rather than throwing an exception to allow unit tests use it. Moreover, this PR also throws an better exception when deserializing QueueInputDStream to make the user find out the problem easily. The previous exception is hard to understand: https://issues.apache.org/jira/browse/SPARK-8553

Author: zsxwing <zsxwing@gmail.com>

Closes #8624 from zsxwing/SPARK-10071 and squashes the following commits:

847cfa8 [zsxwing] Output a warning when writing QueueInputDStream and throw a better exception when reading QueueInputDStream

(cherry picked from commit 820913f)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
The YARN backend doesn't like when user code calls `System.exit`,
since it cannot know the exit status and thus cannot set an
appropriate final status for the application.

So, for pyspark, avoid that call and instead throw an exception with
the exit code. SparkSubmit handles that exception and exits with
the given exit code, while YARN uses the exit code as the failure
code for the Spark app.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #7751 from vanzin/SPARK-9416.

(cherry picked from commit f68d024)
The fix for SPARK-7736 introduced a race where a port value of "-1"
could be passed down to the pyspark process, causing it to fail to
connect back to the JVM. This change adds code to fix that race.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8258 from vanzin/SPARK-7736.

(cherry picked from commit c1840a8)
…ld be 0.0 (original: 1.0)

Small typo in the example for `LabelledPoint` in the MLLib docs.

Author: Sean Paradiso <seanparadiso@gmail.com>

Closes #8680 from sparadiso/docs_mllib_smalltypo.

(cherry picked from commit 1dc7548)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Data Spill with UnsafeRow causes assert failure.

```
java.lang.AssertionError: assertion failed
	at scala.Predef$.assert(Predef.scala:165)
	at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.writeKey(UnsafeRowSerializer.scala:75)
	at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:180)
	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:688)
	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:687)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:687)
	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:683)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:683)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:80)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:88)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
```

To reproduce that with code (thanks andrewor14):
```scala
bin/spark-shell --master local
  --conf spark.shuffle.memoryFraction=0.005
  --conf spark.shuffle.sort.bypassMergeThreshold=0

sc.parallelize(1 to 2 * 1000 * 1000, 10)
  .map { i => (i, i) }.toDF("a", "b").groupBy("b").avg().count()
```

Author: Cheng Hao <hao.cheng@intel.com>

Closes #8635 from chenghao-intel/unsafe_spill.

(cherry picked from commit e048111)
Signed-off-by: Andrew Or <andrew@databricks.com>
From JIRA:
Add documentation for tungsten-sort.
From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in
https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its
corresponding description in
http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty
there are only 'sort' and 'hash' two options)."

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort.

(cherry picked from commit a76bde9)
Signed-off-by: Andrew Or <andrew@databricks.com>
…or.cores

This is a regression introduced in #4960, this commit fixes it and adds a test.

tnachen andrewor14 please review, this should be an easy one.

Author: Iulian Dragos <jaguarul@gmail.com>

Closes #8653 from dragos/issue/mesos/fine-grained-maxExecutorCores.

(cherry picked from commit f0562e8)
Signed-off-by: Andrew Or <andrew@databricks.com>
Previously, project/plugins.sbt explicitly set scalaVersion to 2.10.4. This can cause issues when using a version of sbt that is compiled against a different version of Scala (for example sbt 0.13.9 uses 2.10.5). Removing this explicit setting will cause build files to be compiled and run against the same version of Scala that sbt is compiled against.

Note that this only applies to the project build files (items in project/), it is distinct from the version of Scala we target for the actual spark compilation.

Author: Ahir Reddy <ahirreddy@gmail.com>

Closes #8709 from ahirreddy/sbt-scala-version-fix.

(cherry picked from commit 9bbe33f)
Signed-off-by: Sean Owen <sowen@cloudera.com>
…s" if it is too flaky

If hadoopFsRelationSuites's "test all data types" is too flaky we can disable it for now.

https://issues.apache.org/jira/browse/SPARK-10540

Author: Yin Huai <yhuai@databricks.com>

Closes #8705 from yhuai/SPARK-10540-ignore.

(cherry picked from commit 6ce0886)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Cherry-pick this to branch 1.5.

Author: Rohit Agarwal <rohita@qubole.com>

Closes #8701 from tgravescs/SPARK-9924-1.5 and squashes the following commits:

16e1c5f [Rohit Agarwal] [SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them are already running.
…l the test

This commit ensures if an assertion fails within a thread, it will ultimately fail the test. Otherwise we end up potentially masking real bugs by not propagating assertion failures properly.

Author: Andrew Or <andrew@databricks.com>

Closes #8723 from andrewor14/fix-threading-suite.

(cherry picked from commit d74c6a1)
Signed-off-by: Andrew Or <andrew@databricks.com>
…asks important error information

When throwing an IllegalArgumentException in SnappyCompressionCodec.init, chain the existing exception. This allows potentially important debugging info to be passed to the user.

Manual testing shows the exception chained properly, and the test suite still looks fine as well.

This contribution is my original work and I license the work to the project under the project's open source license.

Author: Daniel Imfeld <daniel@danielimfeld.com>

Closes #8725 from dimfeld/dimfeld-patch-1.

(cherry picked from commit 6d83678)
Signed-off-by: Sean Owen <sowen@cloudera.com>
https://issues.apache.org/jira/browse/SPARK-10554

Fixes NPE when ShutdownHook tries to cleanup temporary folders

Author: Nithin Asokan <Nithin.Asokan@Cerner.com>

Closes #8720 from nasokan/SPARK-10554.

(cherry picked from commit 8285e3b)
Signed-off-by: Sean Owen <sowen@cloudera.com>
spark.mesos.mesosExecutor.cores when launching Mesos executors (regression)

(cherry picked from commit 03e8d0a)

backported to branch-1.5 /cc andrewor14

Author: Iulian Dragos <jaguarul@gmail.com>

Closes #8732 from dragos/issue/mesos/fine-grained-maxExecutorCores-1.5.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Sep 14, 2015

Do you mind closing this PR? it has thousands of changed files. Please ask on user@spark.apache.org, not in a PR.

sarutak and others added 8 commits September 14, 2015 12:06
…e.version is wrong.

The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1.
Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #8739 from sarutak/SPARK-10584.

(cherry picked from commit cf2821e)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Fixes bug where IndexToString output schema was DoubleType. Correct me if I'm wrong, but it doesn't seem like the output needs to have any "ML Attribute" metadata.

Author: Nick Pritchard <nicholas.pritchard@falkonry.com>

Closes #8751 from pnpritchard/SPARK-10573.

(cherry picked from commit 8a634e9)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
…itive

Or Hive can't read it back correctly.

Thanks vanzin for report this.

Author: Davies Liu <davies@databricks.com>

Closes #8674 from davies/positive_nano.

(cherry picked from commit 7e32387)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
Make this lazy so that it can set the yarn mode before creating the securityManager.

Author: Tom Graves <tgraves@yahoo-inc.com>
Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>

Closes #8719 from tgravescs/SPARK-10549.
…k Basis

Read `PEAK_EXECUTION_MEMORY` using `update` to get per task partial value instead of cumulative value.

I tested with this workload:

```scala
val size = 1000
val repetitions = 10
val data = sc.parallelize(1 to size, 5).map(x => (util.Random.nextInt(size / repetitions),util.Random.nextDouble)).toDF("key", "value")
val res = data.toDF.groupBy("key").agg(sum("value")).count
```

Before:
![image](https://cloud.githubusercontent.com/assets/4317392/9828197/07dd6874-58b8-11e5-9bd9-6ba927c38b26.png)

After:
![image](https://cloud.githubusercontent.com/assets/4317392/9828151/a5ddff30-58b7-11e5-8d31-eda5dc4eae79.png)

Tasks view:
![image](https://cloud.githubusercontent.com/assets/4317392/9828199/17dc2b84-58b8-11e5-92a8-be89ce4d29d1.png)

cc andrewor14 I appreciate if you can give feedback on this since I think you introduced display of this metric.

Author: Forest Fang <forest.fang@outlook.com>

Closes #8726 from saurfang/stagepage.

(cherry picked from commit fd1e8cd)
Signed-off-by: Andrew Or <andrew@databricks.com>
…l the test (round 2)

This is a follow-up patch to #8723. I missed one case there.

Author: Andrew Or <andrew@databricks.com>

Closes #8727 from andrewor14/fix-threading-suite.

(cherry picked from commit 7b6c856)
Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Davies Liu <davies@databricks.com>

Closes #8707 from davies/fix_namedtuple.
Links work now properly + consistent use of *Spark standalone cluster* (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs).

Author: Jacek Laskowski <jacek.laskowski@deepsense.io>

Closes #8759 from jaceklaskowski/docs-submitting-apps.

(cherry picked from commit 833be73)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 0d9ab01 Sep 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.