Skip to content

Conversation

@codlife
Copy link
Contributor

@codlife codlife commented Sep 13, 2016

What changes were proposed in this pull request?

when i use sc.makeRDD below

val data3 = sc.makeRDD(Seq())
println(data3.partitions.length)

I got an error:
Exception in thread "main" java.lang.IllegalArgumentException: Positive number of slices required

We can fix this bug just modify the last line ,do a check of seq.size

  def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope {
    assertNotStopped()
    val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
    new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs)
  }

How was this patch tested?

manual tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

@codlife codlife changed the title fg [SPARK-17521]Error when I use sparkContext.makeRDD(Seq()) Sep 13, 2016
@codlife codlife closed this Sep 13, 2016
@codlife codlife reopened this Sep 13, 2016
assertNotStopped()
val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
new ParallelCollectionRDD[T](this, seq.map(_._1), seq.size, indexToPrefs)
new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say math.max(seq.size, 1). Really this method would normally just use the provided partition count (called "numSlices" in this old API) but this one doesn't have that parameter, which is more reason it's an odd man out. Still I think the most reasonable behavior is to use at least 1 partition.

Copy link
Contributor Author

@codlife codlife Sep 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the same with sc.parallelize, I think the defalutParallelism is reasonable,
to let the code below be same, I think we should use defaultParallelism,

val rdd = sc.makeRDD(Seq())
val rdd = sc.parallelize(Seq())

but which one to use, you can make a decision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the default is OK because it's changeable, but here someone has no way to change it. I think it might be better to stay conservative.

Really this is such a corner case that it doesn't matter much. It only showed up for you because you specified no type on your Seq. If you had, it would have chosen the other overload which works fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok ,thanks for your explain.

@srowen
Copy link
Member

srowen commented Sep 14, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Sep 14, 2016

Test build #65362 has finished for PR 15077 at commit f454668.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 15, 2016

Merged to master/2.0

@asfgit asfgit closed this in 647ee05 Sep 15, 2016
asfgit pushed a commit that referenced this pull request Sep 15, 2016
## What changes were proposed in this pull request?

 when i use sc.makeRDD below
```
val data3 = sc.makeRDD(Seq())
println(data3.partitions.length)
```
I got an error:
Exception in thread "main" java.lang.IllegalArgumentException: Positive number of slices required

We can fix this bug just modify the last line ,do a check of seq.size
```
  def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope {
    assertNotStopped()
    val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
    new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs)
  }
```

## How was this patch tested?

 manual tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Author: codlife <1004910847@qq.com>
Author: codlife <wangjianfei15@otcaix.iscas.ac.cn>

Closes #15077 from codlife/master.

(cherry picked from commit 647ee05)
Signed-off-by: Sean Owen <sowen@cloudera.com>
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?

 when i use sc.makeRDD below
```
val data3 = sc.makeRDD(Seq())
println(data3.partitions.length)
```
I got an error:
Exception in thread "main" java.lang.IllegalArgumentException: Positive number of slices required

We can fix this bug just modify the last line ,do a check of seq.size
```
  def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope {
    assertNotStopped()
    val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
    new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs)
  }
```

## How was this patch tested?

 manual tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Author: codlife <1004910847@qq.com>
Author: codlife <wangjianfei15@otcaix.iscas.ac.cn>

Closes apache#15077 from codlife/master.
zzcclp added a commit to zzcclp/spark that referenced this pull request Sep 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants