Skip to content

Conversation

@QiangCai
Copy link
Contributor

If partsScanned = 1, numPartsToTry = Int.MaxValue and totalParts = 200, the result of math.min(partsScanned + numPartsToTry, totalParts) will be Int.MinValue (-2147483648).

Modifies partsScanned to partsScanned.toLong and chang result of math.min to Int.

https://issues.apache.org/jira/browse/SPARK-12340

Modifies partsScanned to partsScanned.toLong and  chang result of math.min to Int.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@SparkQA
Copy link

SparkQA commented Dec 15, 2015

Test build #2217 has finished for PR 10310 at commit 37acd54.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class WrapOption(child: Expression) extends UnaryExpression\n

partsScanned maybe overstep the bounds of Int 

If partsScanned  = 1 and numPartsToTry  = Int.MaxValue, partsScanned += numPartsToTry  well be Int.MinValue. So we chang to use the size of  seq p.
@QiangCai
Copy link
Contributor Author

The number of partitions which has scanned is the size of seq p, not numPartsToTry.

So I use p.size to instead of numPartsToTry in line partsScanned += numPartsToTry

@srowen
Copy link
Member

srowen commented Dec 16, 2015

Yes I think that also makes sense, in the case where we hit the limit of totalParts.

@SparkQA
Copy link

SparkQA commented Dec 16, 2015

Test build #2220 has finished for PR 10310 at commit 56f151a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Dec 17, 2015

Can you also change take in RDD?

@QiangCai
Copy link
Contributor Author

I think it (take in RDD) is ok.

@srowen
Copy link
Member

srowen commented Dec 20, 2015

@QiangCai I think that technically var partsScanned = 0 in take is a problem since it's incremented by numPartsToTry and could overflow causing partsScanned < totalParts to always be true. It can be set to OL to make it a long. This is a very rare case where numPartsToTry is large and totalParts is close to the max integer, and n is very large.

@QiangCai
Copy link
Contributor Author

@srowen I have modified all the code, and try to keep them to be the same.

At first, the var numPartsToTry and partsScanned have been set to Long.
Secondly, because var p should be Seq[Int], so i have added toInt for the until statement

@srowen
Copy link
Member

srowen commented Dec 23, 2015

@QiangCai thanks that looks good, but this needs a rebase now.

@QiangCai
Copy link
Contributor Author

@srowen I maybe have made a mistake in git bash. Can I new another pull request to resolve this problem?

@srowen
Copy link
Member

srowen commented Dec 24, 2015

I think you just need to rebase from master and force-push the result, but do what you need to.

@QiangCai
Copy link
Contributor Author

@srowen I have new another pull request. This pull request will be closed.
another pull request #10487

@QiangCai QiangCai closed this Dec 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants