Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

@WeichenXu123 WeichenXu123 commented Sep 10, 2016

What changes were proposed in this pull request?

In order to avoid confusing user,
error message in PairRDDfunctions
Default partitioner cannot partition array keys.
is updated,
the one in partitionBy is replaced with
Specified partitioner cannot partition array keys.
other is replaced with
Specified or default partitioner cannot partition array keys.

How was this patch tested?

N/A

@srowen
Copy link
Member

srowen commented Sep 10, 2016

There are 5 instances of this check in the file -- they should all be handled the same way. I'm not sure this is accurate either because some code paths lead to these methods when HashPartitioner is used as a default. Just say that HashPartitioner can't be used? refactor one check method for this?

@WeichenXu123
Copy link
Contributor Author

oh, there are 5 similar messages..
I check the others, the others may be set the default one,
so I update their message as "Specified or default partitioner..."
but the one in partitionBy must be set by user, can't use default one,
because the partitionBy API is let user specify how the RDD to be partitioned.
other cases, the partitioner parameter is optional and if user not specify it,
it will be set into the default HashPartitioner.
Now I update the code.
thanks!

@WeichenXu123 WeichenXu123 changed the title [Spark Core][MINOR] fix partitionBy error message [Spark Core][MINOR] fix "default partitioner cannot partition array keys" error message in PairRDDfunctions Sep 10, 2016
@SparkQA
Copy link

SparkQA commented Sep 10, 2016

Test build #65206 has finished for PR 15045 at commit d423b41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2016

Test build #65207 has finished for PR 15045 at commit 6520854.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 10, 2016

Why bother saying 'specified' or 'default' at all though? it's probably even more informative to state that HashPartitioner doesn't work, no matter what the source. If the user specified HashPartitioner, that's clear. If they didn't, they'll still recognize that the other half of the message is relevant: some thing doesn't like their array keys.

def partitionBy(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
if (keyClass.isArray && partitioner.isInstanceOf[HashPartitioner]) {
throw new SparkException("Default partitioner cannot partition array keys.")
throw new SparkException("Specified partitioner cannot partition array keys.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even this method is called by other Spark code with HashPartitioner, which might lead to this error telling the user the "specified" partitioner doesn't work when the user code didn't specify a partitioner.

@SparkQA
Copy link

SparkQA commented Sep 11, 2016

Test build #65229 has finished for PR 15045 at commit f53ad51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2016

Test build #65227 has finished for PR 15045 at commit 25f1f8c.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WeichenXu123
Copy link
Contributor Author

jenkins test please

@WeichenXu123
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 12, 2016

Test build #65236 has finished for PR 15045 at commit f53ad51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 12, 2016

OK, can't hurt to be clear and specific about this. Merged to master

@asfgit asfgit closed this in 8087ecf Sep 12, 2016
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
…eys" error message in PairRDDfunctions

## What changes were proposed in this pull request?

In order to avoid confusing user,
error message in `PairRDDfunctions`
`Default partitioner cannot partition array keys.`
is updated,
the one in `partitionBy` is replaced with
`Specified partitioner cannot partition array keys.`
other is replaced with
`Specified or default partitioner cannot partition array keys.`

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes apache#15045 from WeichenXu123/fix_partitionBy_error_message.
@WeichenXu123 WeichenXu123 deleted the fix_partitionBy_error_message branch April 24, 2019 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants