-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24268][SQL] Use datatype.simpleString in error messages #21321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala Line 756 in b6c50d7
Then, you need to update the output results in |
|
Test build #90586 has finished for PR 21321 at commit
|
|
thanks @maropu I missed that one. I'll update it shortly, thanks. |
|
Test build #90638 has finished for PR 21321 at commit
|
|
cc @gatorsmile |
|
kindly ping @gatorsmile |
1 similar comment
|
kindly ping @gatorsmile |
holdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a quick question for you from skimming this PR.
| dataType.isInstanceOf[StringType] || | ||
| dataType.isInstanceOf[BooleanType], | ||
| s"FeatureHasher requires columns to be of NumericType, BooleanType or StringType. " + | ||
| s"FeatureHasher requires columns to be of ${NumericType.simpleString}, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original PR it doesn't seem like we rewrote the constant types only the dynamic ones (and this PR also doesn't seem to consistently rewrite the constant types referenced). What's the reason/how do you decide which ones you want to rewrite to ref simpleString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR rewrites always constant type referenced. I am not sure why you are saying it is not. If I missed some places, then it was just because I haven't seen them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's my bad, looking through it I saw some raw type names but those are all in the suites which makes sense.
holdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about if we want to change to using SchemaUtils rather than just changing the error messages.
Some places which might have been missed in this PR under ml/ which I did a quick check with grep -r "Type" ./mllib/src/main/scala/org/apache/spark/ml |grep -i require (my bash regex might not be great but you get the idea):
./mllib/src/main/scala/org/apache/spark/ml/feature/DCT.scala: require(inputType.isInstanceOf[VectorUDT], s"Input type must be VectorUDT but got $inputType.")
./mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala: require(inputType == StringType, s"Input type must be string type but got $inputType.")
StringIndexer.scala
If you have a chance to double check those that would be great!
| fields.foreach { fieldSchema => | ||
| val dataType = fieldSchema.dataType | ||
| val fieldName = fieldSchema.name | ||
| require(dataType.isInstanceOf[NumericType] || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to use SchemaUtils checkColumnTypes here -- what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't do that because NumericType is an AbstractDataType and not a DataType so I couldn't use that method.
|
Thanks for your kind and nice review @holdenk, I addressed the places I missed. Thank you! |
|
Test build #92470 has finished for PR 21321 at commit
|
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks worth giving a try since it's an error message and we can revert anytime if we see some other concerns.
One thing I am hesitant is, IIRC @hvanhovell preferred catalogString. I will leave a cc for him.
|
@mgaido91, mind rebasing this please? Let me just get this in. |
|
sorry for the late answer @HyukjinKwon. I just rebased it, thanks. |
|
Test build #92747 has finished for PR 21321 at commit
|
|
Merged to master. |
|
I think the fix is wrong. We should not use simpleString but catalogString, because simpleString will do the truncation. |
|
Let me revert the changes. Please re-submit the fix. |
|
That was the thing I was hesitant of. I was kind of confused recently about this because @mgaido91, mind opening a PR with |
|
@HyukjinKwon The whole PR is doing the wrong things. That is why I reverted it. I do not want the others to follow this PR. For the other PRs whose main objective are not to use |
|
Ah, it's fine. This one is already reverted and I don't mind. Not a big deal actually. Glad that we are on the same page about |
|
@gatorsmile I used |
|
For instance, #20064 at SPARK-22893, right? Let's fix them too. Yes, since this one is reverted, we should better stick to I only haven't left some comments about this so far only because my impression was that we were not yet sure between Now it looks at least including me three committer prefers |
|
|
What changes were proposed in this pull request?
SPARK-22893 tried to unify error messages about dataTypes. Unfortunately, still many places were missing the
simpleStringmethod in other to have the same representation everywhere.The PR unified the messages using alway the simpleString representation of the dataTypes in the messages.
How was this patch tested?
existing/modified UTs