-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12575][SQL] Grammar parity with existing SQL parser #10745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ith charset names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not happy about this one: we are using an unconfigured parser here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not pretty but: you could get the conf from the SQLContext getOrCreate / getActive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do that, we should make sure we have a default one when there is no active context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also move this to SQLImplicits, or SQLContext for that matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't work for Java though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, totally forgot about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allmost all tests have been moved to CatalystQl suite in a previous PR.
|
test this please |
|
Test build #49347 has finished for PR 10745 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any other database systems that allow this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MySQL apparently supports this: https://docs.oracle.com/cd/E17952_01/refman-5.5-en/charset-literal.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to a few more people about this. I think it's best to just drop this feature. Only MySQL and Hive support this, and we cannot support the identical syntax anyway. I'd say if this is a really desired feature, we can just build a function for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll drop the feature.
|
Looks pretty good overall. |
|
retest this please |
|
Test build #49394 has finished for PR 10745 at commit
|
|
Test build #49409 has finished for PR 10745 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these copied from hive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is what was supported in the old SqlParser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you trying to support both hive's and our interval literal grammar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive does not support multi time unit interval, such as: 1 year 3 month 10 milliseconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you trying to support both hive's and our interval literal grammar?
In this case I am trying to do support both. Our interval grammar can be seen as an extention to hive's interval grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on supporting both. actually we have to here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we check elements.isEmpty first and throw exception if needed, and then foldLeft? then we don't need this updated variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually instead of foldLeft with two values, it might be easier and more clear to write this as a loop and just mutate two variables.
|
Test build #49425 has finished for PR 10745 at commit
|
|
retest this please |
|
getting weird seemingly unrelated python error. |
|
Test build #49457 has finished for PR 10745 at commit
|
…r function. Add some docs.
|
LGTM. Will merge once tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HyperLogLogPlusPlus was failing because I was passing it Decimal literals. I thought I could solve this by casting. While this is not relevant anymore, I still think this is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputTypes is used to check input types, however, HyperLogLogPlusPlus only have one input, see https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala#L132.
So we don't need to give 2 type constraints here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right. I am fixing it now.
|
Test build #49470 has finished for PR 10745 at commit
|
|
Test build #49483 has finished for PR 10745 at commit
|
|
LGTM |
|
Thanks - going to merge this. |
In this PR the new CatalystQl parser stack reaches grammar parity with the old Parser-Combinator based SQL Parser. This PR also replaces all uses of the old Parser, and removes it from the code base.
Although the existing Hive and SQL parser dialects were mostly the same, some kinks had to be worked out:
APPROXIMATE(0.01) COUNT(DISTINCT a). In order to make this work we needed to hardcode approximate operators in the parser, or we would have to create an approximate expression.APPROXIMATE_COUNT_DISTINCT(a, 0.01)would also do the job and is much easier to maintain. So, this PR removes this keyword.LIMITclauses in nested queries. This is not supported anymore. See [SPARK-12745] [SQL] Hive Parser: Limit is not supported inside Set Operation #10689 for the rationale for this._ISO-8859-1 0x4341464562616265would yield this string:CAFEbabe. Hive will only allow charset names to start with an underscore. This is quite annoying in spark because as soon as you use a tuple names will start with an underscore. In this PR we remove this feature from the parser. It would be quite easy to implement such a feature as an Expression later on.Doublewhereas the SQL Parser would convert a non-scientific decimal into aBigDecimal, and would turn a scientific decimal into a Double. We follow Hive's behavior here. The new parser supports a big decimal literal, for instance:81923801.42BD, which can be used when a big decimal is needed.cc @rxin @viirya @marmbrus @yhuai @cloud-fan