-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5212][SQL] Add support of schema-less, custom field delimiter and SerDe for HiveQL transform #4014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #25462 has finished for PR 4014 at commit
|
|
Test build #25463 has finished for PR 4014 at commit
|
|
Test build #25549 has finished for PR 4014 at commit
|
|
Test build #25550 has finished for PR 4014 at commit
|
|
Test build #25669 has finished for PR 4014 at commit
|
|
Test build #25670 has finished for PR 4014 at commit
|
|
Test build #25699 has finished for PR 4014 at commit
|
a711657 to
ab22f7b
Compare
|
Test build #25703 has finished for PR 4014 at commit
|
|
Test build #25723 has finished for PR 4014 at commit
|
|
Test build #25724 has finished for PR 4014 at commit
|
|
Test build #25729 has finished for PR 4014 at commit
|
|
Test build #25756 has finished for PR 4014 at commit
|
|
Test build #25758 has finished for PR 4014 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better place to extract the schema (the output) is in Analyzer, HiveContext should be able to create its own rules for that, instead of doing this in Strategy. Otherwise it probably fails in resolving the attributes.
e.g.:
SELECT transform(key + 1, value) USING '/bin/cat' FROM src ORDER BY key, value`
sorry, I didn't test that, let me know if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I didn't notice that. New commit will fix it. Thanks.
|
Test build #26273 has finished for PR 4014 at commit
|
|
@rxin Would you like to take a look at this too and see if it is ready to merge? Thanks. |
|
Can you explain in the PR what is schema-less delimiter? |
|
Schema-less Map-reduce Scripts is a feature of Hive transform syntax. That is there is no
Custom delimiter is defined by
So you can use field delimiters other than default |
|
@rxin I have added the explanation for this feature. Would you have time to review this pr and see if it is ok to merge? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this extra line.
|
Thanks for working on this! It would be great if this could be updated soon so we can include it in 1.3. |
|
Test build #26516 has finished for PR 4014 at commit
|
|
@marmbrus I did some refactoring for the comments. It should be better now. |
|
Thanks! Merged to master. |
|
I just file a jira issue, https://issues.apache.org/jira/browse/SPARK-7119. @viirya can you help on investigate this? |
|
@chenghao-intel ok. |
This pr adds the support of schema-less syntax, custom field delimiter and SerDe for HiveQL's transform.