Skip to content

Conversation

@chenghao-intel
Copy link
Contributor

This is a quick fix for query (in HiveContext) like:

SELECT transform(key + 1, value) USING '/bin/cat' FROM src;
SELECT a, b from (SELECT transform(key + 1, value) using '/bin/cat' as (a float, b string) from src) t where a = 239.0;

Ideally, we need to refactor the ScriptTransformation, which should support the custom SerDe for reader & writer. Will do that in the follow up.

@SparkQA
Copy link

SparkQA commented Jan 22, 2015

Test build #25959 has started for PR 4158 at commit c8fe7fc.

  • This patch merges cleanly.

@viirya
Copy link
Member

viirya commented Jan 22, 2015

Hi @chenghao-intel, I already did this and support for custom field delimiter and SerDe in PR #4014.

@SparkQA
Copy link

SparkQA commented Jan 22, 2015

Test build #25959 has finished for PR 4158 at commit c8fe7fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25959/
Test PASSed.

@chenghao-intel
Copy link
Contributor Author

Thank you @viirya . This is just a quick fix in my use case. Hope it merge soon. And I will give some comment in your PR.

@SparkQA
Copy link

SparkQA commented Jan 23, 2015

Test build #25996 has started for PR 4158 at commit a7b6989.

  • This patch merges cleanly.

@chenghao-intel
Copy link
Contributor Author

@viirya I've updated the code, which is a block issue for our partner, it's would be great if you can review this for me. And definitely the TODOs I leave there can be done in your PR #4014

@SparkQA
Copy link

SparkQA commented Jan 23, 2015

Test build #25996 has finished for PR 4158 at commit a7b6989.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25996/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Hive manual, there should be only two outputs key and value when no output schema is defined. So I am not sure if it is a bug because it is explictly described in the manual. I suppose that it is a well-known and expected behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for notice that. I think this's probably a bug in Hive.
I did the queries in Hive CLI:

set hive.cli.print.header=true;
select transform(key + 1, key - 1, key) using '/bin/cat' from src limit 4;

create table test2 as select transform(key + 1, key - 1, key) using '/bin/cat' from src limit 4;


And print the result of the table test2:

You will see, it's not the expected result, of key and value, that's why I added the default field name for more than 2 columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is expected results as the Hive manual describes about 'Schema-less Map-reduce Scripts
' in transform:

If there is no AS clause after USING my_script, Hive assumes that the output of the script contains 2 parts: key which is before the first tab, and value which is the rest after the first tab.

So in your results, value column gets all query outputs after the first tab. The results of table test2 is just the alignment problem caused by tabs. It should follow the same rule too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. thanks for the explanation. I will update that.

@viirya
Copy link
Member

viirya commented Jan 23, 2015

@chenghao-intel overall it looks good for me except for small comments.

@SparkQA
Copy link

SparkQA commented Jan 26, 2015

Test build #26069 has started for PR 4158 at commit 5618fa7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 26, 2015

Test build #26069 has finished for PR 4158 at commit 5618fa7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26069/
Test PASSed.

@chenghao-intel
Copy link
Contributor Author

Closing this since #4014 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants