Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jan 13, 2015

This pr adds the support of schema-less syntax, custom field delimiter and SerDe for HiveQL's transform.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25462 has finished for PR 4014 at commit ccee49e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25463 has finished for PR 4014 at commit b1729d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-5212][SQL] Add support of schema-less transformation [SPARK-5212][SQL] Add support of schema-less and custom field delimiter for HiveQL transform Jan 14, 2015
@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25549 has finished for PR 4014 at commit 7a48e42.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25550 has finished for PR 4014 at commit ab22f7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-5212][SQL] Add support of schema-less and custom field delimiter for HiveQL transform [SPARK-5212][SQL] Add support of schema-less, custom field delimiter and SerDe for HiveQL transform Jan 16, 2015
@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25669 has finished for PR 4014 at commit 5e0b864.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val trimed_class = outputSerdeClass.split("'")(1)
    • val trimed_class = inputSerdeClass.split("'")(1)

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25670 has finished for PR 4014 at commit 4d21956.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val trimed_class = outputSerdeClass.split("'")(1)
    • val trimed_class = inputSerdeClass.split("'")(1)

@SparkQA
Copy link

SparkQA commented Jan 17, 2015

Test build #25699 has finished for PR 4014 at commit a711657.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val trimed_class = outputSerdeClass.split("'")(1)
    • val trimed_class = inputSerdeClass.split("'")(1)

@viirya viirya force-pushed the schema_less_trans branch from a711657 to ab22f7b Compare January 17, 2015 15:04
@viirya viirya changed the title [SPARK-5212][SQL] Add support of schema-less, custom field delimiter and SerDe for HiveQL transform [SPARK-5212][SQL] Add support of schema-less, custom field delimiter for HiveQL transform Jan 17, 2015
@SparkQA
Copy link

SparkQA commented Jan 17, 2015

Test build #25703 has finished for PR 4014 at commit ab22f7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2015

Test build #25723 has finished for PR 4014 at commit 32d3046.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val trimed_class = outputSerdeClass.split("'")(1)
    • val trimed_class = inputSerdeClass.split("'")(1)

@SparkQA
Copy link

SparkQA commented Jan 18, 2015

Test build #25724 has finished for PR 4014 at commit be2c3fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val trimed_class = outputSerdeClass.split("'")(1)
    • val trimed_class = inputSerdeClass.split("'")(1)

@viirya viirya changed the title [SPARK-5212][SQL] Add support of schema-less, custom field delimiter for HiveQL transform [SPARK-5212][SQL] Add support of schema-less, custom field delimiter and SerDe for HiveQL transform Jan 18, 2015
@SparkQA
Copy link

SparkQA commented Jan 18, 2015

Test build #25729 has finished for PR 4014 at commit 799b5e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val trimed_class = outputSerdeClass.split("'")(1)
    • val trimed_class = inputSerdeClass.split("'")(1)

@SparkQA
Copy link

SparkQA commented Jan 19, 2015

Test build #25756 has finished for PR 4014 at commit 7a14f31.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val trimed_class = outputSerdeClass.split("'")(1)
    • val trimed_class = inputSerdeClass.split("'")(1)

@SparkQA
Copy link

SparkQA commented Jan 19, 2015

Test build #25758 has finished for PR 4014 at commit 9a6dc04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val trimed_class = outputSerdeClass.split("'")(1)
    • val trimed_class = inputSerdeClass.split("'")(1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better place to extract the schema (the output) is in Analyzer, HiveContext should be able to create its own rules for that, instead of doing this in Strategy. Otherwise it probably fails in resolving the attributes.

e.g.:

SELECT transform(key + 1, value) USING '/bin/cat' FROM src ORDER BY key, value`

sorry, I didn't test that, let me know if I am wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I didn't notice that. New commit will fix it. Thanks.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26273 has finished for PR 4014 at commit aa10fbd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class HiveScriptIOSchema (
    • val trimed_class = serdeClassName.split("'")(1)
    • case class ShimWritable(writable: Writable)
    • case class ShimWritable(writable: Writable)

@viirya
Copy link
Member Author

viirya commented Jan 29, 2015

@rxin Would you like to take a look at this too and see if it is ready to merge? Thanks.

@rxin
Copy link
Contributor

rxin commented Jan 29, 2015

Can you explain in the PR what is schema-less delimiter?

@viirya
Copy link
Member Author

viirya commented Jan 29, 2015

Schema-less Map-reduce Scripts is a feature of Hive transform syntax. That is there is no AS clause after USING my_script. Hive assumes that the script output contains two columns: key and value. The example SQL looks like:

SELECT TRANSFORM (key, value) USING 'cat' FROM src

Custom delimiter is defined by ROW FORMAT clause such as:

SELECT TRANSFORM (key, value) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' USING 'cat' AS (tKey, tValue) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' FROM src

So you can use field delimiters other than default \t.

@viirya
Copy link
Member Author

viirya commented Jan 31, 2015

@rxin I have added the explanation for this feature. Would you have time to review this pr and see if it is ok to merge? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this extra line.

@marmbrus
Copy link
Contributor

marmbrus commented Feb 2, 2015

Thanks for working on this! It would be great if this could be updated soon so we can include it in 1.3.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26516 has finished for PR 4014 at commit ac2d1fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class HiveScriptIOSchema (
    • val trimed_class = serdeClassName.split("'")(1)

@viirya
Copy link
Member Author

viirya commented Feb 2, 2015

@marmbrus I did some refactoring for the comments. It should be better now.

@marmbrus
Copy link
Contributor

marmbrus commented Feb 2, 2015

Thanks! Merged to master.

@chenghao-intel
Copy link
Contributor

I just file a jira issue, https://issues.apache.org/jira/browse/SPARK-7119. @viirya can you help on investigate this?

@viirya
Copy link
Member Author

viirya commented Apr 24, 2015

@chenghao-intel ok.

@viirya viirya deleted the schema_less_trans branch December 27, 2023 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants