Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Apr 21, 2021

What changes were proposed in this pull request?

Extract common doc about hive format for sql-ref-syntax-ddl-create-table-hiveformat.md and sql-ref-syntax-qry-select-transform.md to refer.

image

Why are the changes needed?

Improve doc

Does this PR introduce any user-facing change?

No

How was this patch tested?

Not need

@github-actions github-actions bot added the DOCS label Apr 21, 2021
@AngersZhuuuu
Copy link
Contributor Author

I am not sure if we need to put this page in which menu page

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Test build #137710 has finished for PR 32264 at commit df246ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Test build #137711 has finished for PR 32264 at commit d98f825.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42239/

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42239/

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42238/

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42238/


### Description

Spark support Hive format in `CREATE TABLE` clause and `TRANSFORM` clause, Hive format support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark supports Hive format in `CREATE TABLE` clause and `TRANSFORM` clause,
to specify serde or text delimeter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


* **row_format**

Use the `SERDE` clause to specify a custom SerDe for one table or processing inputs and outputs data. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put it in Description at the beginning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Use the `SERDE` clause to specify a custom SerDe for one table or processing inputs and outputs data. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on.

* **SERDE**
Copy link
Contributor

@cloud-fan cloud-fan Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can merge this with the next one. SERDE serde_class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@cloud-fan
Copy link
Contributor

not sure if we need to put this page in which menu page

We don't need to put it in the menu page.

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Test build #137732 has finished for PR 32264 at commit 203f544.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42259/

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42259/

---
layout: global
title: Data Retrieval
displayTitle: Data Retrieval
Copy link
Member

@dongjoon-hyun dongjoon-hyun Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, shall we add a reference to this sql-ref-syntax-hive-format.md into sql-ref-syntax-qry.md?

Oh, got it. I saw @cloud-fan 's comment, We don't need to put it in the menu page. Nvm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, refer this in other menu doc is so strange and it's refer in two syntax doc with different type.

Spark supports Hive format in `CREATE TABLE` clause and `TRANSFORM` clause,
to specify serde or text delimeter. In `row_format`, uses the `SERDE` clause to specify a custom SerDe
for one table or processing inputs and outputs data. Otherwise, use the `DELIMITED` clause
to use the native SerDe and specify the delimiter, escape character, null character and so on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this

There are two ways to specify the `row_format`:
1. Use the `SERDE` clause to specify a custom SerDe class
2. Use the `DELIMITED` clause to specify the delimiter ... and so on for the native text Serde.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Year, more clear

@cloud-fan
Copy link
Contributor

@maropu do you want to take a look?

@SparkQA
Copy link

SparkQA commented Apr 22, 2021

Test build #137797 has finished for PR 32264 at commit 5fe64b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42326/

@SparkQA
Copy link

SparkQA commented Apr 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42326/

* **row_format**

Used for escape mechanism.
All descriptions about syntax in `row_format` can refer to [HIVE FORMAT](sql-ref-syntax-hive-format.html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Specifies the row format for input and output. See [HIVE ROW FORMAT](...) for more syntax details.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@maropu
Copy link
Member

maropu commented Apr 22, 2021

Could you add the screenshot of the new page in the PR description?

@maropu
Copy link
Member

maropu commented Apr 22, 2021

NOTE: I'm planning to backport this PR and #31010 into branch-3.1/3.0 because I think these document pages are useful for users.

@AngersZhuuuu
Copy link
Contributor Author

Could you add the screenshot of the new page in the PR description?

DOne

@SparkQA
Copy link

SparkQA commented Apr 23, 2021

Test build #137837 has finished for PR 32264 at commit 9bfa1cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42367/

@SparkQA
Copy link

SparkQA commented Apr 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42367/

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 20d68dc Apr 23, 2021
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan
Copy link
Contributor

@maropu shall we have a single backport PR or two?

@maropu
Copy link
Member

maropu commented Apr 23, 2021

They have different jira tickets, so I think its better to backport them separately. Could you? @AngersZhuuuu

@maropu
Copy link
Member

maropu commented Apr 23, 2021

Anyway, late lgtm. Thank you, @AngersZhuuuu

@maropu
Copy link
Member

maropu commented Apr 26, 2021

They have different jira tickets, so I think its better to backport them separately. Could you? @AngersZhuuuu

ping

@AngersZhuuuu
Copy link
Contributor Author

They have different jira tickets, so I think its better to backport them separately. Could you? @AngersZhuuuu

ping

Hmmm, have conflict? need me to create backport PR?

@maropu
Copy link
Member

maropu commented Apr 26, 2021

yea, yes. I couldn't cherry-pick them into the previous branches.

@AngersZhuuuu
Copy link
Contributor Author

yea, yes. I couldn't cherry-pick them into the previous branches.

Ok, ping you later when PR is ready

AngersZhuuuu added a commit to AngersZhuuuu/spark that referenced this pull request Apr 28, 2021
### What changes were proposed in this pull request?
Extract common doc about hive format for `sql-ref-syntax-ddl-create-table-hiveformat.md` and `sql-ref-syntax-qry-select-transform.md` to refer.

![image](https://user-images.githubusercontent.com/46485123/115802193-04641800-a411-11eb-827d-d92544881842.png)

### Why are the changes needed?
Improve doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes apache#32264 from AngersZhuuuu/SPARK-35159.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Sep 29, 2021
### What changes were proposed in this pull request?
Extract common doc about hive format for `sql-ref-syntax-ddl-create-table-hiveformat.md` and `sql-ref-syntax-qry-select-transform.md` to refer.

![image](https://user-images.githubusercontent.com/46485123/115802193-04641800-a411-11eb-827d-d92544881842.png)

### Why are the changes needed?
Improve doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes apache#32264 from AngersZhuuuu/SPARK-35159.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants