[SPARK-30113][SQL][Python] Expose mergeSchema option in PySpark's ORC APIs #26755

nchammas · 2019-12-04T02:11:37Z

What changes were proposed in this pull request?

This PR is a follow-up to #24043 and cousin of #26730. It exposes the mergeSchema option directly in the ORC APIs.

Why are the changes needed?

So the Python API matches the Scala API.

Does this PR introduce any user-facing change?

Yes, it adds a new option directly in the ORC reader method signatures.

How was this patch tested?

I tested this manually as follows:

>>> spark.range(3).write.orc('test-orc')
>>> spark.range(3).withColumnRenamed('id', 'name').write.orc('test-orc/nested')
>>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=True)
DataFrame[id: bigint, name: bigint]
>>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=False)
DataFrame[id: bigint]
>>> spark.conf.set('spark.sql.orc.mergeSchema', True)
>>> spark.read.orc('test-orc', recursiveFileLookup=True)
DataFrame[id: bigint, name: bigint]
>>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=False)
DataFrame[id: bigint]

SparkQA · 2019-12-04T02:41:21Z

Test build #114816 has finished for PR 26755 at commit 5e324af.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-12-04T02:44:05Z

Merged to master.

… APIs ### What changes were proposed in this pull request? This PR is a follow-up to apache#24043 and cousin of apache#26730. It exposes the `mergeSchema` option directly in the ORC APIs. ### Why are the changes needed? So the Python API matches the Scala API. ### Does this PR introduce any user-facing change? Yes, it adds a new option directly in the ORC reader method signatures. ### How was this patch tested? I tested this manually as follows: ``` >>> spark.range(3).write.orc('test-orc') >>> spark.range(3).withColumnRenamed('id', 'name').write.orc('test-orc/nested') >>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=True) DataFrame[id: bigint, name: bigint] >>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=False) DataFrame[id: bigint] >>> spark.conf.set('spark.sql.orc.mergeSchema', True) >>> spark.read.orc('test-orc', recursiveFileLookup=True) DataFrame[id: bigint, name: bigint] >>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=False) DataFrame[id: bigint] ``` Closes apache#26755 from nchammas/SPARK-30113-ORC-mergeSchema. Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

expose mergeSchema in Python ORC APIs

5e324af

HyukjinKwon approved these changes Dec 4, 2019

View reviewed changes

HyukjinKwon closed this in c8922d9 Dec 4, 2019

nchammas deleted the SPARK-30113-ORC-mergeSchema branch December 4, 2019 02:45

nchammas mentioned this pull request Dec 20, 2019

[SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC #26958

Closed

zero323 mentioned this pull request Jan 7, 2020

Sync with changes merged after 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 zero323/pyspark-stubs#230

Closed

47 tasks

dongjoon-hyun added the SQL label Feb 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-30113][SQL][Python] Expose mergeSchema option in PySpark's ORC APIs #26755

[SPARK-30113][SQL][Python] Expose mergeSchema option in PySpark's ORC APIs #26755

Uh oh!

nchammas commented Dec 4, 2019

Uh oh!

SparkQA commented Dec 4, 2019

Uh oh!

HyukjinKwon commented Dec 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-30113][SQL][Python] Expose mergeSchema option in PySpark's ORC APIs #26755

[SPARK-30113][SQL][Python] Expose mergeSchema option in PySpark's ORC APIs #26755

Uh oh!

Conversation

nchammas commented Dec 4, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Dec 4, 2019

Uh oh!

HyukjinKwon commented Dec 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants