Skip to content

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Feb 25, 2023

What changes were proposed in this pull request?

Fixes DataFrameReader to use the default source.

Why are the changes needed?

spark.read.load(path)

should work and use the default source without specifying the format.

Does this PR introduce any user-facing change?

The format doesn't need to be specified.

How was this patch tested?

Enabled related tests.

actual = self.spark.read.load(path=tmpPath)
self.assertEqual(sorted(df.collect()), sorted(actual.collect()))
self.spark.sql("SET spark.sql.sources.default=" + defaultDataSourceName)
try:
Copy link
Member Author

@ueshin ueshin Feb 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file is to make the cleanup done properly.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks Takuya

@amaliujia
Copy link
Contributor

LGTM

What is the default source BTW?

@ueshin
Copy link
Member Author

ueshin commented Feb 25, 2023

What is the default source BTW?

If format is not set, the value from SQL conf 'spark.sql.sources.default' will be used.

@hvanhovell
Copy link
Contributor

Merging.

hvanhovell pushed a commit that referenced this pull request Feb 25, 2023
… source

### What changes were proposed in this pull request?

Fixes `DataFrameReader` to use the default source.

### Why are the changes needed?

```py
spark.read.load(path)
```

should work and use the default source without specifying the format.

### Does this PR introduce _any_ user-facing change?

The `format` doesn't need to be specified.

### How was this patch tested?

Enabled related tests.

Closes #40166 from ueshin/issues/SPARK-42570/reader.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
(cherry picked from commit ad35f35)
Signed-off-by: Herman van Hovell <herman@databricks.com>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
… source

### What changes were proposed in this pull request?

Fixes `DataFrameReader` to use the default source.

### Why are the changes needed?

```py
spark.read.load(path)
```

should work and use the default source without specifying the format.

### Does this PR introduce _any_ user-facing change?

The `format` doesn't need to be specified.

### How was this patch tested?

Enabled related tests.

Closes apache#40166 from ueshin/issues/SPARK-42570/reader.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
(cherry picked from commit ad35f35)
Signed-off-by: Herman van Hovell <herman@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants