Skip to content

Merging fixes from version 8.5.0 upstream#11

Merged
c3-avidmych merged 2 commits intomerge-8.5.0-upstreamfrom
server-version-8.5.0
Nov 12, 2024
Merged

Merging fixes from version 8.5.0 upstream#11
c3-avidmych merged 2 commits intomerge-8.5.0-upstreamfrom
server-version-8.5.0

Conversation

@c3-avidmych
Copy link

Merging commits upstream

@c3-avidmych c3-avidmych self-assigned this Nov 12, 2024
@c3-avidmych c3-avidmych merged commit b9cacea into merge-8.5.0-upstream Nov 12, 2024
c3-ChenyangZhang pushed a commit that referenced this pull request Jan 20, 2026
…plan properly

### What changes were proposed in this pull request?
Make `ResolveRelations` handle plan id properly

cherry-pick bugfix apache#45214 to 3.5

### Why are the changes needed?
bug fix for Spark Connect, it won't affect classic Spark SQL

before this PR:
```
from pyspark.sql import functions as sf

spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1")
spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2")

df1 = spark.read.table("test_table_1")
df2 = spark.read.table("test_table_2")
df3 = spark.read.table("test_table_1")

join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2)
join2 = df3.join(join1, how="left", on=join1.index==df3.id)

join2.schema
```

fails with
```
AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704
```

That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect

```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations ===
 '[#12]Join LeftOuter, '`==`('index, 'id)                     '[#12]Join LeftOuter, '`==`('index, 'id)
!:- '[#9]UnresolvedRelation [test_table_1], [], false         :- '[#9]SubqueryAlias spark_catalog.default.test_table_1
!+- '[#11]Project ['index, 'value_2]                          :  +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false
!   +- '[#10]Join Inner, '`==`('id, 'index)                   +- '[#11]Project ['index, 'value_2]
!      :- '[#7]UnresolvedRelation [test_table_1], [], false      +- '[#10]Join Inner, '`==`('id, 'index)
!      +- '[#8]UnresolvedRelation [test_table_2], [], false         :- '[#9]SubqueryAlias spark_catalog.default.test_table_1
!                                                                   :  +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false
!                                                                   +- '[#8]SubqueryAlias spark_catalog.default.test_table_2
!                                                                      +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false

Can not resolve 'id with plan 7
```

`[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one
```
:- '[#9]SubqueryAlias spark_catalog.default.test_table_1
   +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false
```

### Does this PR introduce _any_ user-facing change?
yes, bug fix

### How was this patch tested?
added ut

### Was this patch authored or co-authored using generative AI tooling?
ci

Closes apache#46291 from zhengruifeng/connect_fix_read_join_35.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
c3-ChenyangZhang pushed a commit that referenced this pull request Jan 20, 2026
…erfile` building

### What changes were proposed in this pull request?

This PR aims to add `libwebp-dev` to recover `spark-rm/Dockerfile` building.

### Why are the changes needed?

`Apache Spark` release docker image compilation has been broken for last 7 days due to the SparkR package compilation.
- https://github.com/apache/spark/actions/workflows/release.yml
    - https://github.com/apache/spark/actions/runs/17425825244

```
#11 559.4 No package 'libwebpmux' found
...
#11 559.4 -------------------------- [ERROR MESSAGE] ---------------------------
#11 559.4 <stdin>:1:10: fatal error: ft2build.h: No such file or directory
#11 559.4 compilation terminated.
#11 559.4 --------------------------------------------------------------------
#11 559.4 ERROR: configuration failed for package 'ragg'
```

### Does this PR introduce _any_ user-facing change?

No, this is a fix for Apache Spark release tool.

### How was this patch tested?

Manually build.

```
$ cd dev/create-release/spark-rm
$ docker build .
```

**BEFORE**

```
...
Dockerfile:83
--------------------
  82 |     # See more in SPARK-39959, roxygen2 < 7.2.1
  83 | >>> RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown',  \
  84 | >>>     'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow',  \
  85 | >>>     'ggplot2', 'mvtnorm', 'statmod', 'xml2'), repos='https://cloud.r-project.org/')" && \
  86 | >>>     Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='https://cloud.r-project.org')" && \
  87 | >>>     Rscript -e "devtools::install_version('lintr', version='2.0.1', repos='https://cloud.r-project.org')" && \
  88 | >>>     Rscript -e "devtools::install_version('pkgdown', version='2.0.1', repos='https://cloud.r-project.org')" && \
  89 | >>>     Rscript -e "devtools::install_version('preferably', version='0.4', repos='https://cloud.r-project.org')"
  90 |
--------------------
ERROR: failed to build: failed to solve:
```

**AFTER**
```
...
 => [ 6/22] RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/'                                                             3.8s
 => [ 7/22] RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown',      'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow',       892.2s
 => [ 8/22] RUN add-apt-repository ppa:pypy/ppa                                                                                                                15.3s
...
```

After merging this PR, we can validate via the daily release dry-run CI.

- https://github.com/apache/spark/actions/workflows/release.yml

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52340 from peter-toth/SPARK-53539-3.5.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments