Spark 4.1: Add tests for MERGE INTO schema evolution nested case #15028

varun-lakhyani · 2026-01-11T17:26:33Z

Extends Tests for PR #14970

This PR creates a test for MERGE INTO schema evolution in nested case where source has fewer fields than target and by default it keeps the non existing fields in target intact and null for new source rows.

Spark Updates can be found at https://issues.apache.org/jira/browse/SPARK-54274
This specific feature https://issues.apache.org/jira/browse/SPARK-54621

It also renames an existing nested test like other tests so can be better understood
(testMergeWithSchemaEvolutionNestedStruct() -> testMergeWithSchemaEvolutionNestedStructSourceHasMoreFields())

varun-lakhyani · 2026-01-11T18:43:15Z

PTAL @huaxingao @singhpk234 @szehon-ho

I went through the discussions in Spark PRs related to this feature.
Please let me know if I missed anything.

szehon-ho

Yea , was going to do this, you beat me to it :)

The only missing part here is that there is a flag that needs to be set (as its disabled by default in Spark 4.1.x). We will probably turn it on in later version if there are no complaints on the behavior

…TypeCoercion flag for test

varun-lakhyani · 2026-01-13T02:29:42Z

@szehon-ho Thanks for the original features and tests, This new test is a small tweak based on your existing one.

mergeNestedTypeCoercion flag is enabled for test and then unset it to default state afterwards,
Since 4.1.1 is is active in codebase, the test is now enabled, passing and ready for review.

Please let me know if any further changes are needed. @huaxingao @singhpk234

varun-lakhyani · 2026-01-14T11:04:09Z

PTAL @huaxingao @singhpk234

szehon-ho

looks mostly fine, but i wonder if we should hold the pr until 4.1.1 is supported in iceberg

szehon-ho · 2026-01-16T00:20:34Z

...k-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMergeSchemaEvolution.java

+  public void testMergeWithSchemaEvolutionNestedStructSourceHasFewerFields() {
+    assumeThat(branch).as("Schema evolution does not work for branches currently").isNull();
+
+    spark.conf().set("spark.sql.mergeNestedTypeCoercion.enabled", "true");


should be in try/catch. Do we have any utility in iceberg spark test cases to wrap code in spark configs?

Yes correct, There is withSQLConf at spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/TestBase.java::186 which handles spark configs required for some action and It handles try block in its implementation.
I took reference from testMergeToWapBranch() at spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java::2914 which implements similar thing.

Ready for review.

szehon-ho · 2026-01-16T00:21:43Z

sorry, just re-read and see test is running already against 4.1.1 , that's great. only one comment

singhpk234

LGTM as well, thanks @varun-lakhyani !

szehon-ho · 2026-01-16T18:08:15Z

...k-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMergeSchemaEvolution.java

+            + "{ \"id\": 3, \"s\": { \"c1\": 30, \"c2\": \"c\" } }");
+
+    // Rows should have null for missing c3 nested field from source
+    ImmutableList<Object[]> expectedRows =


nit, can we move this block in the with? its a bit disjointing to see the expectation before the merge :)

iceberg/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java

Lines 2914 to 2946 in 4f57687

public void testMergeToWapBranch() {

assumeThat(branch).as("WAP branch only works for table identifier without branch").isNull();

createAndInitTable("id INT", "{\"id\": -1}");

ImmutableList<Object[]> originalRows = ImmutableList.of(row(-1));

sql(

"ALTER TABLE %s SET TBLPROPERTIES ('%s' = 'true')",

tableName, TableProperties.WRITE_AUDIT_PUBLISH_ENABLED);

spark.range(0, 5).coalesce(1).createOrReplaceTempView("source");

ImmutableList<Object[]> expectedRows =

ImmutableList.of(row(-1), row(0), row(1), row(2), row(3), row(4));

withSQLConf(

ImmutableMap.of(SparkSQLProperties.WAP_BRANCH, "wap"),

() -> {

sql(

"MERGE INTO %s t USING source s ON t.id = s.id "

+ "WHEN MATCHED THEN UPDATE SET *"

+ "WHEN NOT MATCHED THEN INSERT *",

tableName);

assertEquals(

"Should have expected rows when reading table",

expectedRows,

sql("SELECT * FROM %s ORDER BY id", tableName));

assertEquals(

"Should have expected rows when reading WAP branch",

expectedRows,

sql("SELECT * FROM %s.branch_wap ORDER BY id", tableName));

assertEquals(

"Should not modify main branch",

originalRows,

sql("SELECT * FROM %s.branch_main ORDER BY id", tableName));

});

I felt the same way but found lot of places where this was followed as one referenced here.
Should I push the change with expected rows inside withSQLConf after MERGE command?

i see, thanks for checking. well, i did it all this way in the other tests in this file, and we both agree, so id say let's just do it?

Done. Updated in latest commit

varun-lakhyani · 2026-01-19T18:32:43Z

PTAL @szehon-ho.
Let me know if any other changes

szehon-ho · 2026-01-22T00:47:04Z

merged, thanks @varun-lakhyani , and all for reviews

varun-lakhyani · 2026-01-22T04:38:32Z

Thanks @huaxingao @singhpk234 @szehon-ho for reviews and merging

github-actions bot added the spark label Jan 11, 2026

varun-lakhyani marked this pull request as draft January 12, 2026 10:28

szehon-ho reviewed Jan 12, 2026

View reviewed changes

varun-lakhyani added 2 commits January 13, 2026 06:29

Spark 4.1.1: MERGE INTO schema evolution nested case

e6c97a8

Spark 4.1.1 is active in codebase so enable test + Toggle mergeNested…

acd524a

…TypeCoercion flag for test

varun-lakhyani force-pushed the MERGE-INTO-schema-evolution branch from 42a9fe1 to acd524a Compare January 13, 2026 01:07

varun-lakhyani marked this pull request as ready for review January 13, 2026 02:17

varun-lakhyani requested a review from szehon-ho January 13, 2026 02:30

manuzhang changed the title ~~Spark 4.1.1: MERGE INTO schema evolution nested case~~ Spark 4.1: MERGE INTO schema evolution nested case Jan 13, 2026

manuzhang changed the title ~~Spark 4.1: MERGE INTO schema evolution nested case~~ Spark 4.1: Add tests for MERGE INTO schema evolution nested case Jan 15, 2026

szehon-ho reviewed Jan 16, 2026

View reviewed changes

Use WithSQLConf

e840645

varun-lakhyani requested a review from szehon-ho January 16, 2026 10:05

huaxingao approved these changes Jan 16, 2026

View reviewed changes

singhpk234 approved these changes Jan 16, 2026

View reviewed changes

szehon-ho reviewed Jan 16, 2026

View reviewed changes

Follow Test file format

8805dc6

varun-lakhyani requested a review from szehon-ho January 17, 2026 09:48

szehon-ho approved these changes Jan 22, 2026

View reviewed changes

szehon-ho merged commit 135659c into apache:main Jan 22, 2026
22 checks passed

varun-lakhyani deleted the MERGE-INTO-schema-evolution branch January 22, 2026 05:42

	public void testMergeToWapBranch() {
	assumeThat(branch).as("WAP branch only works for table identifier without branch").isNull();

	createAndInitTable("id INT", "{\"id\": -1}");
	ImmutableList<Object[]> originalRows = ImmutableList.of(row(-1));
	sql(
	"ALTER TABLE %s SET TBLPROPERTIES ('%s' = 'true')",
	tableName, TableProperties.WRITE_AUDIT_PUBLISH_ENABLED);
	spark.range(0, 5).coalesce(1).createOrReplaceTempView("source");
	ImmutableList<Object[]> expectedRows =
	ImmutableList.of(row(-1), row(0), row(1), row(2), row(3), row(4));

	withSQLConf(
	ImmutableMap.of(SparkSQLProperties.WAP_BRANCH, "wap"),
	() -> {
	sql(
	"MERGE INTO %s t USING source s ON t.id = s.id "
	+ "WHEN MATCHED THEN UPDATE SET *"
	+ "WHEN NOT MATCHED THEN INSERT *",
	tableName);
	assertEquals(
	"Should have expected rows when reading table",
	expectedRows,
	sql("SELECT * FROM %s ORDER BY id", tableName));
	assertEquals(
	"Should have expected rows when reading WAP branch",
	expectedRows,
	sql("SELECT * FROM %s.branch_wap ORDER BY id", tableName));
	assertEquals(
	"Should not modify main branch",
	originalRows,
	sql("SELECT * FROM %s.branch_main ORDER BY id", tableName));
	});

Spark 4.1: Add tests for MERGE INTO schema evolution nested case #15028

Spark 4.1: Add tests for MERGE INTO schema evolution nested case #15028

Conversation

varun-lakhyani commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varun-lakhyani commented Jan 11, 2026

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

varun-lakhyani commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varun-lakhyani commented Jan 14, 2026

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varun-lakhyani Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Jan 16, 2026

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

varun-lakhyani Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varun-lakhyani Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

varun-lakhyani commented Jan 19, 2026

Uh oh!

Uh oh!

szehon-ho commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varun-lakhyani commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

varun-lakhyani commented Jan 11, 2026 •

edited

Loading

varun-lakhyani commented Jan 13, 2026 •

edited

Loading

szehon-ho Jan 16, 2026 •

edited

Loading

varun-lakhyani Jan 16, 2026 •

edited

Loading

szehon-ho Jan 16, 2026 •

edited

Loading

szehon-ho commented Jan 22, 2026 •

edited

Loading