Spark: Fix CREATE VIEW IF NOT EXISTS failure when non-Iceberg view exists in SparkSessionCatalog #14930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

stuxuhai wants to merge 2 commits into apache:main from stuxuhai:fix-create-view-if-not-exists-hive-collision

Contributor

stuxuhai commented Dec 27, 2025

This is a follow-up PR. The previous PR was closed after the branch was force-reset to apache:main.

Purpose

This PR fixes a bug where CREATE VIEW IF NOT EXISTS fails with a NoSuchIcebergViewException: Not an iceberg view (wrapped in QueryExecutionException) instead of succeeding silently when a non-Iceberg view (e.g., a Hive view) already exists in the SparkSessionCatalog.

The Problem

When SparkSessionCatalog is configured with spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type=hive

A user executes CREATE VIEW IF NOT EXISTS db.view_name AS ....
If db.view_name already exists as a Hive View (or any non-Iceberg table/view).
SparkSessionCatalog.createView currently delegates directly to the underlying Iceberg catalog (asViewCatalog.createView).
The Iceberg catalog (e.g., HiveCatalog) attempts to load the view. Since it is not an Iceberg view, it throws NoSuchIcebergViewException.
Spark expects ViewAlreadyExistsException to handle the IF NOT EXISTS logic. Because it receives a different exception, the query fails entirely.

The Fix

Before delegating the creation to the Iceberg catalog, we explicitly check if the identifier already exists in the underlying session catalog (which is the source of truth for the global namespace).

If getSessionCatalog().tableExists(ident) returns true, we immediately throw ViewAlreadyExistsException. This allows Spark's analysis rules to correctly catch the exception and ignore the operation as per IF NOT EXISTS semantics.

Verification

Added a new unit test in TestSparkSessionCatalog to verify that CREATE VIEW IF NOT EXISTS succeeds when a Hive view exists.
Verified that CREATE VIEW (without if not exists) correctly throws AnalysisException (Table or view already exists).


          Spark: Fix CREATE VIEW IF NOT EXISTS failure when non-Iceberg view ex…

5d78d5a

…ists in SparkSessionCatalog

github-actions bot added the spark label

Contributor Author

stuxuhai commented Jan 6, 2026

@huaxingao The previous PR was automatically closed due to a force push, so I’ve opened a new one.
Could you please help review it when you have time? Thanks!

nastra reviewed

View reviewed changes

...src/test/java/org/apache/iceberg/spark/extensions/TestSparkSessionCatalogWithExtensions.java Outdated

+              import org.junit.jupiter.api.BeforeAll;
+              import org.junit.jupiter.api.Test;
+              public class TestSparkSessionCatalogWithExtensions {

Contributor

nastra Jan 8, 2026

I would suggest to first fix the issue in one Spark version and later backport stuff

Contributor

huaxingao Jan 10, 2026

+1 to first fix 4.1 and then back-porting

nastra requested a review from huaxingao

January 8, 2026 16:04

huaxingao reviewed

View reviewed changes

...src/test/java/org/apache/iceberg/spark/extensions/TestSparkSessionCatalogWithExtensions.java Outdated

+              import org.junit.jupiter.api.BeforeAll;
+              import org.junit.jupiter.api.Test;
+              public class TestSparkSessionCatalogWithExtensions {

Contributor

huaxingao Jan 10, 2026

+1 to first fix 4.1 and then back-porting

...src/test/java/org/apache/iceberg/spark/extensions/TestSparkSessionCatalogWithExtensions.java Outdated

+                  }
+                }
+                public static void setUpCatalog() {

Contributor

huaxingao Jan 10, 2026

nit: private?

...src/test/java/org/apache/iceberg/spark/extensions/TestSparkSessionCatalogWithExtensions.java Outdated

+                  spark.conf().set("spark.sql.catalog.spark_catalog.type", "hive");
+                }
+                public static void resetSparkCatalog() {

Contributor

huaxingao Jan 10, 2026

nit: private?

huaxingao reviewed

View reviewed changes

...src/test/java/org/apache/iceberg/spark/extensions/TestSparkSessionCatalogWithExtensions.java Outdated

+                protected static TestHiveMetastore metastore = null;
+                protected static HiveConf hiveConf = null;
+                protected static SparkSession spark = null;
+                protected static JavaSparkContext sparkContext = null;

Contributor

huaxingao Jan 10, 2026

is this necessary? If not, can we remove?

huaxingao reviewed

View reviewed changes

...src/test/java/org/apache/iceberg/spark/extensions/TestSparkSessionCatalogWithExtensions.java Outdated

+                  spark
+                      .conf()
+                      .set("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog");
+                  spark.conf().set("spark.sql.catalog.spark_catalog.type", "hive");

Contributor

huaxingao Jan 10, 2026

Should we add spark.sessionState().catalogManager().reset() when flipping these configs (either inside the helper methods or immediately after calling them in the tests), similar to how spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/TestSparkSessionCatalog.java does it?


          Spark: Fix CREATE VIEW IF NOT EXISTS failure when non-Iceberg view ex…

d22e9c0

…ists in SparkSessionCatalog

Contributor Author

stuxuhai commented Jan 12, 2026

@nastra @huaxingao Thanks for the review and the suggestion. I've updated the commit. Appreciate your feedback!

Contributor

nastra commented Jan 14, 2026

@stuxuhai thanks for submitting the PR. We might need to revisit the behavior of views in the SparkSessionCatalog (there was also #14557 that ran into issues with Legacy Hive views) and define the behavior we actually want to have.

Basically right now the SparkSessionCatalog only supports Iceberg views since 1.8.0, hence why we have checks like

  public boolean viewExists(Identifier ident) {
    return (asViewCatalog != null && asViewCatalog.viewExists(ident))
        || (isViewCatalog() && getSessionCatalog().viewExists(ident));
  }

where we either have the actual Iceberg catalog or the underlying Spark session catalog implementing Spark's ViewCatalog API. If that's not the case, we don't fall back Spark's session catalog and check getSessionCatalog().tableExists(ident), which would also detect a v1 view.

I understand that using a IF NOT EXISTS on a v1 Hive view you'd expect to not fail, but what are the implications of that when you e.g. run describe or show views? I haven't tested that, so we might want to explore that all of the operations you can do against a View don't produce weird results when we apply this diff.
I'm still currently undecided on what the right approach here would be, and I've seen that e.g. creating a Table behaves slightly different than creating a View.

nastra reviewed

View reviewed changes

.../v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCreateView.java

+                }
+                @AfterEach
+                public void useHiveCatalog() {

Contributor

nastra Jan 14, 2026

I believe this is then going to use Spark's HiveSessionCatalog right? We might want to be clearer here as otherwise one would expect that we're using Iceberg's HiveCatalog

nastra reviewed

View reviewed changes

.../v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCreateView.java

+                  try {
+                    // create Hive view
+                    spark.sql(String.format("CREATE VIEW %s AS SELECT 1 AS id", viewName));
+                  } finally {

Contributor

nastra Jan 14, 2026

I don't think we need any of those try-finally blocks

nastra reviewed

View reviewed changes

.../v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCreateView.java

+                  }
+                  try {
+                    spark.sql(String.format("CREATE VIEW IF NOT EXISTS %s AS SELECT 2 AS id", viewName));

Contributor

nastra Jan 14, 2026

instead of using spark.sql(...) you can directly use sql(...)

nastra reviewed

View reviewed changes

.../v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCreateView.java

+                }
+                @TestTemplate
+                public void testCreateViewWithExistingHiveView() {

Contributor

nastra Jan 14, 2026

nit: no need to use test as a prefix in the method names as it doesn't add any value and we try to avoid using that prefix for new tests

nastra reviewed

View reviewed changes

.../v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCreateView.java

+                }
+                @TestTemplate
+                public void testCreateViewIfNotExistsWithExistingHiveView() {

Contributor

nastra Jan 14, 2026

can you please also add the same set of tests for tables, where we have a v1 hive table and we want to create another one using/not using IF NOT EXISTS

Contributor

nastra Jan 14, 2026

this is so that it's easier to check how tables/views behave exactly and to align their behavior

Contributor Author

stuxuhai commented Jan 14, 2026

@nastra Thanks a lot for the thoughtful feedback and for taking the time to review this.

Based on our testing, applying this change should not affect the behavior of DESCRIBE or SHOW VIEWS. The impact is limited to CREATE VIEW statements. While using SparkSessionCatalog in practice, we encountered a number of confusing and unintuitive behaviors, especially when working with non-Iceberg views and tables. This PR is intended to address one of those cases.

For example, with the following sequence:

-- create hive view
create view test_hive_view as select 1 as id, 'hive_view' as name;

-- use SparkSessionCatalog to create iceberg view
create view test_iceberg_view as select 2 as id, 'iceberg_view' as name;

-- ERROR: org.apache.iceberg.exceptions.NoSuchIcebergViewException: Not an iceberg view
create view if not exists test_hive_view as select 1 as id, 'iceberg' as name;

-- ERROR: [VIEW_NOT_FOUND] The view test_hive_view cannot be found.
create or replace view test_hive_view as select 2 as id, 'create or replace by iceberg' as name;

-- ERROR: [VIEW_NOT_FOUND] The view test_hive_view cannot be found.
drop view test_hive_view;

-- Succeeds, but actually queries the Hive view instead of the Iceberg view
select * from test_iceberg_view;

-- Succeeds, but should fail with WRONG_COMMAND_FOR_OBJECT_TYPE
drop table test_iceberg_view;

-- ERROR: [VIEW_NOT_FOUND] The view test_iceberg_table cannot be found (should be WRONG_COMMAND_FOR_OBJECT_TYPE)
drop view test_iceberg_table;

-- Only drops metadata but does not delete data unless PURGE is specified, which is also different in behavior
drop table test_hive_managed_table;

From a user perspective, these behaviors are quite surprising and make it difficult to reason about how SparkSessionCatalog should be used safely.

My understanding is that SparkSessionCatalog is primarily intended to manage Iceberg tables and views. For non-Iceberg objects, it would be ideal if the behavior could fall back to Spark’s session catalog so that the results remain consistent with running Spark without Iceberg.

At the moment, getSessionCatalog() returns a V2SessionCatalog instance, and due to current Spark design constraints, V2SessionCatalog does not implement the ViewCatalog interface, which makes fallback handling for Hive views more complicated.

For the specific CREATE VIEW IF NOT EXISTS case, it appears that the issue can be resolved with a small and localized change, which is what this PR focuses on. I’m very happy to iterate on this further and explore the best long-term approach together.

Thanks again for the review and for the discussion — I really appreciate it.

Contributor

nastra commented Jan 14, 2026

I think it would be great to summarize all gaps that lead to weird/inconsistent behavior by e.g. writing this down through tests (which would currently fail). Right now I'm missing an overview to make a proper decision on how we would want to proceed, since we really want to fix this issue in a consistent manner. I wouldn't want to fix this only in one place, knowing that the same issue exists in a different place as well.
Would you be willing to summarize all of this in text or in tests so that we can start thinking about the best approach going forward?

Contributor Author

stuxuhai commented Jan 14, 2026

Thanks for the suggestion — that makes a lot of sense.

I agree that having a comprehensive view of all the gaps leading to these inconsistent behaviors would be very helpful, and that addressing them in a consistent way is the right direction.

I can summarize the observed issues and also try to capture them in a set of tests that document the current behavior (and would fail). That should give us a clearer picture to evaluate different approaches going forward.

I’ll follow up with an update once I have that ready. Thanks again for the guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels