Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add REST Catalog tests to Spark 3.5 integration test #11093

Merged
merged 32 commits into from
Nov 21, 2024

Conversation

haizhou-zhao
Copy link
Contributor

For issue: #11079

@haizhou-zhao haizhou-zhao force-pushed the spark-rest-integ-test branch 2 times, most recently from 9481bd4 to 8bcd853 Compare September 18, 2024 23:18
@@ -521,7 +524,7 @@ public void testFilesTableTimeTravelWithSchemaEvolution() throws Exception {
optional(3, "category", Types.StringType.get())));

spark.createDataFrame(newRecords, newSparkSchema).coalesce(1).writeTo(tableName).append();

table.refresh();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this refresh only needed for REST?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, RESTTableOperations and other TableOperations has different mechanisms of refreshing metadata.

// JdbcCatalog, then different jdbc connections could provide different views of table
// status even belonging to the same catalog. Reference:
// https://www.sqlite.org/inmemorydb.html
System.setProperty(CatalogProperties.CLIENT_POOL_SIZE, "1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this strictly needed to make tests pass? I don't think we set this in any other tests for that specific purpose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is needed for test to pass. I attempted to run test without this line and here're the result: https://github.com/apache/iceberg/actions/runs/11297060489/job/31423193928?pr=11093

System.setProperty(CatalogProperties.CLIENT_POOL_SIZE, "1");
restServer.start(false);
restCatalog = RCKUtils.initCatalogClient();
System.clearProperty("rest.port");
Copy link
Contributor

@nastra nastra Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should pass a Map<String, String> to the REST server rather than having to set system properties (which then also need to be cleared again). @danielcweeks thoughts on this?

@BeforeEach
public void before() {
this.validationCatalog =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we don't do any changes to how the validation catalog is configured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like you created a table on REST catalog, while validate against Hive catalog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is that I think all tests should be passing when we don't do any changes to the validation catalog

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the original validationCatalog is set to a HadoopCatalog if catalogName of the catalog being tested is named testhadoop; otherwise, make the validationCatalog the same as catalog (which is strictly a HiveCatalog, as defined by TestBase class).

That will work in the old days, as we only have 2 types catalogs, either Hadoop or Hive, being tested - if the test subject is not a Hadoop Catalog, then setting the validation catalog to Hive Catalog will suffice the validation purpose. However, with the introduction of a 3rd type, REST catalog: when the test subject is a REST catalog, without changing how validationCatalog is initialized, the validationCatalog will be set to a Hive catalog. In this case, conducting test behaviors on REST catalog while validating the status post-change on Hive catalog won't work.

Unless, you are suggesting that we should make changes to TestBase class where the catalog being tested does not strictly need to be a HiveCatalog, and can be any type of catalog.

Map<String, String> catalogProperties = RCKUtils.environmentCatalogConfig();
Map<String, String> catalogProperties = Maps.newHashMap();
catalogProperties.putAll(RCKUtils.environmentCatalogConfig());
catalogProperties.putAll(Maps.fromProperties(System.getProperties()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in a comment further above, maybe we should consider passing a Map<String, String> to RESTCatalogServer rather than relying on system properties (which have to be cleared after they were configured)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that means we can revert this line here: catalogProperties.putAll(Maps.fromProperties(System.getProperties()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

// JdbcCatalog, then different jdbc connections could provide different views of table
// status even belonging to the same catalog. Reference:
// https://www.sqlite.org/inmemorydb.html
System.setProperty(CatalogProperties.CLIENT_POOL_SIZE, "1");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was this conversation getting lost (not sure why): #11093 (comment)

System.setProperty(CatalogProperties.CLIENT_POOL_SIZE, "1");
restServer.start(false);
restCatalog = RCKUtils.initCatalogClient();
System.clearProperty("rest.port");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was also lost: #11093 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in #11093 (comment) I think we should do this differently rather than passing/re-setting system properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@c-thiel
Copy link
Contributor

c-thiel commented Nov 21, 2024

@nastra, @danielcweeks is there something stopping us from merging this?
This is currently blocking us on #11317 which is a requirement for REST Catalogs to offer undrop for storages that are not S3.

Copy link
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haizhou-zhao for getting this done!

@danielcweeks danielcweeks merged commit a52afdc into apache:main Nov 21, 2024
32 checks passed
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
* Add REST Catalog tests to Spark 3.5 integration test

Add REST Catalog tests to Spark 3.4 integration test

tmp save

Fix integ tests

Revert "Add REST Catalog tests to Spark 3.4 integration test"

This reverts commit d052416.

unneeded changes

fix test

retrigger checks

Fix integ test

Fix port already in use

Fix unmatched validation catalog

spotless

Fix sqlite related test failures

* Rebase & spotless

* code format

* unneeded change

* unneeded change

* Revert "unneeded change"

This reverts commit ae29c41.

* code format

* Use in-mem config to configure RCK

* Update open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java

* Use RESTServerExtension

* check style and test failure

* test failure

* fix test

* fix test

* spotless

* Update open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java

Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>

* Update open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java

Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>

* Update spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java

Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>

* Spotless and fix test

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Update spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java

* Package protected RCKUtils

* spotless

* unintentional change

* remove warehouse specification from rest

* spotless

* move find free port to rest server extension

* fix typo

* checkstyle

* fix unit test

---------

Co-authored-by: Haizhou Zhao <haizhouzhao@Haizhous-MacBook-Pro.local>
Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants