New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Implement UPDATE for the Iceberg connector #12026

Merged

findepi merged 5 commits into trinodb:master from alexjo2144:iceberg/updates

May 10, 2022

Member

alexjo2144 commented Apr 19, 2022 •

edited

Loading

Description

Add support for updating individual rows using the Iceberg connector. The IcebergUpdatablePageSource will write a new data file containing the new data, as well as a positional delete file removing the old rows from the existing file.

Is this change a fix, improvement, new feature, refactoring, or other?

New feature

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Add support for updating individual rows using the Iceberg connector.

Related issues, pull requests, and links

Based on #11886

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
(x) Documentation PR is available with #12326
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

cla-bot bot added the cla-signed label

alexjo2144 added the WIP label

github-actions bot added the tests:hive label

alexjo2144 force-pushed the iceberg/updates branch from 2ccef25 to ef32f36 Compare

April 25, 2022 20:39

alexjo2144 removed the WIP label

alexjo2144 force-pushed the iceberg/updates branch 4 times, most recently from 8136bb5 to 07ce007 Compare

April 26, 2022 18:38

alexjo2144 requested review from findepi, phd3, findinpath and homar

April 26, 2022 18:41

alexjo2144 force-pushed the iceberg/updates branch 2 times, most recently from 1e115b6 to 0603fdd Compare

April 26, 2022 20:27

findepi reviewed

View reviewed changes

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/test/java/io/trino/plugin/hive/metastore/CountingAccessHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/HiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

alexjo2144 force-pushed the iceberg/updates branch from 0603fdd to e4f0fad Compare

April 27, 2022 17:31

alexjo2144 commented

View reviewed changes

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

Member Author

alexjo2144 commented Apr 27, 2022

@findepi AC thanks

alexjo2144 force-pushed the iceberg/updates branch from e4f0fad to eb1ca9c Compare

April 27, 2022 21:09

homar reviewed

View reviewed changes

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated

    
                          }

                          released = true;

                          ReentrantLock lock = tableLocks.get(tableName);

                          // Currently, metastore locks are always acquired and released in the same thread.

Member

homar Apr 28, 2022

I don't get this, if they are always acquired and released in the same thread what is the point of using locks instead of booleans?

Member Author

alexjo2144 Apr 28, 2022

Multiple threads can request the lock but whichever thread acquires it, must be the one to release it because of how ReentrantLock works. If the lock was acquired for the entire query, instead of just the commit block, it'd be harder to ensure that.

findepi reviewed

View reviewed changes

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java Outdated

    
                          // Currently, metastore locks are always acquired and released in the same thread.

                          lock.unlock();

                          if (!lock.isLocked()) {

                              tableLocks.remove(tableName);

Member

findepi Apr 28, 2022

remove(tableName, lock) to ensure you remove the thing you want to

Member

findepi Apr 28, 2022

more importantly, i don't think we can base our logic on lock.isLocked.
there can be a thread doing acquire which pulled a lock from the map but didn't .lock() on it yet.
it would not be correct to remove that from the map.

i think we can do

replace Lock with some other primitive (like boolean locked) and implement locking based on the fact the class is synchronized
implement locking with CDL or Sempahore to allow locks cleanup

however, the more i am thinking about this the more i feel we shouldn't do this here. it's just distracting from the main PR.

let's remove removing from the map. and add a TODO comment here.
I don't think such leak can be a problem in tests and for more sustained use this will need to be revisited.

// There is a memory leak. TODO remove unused locks from tableLocks.
// Currently, there is no cleanup as this is used for testing purposes where the whole metastore instance 
// is not longed-lived.

Member

findepi Apr 29, 2022

bump?

Member Author

alexjo2144 Apr 29, 2022

I removed the isLocked call and left a TODO to clean up the locks.

findepi reviewed

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

    
            @@ -169,7 +169,8 @@ public void testV2TableWithEqualityDelete()
          
                      Table icebergTable = updateTableToV2(tableName);

                      writeEqualityDeleteToNationTable(icebergTable);

                      assertQuery("SELECT * FROM " + tableName, "SELECT * FROM nation WHERE regionkey != 1");

                      assertQuery("SELECT nationkey FROM " + tableName, "SELECT nationkey FROM nation WHERE regionkey != 1");

Member

findepi Apr 28, 2022

why removed?

Member Author

alexjo2144 Apr 29, 2022

The line I added below is the same query with an extra column

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

    
            @@ -169,7 +169,8 @@ public void testV2TableWithEqualityDelete()
          
                      Table icebergTable = updateTableToV2(tableName);

                      writeEqualityDeleteToNationTable(icebergTable);

                      assertQuery("SELECT * FROM " + tableName, "SELECT * FROM nation WHERE regionkey != 1");

                      assertQuery("SELECT nationkey FROM " + tableName, "SELECT nationkey FROM nation WHERE regionkey != 1");

                      // natiokey is before the equality delete column in the table schema, comment is after

                      assertQuery("SELECT nationkey, comment FROM " + tableName, "SELECT nationkey, comment FROM nation WHERE regionkey != 1");

Member

findepi Apr 28, 2022

Does this exercise both bugs mentioned in the commit message?

BTW can you come up with a better message than "Fix Iceberg delete filtering bugs"?
Maybe even splitting the commit into two, so that you can call out each fix separately?

Member Author

alexjo2144 Apr 28, 2022

It's kinda hard to split up because I pretty much had to re-do everything I had in TrinoDeleteFilter.

I'll try to rephrase this

Member Author

alexjo2144 Apr 29, 2022

Thinking about it some more, there was only really one bug in the existing code, having to do with equality deletes. I just also had to change how the interaction with deref pushdown worked in order to fix that bug.

findepi reviewed

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

    
            @@ -1247,32 +1305,55 @@ public void finishDelete(ConnectorSession session, ConnectorTableHandle tableHan
          
                          rowDelta.validateNoConflictingDataFiles();

                      }

                      if (isUpdate) {

                          rowDelta.validateDeletedFiles();

Member

findepi Apr 28, 2022

Why only for update? maybe add a comment.
or can this be unconditional?

Member Author

alexjo2144 Apr 29, 2022

I'll add a comment. Deleting a row from two commits concurrently shouldn't cause a validation to fail, but deleting a row and updating it concurrently should fail since the update might un-do the delete

Member

findepi May 5, 2022

Sadly, the method names validateDeletedFiles and validateNoConflictingDeleteFiles do not suggest to me that they should be called for updates and shouldn't be called for deletes.
I see that this is what Spark Iceberg does https://github.com/apache/iceberg/blob/f6e11148d31b408a7aea57a0efcb4428134f6a99/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java#L191-L194

keep as is

alexjo2144 force-pushed the iceberg/updates branch from eb1ca9c to ac41305 Compare

April 29, 2022 16:18

findepi force-pushed the iceberg/updates branch from ac41305 to 0504101 Compare

May 4, 2022 12:53

Member

findepi commented May 4, 2022

(squashed and rebased)

findepi reviewed

View reviewed changes

Member

findepi left a comment

"Add locking when commiting Iceberg tables to FileHiveMetastore" LGTM

findepi reviewed

View reviewed changes

Member

findepi left a comment

"Fix reading Iceberg equality deletes" LGTM

electrum requested changes

View reviewed changes

Member

electrum left a comment

We don't need to add locking to FileHiveMetastore, especially not a broken implementation with a TODO that seems unlikely to be fixed.

Instead, you can use CAS to safely support Iceberg:

if (existingTable.tableType().equalsIgnoreCase("iceberg") && !Objects.equals(
        existingTable.parameters().get("metadata_location"),
        replacementTable.parameters().get("previous_metadata_location"))) {
    throw new MetastoreException("Cannot update Iceberg table: supplied previous location does not match current location");
}

File metastore is not always used for testing purposes. I strongly dislike adding broken stuff to it just for testing purposes. We made the code a lot more complex just to save having a trivial FileMetastoreTableOperations implementation.

alexjo2144 force-pushed the iceberg/updates branch from 25fbe8f to 1b71ff1 Compare

May 5, 2022 21:41

alexjo2144 requested a review from electrum

May 6, 2022 03:45

Member Author

alexjo2144 commented May 6, 2022

@electrum @findepi comments addressed, thanks for the reviews. I removed the commit with the File metastore locking, the rest of the comments are applied in the fixup commit.

findepi approved these changes

View reviewed changes

core/trino-spi/src/main/java/io/trino/spi/connector/UpdatablePageSource.java Outdated Show resolved Hide resolved

findepi force-pushed the iceberg/updates branch from 1b71ff1 to 1a5c9c4 Compare

May 6, 2022 10:17


          Fix reading Iceberg equality deletes

765593b

Columns passed to the Iceberg DeleteFilter must be in the same order as
they are in the TrinoRow created from the Page in IcebergPageSource,
but the `filterSchema` method in TrinoDeleteFilter did not ensure that.

alexjo2144 force-pushed the iceberg/updates branch from 1a5c9c4 to ef7bd23 Compare

May 6, 2022 16:31

Member Author

alexjo2144 commented May 6, 2022

@rdblue I had a partition scheme question on this. If the table's partition scheme has been updated since a data file was written, and then a row from the file is updated, should the file with the updated rows get written using the old scheme or the updated one?

Contributor

rdblue commented May 6, 2022

@alexjo2144, it's up to you want makes the most sense for the updated rows. For the deletes, you have to make sure that the partition of any delete files matches the data that they apply to.

Member Author

alexjo2144 commented May 6, 2022

For the deletes, you have to make sure that the partition of any delete files matches the data that they apply to.

Yep, I've got that part.

Right now the updated rows are using the scheme matching the data file so all the new files are stored with each other in the same directory, which is kinda nice. But I guess if a user has changed the scheme to improve read performance for their queries it might make sense to respect the new layout.

@findepi any opinion?

Contributor

rdblue commented May 6, 2022

The Spark implementation uses the current partition spec for updated rows, for what it's worth.

alexjo2144 force-pushed the iceberg/updates branch from ef7bd23 to 780934f Compare

May 6, 2022 19:16

Member Author

alexjo2144 commented May 6, 2022

Thanks Ryan. Piotr voiced an option for using the current partition spec, so I'll go with that. Just pushed an update.

arhimondr reviewed

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergFailureRecoveryTest.java Outdated

    
                              .withCleanupQuery(cleanupQuery)

                              .experiencing(TASK_GET_RESULTS_REQUEST_TIMEOUT)

                              .at(boundaryDistributedStage())

                              .failsWithoutRetries(failure -> failure.hasMessageContaining("Encountered too many errors talking to a worker node"))

Contributor

arhimondr May 6, 2022 •

edited

Loading

This has to be hasMessageFindingMatch("Encountered too many errors talking to a worker node|Error closing remote buffer"), otherwise the test might be flaky (I'm currently fixing it in other places)

Contributor

arhimondr May 6, 2022

Member Author

alexjo2144 May 6, 2022

Updated, thanks

alexjo2144 force-pushed the iceberg/updates branch from 780934f to e5569bb Compare

May 6, 2022 20:42

Member

findepi commented May 9, 2022

@alexjo2144 please mind the CI, it's quite unhappy.


          Implement UPDATE for the Iceberg connector

c376da2

Using v2 merge-on-read deletes.

Co-authored-by: Jack Ye <yzhaoqin@amazon.com>

alexjo2144 force-pushed the iceberg/updates branch from e5569bb to c376da2 Compare

May 9, 2022 14:58

alexjo2144 added 2 commits

May 9, 2022 12:46


          Empty

6d37ca8


          Empty

Member Author

alexjo2144 commented May 9, 2022

Delta failure seems like a flake. It passed on the first of the two empty commit runs. Created an issue: #12300


          empty

2ecf0e6

findepi mentioned this pull request

Add base connector test to verify concurrent INSERT #12269

Merged

findepi merged commit c058db0 into trinodb:master

This was referenced May 10, 2022

Release notes for 381 #12278

Closed

Iceberg Connector #1324

Closed

alexjo2144 deleted the iceberg/updates branch

May 10, 2022 12:52

github-actions bot added this to the 381 milestone

alexjo2144 mentioned this pull request

Add UPDATE to list of supported Iceberg commands #12326

Merged

mosabua mentioned this pull request

Add Trino 381 release notes #12277

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

homar homar left review comments

arhimondr arhimondr left review comments

findepi findepi approved these changes

phd3 Awaiting requested review from phd3

findinpath Awaiting requested review from findinpath

electrum Awaiting requested review from electrum

Labels