Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta: Support Snapshot Delta Lake Table to Iceberg Table #6449

Merged
merged 57 commits into from
Feb 7, 2023

Commits on Dec 18, 2022

  1. Configuration menu
    Copy the full SHA
    73e38e5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5544f45 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b8b6119 View commit details
    Browse the repository at this point in the history
  4. fix formatting

    JonasJ-ap committed Dec 18, 2022
    Configuration menu
    Copy the full SHA
    274560c View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2022

  1. Configuration menu
    Copy the full SHA
    39e3541 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    92f962c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    033c997 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    681a32f View commit details
    Browse the repository at this point in the history
  5. add support for scala 2.13

    JonasJ-ap committed Dec 19, 2022
    Configuration menu
    Copy the full SHA
    77bdb27 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    3dd540a View commit details
    Browse the repository at this point in the history
  7. fix naming issue

    JonasJ-ap committed Dec 19, 2022
    Configuration menu
    Copy the full SHA
    27ece93 View commit details
    Browse the repository at this point in the history

Commits on Dec 21, 2022

  1. Configuration menu
    Copy the full SHA
    a9faabf View commit details
    Browse the repository at this point in the history
  2. fix typo and nit problems

    JonasJ-ap committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    3982711 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9a7c443 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    173534e View commit details
    Browse the repository at this point in the history
  5. fix comment

    JonasJ-ap committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    bdd1ccf View commit details
    Browse the repository at this point in the history
  6. fix wrong import

    JonasJ-ap committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    85abac2 View commit details
    Browse the repository at this point in the history

Commits on Dec 24, 2022

  1. Migrate delta to iceberg round 1 (#29)

    * remove redundant todo
    
    * move everything to iceberg-delta-lake and make the action a mixin called SupportMigrateDeltaLake
    
    * make constant string final and static
    
    * use filesToAdd/Remove to determine transaction directly
    
    * refactor and delete withRecordNumber from DataFiles.builder
    
    * refactor get partitionValues and use FileIO to get size when necessary
    
    * Add javadoc
    
    * refactor exceptions to be ValidationException
    
    * fix validationException format issuse
    
    * create new test base for spark delta test
    JonasJ-ap authored Dec 24, 2022
    Configuration menu
    Copy the full SHA
    32e1af8 View commit details
    Browse the repository at this point in the history

Commits on Dec 25, 2022

  1. Migrate delta to iceberg util refactor (#30)

    * refactor the structure of the package
    
    * copy-pase the util methods from TableMigrationUtil
    JonasJ-ap authored Dec 25, 2022
    Configuration menu
    Copy the full SHA
    ac1141d View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2022

  1. Migrate delta to iceberg refactor 1.5 (#31)

    * move getFileMetrics to FileMetricsReader
    
    * add unit tests for schema conversion
    JonasJ-ap authored Dec 28, 2022
    Configuration menu
    Copy the full SHA
    8e9b3fc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    12b60ca View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2022

  1. use transaction, refactor structure, add optional newTableLocation, a…

    …dd tableProperty (#32)
    
    * use transaction to commit all changes once
    
    * add optional newTableLocation
    
    * simplify the datafile build process, remove FileMetricsReader
    
    * refactor package structure
    
    * remove unnecessary types
    
    * fix format issue
    
    * add tableProperty method
    JonasJ-ap authored Dec 29, 2022
    Configuration menu
    Copy the full SHA
    8a8adef View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6fbf740 View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2022

  1. refactor getFullPath with unit tests, use newCreateTableTransaction, …

    …remove unnecessary parameters and try-catch (#33)
    
    * refactor getFullFilePath to be static and add test
    
    * refactor the interface name
    
    * use newCreateTableTransaction, remove redundant parameters in helper methods
    
    * remove unnecessary try, catch
    JonasJ-ap authored Dec 30, 2022
    Configuration menu
    Copy the full SHA
    69671b9 View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2023

  1. allow user to specify a custom location for migrated table, fix load …

    …error of icebergCatalog (#34)
    
    * modify build.gradle to remove unnecessary dependency
    
    * fix nit problem
    
    * pass real env test in a questionable manner
    
    * allow user to specify a custom location for migrated table
    
    * remove unnecessary logger
    
    * restore build.gradle for spark
    JonasJ-ap authored Jan 3, 2023
    Configuration menu
    Copy the full SHA
    e3138a6 View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2023

  1. Fix nit problems and optimize some implementation (#38)

    * refactor mixin order
    
    * fix nit problems
    
    * add null check to the constructor
    
    * let copyFromDeltaLakeToIceberg return the number of totalDataFiles directly
    
    * use ImmutableMap.Builder
    
    * fix the problem in getFullFilePath
    
    * use hadoopFileIO to read dataFile
    
    * make type conversion util package-private
    
    * fix format
    JonasJ-ap authored Jan 6, 2023
    Configuration menu
    Copy the full SHA
    2e8dfd0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f4589e8 View commit details
    Browse the repository at this point in the history

Commits on Jan 7, 2023

  1. move everthing to iceberg-delta-lake, build demo integration test (#35)

    * refactor mixin order
    
    * fix nit problems
    
    * add null check to the constructor
    
    * let copyFromDeltaLakeToIceberg return the number of totalDataFiles directly
    
    * use ImmutableMap.Builder
    
    * fix the problem in getFullFilePath
    
    * use hadoopFileIO to read dataFile
    
    * make type conversion util package-private
    
    * fix format issue
    
    * move everything to iceberg-delta-lake. Remove all changes made to iceberg-spark
    
    * fix test delta_core dependency
    
    * fix format
    
    * conditionally build the test
    
    * refactor to integrationTest
    
    * suppress warnings
    
    * test delta core 2.2.0
    JonasJ-ap authored Jan 7, 2023
    Configuration menu
    Copy the full SHA
    59c96cb View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2023

  1. optimize api structure, refactor the integration test, add more tests(#…

    …39)
    
    * use validation exception for unsupported types
    
    * check result file count in integration test
    
    * fix format
    
    * add tableLocation api and remove constructors
    
    * add javadoc for constructor
    
    * remove unnecessary test
    
    * use UUID to generate table records
    
    * resolve format issue
    
    * rename everything from migrate to snapshot
    
    * simplify test configuration round 1
    
    * refactor test spark integration
    
    * refactor correctness check to helper function
    
    * add test for table location and table properties
    JonasJ-ap authored Jan 9, 2023
    Configuration menu
    Copy the full SHA
    afd783b View commit details
    Browse the repository at this point in the history

Commits on Jan 10, 2023

  1. refactor the interfaces, add new tests to integration tests, add new …

    …unit tests (#40)
    
    * rename the interface
    
    * add new APIs and add unit test for precondition checks
    
    * refactor interface and precondition check
    
    * remove redundant private method and refactor javadoc
    
    * add test logic for table contains external data files
    
    * test the inclusion of delta lake table properties
    JonasJ-ap authored Jan 10, 2023
    Configuration menu
    Copy the full SHA
    5b95925 View commit details
    Browse the repository at this point in the history
  2. fix error messages and add default implementation for actionProvider (#…

    …41)
    
    * fix error messages
    
    * add getDefault implementation to the action provider
    JonasJ-ap authored Jan 10, 2023
    Configuration menu
    Copy the full SHA
    f43c325 View commit details
    Browse the repository at this point in the history
  3. refactor the default implementation and javadoc (#43)

    * rename default implementation and make it an instance
    
    * optimize javadoc
    
    * make base classes package-private
    
    * refactor javadoc in the interface
    JonasJ-ap authored Jan 10, 2023
    Configuration menu
    Copy the full SHA
    b2a8bfe View commit details
    Browse the repository at this point in the history

Commits on Jan 12, 2023

  1. fix error when migrating table with nested fields, add CI, upgrade te…

    …st (#44)
    
    * fix parquet import error for nested schema
    
    * add delta conversion CI
    
    * upgrade the test
    JonasJ-ap authored Jan 12, 2023
    Configuration menu
    Copy the full SHA
    450a08c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    300d39b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a285c4a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5760a83 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    e41c787 View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2023

  1. Configuration menu
    Copy the full SHA
    7a16809 View commit details
    Browse the repository at this point in the history
  2. fix format and nit issue

    JonasJ-ap committed Jan 13, 2023
    Configuration menu
    Copy the full SHA
    7072612 View commit details
    Browse the repository at this point in the history
  3. remove unnecessary fields and class and let integrationTest collected…

    … by CI (#45)
    
    * remove unnecessary fields and class
    
    * make integration test collected by check
    JonasJ-ap authored Jan 13, 2023
    Configuration menu
    Copy the full SHA
    c2293c9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f38d7b1 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    10163f8 View commit details
    Browse the repository at this point in the history

Commits on Jan 15, 2023

  1. Merge remote-tracking branch 'origin/master' into migrate_delta_to_ic…

    …eberg
    
    # Conflicts:
    #	settings.gradle
    #	versions.props
    JonasJ-ap committed Jan 15, 2023
    Configuration menu
    Copy the full SHA
    99dbba8 View commit details
    Browse the repository at this point in the history

Commits on Jan 17, 2023

  1. simplify the test base (#46)

    * remove unnecessary namespace creation
    
    * move namespace creation to TestSnapshotDeltaLakeTable.java
    JonasJ-ap authored Jan 17, 2023
    Configuration menu
    Copy the full SHA
    a7c3de1 View commit details
    Browse the repository at this point in the history

Commits on Jan 20, 2023

  1. Configuration menu
    Copy the full SHA
    6c4ab2c View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2023

  1. add null check for table.currentSnapshot() when querying the total nu…

    …mber of data files migrated
    JonasJ-ap committed Jan 21, 2023
    Configuration menu
    Copy the full SHA
    dadd76a View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2023

  1. Refactor iceberg-delta's integration test(#48)

    * use assertj for all tests
    
    * add null check for the spark integration method
    
    * use a method to generate the hardcode dataframe
    
    * drop iceberg table afterwards
    
    * add typetest table
    
    * test all delta lake types
    
    * test conversion of NullType
    
    * fix format issue
    
    * add a second dataframe
    
    * refactor the integration test
    
    * correctly decoded delta's path
    
    * fix wrong decoding
    
    * fix wrong decoding 2
    JonasJ-ap authored Jan 24, 2023
    Configuration menu
    Copy the full SHA
    1cd36b9 View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2023

  1. Adapt for delta.logRetentionDuration (#49)

    * remove a redundant map collector in commitDeltaVersionLogToIcebergTransaction
    
    * get the earliest possible version rather than hard code from 0
    
    * add unit test to check if table exists
    
    * refactor action extracted from the versionlog
    
    * fix format issue
    
    * move non-share table write operation to the test itself, instead of in before()
    
    * fix type
    JonasJ-ap authored Jan 25, 2023
    Configuration menu
    Copy the full SHA
    4463f30 View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2023

  1. Configuration menu
    Copy the full SHA
    d3ccc86 View commit details
    Browse the repository at this point in the history

Commits on Feb 6, 2023

  1. Configuration menu
    Copy the full SHA
    1affcb3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    098a3a2 View commit details
    Browse the repository at this point in the history
  3. Merge remote-tracking branch 'origin/master' into migrate_delta_to_ic…

    …eberg
    
    # Conflicts:
    #	versions.props
    JonasJ-ap committed Feb 6, 2023
    Configuration menu
    Copy the full SHA
    f0d1536 View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2023

  1. rollback to hadoopFileIO

    JonasJ-ap committed Feb 7, 2023
    Configuration menu
    Copy the full SHA
    a98461a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fe6da17 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8e9a3e2 View commit details
    Browse the repository at this point in the history
  4. nit fix

    JonasJ-ap committed Feb 7, 2023
    Configuration menu
    Copy the full SHA
    24405e0 View commit details
    Browse the repository at this point in the history
  5. error message nit fix

    JonasJ-ap committed Feb 7, 2023
    Configuration menu
    Copy the full SHA
    c5a6186 View commit details
    Browse the repository at this point in the history