Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.2 Cherry Pick] [#3423] Fix unnecessary DynamoDB GET calls during LogStore::listFrom VACUUM calls #3463

Merged

Commits on Aug 1, 2024

  1. [delta-io#3423] Fix unnecessary DynamoDB GET calls during LogStore::l…

    …istFrom VACUUM calls (delta-io#3425)
    
    #### Which Delta project/connector is this regarding?
    
    - [X] Spark
    - [ ] Standalone
    - [ ] Flink
    - [ ] Kernel
    - [ ] Other (fill in here)
    
    ## Description
    
    Resolves delta-io#3423.
    
    This PR updates the logic in `BaseExternalLogStore::listFrom` so that it
    does not make a request to get the latest entry from the external store
    (which is used to perform recovery operations) in the event that a non
    `_delta_log` file is being listed.
    
    This is useful for VACUUM operations which may do hundreds or thousands
    of list calls in the table directory and nested partition directories of
    parquet files. This is NOT the `_delta_log`. Thus, checking the external
    store during these list calls is (1) useless and unwanted as we are not
    listing the `_delta_log` so clearly now isn't the time to attempt to do
    a fixup, and (2) expensive.
    
    This PR makes it so that future VACUUM operations do not perform
    unnecessary calls to the external store (e.g. DyanamoDB).
    
    ## How was this patch tested?
    
    Unit tests and an integration test that actually runs VACUUM and
    compares the number of external store calls using the old/new logic. I
    ran that test myself 50 times, too, and it passed every time (therefore,
    not flaky).
    
    ## Does this PR introduce _any_ user-facing changes?
    
    No
    scottsand-db committed Aug 1, 2024
    Configuration menu
    Copy the full SHA
    2c78c9b View commit details
    Browse the repository at this point in the history