Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for dynamically getting delta table features #428

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ForeverAngry
Copy link

Important Read

To address: Issue #354

What is the purpose of the pull request

This pull request implements dynamically capturing the min reader and writer version of a target delta table, based on its table details.

Brief change log

  • Forked the current version of main
  • Copied @the-other-tim-brown initial changes from the support ntz type #382 into this fork
  • The bulk of the changes were related to the getConfigurationsForDeltaSync() method from the DeltaConversionTarget.java

Verify this pull request

This pull request is already covered by existing tests, i think...

This is my first PR to an open source project - i was a bit unsure of how to do the unit test for this change. Any pointers or insight would be welcomed :) !

Copy link
Contributor

@the-other-tim-brown the-other-tim-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delayed response, I've added some comments about places where you can add some tests. Thanks for taking up the patch!

@@ -0,0 +1,3 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the .gitignore with .vscode/* as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely!

InternalSchema.MetadataValue.MICROS);
metadata = DEFAULT_TIMESTAMP_PRECISION_METADATA;
break;
case "timestamp_ntz":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -268,9 +281,23 @@ private void commitTransaction() {
}

private Map<String, String> getConfigurationsForDeltaSync() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can test this here

I would expect we can add the timestamp_ntz and see that the delta table has the correct versions set for min reader/writer?

// a private variable to of deltaTable

// limit the results to the attributes needed
Dataset<Row> record = deltaTable.detail().select("minWriterVersion", "minReaderVersion");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get this information from deltaLog.snapshot().metadata()?

Copy link
Author

@ForeverAngry ForeverAngry May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think so. I looked through the api for the deltaLog.snapshot().metadata(), and did not find a referance to it. The only place i found it at was here https://docs.delta.io/latest/delta-utility.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to see if we can just let the Delta Lake library automatically set the versions for us. Can you try that as well? If I remember correctly, it looked like it would auto update them for us.

The test creates a new df with its partition field set as the timestamp ntz type.  IF this is set, then the table should be upgraded to a reader and writer version of 3,7.
@@ -57,7 +57,7 @@
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.Arguments;
import org.junit.jupiter.params.provider.MethodSource;

import org.apache.spark.sql.delta.DeltaConfigs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ForeverAngry Can you remove these changes to this file if they are no longer required?

@the-other-tim-brown
Copy link
Contributor

@ForeverAngry we've gone through some module renaming, can you pull in the latest changes from main to pick up the path changes?

@ghost
Copy link

ghost commented May 27, 2024

CI report:

Bot commands @xtable-bot supports the following commands:
  • @xtable-bot run azure re-run the last Azure build

@ForeverAngry
Copy link
Author

ForeverAngry commented May 27, 2024 via email

@ForeverAngry
Copy link
Author

@ForeverAngry we've gone through some module renaming, can you pull in the latest changes from main to pick up the path changes?

@the-other-tim-brown sure. Also, I don't have a lot of experience writing java test. Can you point out what I need to fix, to please the bot :)?

@the-other-tim-brown
Copy link
Contributor

@ForeverAngry we've gone through some module renaming, can you pull in the latest changes from main to pick up the path changes?

@the-other-tim-brown sure. Also, I don't have a lot of experience writing java test. Can you point out what I need to fix, to please the bot :)?

There were some issues with the test infra but those should be resolved now. After pushing your next commit, it should kick off a new run and I'll take a look at the results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants