-
Notifications
You must be signed in to change notification settings - Fork 168
Handle NessieConflictException in Delta clients + handling of multiple-branches
#1249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1249 +/- ##
============================================
- Coverage 72.25% 71.34% -0.91%
Complexity 142 142
============================================
Files 232 242 +10
Lines 9233 9381 +148
Branches 839 855 +16
============================================
+ Hits 6671 6693 +22
- Misses 2127 2250 +123
- Partials 435 438 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
7524732 to
55de2e7
Compare
NessieConflictException in Delta clientsNessieConflictException in Delta clients + handling of multiple-branches
55de2e7 to
ab1fee6
Compare
…ple-branches The Delta clients can run into an unrecoverable (dead) state, when the reference changed in the mean time. This commit adds a single retry in case the client throws a `NessieConflictException`. The Delta-Client kept a single `Reference` that was initialized with the named-reference given in the hadoop-config. However, if the configuration changed to switch to another Nessie branch, that `Reference` was never refreshed and resulted into `NessieNotFoundException`s. The change is to replace that single `Reference` field with a map of ref-name to `Reference` to support multiple branches. Also adds tests to verify that the Delta client works with multiple branches and the retry when hitting a `NessieConflictExcepiton`. The new tests are disabled for Spark2/Delta0.6.
ab1fee6 to
e527d32
Compare
| /** | ||
| * Keeps a mapping of reference name to current hash. | ||
| */ | ||
| private val referenceMap: util.Map[String, Reference] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a java map and not a scala map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need a mutable map here - and using a Java HashMap felt to be the easiest way to go.
| // Delta < 0.8 w/ Spark 2.x doesn't support multiple branches well (warnings when changing the configuration) | ||
| @DisabledIfSystemProperty(named = "skip-multi-branch-tests", matches = "true") | ||
| void testCommitRetry() throws Exception { | ||
| String csvSalaries1 = ITDeltaLog.class.getResource("/salaries1.csv").getPath(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use csvs when everywhere else in spark tests we create simple few row in memory datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mixture of "laziness and reproduction of the original issue". I wanted this test to be close to what the demo in nessie-demos looks like.
| } | ||
|
|
||
| @Test | ||
| // Delta < 0.8 w/ Spark 2.x doesn't support multiple branches well (warnings when changing the configuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my benefit can you explain a bit more? If i recall correctly 0.6.0 is Spark2 only and >0.7.0 is Spark3 only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's delta/spark2 - it complains with a warning that the configuration has changed and "might not have been applied completely".
| .getOrCreate(); | ||
| spark.sparkContext().setLogLevel("WARN"); | ||
|
|
||
| nessieClient = NessieClient.builder().withUri(url).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we rename the url parameter
The Delta clients can run into an unrecoverable (dead) state, when the reference changed in the mean time.
This commit adds a single retry in case the client throws a
NessieConflictException.The Delta-Client kept a single
Referencethat was initialized with the named-reference givenin the hadoop-config. However, if the configuration changed to switch to another Nessie branch,
that
Referencewas never refreshed and resulted intoNessieNotFoundExceptions.The change is to replace that single
Referencefield with a map of ref-name toReferenceto support multiple branches.
Also adds tests to verify that the Delta client works with multiple branches and the retry
when hitting a
NessieConflictExcepiton.The PR also changes the three delta-modules to include the resources from the -core module for tests.
This change is