Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update with master #12

Conversation

JassAbidi
Copy link
Owner

No description provided.

lizhangdatabricks and others added 24 commits July 6, 2021 09:33
… - 1

When there’s a checkpoint at version 10 and a delta file at version 11, the earliest version returned should be version 10. We don’t handle that case correctly right now

unit test

AFFECTED VERSIONS:
PROBLEM DESCRIPTION:

Author: Li Zhang <li.zhang@databricks.com>

GitOrigin-RevId: 8e70e6b2aae76a5043653d3b6fdee45b824a9c27
Minor change in sbt install script.

Author: yaohua <yaohua.zhao@databricks.com>

GitOrigin-RevId: 26b0fe9df735739332a482ced59bdd90bd7534ec
…a.`<path>`" name

## What changes were proposed in this pull request?
Make DeltaTable.forName support "delta.`<path>`" name. Before this change, DeltaTable.forName(s"delta.`$dir`") would result in an error.

## This PR introduces the following *user-facing* changes
Before this change, DeltaTable.forName(s"delta.`$dir`") would result in an error. After this change, DeltaTable.forName(s"delta.`$dir`") would be allowed for Delta Table directories, but still blocked for empty (non Delta Table) directories.

## How was this patch tested?
Unit tested that DeltaTable.forName(s"delta.`$dir`") on a Delta Table directory is allowed, and that DeltaTable.forName(s"delta.`$dir`") on an empty directory is still blocked.

Author: Yuhong Chen <yuhong.chen@databricks.com>
Author: Yuhong Chen <mikechen212@gmail.com>

#22994 is resolved by FX196/dpgget9c.

GitOrigin-RevId: 7f0bd84e5d1064bbc2282d5330f2e8b74a45959d
…t - cleanup

This PR cleans up an unused variable in the IBMCOSLogStore.
The variable (`writeSize`) was a leftover from an older version of the LogStore implementation.
No logic changes are introduced.

Closes #692

Signed-off-by: Yijia Cui <yijia.cui@databricks.com>

Author: Guy Khazma <33684427+guykhazma@users.noreply.github.com>

#23256 is resolved by yijiacui-db/73jdbmso.

GitOrigin-RevId: 0a1de46e6b55b7ebff76b187a4d24f5a826df385
Set `spark.databricks.delta.commitLock.enabled` to `true` on Azure, as removing the lock will increase the chance to hit the concurrent error when overwriting the `_last_checkpoint` file concurrently.

The new unit tests.

-Regression: Azure users may hit concurrent error when overwriting the `_last_checkpoint` file concurrently.

Author: Shixiong Zhu <zsxwing@gmail.com>

GitOrigin-RevId: df9d11f1982bb71563934d9d389e40a1e37b7add
Minor refactor of test names and comment

Author: Zach Schuermann <zach.schuermann@databricks.com>

GitOrigin-RevId: 7d7c15e13c4c0b3f41fa3421c91e6a5a02812efa
Minor refactor

Author: Yuyuan Tang <yuyuan.tang@databricks.com>

GitOrigin-RevId: 1ae361b37d749cac3d06fe4cae18fc172fa464a7
…tection

two improvements
- every log line prints a unique identifier of the txn. this differentiates logs from concurrent txn to the same table in the same jvm (optimize does this all the time). the id is completely internal and used only for this log4j logging purpose.
- addititonal timing metrics to show the breakdown of timing between different steps conflict detection.

no unit tests

Author: Tathagata Das <tathagata.das1565@gmail.com>

GitOrigin-RevId: 3a0c424288660cbbcef5cb76cd66d75050b10828
Minor refactor style

N/A

Author: Lars Kroll <lars.kroll@databricks.com>

GitOrigin-RevId: c9c06110075d32c749eb1afb24ea6f873bfece61
Add new function getBinIndex in FileSizeHistogram, which returns the index of the bin to which given fileSize belongs OR -1 if given fileSize doesn't belongs to any bin

Existing UTs.

Author: Prakhar Jain <prakhar.jain@databricks.com>

GitOrigin-RevId: 9a8bee48e60a4cf2b0e1207c9a7ddc3c31991c82
Minor refactor of Delta conf code

Author: Prakhar Jain <prakhar.jain@databricks.com>

GitOrigin-RevId: 5ebc318ed5a9ad34529f0bf49ca7bf4b9399bccf
Minor refactor

Author: Lars Kroll <lars.kroll@databricks.com>

GitOrigin-RevId: d9b49a4fa92dea967104a82fdbae69534c14436a
Minor refactor of EvolvabilitySuiteBase

Test-only PR.

Author: Zach Schuermann <zach.schuermann@databricks.com>

GitOrigin-RevId: 73bc357d0634b1607ed77b3a4d709a39fe625b8b
## What changes were proposed in this pull request?

When a snapshot was created as an `InitialSnapshot` for a Delta table and is cached as such (for example, a race condition due to unmounting and mounting paths), then all following reads on that Delta table would return a “This path is not a Delta table” error. This PR adds an `update()` call to the Dataframe read path to prevent this from happening (and give the valid table).

This is done by forcing the computation of `snapshot` when we create a `BaseRelation` for `DeltaTableV2`. In short, this will call `deltaLog.update()` so we ensure that the check whether or not the table exists is accurate. This costs an additional RPC but is deemed necessary for correctness.

## How was this patch tested?

Added a unit test to simulate reading from a table with cached `InitialSnapshot` and a valid DeltaLog.

Author: Zach Schuermann <zach.schuermann@databricks.com>

#23778 is resolved by schuermannator/sc-78050.

GitOrigin-RevId: 8fd732bbf39788f92ea390f720aa9bb4246e8d12
Minor refactor of DeltaAnalysis code and update comments in DeltaInvariantCheckerExec

existing UT

Author: Linhong Liu <linhong.liu@databricks.com>

GitOrigin-RevId: 4582218e20f7eae532d063cc8613e2d964ee35d9
Strip the full temp view plan for Delta DML commands. This allows us to reenable the test for merging into SQL temp views for MERGE - previously resolution would fail.

new unit test

Author: Jose Torres <joseph.torres@databricks.com>

GitOrigin-RevId: b418f4bd194d6186390261cd8d32c4f2c9ed1048
NullType column is not very useful as they do not contain any contents. Hence we used to
drop this NullType column when we create a table from DataFrameReader, but we did not
do the same thing on SQL read path. This PR unifies the behavior, which will drop NullType column
always in any read/table/sql APIs.

Unit tests testing different read APIs.

Author: Junyong Lee <junyong.lee@databricks.com>

GitOrigin-RevId: 9b55e8fb5e51ffbfb86832a811668e9b920c225e
…acySuite

Minor code style change.

Author: Prakhar Jain <prakhar.jain@databricks.com>

GitOrigin-RevId: 217f785ec1dbb111e6ff1aca88e680774ce68e90
## What changes were proposed in this pull request?

This PR adds support for generated command in MERGE ... UPDATE case. Previously, if the generated column is not explicitly updated, we will copy over the old values, which would potentially break the generated column check constraint and fail the query. With this PR, the values of generated columns will be computed correctly using the (potentially updated) referenced columns.

This PR mostly reuses the utility functions for UPDATE to generate the correct update expressions, with some small changes needed to cover the schema evolution case.

## How was this patch tested?

Added new unit test.

Author: Meng Tong <meng.tong@databricks.com>

#23499 is resolved by mengtong-db/generated-column-merge.

GitOrigin-RevId: 6245c07c323255eb4a0db88150e520ef24e02af8
Add new testsuite OptimisticTransactionSuite

UTs

Author: Prakhar Jain <prakharjain09@gmail.com>
Author: Prakhar Jain <prakhar.jain@databricks.com>

GitOrigin-RevId: 6aa1b08ea56220e76b75807c4577d21b4547762c
…efactor DeltaMergeInto to Include Final Schema

Move MergeSchema in SchemaUtils to a new file to report finalSchema in DeltaMergeInto.
Refactor PreprocessTableMerge to report the fully analyzed DeltaMergeInto.

Unit test.

Author: Yijia Cui <yijia.cui@databricks.com>

GitOrigin-RevId: 5fe7e0d2a2e899a384382d8caa7273be8408ee14
…path

A minor refactor to call defaultTablePath only once

Author: Yuchen Huo <yuchen.huo@databricks.com>

GitOrigin-RevId: 1eafe477d0d2ad9d6980398f57dedea31c020d75
This PR refactors the conflict detection code flow to a separate class so that:
- Improve readability of the current code: The current code is has a single `checkForConflict` method which do all the required checks.

Existing UTs

GitOrigin-RevId: 54ad050e0967fa49a61f2677fe1510242d0916d5
The bintray url is not working now. Use the `repo.typesafe.com` link instead.

Closes #711

Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>

Author: Shixiong Zhu <zsxwing@gmail.com>

#24449 is resolved by zsxwing/den3b8hd.

GitOrigin-RevId: 1f8fdb3bba694ff53001d13ebca9f84dfae0748e
@JassAbidi JassAbidi merged commit e9fd7b4 into JassAbidi:set_the_right_isolation_level_in_the_CommitInfo Jul 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.