Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

local/importer backend return error if source data contains duplicated row #467

Open
glorv opened this issue Nov 11, 2020 · 1 comment
Open
Labels
feature-request This issue is a feature request

Comments

@glorv
Copy link
Contributor

glorv commented Nov 11, 2020

Feature Request

Is your feature request related to a problem? Please describe:

In some cases, the source files may contain rows that primary/unique keys are conflict with each other. It will be helpful if lightning local/importer backend can raise an error early when reading the source file, instead of failed until find checksum mismatch.

Describe the feature you'd like:

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Optimization:

@glorv glorv added the feature-request This issue is a feature request label Nov 11, 2020
@kennytm
Copy link
Collaborator

kennytm commented Nov 11, 2020

We likely won't do anything for Importer backend.

In Local backend, when on-duplicate is "error", we can perform a Get() on the (indexed) Batch before calling Set(). If the result of Get() is empty or exactly the same as the new value, we accept it. Otherwise, there is definite duplication and should be flagged.

Known drawbacks:

  1. Cross-engine (including disk-quota-flush) duplication cannot be detected, i.e. there are still false negatives that can only be verified at the checksum phase.
  2. This will probably incur >2× performance hit.

(Due to the false negative, on-duplicate = "ignore" cannot be reliably supported and thus should be rejected.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request This issue is a feature request
Projects
None yet
Development

No branches or pull requests

2 participants