Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Committing an object deletion to master erases object from all commits to master #997

Closed
tsykes2020 opened this issue Dec 4, 2020 · 6 comments · Fixed by #1000
Closed
Assignees
Labels
contributor pr/high-priority Pull requests that should be reviewed first

Comments

@tsykes2020
Copy link

version: 0.19.0

Steps to reproduce

  1. Create new repository with default branch master and a local:// storage namespace
  2. Upload any file (eg file.jpg) to master and commit the change.
  3. Delete the file and commit the change.
  4. View the commit created in step 2

Expected result

The listing should include file.jpg

Actual result

The listing is empty

The above happens in both the web UI and the CLI.

It seems that the issue does not occur when the object is deleted from branches that it wasn't created under,

@arielshaqed
Copy link
Contributor

Thanks!

Results from immediate triage:

It looks bad in the UI, but the deleted object is still there. Our tests however are not specific enough to detect this and they should.

I recreated using the following commands (local block adapter on the backend, not that should matter):

echo foo | AWS_PROFILE=local aws --endpoint-url http://s3.local.lakefs.io:8000 s3 cp - s3://foo/master/abc
./lakectl commit lakefs://foo@master -m 'commit foo'
AWS_PROFILE=local aws --endpoint-url http://s3.local.lakefs.io:8000 s3 rm s3://foo/master/abc
./lakectl commit lakefs://foo@master -m 'delete foo'
./lakectl log lakefs://foo@master

Now browsing the objects in the GUI for the previous commit "commit foo" does not show abc, which is a bug. However if I attempt to download a file and then edit its download url to identify abc I can successfully download it. So I modified http://localhost:8000/api/v1/repositories/foo/refs/~2Pkc9jvtkJjuG/objects?path=def (path to def, which does exist on that commit) to http://localhost:8000/api/v1/repositories/foo/refs/~2Pkc9jvtkJjuG/objects?path=abc and got my file. Trying the same trick with http://localhost:8000/api/v1/repositories/foo/refs/~2Pkc9jvtkJjuH/objects?path=def ("delete foo") gives nothing, as expected.

It's a (serious) UI bug, but no data loss.

@arielshaqed arielshaqed added the pr/high-priority Pull requests that should be reviewed first label Dec 4, 2020
@arielshaqed
Copy link
Contributor

@tzahij who I think is our expert on this area -- WDYT?

@arielshaqed
Copy link
Contributor

lakectl fs ls also doesn't show the missing file...

ariels@ariels:~/Dev/lakeFS$ ./lakectl  log lakefs://foo@master --amount 2


ID: ~2Pkc9jvtkJjuH
Author: foo
Date: 2020-12-04 18:23:02 +0200 IST
	
	delete foo
	
	
ID: ~2Pkc9jvtkJjuG
Author: foo
Date: 2020-12-04 18:17:10 +0200 IST
	
	commit foo

ariels@ariels:~/Dev/lakeFS$ ./lakectl  fs ls lakefs://foo@~2Pkc9jvtkJjuG
common_prefix    1970-01-01 02:00:00 +0200 IST    0 B             a/
common_prefix    1970-01-01 02:00:00 +0200 IST    0 B             b/
object    2020-11-25 17:20:33 +0200 IST    12 B            def

ariels@ariels:~/Dev/lakeFS$ ./lakectl  fs ls lakefs://foo@~2Pkc9jvtkJjuH
common_prefix    1970-01-01 02:00:00 +0200 IST    0 B             a/
common_prefix    1970-01-01 02:00:00 +0200 IST    0 B             b/
object    2020-11-25 17:20:33 +0200 IST    12 B            def

@tzahij
Copy link
Contributor

tzahij commented Dec 4, 2020

@tzahij who I think is our expert on this area -- WDYT?

Looks bad, I am on it

@tzahij
Copy link
Contributor

tzahij commented Dec 5, 2020

Thanks to the accurate bug description by tsykes-2020, I was able to find the and fix the bug.
It does not appear in the single-entry retrieval path (GetEntry) that is used by the gateways in response to Get-object. It does appear in both in regular list entries, and list entries by prefix. Those are two distinct code paths, and i am happy to say both are fixed, and have additional tests.
I will pass the PR , I suggest we merge it on Sunday.

tzahij added a commit that referenced this issue Dec 6, 2020
…s to master - #997 (#1000)

* SSTable interface for committed data

* Remove unnecessary funcs from iterator

* added test for merge with zero changes

* initial commit of tree

* Change iterator to match the existing scanners interfaces

* initial commit of tree

* initial commit of tree

* initial commit for rocks catalog interface

* sstable initial implementation without repo seperation

* initial catalog interface

* initial catalog interface

* initial catalog interface

* Update catalog/rocks/catalog.go

Co-authored-by: itaiad200 <itaiad200@gmail.com>

* iterator interface

* diff type comment

* continue talking about StagingManager

* initial commit of tree

* just save changes - initial

* working on apply

* working on apply

* working on apply

* sstable interface

* changes

* fix to entry disappearing from a commit listing when it was deleted in a later commit

* Update catalog/mvcc/cataloger_delete_entry_test.go

Co-authored-by: arielshaqed <ariels@treeverse.io>

* compare with get entry with catalog entry not found

* changes after review

Co-authored-by: Itai Admi <itaiad200@gmail.com>
Co-authored-by: guyhardonag <guy.hardonag@treeverse.io>
Co-authored-by: guy-har <60321938+guy-har@users.noreply.github.com>
Co-authored-by: Barak Amar <barak.amar@treeverse.io>
Co-authored-by: arielshaqed <ariels@treeverse.io>
@nopcoder
Copy link
Contributor

nopcoder commented Dec 7, 2020

Hi @tsykes-2020 release v0.20.0 should include this fix - let us know if it is working for you or if you have any other issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor pr/high-priority Pull requests that should be reviewed first
Projects
None yet
4 participants