Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blob object's data stream incomplete for certain files #220

Closed
terminalmage opened this issue Dec 31, 2014 · 9 comments
Closed

Blob object's data stream incomplete for certain files #220

terminalmage opened this issue Dec 31, 2014 · 9 comments

Comments

@terminalmage
Copy link
Contributor

>>> import git
>>> git.__version__
'0.3.2.1'
>>> import os
>>> os.mkdir('foo')
>>> repo = git.Repo.init('foo')
>>> repo.create_remote('origin', 'https://github.com/terminalmage/gitfs-test1.git')
<git.Remote "origin">
>>> repo.git.config('http.sslVerify', 'true')
''
>>> origin = repo.remotes[0]
>>> origin.fetch()
[<git.remote.FetchInfo at 0x7f2f12da6aa0>,
 <git.remote.FetchInfo at 0x7f2f12da6af0>,
 <git.remote.FetchInfo at 0x7f2f12da6b40>]
>>> repo.refs
[<git.RemoteReference "refs/remotes/origin/master">,
 <git.RemoteReference "refs/remotes/origin/slash/test">,
 <git.TagReference "refs/tags/foo_tag">]
>>> tree = repo.refs[0].commit.tree
>>> blob = tree / 'loremipsum.txt'
>>> blob
<git.Blob "93c8ff2243c4d2e9c63850d025a25ed38a51e623">
>>> blob.size
18804
>>> len(blob.data_stream.read())
18804
>>> blob = tree / 'saltstack.png'
>>> blob.size
7377
>>> len(blob.data_stream.read())
7377
>>> blob = tree / 'imagemagick_6.7.7.10-6ubuntu4_amd64.deb'
>>> blob.size
222318
>>> len(blob.data_stream.read())
221818
>>> blob = tree / 'archlinux_1080p.png'
>>> blob.size
778428
>>> len(blob.data_stream.read())
778428

Note that the .deb package's data stream is not the correct size (221818 instead of 222318 as it should be).

This doesn't seem to be related to the file size, as a larger PNG image works just fine. Both the PNG images and the .deb package are marked as binary files in the .gitattributes file, as well.

@Byron
Copy link
Member

Byron commented Jan 1, 2015

I may have an idea what's causing this, and will try to look into it today. Using the gitfs-test repository, I should be able to reproduce the issue right away.

Will let you know about details once I have them.

@Byron Byron self-assigned this Jan 1, 2015
Byron added a commit to gitpython-developers/gitdb that referenced this issue Jan 1, 2015
the issue described in gitpython-developers/GitPython#220

See test notes for proper usage, it all depends on a useful dataset with high entropy
@Byron Byron removed this from the v0.3.3 milestone Jan 1, 2015
@Byron
Copy link
Member

Byron commented Jan 1, 2015

The issue seems to be fixed with the release of gitdb 0.6.1 .

Even though I put the file in question into the test-suite to assert the fix and added a few more checks to possibly find more issues of this kind, I wasn't able to find another file that showed the issue. My guess is that high-entropy loose object files are causing it.

The fix consists of setting the decompression buffer of the compressed loose object from 512B to 8196B, which is the same value git was using at the time I implemented it.

Please verify the issue is gone on your end as well - in the meanwhile I will see if I can find another fix that relies more on the understanding of the matter, instead of some zlib inner workings.

@terminalmage
Copy link
Contributor Author

Fantastic, I should be able to test this by the end of the week.

@Byron
Copy link
Member

Byron commented Jan 1, 2015

Great - if you have access to repositories with plenty of binary data in it, you should be able to test for the error using this test.

The self.gitrepopath symbol is initialized with the value of the GITDB_TEST_GIT_REPO_BASE environment variable, which you can just set to the .git directory of the repository you want to test all files in.

The memory database will do exactly what a loose object database does, except that it only writes to and reads from memory, and I have just verified it truly spots the issue with the file you provided with the previous patch removed.

In Short

To test your repository in its entirety, set an environment variable to the repository and run a particular test using nose.

 export GITDB_TEST_GIT_REPO_BASE=myrepo/.git
 nosetests gitdb/test/performance/test_pack.py -m test_loose_correctness

@terminalmage
Copy link
Contributor Author

I was able to confirm the fix. BTW, I noticed that pypi had not been updated with a tar archive for 0.6.1, so if you were waiting for a confirmation on the fix, you should be good to push the source to pypi now.

@Byron
Copy link
Member

Byron commented Jan 2, 2015

Thank you !

It's gitdb that was updated, not git-python. Also I am hoping pip will automatically check for updates in dependent packages when users try to update git-python.

@terminalmage
Copy link
Contributor Author

Yes, I'm aware of this. There is no source link for 0.6.1, and when browsing the index on pypi there is no tar archive for 0.6.1: https://pypi.python.org/packages/source/g/gitdb/

@Byron
Copy link
Member

Byron commented Jan 2, 2015

Ouch, that hurts ! What was I thinking ? "Nothing" would probably be an acceptable answer !

Finally, the source exists, thanks again for letting me know so persistently :) !

@terminalmage
Copy link
Contributor Author

Haha, no problem. Thanks again for the quick fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants