-
-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stacktrace; pump stream
fails when commit diff is on a file with a colon in the name
#1210
Comments
git pump
fails when diff is on a binary filepump stream
fails when diff is on a binary file
pump stream
fails when diff is on a binary filepump stream
fails when commit diff is on a binary file
pump stream
fails when commit diff is on a binary filepump stream
fails when commit diff is on a binary file
Thanks for the report. Would you be able to run Line 509 in ea43def
…you should be able to make out the line that would break such an assumption. It would be interesting to see if the issue could have a workaround by avoiding to force parsing all output, and instead stopping at the first encountered modification, maybe like so: if next(prev_commit.diff(head_commit).iter_change_type("M")) is not None: … |
@Byron I can add some logging in between each of the three lines in:
to help narrow this down. We want a list of all modified files in our code, so we don't want to stop at the first one, but I'll log what happens with the code you suggested too. However this runs as part of an automated job in kubernetes that does a git clone and runs some checks on a git repo, so it's not easy to add a terminal intervention |
Thanks. I would hope there is a way for you to clone this repo yourself and execute the suggested command-line by hand, it should be reproducible locally. Logging between the lines might be useful if it reveals exactly what GitPython tries to parse - something I wouldn't know how to do in that case without patching GitPython itself. Ultimately, GitPython encounters a format it doesn't know, and knowing the input should allow to fix it. As you are indeed interested in the entire list, there shouldn't be a need to try the alternative one for a workaround. |
@Byron ok here is a completely reproducible codesample; except I can't give you the git repo that contains the files. Im going to try running your git diff command next To trigger a stack trace, I go into
this will blow in the stack trace:
|
@Byron in terms of your other request; ive only edited a sensitive foldername to
PRODUCES
I actually bet this has to do with the fact that these filenames contain spaces ( |
Thanks a lot. It turns out that colons With a failing test added, it should be possible to implement a fix by improving the parser. Line 497 in 651a81d
To my mind, a more sane way is to rely on the presence of NULL bytes and split across them instead. There are subtleties with renames showing multiple paths, each of which separated by NULL bytes. On top of that, the parser implementation is actually a streaming parser which will break if there are newlines in paths or if the line read is truncated (maybe due to a file path being very, very long. Line 80 in 651a81d
|
pump stream
fails when commit diff is on a binary filepump stream
fails when commit diff is on a file with a colon in the name
thanks for your investigation! btw, if the thought of colons and spaces in filenames makes you cringe.. us too - but we have to sync the customer files exactly as we receive them, so we aren't in a position to simply say "don't do that". |
You are welcome! In that moment it would be beneficial to also avoid streaming entirely and just store gits output in a string at once, parsing it afterwards, eliminating another error source in the process. |
Thanks @tommyjcarpenter - I found a similar problem in another project. I hope it's ok to re-use this issue for more "test" cases. A similar problem happens when working with the Airavata open source project.
If I run the command manually, I get the following
If I run the command without
Any idea how to fix this? |
The filenames of screenshot contains colon and causes GitPython's bug[1], so I replace colon with underline. [1] gitpython-developers/GitPython#1210
Hi
We use this code to check whether a git repo contains only "new" files or whether a file was modified:
When running this on repos with only csv data, it's fine.
However when running this on repos with binary files in it, like
.xlxs
, we blow in a stack trace like this; but because it's threaded I am not sure which exact line above is internally causing this "pump"Near that line of code I see https://github.com/gitpython-developers/GitPython/blob/main/git/diff.py#L498
I do not know how to get to this output on my git terminal. I was going to try to make a test case here seeing if this looks different for binary files (basically make a PDF or an excel file, then edit it, then commit it; but reading the log doesnt show such a format)
We do not pin the version of
gitpython
and we rebuild this docker container fairly often so this should be running with the latest version in pypi (488 on your master branch does not coorespond to that line of code; however you've had commits to master since the pypi release)The text was updated successfully, but these errors were encountered: