Skip to content

Impossible to clone to path with unicode #920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mikicz opened this issue Sep 12, 2019 · 1 comment
Closed

Impossible to clone to path with unicode #920

mikicz opened this issue Sep 12, 2019 · 1 comment

Comments

@mikicz
Copy link
Contributor

mikicz commented Sep 12, 2019

Hi, in integration of GitPython I ran into an issue with cloning into directories that have an unicode name. This was an issue with version 2.1.9 and has not been fixed with upgrade to 3.0.2.

I am using Python 3.7.

Basically, if you pass a str with some unicode to e.g. Repo.clone_from than the package throws an UnicodeEncodeError, it seems in processing of the output of the command.

    return Repo.clone_from(repo, repo_path, branch=branch, **kwargs)
venv/lib64/python3.7/site-packages/git/repo/base.py:1023: in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
venv/lib64/python3.7/site-packages/git/repo/base.py:969: in _clone
    finalize_process(proc, stderr=stderr)
venv/lib64/python3.7/site-packages/git/util.py:333: in finalize_process
    proc.wait(**kwargs)
venv/lib64/python3.7/site-packages/git/cmd.py:399: in wait
    stderr = force_bytes(stderr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

data = "Cloning into '/tmp/arca/test/abčď/repos/_tmp_tmp1f9n_j71299f859f6fac750147aec65a9992f8e289ef42177ff1b234677764b2d5c61560/master'...\n", encoding = 'ascii'

    def force_bytes(data, encoding="ascii"):
        if isinstance(data, bytes):
            return data
    
        if isinstance(data, string_types):
>           return data.encode(encoding)
E           UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: ordinal not in range(128)

Since GitPython is Python 3+ only now, it would make make sense to set the default encoding in force_bytes to utf-8, that actually fixes the issue when I try it. This was proposed in gitpython-developers/gitdb#48 and or gitpython-developers/gitdb#49, but it's been a while since those have been proposed.

Maybe an another solution would be for the use to be able to select the default encoding somehow, as not to break previous cases, but to provide a solution for this issue?

This is related to #761, which seems to be stale at the moment. I'm raising the issue again since it's still an problem in the new version of GitPython which is Python 3+.

@Byron
Copy link
Member

Byron commented Sep 13, 2019

Thanks for raising the issue, it's known that (unfortunately) the encoding of strings is very messy in GitPython. No proper solution was ever implemented, yet I hope thanks to thoughtful contributions, we can eventually get there.

ulturt added a commit to ulturt/GitPython that referenced this issue Oct 26, 2019
@Byron Byron closed this as completed in ebf4656 Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants