Skip to content

DiffLine.content is represented as unicode #610

Closed
@mrh1997

Description

@mrh1997

The field DiffLine.content contains a unicode line. As git does not know anything about the encoding of the files to be diffed (they are blobs), I expect this object to be of type str in py2 and bytes in py3.

Even worse if a file is i.e. latin-1 encoded and contains latin-1 specific characters, all these characters are mapped to '\xfffd'. Thus is impossible to diff non-ascii encoded text files correctly.

I suppose this is a pygit2 bug, as the libgit2.h interface works correctly, as it exposes this field as const char * (see https://github.com/libgit2/libgit2/blob/HEAD/include/git2/diff.h#L555)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions