Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submodule.__init__ parent_commit conversion/validation is implied but not done #1869

Closed
EliahKagan opened this issue Mar 12, 2024 · 8 comments

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Mar 12, 2024

The parent_commit parameter of Submodule.__init__ is documented this way:

:param parent_commit:
See :meth:`set_parent_commit`.

Submodule.__init__ binds this directly to the private _parent_commit attribute:

self._parent_commit = parent_commit

But this is at odds with the documented relationship to Submodule.set_parent_commit. That method's commit parameter corresponds to the parent_commit parameter of Submodule.__init__, in that its commit parameter is used to identify the commit, if any, to set as _parent_commit. However, set_parent_commit performs both conversion and validation.

This is the relevant fragment of set_parent_commit's docstring:

:param commit:
Commit-ish reference pointing at the root_tree, or ``None`` to always point
to the most recent commit.

When commit is None, it sets None to _parent_commit. Otherwise, however, commit may not already be a Commit object, and that is okay, because a commit is looked up from it:

pcommit = self.repo.commit(commit)

That's the conversion. Then validation is performed, with _parent_commit ending up set to the commit that commit identified only if there is such a suitable commit:

pctree = pcommit.tree
if self.k_modules_file not in pctree:
raise ValueError("Tree of commit %s did not contain the %s file" % (commit, self.k_modules_file))
# END handle exceptions
prev_pc = self._parent_commit
self._parent_commit = pcommit
if check:
parser = self._config_parser(self.repo, self._parent_commit, read_only=True)
if not parser.has_section(sm_section(self.name)):
self._parent_commit = prev_pc
raise ValueError("Submodule at path %r did not exist in parent commit %s" % (self.path, commit))
# END handle submodule did not exist
# END handle checking mode

The type annotations do not reveal the intent, as they are among those using Commit_ish that need to be updated with the fix for Commit_ish, and that I am fixing up in #1859. My immediate motivation for opening this issue is that I'm having trouble figuring out how to annotate them, because due to the inconsistency between the docstring and the implementations, I don't know what is intended to be accepted.

Either the documentation should be updated, which could be part of #1859, or the code should be fixed to perform any expected validation and conversion and a test case added to check that this is working, which would be best done separately from #1859 (lest its scope expand ever further). I am not sure which. For #1859, it is likely sufficient for me to know what is intended, so full progress on this is not needed to finish #1859. It is my hope and also strong guess that this issue is not a blocker for #1859.

This should not be confused with #1866, which is about the parent_commits parameter of IndexFile.commit rather than the parent_commit parameter of Submodule.__init__ (and which, unlike this, really is about annotations).

@Byron
Copy link
Member

Byron commented Mar 13, 2024

Thanks a lot for documenting the issue.

As this implementation was intended to be 'a more easily usable' version of git submodules (back then they were much rougher than they are now), I'd think it's better to let the code match the documented behaviour, assuming that wouldn't unnecessarily narrow its applicability but instead makes it safer. However, it's hard for me to be more assertive here.

@EliahKagan
Copy link
Contributor Author

EliahKagan commented Mar 13, 2024

I worry about the possibility that binding something other than a GitPython Commit (or None) object to the private _parent_commit attribute might actually be relied on. Because much interaction with repositories is through the git executable where GitPython marshals Python objects into command-line arguments by using their str, often more types than intended are possible to use successfully.

To be clear, I'm not suggesting GitPython forever continue to support all possible wrong uses that happen somehow to be working, only that maybe this hasn't been found before because the asymmetry is doing something valuable that should be retained. I'll look into this a bit further by seeing how _parent_commit is used, and how the methods that access it are themselves used.

I'll report back, at least to say whether I was able to find enough information to proceed with the related part of #1859.

@EliahKagan
Copy link
Contributor Author

I've looked into this a bit further. First, I should mention that some of the validation I showed above is not done by default in Submodule.set_parent_commit:

if self.k_modules_file not in pctree:
raise ValueError("Tree of commit %s did not contain the %s file" % (commit, self.k_modules_file))

But then also, related to that, there is this near the end, which I had not shown:

# Update our sha, it could have changed.
# If check is False, we might see a parent-commit that doesn't even contain the
# submodule anymore. in that case, mark our sha as being NULL.
try:
self.binsha = pctree[str(self.path)].binsha
except KeyError:
self.binsha = self.NULL_BIN_SHA

The potentially expensive validation is not done by default--check defaults to False--so a plain reading of the Submodule.__init__ docstring saying to see set_parent_commit for the parent_commit parameter would be that this extra validation should not be done in __init__ either and that, moreover, initializing a Submodule object should set its parent commit the same way as calling set_parent_commit.

But it seems to me that it would not be intuitive that constructing a Submodule object with an explicit binsha and explicit parent_commit could result in no error, but the constructed Submodule object immediately having an all-zeros binsha instead of the binsha passed. For this reason, I am reluctant to forge ahead in having Submodule.__init__ call Submodule.set_parent_commit (or equivalent) without further discussion.

Besides that, however, it does seem like the private _parent_commit attribute that these methods set is always expected, when not None, to be a Commit object and not, for example, a string. In particular, Submodule methods like config_reader need this. The config_reader method calls _config_parser_constrained, which calls _config_parser, which for unrecognized values (even when correct) ends up calling _sio_modules, which is incompatible with a str because it accesses the tree attribute of the parent commit object:

sio = BytesIO(parent_commit.tree[cls.k_modules_file].data_stream.read())

I get AttributeError if I set a string initially when constructing a Submodule object and then call config_reader.

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 14, 2024
These used the old Commit_ish. They may not all be the same as each
other, since they perform different validation and some will look
up a commit while others do not. In particular, this is complicated
by how Submodile.__init__ documents its parent_commit parameter: it
says to see the set_parent_commit method. But __init__ sets the
_parent_commit attribute to parent_commit, while set_parent_commit
treats its commit parameter differently, calling self.repo.commit
on it to get get a Commit object from it even if it wasn't one to
begin with. See gitpython-developers#1869 for full details.

There may be some other subtleties as well. A number of uses of the
of the old Commit_ish type appear in git.submodule.base, all
related to parent_commit. These are all remaining references to it
(identifiable due to the rename to Old_commit_ish done in 04a2753),
though. So once the changes begun here for git.submodule.base are
done, that will also have completed the broader replacement effort
begun in 7328a00. At that point cleanup can be done in git.types
(removing Old_commit_ish introduced in 04a2753, replacing
Lit_old_commit_ish with a possibly-updated and deprecated
Lit_commit_ish to avoid a breaking change, and possibly making
further refinements to additions to the newly added docstrings).
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 14, 2024
These used the old Commit_ish. They may not all be the same as each
other, since they perform different validation and some will look
up a commit while others do not. In particular, this is complicated
by how Submodile.__init__ documents its parent_commit parameter: it
says to see the set_parent_commit method. But __init__ sets the
_parent_commit attribute to parent_commit, while set_parent_commit
treats its commit parameter differently, calling self.repo.commit
on it to get get a Commit object from it even if it wasn't one to
begin with. See gitpython-developers#1869 for full details.

There may be some other subtleties as well. A number of uses of the
of the old Commit_ish type appear in git.submodule.base, all
related to parent_commit. These are all remaining references to it
(identifiable due to the rename to Old_commit_ish done in 04a2753),
though. So once the changes begun here for git.submodule.* are
done, that will also have completed the broader replacement effort
begun in 7328a00. At that point cleanup can be done in git.types
(removing Old_commit_ish introduced in 04a2753, replacing
Lit_old_commit_ish with a possibly-updated and deprecated
Lit_commit_ish to avoid a breaking change, and possibly making
further refinements to additions to the newly added docstrings).
@Byron
Copy link
Member

Byron commented Mar 14, 2024

Thanks for the update!

By the looks of it, fixing the code is a bit tougher than anticipated and it pains me to see you spend time on the submodule implementation, even though I know how it came to be naturally and that it is necessary. After all, the submodule implementation is likely broken in many ways, so it is my hope that any fixes only go as deep as they have to be in order to help with what initially triggered the investigation into submodules in the first place.

With that said, maybe adjusting the documentation would be easier at this point, and not necessarily (much) worse?

@EliahKagan
Copy link
Contributor Author

I think for now--not as a fix for this bug, but as a way forward that somewhat mitigates it at least from a documentation perspective and also allows #1859 to be finished--that it may be sufficient just to fix the type annotations based on what actually works. This will make clearer how they can be used, and updates to the docstrings or changes to their behavior could come later. I've done this in 1f03e7f.

@Byron Byron closed this as completed in 1f03e7f Mar 14, 2024
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 14, 2024
This includes a brief description of the Submodule.__init__
parent_commit parameter in its docstring, rather than only
referring to the set_parent_commit method, whose semantics differ
due to conversation and validation, and which accepts more types
than just Commit or None.

The wording is based on wording in set_parent_commit, adjusted for
the difference in types, and set_parent_commit remains reference
for further details.

This builds on 1f03e7f (gitpython-developers#1859) in improving the situation described
in gitpython-developers#1869.
@EliahKagan
Copy link
Contributor Author

It looks like this issue may have been inadvertently closed automatically as a result of the wording in 1f03e7f:

Even once that is done, this will not have fixed #1869

The phrase "will not have fixed #1869" contains "fixed #1869", which GitHub took to mean that the commit fixed #1869:

Screenshot of the commit viewed on GitHub in a web browser, with the mouse cursor over the word "fixed", showing the GitHub tooltip saying "This commit closes issue #1869"

I've done a bit more in #1877, which I think is the minimal further change necessary to consider this fixed (though it might be appropriate to leave it open even after that, since greater documentation changes and/or behavioral changes may be warranted).

@Byron
Copy link
Member

Byron commented Mar 15, 2024

Thanks for catching this! I merged #1877 and would be happy with keeping this issue closed if you agree. Otherwise it's definitely OK to keep it open as well.

@EliahKagan
Copy link
Contributor Author

I think it may be reasonable to consider #1877 (together with previous changes) to have fixed this, I'm not sure. I'm not advocating that it be reopened at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants