Skip to content

Commit e0517c1

Browse files
[PATCH] Fix parsing submodule URL (passes tests) (#7)
* Fix parsing submodule URL. * Fallback to dummy values when Subproject is not available. * update tests - wip * fixes * lint test files too * add string utils * subproject objects can have a None project attribute * submodule_to_project can return None * more robust url parsing * fix lint * fix tests * fix tests (2) * add `self_managed_gitlab_host` optional arg, clean code a bit * run tests for python 3.7 too * update README.md * update user link * add README.md in test Co-authored-by: Darkdragon-001 <darkdragon-001@web.de>
1 parent 9af5b3a commit e0517c1

14 files changed

+426
-149
lines changed

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ PROJECT = gitlab_submodule
22

33
lint:
44
flake8 $(PROJECT) --count --show-source --statistics
5+
flake8 tests --count --show-source --statistics
56

67
test:
78
PYTHON_VERSION=$$(python3 --version) && \

README.md

Lines changed: 40 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,38 @@ List project submodules and get the commits they point to with python-gitlab.
55

66
The [Gitlab REST API V4](https://docs.gitlab.com/ee/api/api_resources.html)
77
doesn't implement anything for submodule inspection yet. The only thing we can
8-
currently do is [updating a submodule to a new commit id](https://docs.gitlab.com/ee/api/repository_submodules.html),
8+
currently do is [updating a submodule to a new commit id](
9+
https://docs.gitlab.com/ee/api/repository_submodules.html),
910
but how can we trigger such an update if we don't know the current commit id
1011
of our submodule?
1112

1213
If you're using `python-gitlab` and you're distributing shared code among
1314
your projects with submodules, you've probably run into this issue already.
1415

1516
This package provides minimal utils to list the submodules present in a
16-
Gitlab project, and more importantly to get the commits they're pointing to
17-
(when the submodules are Gitlab projects themselves, otherwise we cannot
18-
access the project via their URLs using `python-gitlab` only).
17+
Gitlab project, and more importantly to get the commits they're pointing to.
1918

2019
Internally, it reads and parses the `.gitmodules` file at the root of the
2120
Project. To get the commit id of a submodule, it finds the last commit that
2221
updated the submodule and parses its diff.
2322

23+
---
24+
**About the future of this package**
25+
26+
I don't plan to make PRs to `python-gitlab` for now.
27+
28+
In my opinion this problem should ideally be fixed in the Gitlab REST API,
29+
and then `python-gitlab` could wrap around the new endpoints.
30+
31+
So I see this package as a temporary solution until the API gets extended
32+
with more submodule functionalities.
33+
34+
[@darkdragon-001](https://github.com/darkdragon-001) created an issue on
35+
GitLab about the lack of support for submodules, feel free to support it with
36+
a thumb up: https://gitlab.com/gitlab-org/gitlab/-/issues/352836
37+
38+
---
39+
2440
## Requirements
2541
- Python >= __3.7__ (required by `python-gitlab` since version `3.0.0`)
2642

@@ -100,14 +116,13 @@ Output:
100116
### `iterate_subprojects(...)`
101117
What you'll probably use most of the time.<br/>
102118
- Yields [`Subproject`](#class-subproject) objects that describe the submodules.
103-
- Ignores submodules that are not hosted on Gitlab. If you want to list all
104-
modules present in the `.gitmodules` file but without mapping them to
105-
`gitlab.v4.objects.Project` objects, use [`list_submodules(...)`](#list_submodules) instead.
106119
```python
107120
iterate_subprojects(
108121
project: Project,
109122
gl: Union[Gitlab, ProjectManager],
110123
ref: Optional[str] = None,
124+
only_gitlab_subprojects: bool = False,
125+
self_managed_gitlab_host: Optional[str] = None,
111126
get_latest_commit_possible_if_not_found: bool = False,
112127
get_latest_commit_possible_ref: Optional[str] = None
113128
) -> Generator[Subproject, None, None]
@@ -118,13 +133,20 @@ Parameters:
118133
`projects: gitlab.v4.objects.ProjectManager` attribute
119134
- `ref`: (optional) a ref to a branch, commit, tag etc. Defaults to the
120135
HEAD of the project default branch.
121-
- `get_latest_commit_possible_if_not_found`: in some rare cases, there
122-
won't be any `Subproject commit ...` info in the diff of the last commit
123-
that updated the submodules. Set this option to `True` if you want to get
124-
instead the most recent commit in the subproject that is anterior to the
136+
- `only_gitlab_subprojects`: (optional) if set to `True`, will ignore the
137+
submodules not hosted on GitLab. If set to `False` (default), it will yield
138+
[`Subproject`](#class-subproject) objects with `self.project = None`
139+
for submodules not hosted on GitLab.
140+
- `self_managed_gitlab_host`: (optional) if some submodules are hosted on a
141+
self-managed GitLab instance, you should pass its url here otherwise it
142+
may be impossible to know from the URL that it's a GitLab project.
143+
- `get_latest_commit_possible_if_not_found`: (optional) in some rare cases,
144+
there won't be any `Subproject commit ...` info in the diff of the last
145+
commit that updated the submodules. Set this option to `True` if you want to
146+
get instead the most recent commit in the subproject that is anterior to the
125147
commit that updated the submodules of the project. If your goal is to
126148
check that your submodules are up-to-date, you might want to use this.
127-
- `get_latest_commit_possible_ref`: in case you set
149+
- `get_latest_commit_possible_ref`: (optional) in case you set
128150
`get_latest_commit_possible_if_not_found` to `True`, you can specify a ref for the
129151
subproject (for instance your submodule could point to a different branch
130152
than the main one). By default, the main branch of the subproject will be
@@ -140,10 +162,13 @@ returns a `list` of [`Subproject`](#class-subproject) objects.
140162
Basic objects that contain the info about a Gitlab subproject.
141163

142164
Attributes:
143-
- `project: gitlab.v4.objects.Project`: the Gitlab project that the submodule links to
165+
- `project: Optional[gitlab.v4.objects.Project]`: the Gitlab project that the
166+
submodule links to (can be `None` if the submodule is not hosted on GitLab)
144167
- `submodule: `[`Submodule`](#class-submodule): a basic object that contains
145168
the info found in the `.gitmodules` file (name, path, url).
146-
- `commit: gitlab.v4.objects.ProjectCommit`: the commit that the submodule points to
169+
- `commit: Union[gitlab.v4.objects.ProjectCommit, Commit]`: the commit that
170+
the submodule points to (if the submodule is not hosted on GitLab, it will
171+
be a dummy `Commit` object with a single attribute `id`)
147172
- `commit_is_exact: bool`: `True` most of the time, `False` only if the commit
148173
had to be guessed via the `get_latest_commit_possible_if_not_found` option
149174

@@ -195,6 +220,7 @@ hosted on Gitlab.
195220
submodule_to_subproject(
196221
gitmodules_submodule: Submodule,
197222
gl: Union[Gitlab, ProjectManager],
223+
self_managed_gitlab_host: Optional[str] = None,
198224
get_latest_commit_possible_if_not_found: bool = False,
199225
get_latest_commit_possible_ref: Optional[str] = None
200226
) -> Subproject

gitlab_submodule/__init__.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@
77

88
__all__ = [
99
'Submodule', 'Subproject',
10-
'list_submodules',
11-
'iterate_submodules',
10+
'list_submodules', 'iterate_submodules',
1211
'submodule_to_subproject',
1312
'iterate_subprojects', 'list_subprojects'
1413
]

gitlab_submodule/gitlab_submodule.py

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,21 @@ def _get_project_manager(
2323
def submodule_to_subproject(
2424
gitmodules_submodule: Submodule,
2525
gl: Union[Gitlab, ProjectManager],
26+
self_managed_gitlab_host: Optional[str] = None,
2627
get_latest_commit_possible_if_not_found: bool = False,
2728
get_latest_commit_possible_ref: Optional[str] = None
2829
) -> Subproject:
29-
submodule_project = submodule_to_project(gitmodules_submodule,
30-
_get_project_manager(gl))
30+
submodule_project = submodule_to_project(
31+
gitmodules_submodule,
32+
_get_project_manager(gl),
33+
self_managed_gitlab_host
34+
)
3135
submodule_commit, commit_is_exact = get_submodule_commit(
3236
gitmodules_submodule,
3337
submodule_project,
3438
get_latest_commit_possible_if_not_found,
35-
get_latest_commit_possible_ref)
39+
get_latest_commit_possible_ref
40+
)
3641
return Subproject(
3742
gitmodules_submodule,
3843
submodule_project,
@@ -45,20 +50,21 @@ def iterate_subprojects(
4550
project: Project,
4651
gl: Union[Gitlab, ProjectManager],
4752
ref: Optional[str] = None,
53+
only_gitlab_subprojects: bool = False,
54+
self_managed_gitlab_host: Optional[str] = None,
4855
get_latest_commit_possible_if_not_found: bool = False,
4956
get_latest_commit_possible_ref: Optional[str] = None
5057
) -> Generator[Subproject, None, None]:
5158
for gitmodules_submodule in iterate_submodules(project, ref):
52-
try:
53-
yield submodule_to_subproject(
54-
gitmodules_submodule,
55-
_get_project_manager(gl),
56-
get_latest_commit_possible_if_not_found,
57-
get_latest_commit_possible_ref)
58-
except ValueError:
59-
continue
60-
except Exception:
61-
raise
59+
subproject: Subproject = submodule_to_subproject(
60+
gitmodules_submodule,
61+
_get_project_manager(gl),
62+
self_managed_gitlab_host,
63+
get_latest_commit_possible_if_not_found,
64+
get_latest_commit_possible_ref
65+
)
66+
if not (only_gitlab_subprojects and not subproject.project):
67+
yield subproject
6268

6369

6470
def list_subprojects(*args, **kwargs) -> List[Subproject]:

gitlab_submodule/objects.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
1+
from typing import Union, Optional
2+
13
from gitlab.v4.objects import Project, ProjectCommit
24

5+
from gitlab_submodule.string_utils import lstrip
6+
37

48
class Submodule:
59

@@ -48,18 +52,16 @@ def __repr__(self):
4852
)
4953

5054

51-
def lstrip(string: str, pattern: str) -> str:
52-
if string[:len(pattern)] == pattern:
53-
return string[len(pattern):]
54-
else:
55-
return string
55+
class Commit:
56+
def __init__(self, _id) -> None:
57+
self.id = id
5658

5759

5860
class Subproject:
5961
def __init__(self,
6062
submodule: Submodule,
61-
project: Project,
62-
commit: ProjectCommit,
63+
project: Optional[Project],
64+
commit: Union[ProjectCommit, Commit],
6365
commit_is_exact: bool):
6466
self.submodule = submodule
6567
self.project = project

gitlab_submodule/string_utils.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
def lstrip(string: str, pattern: str) -> str:
2+
if string[:len(pattern)] == pattern:
3+
return string[len(pattern):]
4+
else:
5+
return string
6+
7+
8+
def rstrip(string: str, pattern: str) -> str:
9+
if string[-len(pattern):] == pattern:
10+
return string[:-len(pattern)]
11+
else:
12+
return string

gitlab_submodule/submodule_commit.py

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
1-
from typing import Optional, Tuple
1+
from typing import Optional, Tuple, Union
22

33
import re
44

55
from gitlab.v4.objects import Project, ProjectCommit
66

7-
from gitlab_submodule.objects import Submodule
7+
from gitlab_submodule.objects import Submodule, Commit
88

99

1010
def get_submodule_commit(
1111
submodule: Submodule,
12-
submodule_project: Project,
12+
submodule_project: Optional[Project] = None,
1313
*args,
1414
**kwargs
15-
) -> Tuple[ProjectCommit, bool]:
15+
) -> Tuple[Union[ProjectCommit, Commit], bool]:
1616
commit_id, is_exact = _get_submodule_commit_id(
1717
submodule.parent_project,
1818
submodule.path,
@@ -21,7 +21,10 @@ def get_submodule_commit(
2121
*args,
2222
**kwargs
2323
)
24-
commit = submodule_project.commits.get(commit_id)
24+
if submodule_project is not None:
25+
commit = submodule_project.commits.get(commit_id)
26+
else:
27+
commit = Commit(commit_id)
2528
return commit, is_exact
2629

2730

@@ -76,15 +79,15 @@ def _get_submodule_commit_id(
7679
# was created before this date.
7780
# This requires a Project object for the submodule so if it wasn't
7881
# passed we cannot guess anything.
79-
if not get_latest_commit_possible_if_not_found:
82+
if not (get_latest_commit_possible_if_not_found
83+
and submodule_project is not None):
8084
raise ValueError(
8185
f'Could not find commit id for submodule {submodule_path} of '
8286
f'project {project.path_with_namespace}.')
83-
else:
84-
last_subproject_commits = submodule_project.commits.list(
85-
ref_name=(get_latest_commit_possible_ref
86-
if get_latest_commit_possible_ref
87-
else submodule_project.default_branch),
88-
until=update_submodule_commit.created_at
89-
)
87+
88+
last_subproject_commits = submodule_project.commits.list(
89+
ref_name=(get_latest_commit_possible_ref
90+
if get_latest_commit_possible_ref
91+
else submodule_project.default_branch),
92+
until=update_submodule_commit.created_at)
9093
return last_subproject_commits[0].id, False
Lines changed: 50 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,73 @@
11
from typing import Optional
2+
import logging
23

34
from posixpath import join, normpath
4-
from giturlparse import parse
5+
from giturlparse import parse, GitUrlParsed
56

67
from gitlab.v4.objects import Project, ProjectManager
78

89
from gitlab_submodule.objects import Submodule
10+
from gitlab_submodule.string_utils import lstrip, rstrip
911

12+
logger = logging.getLogger(__name__)
1013

11-
def submodule_to_project(submodule: Submodule,
12-
project_manager: ProjectManager) -> Project:
14+
15+
def submodule_to_project(
16+
submodule: Submodule,
17+
project_manager: ProjectManager,
18+
self_managed_gitlab_host: Optional[str] = None) -> Optional[Project]:
1319
submodule_project_path_with_namespace = \
1420
_submodule_url_to_path_with_namespace(submodule.url,
15-
submodule.parent_project)
21+
submodule.parent_project,
22+
self_managed_gitlab_host)
1623
if not submodule_project_path_with_namespace:
17-
raise ValueError(
18-
f'submodule at {submodule.url} is not hosted on Gitlab')
24+
return None
1925
submodule_project = project_manager.get(
2026
submodule_project_path_with_namespace)
2127
return submodule_project
2228

2329

2430
def _submodule_url_to_path_with_namespace(
2531
url: str,
26-
parent_project: Project
27-
) -> Optional[str]:
32+
parent_project: Project,
33+
self_managed_gitlab_host: Optional[str] = None) -> Optional[str]:
2834
"""Returns a path pointing to a Gitlab project, or None if the submodule
2935
is hosted elsewhere
3036
"""
31-
try:
32-
parsed = parse(url)
33-
if parsed.platform != 'gitlab':
34-
return None
35-
if parsed.groups:
36-
to_join = [parsed.owner, join(*parsed.groups), parsed.repo]
37-
else:
38-
to_join = [parsed.owner, parsed.repo]
39-
path_with_namespace = join(*to_join)
37+
# check if the submodule url is a relative path to the project path
38+
if url.startswith('./') or url.startswith('../'):
39+
# we build the path of the submodule project using the path of
40+
# the current project
41+
url = rstrip(url, '.git')
42+
path_with_namespace = normpath(
43+
join(parent_project.path_with_namespace, url))
4044
return path_with_namespace
41-
except Exception:
42-
# check if the submodule url is a relative path to the project path
43-
if url.startswith('./') or url.startswith('../'):
44-
# we build the path of the submodule project using the path of
45-
# the current project
46-
path_with_namespace = normpath(
47-
join(parent_project.path_with_namespace, url))
48-
return path_with_namespace
49-
return None
45+
46+
parsed: GitUrlParsed = parse(url)
47+
if not parsed.valid:
48+
logger.warning(f'submodule git url does not seem to be valid: {url}')
49+
return None
50+
51+
# even if the parent project is hosted on a self-managed gitlab host,
52+
# it can still use submodules hosted on gitlab.com
53+
gitlab_hosts = ['gitlab']
54+
if self_managed_gitlab_host:
55+
gitlab_hosts.append(self_managed_gitlab_host)
56+
57+
# giturlparse.GitUrlParsed.platform is too permissive and will be set to
58+
# 'gitlab' for some non-gitlab urls, for instance:
59+
# https://opensource.ncsa.illinois.edu/bitbucket/scm/u3d/3dutilities.git
60+
if (parsed.platform != 'gitlab'
61+
or all([host not in parsed.host for host in gitlab_hosts])):
62+
logger.warning(f'submodule git url is not hosted on gitlab: {url}')
63+
return None
64+
65+
# Format to python-gitlab path_with_namespace:
66+
# rewrite to https format then split by host and keep & cut the right part.
67+
# I find it more robust than trying to rebuild the path from the different
68+
# attributes of giturlparse.GitUrlParsed objects
69+
https_url = parsed.url2https
70+
path_with_namespace = https_url.split(parsed.host)[1]
71+
path_with_namespace = lstrip(path_with_namespace, '/')
72+
path_with_namespace = rstrip(path_with_namespace, '.git')
73+
return path_with_namespace

0 commit comments

Comments
 (0)