Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][GitLab] GitLab Pagination not working again #8187

Closed
2 of 3 tasks
kkc-tonywu opened this issue Nov 12, 2024 · 5 comments · Fixed by #8201
Closed
2 of 3 tasks

[Bug][GitLab] GitLab Pagination not working again #8187

kkc-tonywu opened this issue Nov 12, 2024 · 5 comments · Fixed by #8201
Assignees
Labels
component/plugins This issue or PR relates to plugins devops Something about CI/CD (devops) type/bug This issue is a bug

Comments

@kkc-tonywu
Copy link

kkc-tonywu commented Nov 12, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

The number of GitLab Repo MRs、tag、issue、deployment... in our system exceeds 10,000.

According to the official GitLab documentation:

For performance reasons, if a query returns more than 10,000 records, GitLab excludes some headers.
ref: GitLab Documentation

As a result, the x-total-page header is not returned, which causes Develake to only fetch 100 records for the subtask.

What do you expect to happen

No matter the settings, it is necessary to retrieve the complete data.

How to reproduce

When MR, tag, issue, and Deployment are executed tens of thousands of times, attempting to retrieve the data afterwards will only return 100 records.

Anything else

No response

Version

1.0.1

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@kkc-tonywu kkc-tonywu added the type/bug This issue is a bug label Nov 12, 2024
@dosubot dosubot bot added component/plugins This issue or PR relates to plugins devops Something about CI/CD (devops) labels Nov 12, 2024
Copy link

dosubot bot commented Nov 12, 2024

Hello, @kkc-tonywu! I'm here to help you with any bugs, questions, or contributions. Let me know how I can assist you while you wait for a human maintainer.

I found a similar closed issue that might be relevant to your situation: [Bug][Gitlab Plugin] Missing X-Total and X-Total-Pages headers cause data losing in gitlab plugin](#4093). This issue was closed, and it involved the GitLab plugin handling pagination when certain headers are missing [1].

In the current implementation of the GitLab plugin in DevLake, pagination is handled by checking the x-next-page header. If this header is missing, the function assumes there are no more pages to fetch. This means that if both x-total-page and x-next-page headers are missing, the plugin might not fetch all available records [2].

Unfortunately, I couldn't find specific logic in the codebase to handle cases where GitLab's API returns more than 10,000 records or when both headers are missing. You might need to consider implementing a workaround or custom logic to handle such scenarios.

To continue talking to Dosu, mention @dosu.

@kkc-tonywu kkc-tonywu changed the title [Bug][GitLab] Bug title [Bug][GitLab] GitLab Pagination not working again Nov 12, 2024
@klesh klesh self-assigned this Nov 20, 2024
@klesh
Copy link
Contributor

klesh commented Nov 20, 2024

Hi, Please upgrade to https://github.com/apache/incubator-devlake/releases/tag/v1.0.2-beta5 and see if the problem is fixed. Thanks.

@kkc-tonywu
Copy link
Author

kkc-tonywu commented Nov 28, 2024

It has been tested, and the issue of only collecting 100 records has been fixed.

However, based on the content of the PR, it still seems to be limited to 100 pages of data:

//incubator-devlake/backend/plugins/gitlab/tasks/deployment_collector.go

apiCollector.InitCollector(helper.ApiCollectorArgs{
		RawDataSubTaskArgs: *rawDataSubTaskArgs,
		ApiClient:          data.ApiClient,
		PageSize:           100,  << here
		UrlTemplate:        "projects/{{ .Params.ProjectId }}/deployments",
...
// backend/helpers/pluginhelper/api/api_collector.go
		reqDataCopy := RequestData{
			Pager: &Pager{
				Page: i + 1,
				Size: collector.args.PageSize, << here
				Skip: collector.args.PageSize * (i),
			},
			Input:     reqData.Input,
			InputJSON: reqData.InputJSON,
		}

For example:

I have set up to collect data for one year on DevLake, and the total number of records will exceed 10,000. It seems that only 10,000 records will be collected?

Is this scenario correct? My assumption is that regardless of the time range setting, I must be able to collect all the data.

I'm not sure if I understand this correctly. Please let me know if there is any issue.

@klesh
Copy link
Contributor

klesh commented Dec 2, 2024

It has been tested, and the issue of only collecting 100 records has been fixed.

However, based on the content of the PR, it still seems to be limited to 100 pages of data:

//incubator-devlake/backend/plugins/gitlab/tasks/deployment_collector.go

apiCollector.InitCollector(helper.ApiCollectorArgs{
		RawDataSubTaskArgs: *rawDataSubTaskArgs,
		ApiClient:          data.ApiClient,
		PageSize:           100,  << here
		UrlTemplate:        "projects/{{ .Params.ProjectId }}/deployments",
...
// backend/helpers/pluginhelper/api/api_collector.go
		reqDataCopy := RequestData{
			Pager: &Pager{
				Page: i + 1,
				Size: collector.args.PageSize, << here
				Skip: collector.args.PageSize * (i),
			},
			Input:     reqData.Input,
			InputJSON: reqData.InputJSON,
		}

For example:

I have set up to collect data for one year on DevLake, and the total number of records will exceed 10,000. It seems that only 10,000 records will be collected?

Is this scenario correct? My assumption is that regardless of the time range setting, I must be able to collect all the data.

I'm not sure if I understand this correctly. Please let me know if there is any issue.

No, it won't. Size is PageSize, it doesn't affect the total number of records to be collected as long as the Skip grows. No?

@d4x1 d4x1 closed this as completed in #8201 Dec 2, 2024
@odorT
Copy link

odorT commented Dec 12, 2024

I have similar problem(gitlab connection only fetches 100 repos in group where there are 175), and scrolling only duplicates what were seleced above. tested latest image available atm: devlake.docker.scarf.sh/apache/devlake:v1.0.2-beta5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/plugins This issue or PR relates to plugins devops Something about CI/CD (devops) type/bug This issue is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants