-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug][CircleCI Plugin] Only collecting first page of API responses #7750
Comments
Hey there, @Nickcw6! I'm here to help you with any bugs, questions, or contributions you might have. Let's tackle this issue together! The issue you're experiencing with the CircleCI plugin collecting data only from the past 24 hours, regardless of the
Example:### Step 2.3 - Set the Sync Policy
There are three settings for Sync Policy:
- **Data Time Range**: You can select the time range of the data you wish to collect. The default is set to the past six months. Ensure this is set to the full time range you need.
- **Sync Frequency**: Choose how often you would like to sync your data by selecting a sync frequency option or enter a cron code to specify your preferred schedule.
- **Skip Failed Tasks**: Choose to skip failed tasks to avoid spending more time running the pipeline all over again.
cmd := &cobra.Command{Use: "CircleCi"}
connectionId := cmd.Flags().Uint64P("connection", "c", 0, "CircleCi connection id")
projectSlug := cmd.Flags().StringP("project", "p", "", "CircleCi project slug")
timeAfter := cmd.Flags().StringP("timeAfter", "a", "", "collect data that are created after specified time, ie 2006-01-02T15:04:05Z")
cmd.Run = func(cmd *cobra.Command, args []string) {
runner.DirectRun(cmd, args, PluginEntry, map[string]interface{}{
"connectionId": *connectionId,
"projectSlug": *projectSlug,
}, *timeAfter)
}
runner.RunCmd(cmd) By setting the Additionally, ensure that there are no API rate limits or other issues affecting the data collection. The CircleCI plugin handles pagination and rate limits, but if you encounter specific error messages or logs, they can provide insights into why data collection might be limited [5][6][7][8]. |
The discrepancies are valid: All However, I can't reproduce the data missing bug. It seems to be working correctly on my end: Can you check the |
Hey @klesh, thanks for your response - I think I've figured out what's happening here. I've updated the original post. Only the first page of the I think the issue is on this line in the pipeline collector - it's setting the query param as |
@Nickcw6 Thanks for the information. It is very valuable, would you like to put up a PR to fix the problem? Thanks in advance. |
@klesh Happy to give it a go over the weekend - I haven't worked in Go before which is the only reason I didn't offer originally 😅 Any advice on tackling this issue in particular, or is it as straightforward as it seems? |
@Nickcw6 Nice, i think fixing the typo you found out would be sufficient. |
Search before asking
What happened
When running a data collection for a CircleCI connection, data only appears to be collected from the past <24 hours, irrespective of what
Time Range
is set to. Same behaviour observed in 'full refresh mode' & normal data collection.Seemed to have slightly differing behaviour each time I tried - when originally raised on Slack only the last ~3 hours of data was collected, however when reproducing again to raise this issue, seems to now have data from the past ~24 hours.
E.g. time frequency set to start of the year, then checking the
_tool_circleci_workflow
table:Only 18 workflows are identified, the earliest of which occurring at
2024-07-15 10:29:09.000
. I would expect to see many more rows dating back to2024-01-01
.CircleCI pipeline task logs:
Also have Github and Jira data connections running within the same pipeline, and data is pulled through as expected for both of these plugins.
EDIT: What is actually happening is only 20 pipelines are being collected from the CircleCI API response (ie. the first page). This then has a knock-on effect throughout the workflows and jobs tables.
What do you expect to happen
Data is collected from the full specified time range, e.g. starting from
2024-01-01
(or whenever specified).How to reproduce
_tools_circleci_workflows
,_tools_circleci_pipelines
or_tools_circleci_jobs
tables for expected row count, and earlieststarted_at
orcreated_at
timestamp (see below)Anything else
As an aside (but potentially related) - I notice there are discrepancies between the column names across the three CircleCI tool tables, e.g.
_tools_circleci_workflows
-created_at
is the timestamp the workflow was triggered in CircleCI. There is no other column which could represent the start of the workflow in CircleCI._tools_circleci_jobs
-created_at
is the timestamp the row was created in the DevLake DB, andstarted_at
is the CircleCI timestamp._tools_circleci_pipelines
-created_at
is again the timestamp of DevLake DB creation. There iscreated_date
, but this always seems to beNULL
. As with the workflows table, there doesn't appear to be any column which represents the starting timestamp in CircleCI.Version
v1.0.0
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: