Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems importing from github.com/nixos/nixpkgs #749

Open
ghost opened this issue Feb 21, 2022 · 25 comments · May be fixed by #760
Open

problems importing from github.com/nixos/nixpkgs #749

ghost opened this issue Feb 21, 2022 · 25 comments · May be fixed by #760
Labels
lifecycle/stale Inactive for 90d or more

Comments

@ghost
Copy link

ghost commented Feb 21, 2022

Ten minutes ago, trying to git bug bridge pull the nixpkgs project's bugs (>5k bugs, >3k PRs -- a very very large set of bugs):

import error: Something went wrong while executing your query. Please include `A8DE:54BE:87BAE:9F6E7:62140406` when reporting this issue.

I have never been able to get a pull of nixpkgs' bugs to complete, and I've been trying for weeks now (see also #740).

I am using 05d73e1, which has the fix for #585.

Is there any chance that git-bug could keep fetching other bugs when it encounters a problem like this? It appears that it just aborts as soon as any API call produces an error. The nixpkgs bugset is ginormous, so it's going to take me at least a week (due to ratelimits) anyways. But right now it keeps failing after 108 bugs out of 5000+.

Is there an environment variable I can set to cause git-bug to print a backtrace or other context when it gives up like this?

@ghost
Copy link
Author

ghost commented Feb 23, 2022

import error: Something went wrong while executing your query. Please include `96F8:8D83:295160:2A2B42:6214AE3E` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `D1D4:02E4:B8460B:160070D:6214BB7A` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `D25A:179B:23FA97:64C0C7:6214BD6F` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `B1D0:CA69:188D36:19C1B9:6214CAB9` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `DC06:4AA8:86762:96A45:6214D808` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `BCAC:8D83:653F02:675761:6214DA0A` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `CA7A:6EAD:9E8FE8:A15B35:6214E738` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `D704:58D9:193918:19A2BB:6214F484` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `9C32:41F8:BF7CB1:172FF12:621501C1` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `9C86:62D0:471B40:B08239:621503C8` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `EB3C:8E88:466FFE:47B0D0:621510E9` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `A71A:5B83:1C51E:27EA5:62151E39` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `A7B8:4750:A7461:B3EE1:62152057` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `9332:81CD:3B1F92:3CC6F8:62152D6C` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `CC06:2824:17F801:56B3B8:62153AAA` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `A6E8:CEC2:8A2504:8CDFBB:621549CE` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `CD30:3456:5C995B:5F2A3F:6215571F` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `9E18:0AF2:CF4F53:D8D530:62156454` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `EA86:4003:1E512C:681416:62156642` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `E05E:13CE1:1049D1D:108CD2E:62157381` when reporting this issue.

@rng-dynamics
Copy link
Collaborator

The error is from the github server. But I can tell you what I would try:

  1. You should put the nixpkg repo in a tmpfs (mount -t tmpfs) because there is a big number of issues in nixpkg and git-bug writes more data than necessary on your disk while importing. Using a tmpfs might help you to speed up the import.

  2. You can try to decrease the size of the queries to the github server here: https://github.com/MichaelMure/git-bug/blob/b2f0e126a10b5f030bfc35cac4b0cabcf083b589/bridge/github/import_mediator.go#L12-L15
    These are the number of items requested from the server with one query. You could start by reducing NumIssues. If it doesn't work, try to reduce the other ones, too. This will slow the import down, but it might enable you to get a full import without error. If you have success with this method, please let me know which numbers you used.

  3. You can increase the number of retries in case the github server has an error:
    https://github.com/MichaelMure/git-bug/blob/b2f0e126a10b5f030bfc35cac4b0cabcf083b589/bridge/github/client.go#L110

I hope that helps. Let us know how it is going.

@MichaelMure
Copy link
Collaborator

git-bug writes more data than necessary on your disk while importing.

We should adress that at some point, any idea why that is?

@rng-dynamics
Copy link
Collaborator

We should adress that at some point, any idea why that is?

Do you remember that discussion: #585 (comment). As far as I understand .git/git-bug/bug-cache and .git/git-bug/identity-cache are rewritten after every imported issue, comment, edit, ... .

@MichaelMure
Copy link
Collaborator

Ha yes, I forgot about that, thanks.

@ghost
Copy link
Author

ghost commented Mar 1, 2022

The error is from the github server. But I can tell you what I would try:

Hey, thank you for your reply. Just to clarify:

  1. You should put the nixpkg repo in a tmpfs

Oh, the reason it's slow is not git-bug; it's github's API ratelimit. I did not mean to imply that git-bug was not performant.

  1. You can try to decrease the size of the queries to the github server here:
  1. You can increase the number of retries in case the github server has an error:

Thanks, I will give those a try tonight or tomorrow.

@ghost
Copy link
Author

ghost commented Mar 1, 2022

Thanks, I will give those a try tonight or tomorrow.

Testing now.

Is there any way to get git-bug bridge pull to print verbosely what it is doing? Right now it only prints to the console when it discovers new data. There is a long silent delay at the start of the pull where it isn't clear what it's doing.

@ghost
Copy link
Author

ghost commented Mar 1, 2022

Hrm, with latest HEAD and NumIssues=10 I get:

rate limiting: Github GraphQL API rate limit. This process will sleep until 2022-03-01 04:12:20 +0000 UTC.
rate limiting: Github GraphQL API rate limit. This process will sleep until 2022-03-01 04:12:20 +0000 UTC.
rate limiting: Github GraphQL API rate limit. This process will sleep until 2022-03-01 04:12:20 +0000 UTC.
import error: API rate limit exceeded for user ID [redacted].

Apparently git-bug's API ratelimit calculation disagrees with github's?

NumIssues=20 did not solve the problem.

@ghost
Copy link
Author

ghost commented Mar 1, 2022

2. You can try to decrease the size of the queries to the github server here:

Decreasing these (as low as 20) did not help.

3. You can increase the number of retries in case the github server has an error:

Increasing this (as high as 10) did not help.

Is there any way to get more information about what git-bug was in the middle of doing when the error happened? Like a stack trace maybe?

@rng-dynamics
Copy link
Collaborator

I will have look at it, but I will need a few days. In case I do not come back to this issue in about a week, please feel free to ping me.

@MichaelMure
Copy link
Collaborator

@rng-dynamics it might make sense to implement another ImportResult/ExportResult for simple logs that the bridges could use to spit out more detailed information on what's happening. Those could be made visible with a --verbose flag or something like that.

At this point it would make sense to rename those to importEvent/ExportEvent.

@rng-dynamics rng-dynamics linked a pull request Mar 9, 2022 that will close this issue
@rng-dynamics
Copy link
Collaborator

rng-dynamics commented Mar 9, 2022

@a-m-joseph, thanks for the bug report. The errors are seemingly arbitrary errors by github. If you want you can use my draft in #760. Even if there are github errors with #760 the bridge will continue importing the other issues. And you can run the import again and it will try to fetch the remaining issues. Have look at the file git-bug-import after running the import. The content of the file should be self-explanatory.

@ghost
Copy link
Author

ghost commented Mar 11, 2022

Thanks, I'm trying this right now.

@ghost
Copy link
Author

ghost commented Mar 12, 2022

It just finished. I'll investigate the results tomorrow (but wow, the issue count looks correct!)

...
new issue: e0612737cebde57cb49b3f596ce045235a260258f70037d0f9860759ceba3e04
changed label: 682fcc468648af894d56930f714d36124db5ad5348e250ddb6ad42ce1b5b7bd0
import error: context deadline exceeded
imported 24645 issues and 6644 identities with default bridge

@ghost
Copy link
Author

ghost commented Mar 13, 2022

Appears to be working!

Thank you so much for taking the time to look into this.

@ghost ghost closed this as completed Mar 13, 2022
@ghost ghost changed the title import error: Something went wrong while executing your query. Please include A8DE:54BE:87BAE:9F6E7:62140406 when reporting this issue. problems importing from github.com/nixos/nixpkgs Mar 16, 2022
@ghost
Copy link
Author

ghost commented Mar 16, 2022

I'm very sorry to trouble you again, I really appreciate the time you've put into this already.

Unfortunately after taking a closer look at the results of the import, I have the correct number of bugs, but:

  1. After the initial import, git bug bridge pull (with or without -n) no longer picks up new changes. Three days after the initial import (with plenty of new bugs opened in the interim):
$ ../gitbug/git-bug bridge pull
imported 0 issues and 0 identities with default bridge
$ ../gitbug/git-bug bridge pull -n
imported 0 issues and 0 identities with default bridge
  1. Even the initial import seems to be missing a lot of activity. Strangely, both git bug ls --by edit and git-bug ls --by creation show the same bug as being most-recent
$ git-bug ls --by creation | tail
52ba089 open    Home Assistant module: Use Postgresql            ◼      21stce (21stce) 
5c52993 open    nixos manual: missing warning about firmware     ◼      Björn Gohla (b…   4
6a9acf2 open    qutebrowser.aarch64-linux broken due to pyqt5 n… ◼      Collin Arnett … 
9095ca7 open    mate-utils pulls inkscape to the system          ◼      ilya-fedin (il…  10
9d6cf34 open    Keepmenu                                         ◼      Stefan Machmei…   1
fdd78d6 open    python3: Enabling optimisations as documented… ◼ ◼      David Nadlinge… 
5a037d8 open    Slack fails to install on Mac OS Monterey        ◼      Tyler Levine (… 
e061273 open    G'Mic missing from Krita                         ◼      Aidan Gauland … 
9241e1b open    CUPS web interface hangs at 'add printer' 4/5… ◼ ◼      Alain Zscheile… 
66a48f2 open    qt5base: apple silicon: fails to build       ◼ ◼ ◼      Matthew Leach … 
$ git-bug ls --by edit | tail
52ba089 open    Home Assistant module: Use Postgresql            ◼      21stce (21stce) 
5c52993 open    nixos manual: missing warning about firmware     ◼      Björn Gohla (b…   4
6a9acf2 open    qutebrowser.aarch64-linux broken due to pyqt5 n… ◼      Collin Arnett … 
9095ca7 open    mate-utils pulls inkscape to the system          ◼      ilya-fedin (il…  10
9d6cf34 open    Keepmenu                                         ◼      Stefan Machmei…   1
fdd78d6 open    python3: Enabling optimisations as documented… ◼ ◼      David Nadlinge… 
5a037d8 open    Slack fails to install on Mac OS Monterey        ◼      Tyler Levine (… 
e061273 open    G'Mic missing from Krita                         ◼      Aidan Gauland … 
9241e1b open    CUPS web interface hangs at 'add printer' 4/5… ◼ ◼      Alain Zscheile… 
66a48f2 open    qt5base: apple silicon: fails to build       ◼ ◼ ◼      Matthew Leach … 

The 66a48f2 bug in my import is this one; not sure if that helps.

Even weirder, the bridge is importing dates correctly, but they don't seem to be used when sorting! For example, git bug ls considers 66a48f2 to be newest, yet there are plenty of bugs with larger create_time and edit_time values:

$ git-bug show --format=json 66a48f2 | grep -A1 time
    "create_time": {
        "timestamp": 1626299686,
        "time": "2021-07-14T14:54:46-07:00"
    },
    "edit_time": {
        "timestamp": 1626328489,
        "time": "2021-07-14T22:54:49-07:00"
    },
$ git-bug show --format=json 52ccab0 | grep -A1 time
    "create_time": {
        "timestamp": 1646198867,
        "time": "2022-03-01T21:27:47-08:00"
    },
    "edit_time": {
        "timestamp": 1646969884,
        "time": "2022-03-10T19:38:04-08:00"
    },

Again, I'm really sorry to pester you over this, I feel like I've already really pushed the boundaries of your generosity with your time. Unfortunately I don't know go, so debugging this myself is not really feasible.

In the event that this isn't something you can spend any more time on (which I completely understand), do you consider git-bug's repository format to be stable enough that it is okay for people to write bridges which aren't distributed as part of the git-bug codebase? I can probably spare enough time to write a really good github importer, one that also imports pull requests (which github apparently treats as a special kind of bug). Unfortunately I can't really take on learning a whole new programming language and ecosystem right now, that's a much larger time commitment.

@ghost ghost reopened this Mar 16, 2022
@rng-dynamics
Copy link
Collaborator

Regarding the github bridge: The Github API is quite fragile and we are trying to keep up with its repeatedly changing quirks. The code which you used is only a draft merge request. Anyway, for my own convenience I used a file git-pug-import to keep track of the progress. If that file is present, then the bridge will import only issues which are listed in that file. If you want to import all issues, then you should just delete the git-bug-import file. Then the bridge will import all issues and it will write a new file. Of course you can also write your own git-bug-import file. E.g., if you want to import issues 23, 45, and 65, you should create the file git-bug-import with the following content and start the import.

# file: git-bug-import
23
45
65

The bridge will update the file according to the import status. After the import the file might look as follows.

# file: git-bug-import
# 23 # imported # Mon 01 Jan 07:03:25 2022
# 45 # imported # Mon 01 Jan 07:03:42 2022
65 # import error # Mon 01 Jan 07:03:56 2022

If you would run the import again, it would read the file again and it would try only to import issue 65, which has failed in the previous import.

The other problems which you describe are probably not caused by the importer but some other component in git-bug.

I would actually really appreciate input on how to improve the importer. My main headaches with the importer are (1) the fragility and changing error-behaviour of the GitHub GraphQL API, and (2) the user interface/interaction in case of errors during the import. How could we get closer to that really good importer?

@ghost
Copy link
Author

ghost commented Mar 19, 2022

Regarding the github bridge: The Github API is quite fragile and we are trying to keep up with its repeatedly changing quirks.

My main headaches with the importer are (1) the fragility and changing error-behaviour of the GitHub GraphQL API

Thank you for explaining this.

Personally, I much prefer the sort of workflow the Linux kernel uses, where inter-developer communication uses simple protocols and the developers themselves select the complex tools that suit them best.

Whenever I bring this up in discussions, people always come back with "But GitHub has a usable and reliable API that you can use if you want that!" I've long suspected that this was not, in fact, true. I appreciate your confirmation of this, as someone who has worked on a major integration project with this API.

How could we get closer to that really good importer?

Mainly I would add several levels of debugging output so I can see what's failing, including: (1) a "here's what I'm doing" log and (2) a wire-protocol dump (interleaved with (1))

@MichaelMure
Copy link
Collaborator

Anyone know if it's still an issue?

@ghost
Copy link
Author

ghost commented Nov 23, 2022

Anyone know if it's still an issue?

I don't think that "hope the problem goes away" is really a solution. The problem still exists at 70bd737.

$ ../gitbug/git-bug bridge pull
import error: non-200 OK status code: 502 Bad Gateway body: "{\n   \"data\": null,\n
   \"errors\":[\n      {\n         \"message\":\"Something went wrong while executing your 
query. This may be the result of a timeout, or it could be a GitHub bug. Please include
 `AD12:087E:226594A:22E4D90:637D629A` when reporting this issue.\"\n      }\n   ]\n}\n"

Note that the error above is a github error. Clearly they are trying to say "we need you to rephrase your query, but we are not interested in giving you any hints about how to do that. neener neener."

I guess "stop using github for large projects" is really the only viable solution to the fact that github's api does not scale, nor does github care at all about that fact.

@MichaelMure
Copy link
Collaborator

It looks like a temporary failure in the github side (like a burst of request causing a CPU overload, cascading into failing to handle the request) to me. Basically the P99999 that is both hard to track down and fix for cloud engineers.

git-bug is doing so much request in that situation that it increase the likelyhood of that happening. The problem is that instead of retrying, git-bug fail entirely. We need to both:

  • have better retry for this, maybe only fully failing if multiple of those errors occurs, with a gradually increasing delay between retrys
  • have better "resume" mechanism

@ghost
Copy link
Author

ghost commented Nov 23, 2022

It looks like a temporary failure

I assure you, it is not temporary. I left it running for several days in a loop a while back and it never finished.

@MichaelMure
Copy link
Collaborator

I meant that it's a transient failure on github side, meaning that the exact same request would succeed later, meaning that git-bug is doing a valid request.

The problem is in how we handle those random failure.

@ghost
Copy link
Author

ghost commented Nov 25, 2022

Not if some aspect of the failure is really an undocumented ratelimit/cpulimit.

I strongly suspect that is the case.

Copy link

This bot triages untriaged issues and PRs according to the following rules:

  • After 90 days of inactivity, the lifecycle/stale label is applied
  • After 30 days of inactivity since lifecycle/stale was applied, the issue is closed

To remove the stale status, you can:

  • Remove the lifecycle/stale label
  • Comment on this issue

@github-actions github-actions bot added the lifecycle/stale Inactive for 90d or more label Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Inactive for 90d or more
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants