-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error messages from CLI invocations #592
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable, though I think Andrew or Robert should take a look too.
listeners: { | ||
stdout: (data: Buffer) => { | ||
output += data.toString(); | ||
}, | ||
stderr: (data: Buffer) => { | ||
const toRead = Math.min(maxErrorSize - error.length, data.length); | ||
error += data.toString("utf8", 0, toRead); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I don't think we're guaranteed UTF-8 encoded output. But that's hopefully rare, and partially incomplete errors are better than no errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's reasonable to assume most of the errors we encounter will be UTF8. Certainly I think anything the CLI outputs typically is, unless a string we include in the error (file path, snippet from a QL file, etc.) isn't. I think it's okay if those cases don't get a perfect error message in the status reports.
* (2) It avoids us hitting the limit of how much data we can send in our | ||
* status reports on GitHub.com. | ||
*/ | ||
const maxErrorSize = 20_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but is there a reason why you chose 20,000? Do you know what the max size for a status report is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max size of fields in the status report is 100kb and UTF8 characters can be up to four bytes so the max length of the message is 25k characters (assuming we're really unlucky and all the characters happen to require four bytes). I subtracted 5k to allow other space for the command invocation and arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thanks for the explanation. Could you add something like that to the comment above just so this context doesn't get lost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max size of fields in the status report is 100kb
Just curious, where did this information come from? For my own benefit really as I didn't know what the limit was or where it comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misunderstood your comment on the issue as meaning that you knew 100kb was the limit rather than it being a suggestion. The database column allows storing up to 2GB apparently, but I'm not sure how much data the endpoint that receives status reports will accept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no worries. 25k characters is probably fine anyway, though you could increase the limit and it would be fine.
The API should accept large volumes. Not up to 2GB but I think the limit is around 400MB so easily high enough. There's also protobuf in the pipeline there but the limit on that is also big enough for anything we might want to send.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is strictly better than we had before, and so I think it should be merged, but do you know if this fixes the error reporting problems discussed earlier?
It will certainly make the error message better than what we had before. Whether they'll be sufficiently better to actually gleam information from I think is something we'll have to establish once the data starts flowing in. Perhaps, we'll start to see patterns of common errors and we'll want to make special cases for them to distinguish them - for instance, clearly marking certain errors as user/configuration errors and others as CLI ones. At the moment, however, we can't really do this because all the messages in status reports say the exact same thing (CLI failed with exit code 2) so we don't really know what errors we need to distinguish. I suggest we wait a week or two from where this is merged and then take a look at what errors we have gotten and if/how we can improve them further. |
89d93cd
to
3b83544
Compare
This PR improves the error messages we get from CodeQL CLI invocations that went wrong. In particular, it captures the stderr of the program (up to a limit of 20,000 characters), and creates an error message that includes this as well as the command being run and the exit code. This error will bubble up to the top level where it should be caught and included in the status reports that get uploaded.
Merge / deployment checklist