-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Unstable search issue behavior #178
Conversation
According to the GitHub API specification, The structure of a query consists of one or more KEYWORDs and one one more QUALIFIERs, as follows. ``` KEYWORD_1 KEYWORD_2 QUALIFIER_1 QUALIFIER_2 ``` The character limit of query does not apply to QUALIFIERs, so the part excluding the QUALIFIERs must be validated. See-also: https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#limitations-on-query-length
tagpr.go
Outdated
// Also, from the results of the experiment, it is possible that when counting | ||
// the number of characters in the keyword part, one space character is counted | ||
// as three characters (possibly '%20'). | ||
if len(strings.Join(keywords, "%20") + "%20" + sha) >= 256 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected something like this
len(keywords) * (3+len(sha)) >= 256 or something close to this.
While I understand the logic of your code, I'm not confident about how this code could be maintained.
Any extra special character added later might break the way to count and may lead to errors.
I would have kept the old code, and check the length of the URL encoded strings
So instead of checking len(query)+1+len(sha) >= 256
I would concat a tmpQuery := query + " " + sha
Then use if len(url.QueryEscape(tmpQuery)) >= 256
If the condition is true, the code is unchanged.
If condition is false, you set
query = tmpQuery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback.
I was actually wondering if I should use the url
package here.
I will explain my concerns later, but first, let me comment on your suggested fix.
url.QueryEscape()
escapes spaces to +
(not %20
).
To achieve the expected result, we should use url.PathEscape()
here.
https://go.dev/play/p/0Xk56v95-Ol?v=goprev
However, I wasn't confident that url.PathEscape()
was the best choice here.
There are two reasons for this:
- The awkwardness of applying
PathEscape
to query escaping. - The fact that the GitHub API specification does not disclose how the number of characters in this part is counted (this is determined by black-box testing, but there is no certainty).
- In other words, I can't determine if the GitHub API's query character count specification matches the
PathEscape
specification
- In other words, I can't determine if the GitHub API's query character count specification matches the
Considering these points, do you still think it's better to use the url
package? I'd like to here your opinion on this.
To be honest, I think it does't matter if we use the url
package here or not.
Also, I have limited experience with Go language, so I appreciate your feedback like this.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ccoVeille
May I ask your opinion about the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't receive a notification 🤦♂️
I understand it's quite a mess to cope with this.
I thought it was something that was used in URL encoding.
The solution you published today is simpler and safer.
I hope you won't face more trouble.
The changes might come with higher number of request, so maybe the rate limiting will be experienced more often.
Maybe you could/should wait a bit more between calls
Thanks. There is no need to aim for the last possible length (256), and it would be better to do a character count that considers the possibility of spaces being URL-escaped to three characters. |
query := queryBase | ||
func buildChunkSearchIssuesQuery(qualifiers string, shasStr string) (chunkQueries []string) { | ||
// array of SHAs | ||
keywords := make([]string, 0, 25) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 To clarify that the character limit applies only to the part of the string that makes up the query, excluding the qualifier, the variable name has been changed.
The following changes were made:
|
tagpr.go
Outdated
// However, although not explicitly stated in the documentation, the space separating | ||
// keywords is counted as one or more characters, so it is possible to exceed 256 | ||
// characters if the text is filled to the very limit of 256 characters. | ||
// For this reason, the maximum number of chars in the KEYWORD section is limited here to 200. | ||
tempKeywords := append(keywords, sha) | ||
if len(strings.Join(tempKeywords, " ")) >= 200 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a constant and put this explanation aside.
It would help to understand the comment is about the value and not the algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it, how about this?
OK; let's give it a try. |
I checked with the repository where the problem had previously appeared, and the problem has been resolved. 👍 |
Issue
GitHub's Isseu search API may fail due to a query character count validation error.
How to reproduce problem
For example, with the current calculation method, API validation will generate an error if the following conditions are met
Reproduced repository: https://github.com/snaka/a
How to fix it
I think there are two problems with the current logic
Wrong count targets
According to the GitHub API specification,
The structure of a query consists of one or more KEYWORDs and one or more QUALIFIERs, as follows.
The character limit of query does not apply to QUALIFIERs, so the part excluding the QUALIFIERs must be validated.
See also: https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#limitations-on-query-length
Inconsistency of delimiter counting method
The following scripts have been used to try switching the contents of the
keywords
variable,I have speculated that the count as the number of characters per delimiter space may be counting as three characters instead of one.
Associated with the number
3
is the specification that space characters can be represented by the expression%20
.The API documentation is not clear on this, but as long as the number of characters is counted assuming the whitespace character is
%20
, the API validation does not seem to cause an error.