fix: Unstable search issue behavior #178

snaka · 2024-08-02T14:07:33Z

Issue

GitHub's Isseu search API may fail due to a query character count validation error.

GET https://api.github.com/search/issues?q=... (snip) ... : 422 Validation Failed [{Resource:Search Field:q Code:invalid Message:The search is longer than 256 characters.}]

How to reproduce problem

For example, with the current calculation method, API validation will generate an error if the following conditions are met

The repository name is extremely short
Many commits since the last release

Reproduced repository: https://github.com/snaka/a

How to fix it

I think there are two problems with the current logic

Wrong count targets.
- The count of query characters includes qualifiers.
Inconsistency between the API delimiter counting method and the implementation.
- The SHA delimiter is assumed to be a single space character, but the API counts delimiters as approximately three characters (note that there is no mention of this in the API documentation and it is a guess based on trial and error).

Wrong count targets

According to the GitHub API specification,
The structure of a query consists of one or more KEYWORDs and one or more QUALIFIERs, as follows.

KEYWORD_1 KEYWORD_2 QUALIFIER_1 QUALIFIER_2

The character limit of query does not apply to QUALIFIERs, so the part excluding the QUALIFIERs must be validated.

See also: https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#limitations-on-query-length

Inconsistency of delimiter counting method

The following scripts have been used to try switching the contents of the keywords variable,
I have speculated that the count as the number of characters per delimiter space may be counting as three characters instead of one.

#!/bin/bash

# GitHub API endpoint
url="https://api.github.com/search/issues"

# search query
qualifiers="repo:snaka/a is:pr is:closed"
keywords="a3dbce50efe8b2133b284157329cfae236ffc154ae2ce4d1f9571a330629635150d269e597c8765b591fbacc2876ecb195db6032a13ba8c8dfdce965c1e99b3dbccbab22911dc13da15916e3d8808eb384bccdaa6b1172badbb216965c1e99b3dbccbab22911dc13da15916e3d8808eb384bccdaa6b1172badbb21691234 1" # valid
# keywords="a3dbce50efe8b2133b284157329cfae236ffc154ae2ce4d1f9571a330629635150d269e597c8765b591fbacc2876ecb195db6032a13ba8c8dfdce965c1e99b3dbccbab22911dc13da15916e3d8808eb384bccdaa6b1172badbb216965c1e99b3dbccbab22911dc13da15916e3d8808eb384bccdaa6b1172badbb21691234 12" # invalid
# keywords="a3dbce50efe8b2133b284157329cfae236ffc154ae2ce4d1f9571a330629635150d269e597c8765b591fbacc2876ecb195db6032a13ba8c8dfdce965c1e99b3dbccbab22911dc13da15916e3d8808eb384bccdaa6b1172badbb216965c1e99b3dbccbab22911dc13da15916e3d8808eb384bccdaa6b1172badbb216912341234" # valid
encoded_kw=$(printf "$keywords" | jq -sRr @uri)

echo "keywords (${#keywords}): $keywords"
echo "encoded keywords (${#encoded_kw}): $encoded_kw"
echo "---------------------------------"

query="$qualifiers $keywords"

# URL encode
encoded_query=$(printf "$query" | jq -sRr @uri)

# run curl command
curl -H "Accept: application/vnd.github.v3+json" \
  -H "Authorization: token $GH_TOKEN" \
  "$url?q=$encoded_query"

Associated with the number 3 is the specification that space characters can be represented by the expression %20.

The API documentation is not clear on this, but as long as the number of characters is counted assuming the whitespace character is %20, the API validation does not seem to cause an error.

According to the GitHub API specification, The structure of a query consists of one or more KEYWORDs and one one more QUALIFIERs, as follows. ``` KEYWORD_1 KEYWORD_2 QUALIFIER_1 QUALIFIER_2 ``` The character limit of query does not apply to QUALIFIERs, so the part excluding the QUALIFIERs must be validated. See-also: https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#limitations-on-query-length

ccoVeille · 2024-08-02T17:00:33Z

tagpr.go

+		// Also, from the results of the experiment, it is possible that when counting
+		// the number of characters in the keyword part, one space character is counted
+		//  as three characters (possibly '%20').
+		if len(strings.Join(keywords, "%20") + "%20" + sha) >= 256 {


I would have expected something like this

len(keywords) * (3+len(sha)) >= 256 or something close to this.

While I understand the logic of your code, I'm not confident about how this code could be maintained.

Any extra special character added later might break the way to count and may lead to errors.

I would have kept the old code, and check the length of the URL encoded strings

So instead of checking len(query)+1+len(sha) >= 256

I would concat a tmpQuery := query + " " + sha

Then use if len(url.QueryEscape(tmpQuery)) >= 256

If the condition is true, the code is unchanged.

If condition is false, you set

query = tmpQuery

Thank you for your feedback.

I was actually wondering if I should use the url package here.

I will explain my concerns later, but first, let me comment on your suggested fix.

url.QueryEscape() escapes spaces to + (not %20).
To achieve the expected result, we should use url.PathEscape() here.

https://go.dev/play/p/0Xk56v95-Ol?v=goprev

However, I wasn't confident that url.PathEscape() was the best choice here.
There are two reasons for this:

The awkwardness of applying PathEscape to query escaping.

The fact that the GitHub API specification does not disclose how the number of characters in this part is counted (this is determined by black-box testing, but there is no certainty).

In other words, I can't determine if the GitHub API's query character count specification matches the PathEscape specification

Considering these points, do you still think it's better to use the url package? I'd like to here your opinion on this.

To be honest, I think it does't matter if we use the url package here or not.

Also, I have limited experience with Go language, so I appreciate your feedback like this.

Thanks

@ccoVeille
May I ask your opinion about the above?

I didn't receive a notification 🤦‍♂️

I understand it's quite a mess to cope with this.

I thought it was something that was used in URL encoding.

The solution you published today is simpler and safer.

I hope you won't face more trouble.

The changes might come with higher number of request, so maybe the rate limiting will be experienced more often.

Maybe you could/should wait a bit more between calls

Songmu · 2024-08-07T00:51:29Z

Thanks. There is no need to aim for the last possible length (256), and it would be better to do a character count that considers the possibility of spaces being URL-escaped to three characters.

snaka · 2024-08-07T12:04:47Z

tagpr.go

-	query := queryBase
+func buildChunkSearchIssuesQuery(qualifiers string, shasStr string) (chunkQueries []string) {
+	// array of SHAs
+	keywords := make([]string, 0, 25)


📝 To clarify that the character limit applies only to the part of the string that makes up the query, excluding the qualifier, the variable name has been changed.

snaka · 2024-08-07T12:20:57Z

@Songmu @ccoVeille

The following changes were made:

Removed escaping of spaces, which was assumed to be guessing.
Instead, the maximum number of characters was given more leeway.
- 7 x (25+1) = 200

ccoVeille · 2024-08-07T12:30:28Z

tagpr.go

+		// However, although not explicitly stated in the documentation, the space separating
+		// keywords is counted as one or more characters, so it is possible to exceed 256
+		// characters if the text is filled to the very limit of 256 characters.
+		// For this reason, the maximum number of chars in the KEYWORD section is limited here to 200.
+		tempKeywords := append(keywords, sha)
+		if len(strings.Join(tempKeywords, " ")) >= 200 {


Please add a constant and put this explanation aside.

It would help to understand the comment is about the value and not the algorithm

I fixed it, how about this?

Songmu · 2024-08-11T12:30:57Z

OK; let's give it a try.

snaka · 2024-08-15T00:35:50Z

I checked with the repository where the problem had previously appeared, and the problem has been resolved. 👍

snaka marked this pull request as ready for review August 2, 2024 15:26

ccoVeille suggested changes Aug 2, 2024

View reviewed changes

snaka changed the title ~~fix: Unstable search behaviour~~ fix: Unstable search issue behavior Aug 3, 2024

h-yon mentioned this pull request Aug 5, 2024

GitHub API error: The search is longer than 256 characters. x-motemen/git-pr-release#103

Open

No escapes, but give margins for upper limits

bbcd48b

snaka commented Aug 7, 2024

View reviewed changes

snaka requested a review from ccoVeille August 7, 2024 12:09

ccoVeille approved these changes Aug 7, 2024

View reviewed changes

Declare the character limit as a constant

0b66380

snaka requested a review from ccoVeille August 7, 2024 12:54

ccoVeille approved these changes Aug 7, 2024

View reviewed changes

Songmu merged commit 9138386 into Songmu:main Aug 11, 2024
3 checks passed

temporary-token-issuer-for-tagpr bot mentioned this pull request Aug 10, 2024

Release for v1.4.0 #176

Merged

Songmu added the minor label Aug 11, 2024

snaka deleted the fix-unstable-search-api-behavior branch August 13, 2024 23:31

MH4GF mentioned this pull request Aug 15, 2024

build(deps): bump Songmu/tagpr from 1.3.0 to 1.4.0 route06/actions#68

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Unstable search issue behavior #178

fix: Unstable search issue behavior #178

snaka commented Aug 2, 2024 •

edited

Loading

ccoVeille Aug 2, 2024

snaka Aug 3, 2024 •

edited

Loading

snaka Aug 6, 2024

ccoVeille Aug 7, 2024

Songmu commented Aug 7, 2024

snaka Aug 7, 2024 •

edited

Loading

snaka commented Aug 7, 2024

ccoVeille Aug 7, 2024

snaka Aug 7, 2024

Songmu commented Aug 11, 2024

snaka commented Aug 15, 2024

fix: Unstable search issue behavior #178

fix: Unstable search issue behavior #178

Conversation

snaka commented Aug 2, 2024 • edited Loading

Issue

How to reproduce problem

How to fix it

Wrong count targets

Inconsistency of delimiter counting method

ccoVeille Aug 2, 2024

Choose a reason for hiding this comment

snaka Aug 3, 2024 • edited Loading

Choose a reason for hiding this comment

snaka Aug 6, 2024

Choose a reason for hiding this comment

ccoVeille Aug 7, 2024

Choose a reason for hiding this comment

Songmu commented Aug 7, 2024

snaka Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

snaka commented Aug 7, 2024

ccoVeille Aug 7, 2024

Choose a reason for hiding this comment

snaka Aug 7, 2024

Choose a reason for hiding this comment

Songmu commented Aug 11, 2024

snaka commented Aug 15, 2024

snaka commented Aug 2, 2024 •

edited

Loading

snaka Aug 3, 2024 •

edited

Loading

snaka Aug 7, 2024 •

edited

Loading