Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drone-Webhook fails after update to 1.22.3 #32241

Closed
JumpingScript opened this issue Oct 11, 2024 · 17 comments
Closed

Drone-Webhook fails after update to 1.22.3 #32241

JumpingScript opened this issue Oct 11, 2024 · 17 comments
Labels
issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail

Comments

@JumpingScript
Copy link

JumpingScript commented Oct 11, 2024

Description

After I've updated my gitea instance to 1.22.3, my drone webhook fails with
Delivery: Post "https://ci.example.de/hook?secret=SECRET": context deadline exceeded (Client.Timeout exceeded while awaiting headers).
The log showed
[EDIT, more log entries]

2024/10/11 15:10:47 ...eb/routing/logger.go:102:func1() [I] router: completed POST /org/project1/settings/hooks/21/test for 132.187.50.164:0, 200 OK in 26.7ms @ setting/webhook.go:647(setting.TestWebhook)
2024/10/11 15:10:48 ...eb/routing/logger.go:102:func1() [I] router: completed POST /login/oauth/access_token for 10.87.181.26:0, 200 OK in 105.1ms @ auth/oauth.go:638(auth.AccessTokenOAuth)
2024/10/11 15:10:48 ...eb/routing/logger.go:102:func1() [I] router: completed GET /api/v1/repos/org/project1/raw/a770584dec278ed7ff344add85deb6457c87f8f3/.drone.yml for 10.87.181.26:0, 403 Forbidden in 2.0ms @ v1/api.go:762(v1.Routes.verifyAuthWithOptions)
2024/10/11 15:10:52 ...eb/routing/logger.go:102:func1() [I] router: completed GET /org/project1/settings/hooks/21 for 132.187.50.164:0, 200 OK in 36.3ms @ setting/webhook.go:632(setting.WebHooksEdit)
2024/10/11 15:10:53 ...s/webhook/webhook.go:101:handler() [E] Unable to deliver webhook task[599]: unable to deliver webhook task[599] in https://ci.example.de/hook?secret=SECRET due to error in http client: Post "https://ci.example.de/hook?secret=SECRET": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I already checked the webhook.ALLOWED_HOST_LIST-setting - it does have the correct url.
Also, the gitea server is reachable by ping from the ci server.

Gitea Version

1.22.3

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

2.43.0

Operating System

openSUSE

How are you running Gitea?

go-binary

Database

MySQL/MariaDB

@wxiaoguang
Copy link
Contributor

Delivery: Post "https://ci.example.de/hook?secret=SECRET": context deadline exceeded (Client.Timeout exceeded while awaiting headers).

Just FYI: it means that Gitea can't make a connection to https://ci.example.de, maybe you need to check your network and/or proxy setting, etc.

@wxiaoguang wxiaoguang added issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail and removed type/bug labels Oct 11, 2024
@JumpingScript
Copy link
Author

JumpingScript commented Oct 11, 2024

Just FYI: it means that Gitea can't make a connection to https://ci.example.de, maybe you need to check your network and/or proxy setting, etc.

Uh, ping works from gitea to the ci server, so that's probably not it?

@wxiaoguang
Copy link
Contributor

Yup, that's weird (and doesn't seem to be related to Gitea at the moment).

Ping works doesn't mean "https" also works.

I think you could try to figure out why the https request failed, eg:

  1. try curl https://ci.example.de/hook?secret=SECRET on your Gitea server
  2. try to figure out whether there are proxy settings (enviroments)
  3. use tcpdump or wireshark to see the real network packets.

@wxiaoguang
Copy link
Contributor

ps: and there is a setting DELIVER_TIMEOUT which defaults to 5 (seconds)

@JumpingScript
Copy link
Author

JumpingScript commented Oct 11, 2024

1. try `curl https://ci.example.de/hook?secret=SECRET` on your Gitea server

Returns "405 method not allowed" after adding -v

2. try to figure out whether there are proxy settings (enviroments)

There are no proxies as far as I am aware.

3. use tcpdump or wireshark to see the real network packets.

I'll need to check that later, sorry about that!

ps: and there is a setting DELIVER_TIMEOUT which defaults to 5 (seconds)

Just to make sure, you think that the operation is taking too long and runs into timeout?

Also, I've added a bit more log entries I found - hope that helps more.

@wxiaoguang
Copy link
Contributor

ps: and there is a setting DELIVER_TIMEOUT which defaults to 5 (seconds)

Just to make sure, you think that the operation is taking too long and runs into timeout?

Yup, that's also a possible problem.

exceeded while awaiting headers could also mean that "Gitea has sent the request to Drone, but Drone are not able to respond in time, so Gitea cancels the request after 5sec timeout" (still, just a guess). So if there could be some logs from Drone side to see whether the requests have reached Drone, it would help a lot.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Oct 11, 2024

Also, I've added a bit more log entries I found - hope that helps more.

By reading the new logs, if I understand correct, a best guess is like this:

  1. TestWebhook tries to make Gitea send a webhook request to Drone
  2. Drone receives the request and tries to access /login/oauth/access_token and /api/v1/repos/org/project1/raw/a770584dec278ed7ff344add85deb6457c87f8f3/.drone.yml
    • BUT Drone fails to access .drone.yml (it doesn't pass Gitea's permission check: 403 Forbidden)
  3. Drone fails to process the webhook request
  4. Gitea fails to complete the webhook request and recevies no response in 5 sec, then it redirects to the "webhook edit" page and shows an error message.

If the guess is right, the key problem is in step 2

@wxiaoguang
Copy link
Contributor

If the guess is right and it is related to the token scope/permission, maybe it is related to this: Fix bug when a token is given public only (#32204) #32218 and maybe you could try to fix the token's permission which is used by Drone.

@JumpingScript
Copy link
Author

There's a 500 logged by Drone for the POST to /hook?secret.

This could be caused by Drone not being able to access /api/v1/repos/org/project1/raw/a770584dec278ed7ff344add85deb6457c87f8f3/.drone.yml, if I am interpreting this right.

If the guess is right and it is related to the token scope/permission, maybe it is related to this: Fix bug when a token is given public only (#32204) #32218 and maybe you could try to fix the token's permission which is used by Drone.

Well, Drone uses a service account and is authenticated through oauth - and there's no way to actually change the permissions assigned to an oath app, is there?

@wxiaoguang
Copy link
Contributor

If the guess is right and it is related to the token scope/permission, maybe it is related to this: Fix bug when a token is given public only (#32204) #32218 and maybe you could try to fix the token's permission which is used by Drone.

Well, Drone uses a service account and is authenticated through oauth - and there's no way to actually change the permissions assigned to an oath app, is there?

"Fix bug when a token is given public only (#32204) #32218" fixed a permission (security) bug, that's the only possible related change I can recall but I didn't read the details about it. Maybe others could have some ideas .....

@wxiaoguang
Copy link
Contributor

Hmm, one more thing, were you using the Drone webhook in a private repo? If yes, could you try to setup a public repo to use Drone webhook? If the public one works but the private one doesn't work, then it makes "Fix bug when a token is given public only (#32204) #32218" more suspicious.

@JumpingScript
Copy link
Author

Making the repo public sadly does not change anything - it's still not working :(

@wxiaoguang
Copy link
Contributor

That's really strange .... to be honest I have no idea at the moment either.

Could you try to downgrade to 1.22.2 or the last usable version? Maybe finding the breaking point would help.

@JumpingScript
Copy link
Author

I am pretty sure that the breaking point is the last version 1.22.3, since the upgrade was made on the 9th and the webhook still worked on the 8th.
The upgrade from 1.22.2 to 1.22.3 is also the only change that happened during that period in that section of the network (gitea, drone).

I'll need to clear a downgrade with my admin, if the points above don't convince you :)

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Oct 14, 2024

I am pretty sure that the breaking point is the last version 1.22.3, since the upgrade was made on the 9th and the webhook still worked on the 8th. The upgrade from 1.22.2 to 1.22.3 is also the only change that happened during that period in that section of the network (gitea, drone).

Yup, I agree that's the most suspicious part, while there could still be a small chance that other unknown bugs or operations causes the problem (eg: unfortunately downgrading to 1.22.2 doesn't work either now)

Think about some cases:

  1. There is a bug between 1.22.2 and 1.22.3 (maybe it is the most suspicious case)
    • Somebody could guess right soon and propose a fix.
    • Very difficult to catch the real problem:
      • Some maintainers also use Drone and they also encounter the bug and they will fix it.
      • You could provide a reproducible setup with detailed steps then some people to debug and fix
      • Some people could help to build a special binary with enough debug logs and you convince the admin to deploy it and collect more debug information.
  2. There is an old bug before 1.22.3, or some unknown operations (there could still be a small chance to hit)
    • Some operations triggers the bug during upgrading, eg: edit tokens, change settings, etc
    • It still needs to do something like "Very difficult to catch the real problem" to figure out how it happens.

I'll need to clear a downgrade with my admin, if the points above don't convince you :)

I am not expert and could just guess, and I do not use Drone, maybe there could be some experts to provide more ideas. Actually, "Drone CI/CD stopping working" is a strong reason to convince the admin.


If the error logs are right:

2024/10/11 15:10:48 ...eb/routing/logger.go:102:func1() [I] router: completed GET /api/v1/repos/org/project1/raw/a770584dec278ed7ff344add85deb6457c87f8f3/.drone.yml for 10.87.181.26:0, 403 Forbidden in 2.0ms @ v1/api.go:762(v1.Routes.verifyAuthWithOptions)

It means that the request fails in verifyAuthWithOptions https://github.com/go-gitea/gitea/blob/release/v1.22/routers/api/v1/api.go#L761 , it seems that it only checks the user status.

@JumpingScript
Copy link
Author

... I just disabled the project in drone, and then activated it, everything's working again.

Sorry to have bothered you, and thank you for the help.

@wxiaoguang
Copy link
Contributor

Well, that's really difficult to guess ......

@go-gitea go-gitea locked as resolved and limited conversation to collaborators Jan 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail
Projects
None yet
Development

No branches or pull requests

2 participants