Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.5.7+ changes some http response status codes from GET Workflow #13558

Open
3 of 4 tasks
pmareke opened this issue Sep 5, 2024 · 12 comments
Open
3 of 4 tasks

v3.5.7+ changes some http response status codes from GET Workflow #13558

pmareke opened this issue Sep 5, 2024 · 12 comments
Assignees
Labels
area/api Argo Server API type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@pmareke
Copy link
Contributor

pmareke commented Sep 5, 2024

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

In the latest images some endpoints change their status responses.

When I call this endpoints:

  • Calling the API to get a running workflow with an invalid token.
  • Calling the API to get a running workflow with a token without enough permissions.

I start returning a 404 when in previous version (3.4.x) the status code was a 401.

Version(s)

v3.5.7+

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

It's an http call, not related with the workflow itself.

Logs from the workflow controller

Nothing to show.

Logs from in your workflow's wait container

Nothing to show.
@agilgur5 agilgur5 added area/api Argo Server API type/regression Regression from previous behavior (a specific type of bug) labels Sep 5, 2024
@pmareke pmareke changed the title v3.5.5 changes the http response status codes v3.5.x changes the http response status codes Sep 5, 2024
@pmareke pmareke changed the title v3.5.x changes the http response status codes v3.5.x changes some http response status codes Sep 5, 2024
@agilgur5
Copy link
Member

agilgur5 commented Sep 5, 2024

Could you give some example API requests to make this easier to reproduce and debug?

As far as I know, the API logic is mostly unchanged in 3.5 other than the Unified Workflows List (i.e. #11121)

@pmareke
Copy link
Contributor Author

pmareke commented Sep 6, 2024

Sure @agilgur5:

  • GET/workflows/{self.namespace}/{workflow_name} sending an invalid Bearer token -> 401 in v3.4.x and 404 in v3.5.x
  • GET /workflows/{self.namespace}/{workflow_name} sending a valid Bearer token without enough permissions -> 401 in v3.4.x and 404 in v3.5.x

It also change the response of the following endpoints:

  • PUT /workflows/{self.namespace}/{workflow_name}/resume with a valid token and a non existing workflows -> 404 in v3.4.x
  • PUT /workflows/{self.namespace}/{workflow_name}/stop with a valid token and a non existing workflows -> 404 in v3.4.x

@agilgur5
Copy link
Member

agilgur5 commented Sep 6, 2024

Some more follow-up questions:

  1. Are the first two also non-existent Workflows?
  2. Do the namespaces exist?
  3. What is the response to the last two? A 200?
  4. Also what auth mode(s) are you using?

@pmareke
Copy link
Contributor Author

pmareke commented Sep 6, 2024

  1. Are the first two also non-existent Workflows?

NO

  1. Do the namespaces exist?

YES

  1. What is the response to the last two? A 200?

Not sure, I could't see, I need to rollback the upgrade as fast as possible 😞.

  1. Also what auth mode(s) are you using?

The Authorization header with the Bearer token.

The most important info I can give you is that without changing nothing on my side a lot of tests, which validates the ArgoWorkflow API responses, start to fail.

Rolling back to the v3.4 version everything back to work again.

@agilgur5
Copy link
Member

agilgur5 commented Sep 6, 2024

NO

Ok, so it's giving a 404 instead of a 401 even when the Workflow exists 🤔

Not sure, I could't see, I need to rollback the upgrade as fast as possible 😞.

I imagine you were able to test locally or on a staging/dev env if you tried :latest, so that's where you could repro.

The Authorization header with the Bearer token.

There's two of those; client is Bearer **** and sso is Bearer v2:****
Also you can have multiple enabled, so using one does not preclude the other. That's why I asked, as a response from one auth method could be different in another auth method (see also #13372 et al). In particular, server is effectively no auth (if your Server has appropriate RBAC)

Rolling back to the v3.4 version everything back to work again.

Yea I got that, and I did mark this as a regression according to that, I'm just not recalling any intentional change to API responses, so it's likely an unintentional bug somewhere, which is a lot harder to track down 🫠 More information helps with a repro to try to debug that

@Joibel
Copy link
Member

Joibel commented Sep 6, 2024

My best guess would be this is an unintentional side effect of #13021.

@agilgur5
Copy link
Member

agilgur5 commented Sep 6, 2024

That's v3.5.8+ (#12736 is v3.5.7+) and only affects the list endpoint, vs this is happening on individual workflows, and even on stops. So I would think #11121 would have caused it rather

@pmareke
Copy link
Contributor Author

pmareke commented Sep 7, 2024

Ok, so it's giving a 404 instead of a 401 even when the Workflow exists 🤔

Yes.

I imagine you were able to test locally or on a staging/dev env if you tried :latest, so that's where you could repro.

Yes, sure. I'll update again run the tests and give you more info.

There's two of those; client is Bearer **** and sso is Bearer v2:****

In my case I'm using the Bearer **** one.

I'm just not recalling any intentional change to API responses, so it's likely an unintentional bug somewhere, which is a lot harder to track down 🫠

Yes, I was taking a look to the latest releases notes and I didn't see too any change related 😔.

@pmareke
Copy link
Contributor Author

pmareke commented Sep 7, 2024

Ok so in the following endpoints:

  • PUT /workflows/{self.namespace}/{workflow_name}/resume with a valid token and a non existing workflow.
  • PUT /workflows/{self.namespace}/{workflow_name}/stop with a valid token and a non existing workflow.

The response now does not include the workflow name as it did in previous versions:

image

But I don't think this is nothing to worry about, my tests are validating the text error and that's an implementation detail and I need to change it.

The changes then are related with:

  • GET/workflows/{self.namespace}/{workflow_name} sending an invalid Bearer token -> 401 in v3.4.x and 404 in v3.5.x
  • GET /workflows/{self.namespace}/{workflow_name} sending a valid Bearer token without enough permissions -> 401 in v3.4.x and 404 in v3.5.x

Taking a look to the code I have the feeling that maybe the changes comes from the cli, does it makes any sense to you?

@agilgur5
Copy link
Member

agilgur5 commented Sep 7, 2024

I have the feeling that maybe the changes comes from the cli, does it makes any sense to you?

No. The CLI doesn't affect responses from the API, it's just a client.

@agilgur5 agilgur5 added this to the v3.5.x patches milestone Sep 7, 2024
@agilgur5
Copy link
Member

agilgur5 commented Sep 8, 2024

Ok after digging through the blame for the GetWorkflow method, pretty sure I found the root cause. My intuition was roughly correct; there was a precursor to #11121, which was #11055 and #11674. Those two PRs mean archive unification apparently affected GetWorkflow too and the logic there is a little incorrect with the archive fallback.

I'll follow-up more tomorrow and this week, there might be some other bugs related to that as well.

@agilgur5 agilgur5 self-assigned this Sep 19, 2024
@agilgur5
Copy link
Member

agilgur5 commented Sep 28, 2024

Correction, #11674 was correct, but #13021 / #12736 indeed made some alterations to the same exact place apparently, these lines specifically. This does mean it affects v3.5.7+ specifically

I should have a fix out soon

@agilgur5 agilgur5 changed the title v3.5.x changes some http response status codes v3.5.8+ changes some http response status codes from GET Workflow Sep 28, 2024
@agilgur5 agilgur5 changed the title v3.5.8+ changes some http response status codes from GET Workflow v3.5.7+ changes some http response status codes from GET Workflow Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Argo Server API type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

No branches or pull requests

3 participants