Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Agent Upgrade functionality throws an error when only 1 Agent is selected #80045

Closed
EricDavisX opened this issue Oct 8, 2020 · 15 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience impact:critical This issue should be addressed immediately due to a critical level of impact on the product. Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0

Comments

@EricDavisX
Copy link
Contributor

EricDavisX commented Oct 8, 2020

verison info:
8.0 snapshot tested on cloud

edavis-mbp:kibana_elastic edavis$ git show -s 43495d8
commit 43495d8
Date: Wed Oct 7 18:53:19 2020 -0600

Agent:
elastic-agent-8.0.0-SNAPSHOT-windows-x86_64 - from 10-06 with hash af3cda3c

  • I was hoping to upgrade it to
    elastic-agent-8.0.0-SNAPSHOT-windows-x86_64 - from 10-08 of hash 362958b3

Browser version:
Chrome on macOS

Describe the bug:
The new upgrade functionality of Agent in coming release is not working as we had hoped. it throws an error in the UI, the API returns a 400 on the POST.

the POST returns a 400:
https://9a98a19f33d84272960c5792357a35e5.us-central1.gcp.foundit.no:9243/api/fleet/agents/120619cb-d917-4c45-b5c8-e3be5074c4a4/upgrade
with body: {“version”:“8.0.0-SNAPSHOT”}

Steps to reproduce:

  • install 8.0 latest
  • install the most recent prior version agent (as noted above)
  • Add Endpoint integration to a new policy (may be unrelated?)
  • change Agent to the new policy
  • then i selected the Agent on the left in Agents list in Fleet, with checkboxes.
  • then the bulk action option list had ‘upgrade’ as an option.

I used the new 'install' command to get the Windows x64 Win 7 Agent up and running. akin to this command:
.\elastic-agent.exe install -f --kibana-url=https://1238a19f33d84272960c5792357a35e5.us-central1.gcp.foundit.no:443 --enrollment-token=123abcNYVUJnZ2tDQnhxdW41Zzk6NzBXNlpUeWJReHl4bEthTHlVd0piQQ==

Expected behavior:
it should upgrade Agent to the newer available snapshot and start it up

Screenshots (if relevant):
browser-console
ui-in-kibana
error

@EricDavisX EricDavisX added the Team:Fleet Team label for Observability Data Collection Fleet team label Oct 8, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@EricDavisX EricDavisX changed the title [Fleet] Agent Upgrade functionality throws an error on 8.0.0-SNAPSHOT usage [Fleet] Agent Upgrade functionality throws an error when only 1 Agent is selected Oct 8, 2020
@EricDavisX
Copy link
Contributor Author

pretty sure I found the bug at this stage. compare the 2 browser console POSTS - the single select didn't try to pass in the agent id, which makes sense that it is a bad post! the multi-select did, and worked (to kick off the upgrade, I can't say if it worked fully, yet)

see new screenshot:
browser-console-multi-select-has-agent-ids-and-works

@ph ph added bug Fixes for quality problems that affect the customer experience impact:critical This issue should be addressed immediately due to a critical level of impact on the product. labels Oct 8, 2020
@neptunian
Copy link
Contributor

neptunian commented Oct 8, 2020

For the single agent error, I think this is probably okay. If a user selects a bunch of agents and wants to perform a bulk action, we dont know which action they will try to perform, so they can still attempt an upgrade. When its just a single agent from a bulk action we direct to the the single upgrade: /agents/upgrade. It turns out this agent is not upgradeable, hence the response. If they had selected some agents that were upgradeable and some not, agents not upgradeable would be silently ignored and filtered out (i do think you found a bug in the bulk upgrade if not upgradeable agents, which we should open a diff issue for). Other actions like bulk unenroll do the same. We could do the same for a single agent in a bulk action (ignore it), and not provide any message, but I think having this message is probably better.

@EricDavisX
Copy link
Contributor Author

I will re-read your post a few times, I think I got it.. but to be sure, I would appreciate the help to log the bug you noted. @neptunian thank you.

I see that the Windows Agent came back and said it wasn't upgradeable - I think that is a bug on the Agent side (why wouldn't it be upgradeable) and I'll log it separately.

@neptunian
Copy link
Contributor

neptunian commented Oct 8, 2020

There are a couple checks for upgradeable. First checking whether the agent says its upgradeable. You you can see this value in the fleet-agents saved object. Then kibana does another check to see whether this agent version is less than the kibana version its running on. If you're running kibana 8.0.0 and the agent is at 8.0.0, then its not upgradeable. I haven't been able to test on a working snapshot build yet, so I'll test that.

Here is the new issue: #80070

@EricDavisX
Copy link
Contributor Author

I'd like to suggest that we enhance and change the POST response here to be a 406 (Not Acceptable), or 412 (Precondition Failed) or 405 (Method Not Allowed) code maybe. 400 is fairly generically 'bad' and this would help distinguish that the request was formed correctly and was an ok request, and that it was just rejected for not being able to be met which would represent the state of play a little better?

Maybe I'm thinking too hard about it, but wanted to mention. @ph if you think it is worthwhile? Else I suppose we can close this as working as designed. We may end up opening a special 'test support' ticket to help us implement a test jig to validate this, as we may not currently have a good way to do it in snapshot. I posted a proposal via slack, here is the gist, tracked at least somewhere in git now:

I don’t know if we need to make modifications to both Agent and Kibana side, but we need to build a back door in for ourselves. Kibana knows it was a ‘snapshot’ version, it says so in the UI (I suppose coming from Agent?). We could, hardcode Agent to just *always report that its upgradeable, when it is a snapshot version. As a hack.

Whats the benefit of not doing that? What would it hurt? In this way… the previous minus 1 version would attempt an upgrade, and then find the latest snapshot and do it and we could prove it with a version check on Agent side (and in logs, etc). Worst case scenario is that it re-installs the same versoin it has already (which is not a terrible outcome for a snapshot build)

@jfsiii
Copy link
Contributor

jfsiii commented Oct 9, 2020

@EricDavisX perhaps we could keep 400 and make the message better. I agree that 400 is generic "something about your request isn't right" but each of those codes you mentioned have specific meanings that aren't appropriate in this case. e.g. 405 means we don't support POST on that resource, 406 would be that we don't know how to answer in a format they understand, and so on.

@EricDavisX
Copy link
Contributor Author

@jfsiii ok - sounds good to me about the error codes.

@neptunian hi, do you or @ruflin / @ph have any thoughts about the testability of this and how we can verify it on the BC builds (and on regular snapshots for automation)? I didn't get to today, not sure if anyone made progress on testing post-merge end to end on this?

@ph
Copy link
Contributor

ph commented Oct 14, 2020

@EricDavisX Did we have testing information/guideline could we add to this issue?

@ph ph added the v7.10.0 label Oct 14, 2020
@EricDavisX
Copy link
Contributor Author

We're still compiling how to test it, I've started putting details and open questions into the main ticket -> elastic/beats#20205 (comment)

and the automation ticket has details of how to trigger it:
elastic/e2e-testing#341 (comment)

@rahulgupta-qasource the feature is testable, so you can play with it with the above in mind (cannot trigger via the UI). Fixes are nearly done, but there are open questions. You could create / update test cases with what we know. Thanks

@EricDavisX
Copy link
Contributor Author

@ph I'm seeing different behaviors on 8.0 vs 7.11 vs 7.10 so its hard to navigate... Sandra, Michal, and Blake were tracking (each) at least 1 item to merge for the feature, I don't think we have specific tickets for all of it yet, working on it.

@EricDavisX
Copy link
Contributor Author

EricDavisX commented Oct 19, 2020

There were some engineering questions we got organized - basic testing thru the UI with single and bulk update feature should be possible after a few fixes go in, hopefully in BC3 but need to track which feature relate and when they are merged.

Windows Agent doesn't always report as 'upgradeable':
elastic/beats#21884

Agent needs fix to restart correct Agent after upgrade
elastic/beats#21835
elastic/beats#21935

Agent needs fix to allow upgrade testing in UI
elastic/beats#21971

This last one is the only piece I don't know of a PR for yet, but its in progress by the team.

@EricDavisX
Copy link
Contributor Author

The BC4 or newer, and a 7.11 or 8.0 build after today will be enough to test this out.

@EricDavisX
Copy link
Contributor Author

I've tested on the 7.11 branch and find this fixed - there is still one bug in play, but it is Agent side and unrelated to exactly this scenario.

screenshot of POST:
Screen Shot 2020-10-26 at 1 21 10 PM

@ghost
Copy link

ghost commented Dec 1, 2020

Bug Conversion:

Below 01 Testcase already exists for this ticket:
https://elastic.testrail.io/index.php?/cases/view/34211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience impact:critical This issue should be addressed immediately due to a critical level of impact on the product. Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0
Projects
None yet
Development

No branches or pull requests

5 participants