Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet]: Hosted fleet server not available on upgrading from 8.13.2>8.14.0. #3483

Closed
sukhwindersingh-qasource opened this issue Apr 22, 2024 · 13 comments
Assignees
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Fleet Label for the Fleet team

Comments

@sukhwindersingh-qasource
Copy link

sukhwindersingh-qasource commented Apr 22, 2024

Kibana Build details:

VERSION: 8.13.2
BUILD: 72154
COMMIT: d4d06bdf0d1d5dcb4532f00d2cbaa83fc61bb877
Artifact Link: https://www.elastic.co/downloads/past-releases/elastic-agent-8-13-2

VERSION: 8.14.0
BUILD: 73520
COMMIT: c1513cd7e5a00eab209ba02d30cafd6945d75470

Preconditions:

  • 8.13.2 Kibana cloud environment should be available.
  • 3 policies should be present, with these 2 integrations - Defend , Osquery manager
  • Agent should be installed on 3 or more endpoints.

Steps to reproduce:

  • Navigate to Edit deployment.
  • Now upgrade the 8.13.2 to 8.14.0
  • Wait for the upgrade to be done.
  • Navigate to the Fleets tab
  • Observe Hosted fleet server not available on upgrading

Screen Recording:

Agents.-.Fleet.-.Elastic.Mozilla.Firefox.2024-04-22.15-53-16.mp4

Expected Result:

  • Hosted fleet server should be available on upgrading from 8.13.2>8.14.0.

Whats Working:

  • It is working when we upgrade the empty instance of 8.13.2 > 8.14.0.
  • Issue got fixed after we force restart the integrations Server
    image
@sukhwindersingh-qasource sukhwindersingh-qasource added bug Something isn't working Team:Fleet Label for the Fleet team impact:high Short-term priority; add to current release, or definitely next. labels Apr 22, 2024
@sukhwindersingh-qasource
Copy link
Author

@amolnater-qasource Kindly review this
Thanks

@amolnater-qasource amolnater-qasource removed their assignment Apr 22, 2024
@amolnater-qasource
Copy link
Collaborator

Secondary review for this ticket is Done.

@cmacknz
Copy link
Member

cmacknz commented Apr 22, 2024

Hmm, looks like Fleet server was reported as offline. Do you have the deployment ID so we can look up logs?

I tried this and it succeed so likely it is intermittent.

@sukhwindersingh-qasource
Copy link
Author

Hi @cmacknz,

The deployment ID of the build is 9b9eb14524214a609c66172a4b05ca33. Please let us know if anything else is required.

Thanks!

@cmacknz
Copy link
Member

cmacknz commented Apr 23, 2024

Possibly this is the same as #3328

@juliaElastic
Copy link
Contributor

juliaElastic commented Apr 24, 2024

This doesn't seem the same issue, I'm seeing policy docs with coordinator_idx: 1 way before the time of the upgrade:
2024-04-22T09:55:55.181Z

image
   {
        "_index": ".fleet-policies-7",
        "_id": "El0cBY8BUevs6eRkOcSy",
        "_score": null,
        "_source": {
          "coordinator_idx": 1,
          "policy_id": "policy-elastic-agent-on-cloud",
          "revision_idx": 5,
          "@timestamp": "2024-04-22T09:21:06.634Z"
        },

@juliaElastic
Copy link
Contributor

juliaElastic commented Apr 24, 2024

I noticed that it's also slower for fleet-server to be healthy on upgrade/new deployment creation, the observability-perf tests consistently fail on this branch when expecting a healthy fleet-server in Functional Test: https://github.com/elastic/observability-perf/pull/771

I added a sleep of 2m, then fleet-server is healthy, tried with 1m, which wasn't enough.

Took a diagnostics from a cloud fleet-server where upgraded from 8.13.2 to 8.14.0. The fleet-server was in Updating state for a few minutes after the stack upgrade.
elastic-agent-diagnostics-2024-04-24T14-55-56Z-00.zip
Instance: https://admin.staging.foundit.no/deployments/ee8ec0f629cf4fc7940a0d5459910e97/activity

@kpollich
Copy link
Member

kpollich commented Apr 24, 2024

I noticed that it's also slower for fleet-server to be healthy on upgrade/new deployment creation

I noticed this as well while executing our 8.14 test plan. The managed agent was initially updating for a few minutes before eventually going healthy. We should retry this on the next 8.14 BC with #3495 merged.

@juliaElastic
Copy link
Contributor

Tested upgrading a cloud deployment from 8.13-SNAPSHOT to 8.14-SNAPSHOT and it was successful, and fleet-server became online quickly.
https://staging.found.no/deployments/661e87af059d4cca9a1114f8e4f6e950/activity
image

@kpollich
Copy link
Member

Seems likely that this was fixed by this revert then, right? #3495

@juliaElastic
Copy link
Contributor

Seems likely that this was fixed by this revert then, right? #3495

Yes, it seems so.

@kpollich
Copy link
Member

Great! Thanks for testing. I'll close this. @amolnater-qasource @sukhwindersingh-qasource would you mind retesting? Feel free to reopen if there are further issues. Thank you!

@amolnater-qasource amolnater-qasource added the QA:Ready For Testing Code is merged and ready for QA to validate label Apr 25, 2024
@sukhwindersingh-qasource
Copy link
Author

Hi @juliaElastic ,

We have validated this ticket on the latest 8.14.0 after upgrading it from 8.13.2 build and found the issue as FIXED. ✔️

Observations:

  • Hosted fleet server now available on upgrading from 8.13.2>8.14.0.

Build Details:
VERSION: 8.14.0
BUILD: 73626
COMMIT: bcf6960778ae270d0894a8aab07f10197ee9b97f

Screenshot:

image

Hence, we are marking this issue as QA Validated.

Thanks!!

@sukhwindersingh-qasource sukhwindersingh-qasource added QA:Validated Validated by the QA Team and removed QA:Ready For Testing Code is merged and ready for QA to validate labels Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Fleet Label for the Fleet team
Projects
None yet
Development

No branches or pull requests

5 participants