Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique limiters for each API listener #1904

Merged
merged 13 commits into from
Sep 29, 2022

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Sep 22, 2022

What is the problem this PR solves?

Due to the internal and external listenser using the same limiter, the fleet-server may send a 429 response to the elastic-agent that it is running under when under load. This can cause the agent and fleet-server instance to be flagged as unhealthy and shut down.

How does this PR solve the problem?

Refactor the limit.Limiter so it can wrap the separate API httprouter endpoints. Limiter.WrapX() calls take the handler and stats incrementer for metrics/error counting. api.Run() replaced with Router.Run(), which will generate an httprouter for each listener in order to be able to associate the httprouter with a unique Limiter.

How to test

Start an elastic-agent that uses the fleet-server integration and view the fleet-server logs:

{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"0.0.0.0:8220","limits":{"MaxAgents":0,"PolicyThrottle":500000,"MaxHeaderByteSize":8192,"MaxConnections":52428800,"CheckinLimit":{"Interval":1000000,"Burst":1000,"Max":0,"MaxBody":1048576},"ArtifactLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0},"EnrollLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":524288},"AckLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":2097152},"StatusLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0}},"@timestamp":"2022-09-28T21:14:25.355Z","message":"fleet-server creating new limiter"}
...
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"localhost:8221","limits":{"MaxAgents":0,"PolicyThrottle":500000,"MaxHeaderByteSize":8192,"MaxConnections":52428800,"CheckinLimit":{"Interval":1000000,"Burst":1000,"Max":0,"MaxBody":1048576},"ArtifactLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0},"EnrollLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":524288},"AckLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":2097152},"StatusLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0}},"@timestamp":"2022-09-28T21:14:25.385Z","message":"fleet-server creating new limiter"}
...
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"localhost:8221","@timestamp":"2022-09-28T21:14:25.385Z","message":"fleet-server routes set up"}

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Refactor the limit.Limiter so it can wrap the separate API httprouter
endpoints. Limiter.WrapX() calls take the handler and stats incrementer
for metrics/error counting. api.Run() replaced with Router.Run(), which
will generate an httprouter for each listener in order to be able to
associate the httprouter with a unique Limiter.
@michel-laterman michel-laterman added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-v8.5.0 Automated backport with mergify labels Sep 22, 2022
@michel-laterman michel-laterman requested a review from a team as a code owner September 22, 2022 19:30
@mergify
Copy link
Contributor

mergify bot commented Sep 22, 2022

This pull request is now in conflicts. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b separate-limiters upstream/separate-limiters
git merge upstream/main
git push upstream separate-limiters

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 22, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-09-29T16:51:21.337+0000

  • Duration: 11 min 28 sec

Test stats 🧪

Test Results
Failed 0
Passed 351
Skipped 1
Total 352

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Comment on lines -60 to -77
{
limit.ErrRateLimit,
HTTPErrResp{
http.StatusTooManyRequests,
"RateLimit",
"exceeded the rate limit",
zerolog.DebugLevel,
},
},
{
limit.ErrMaxLimit,
HTTPErrResp{
http.StatusTooManyRequests,
"MaxLimit",
"exceeded the max limit",
zerolog.DebugLevel,
},
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rate limiting responses are done by the wrapping limiter

Comment on lines -110 to -112
// Metrics; serenity now.
dfunc := cntAcks.IncStart()
defer dfunc()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to not change the structure of the stats we collect, and still track rate-limit errors per route. I've moved the incrementer/decrement to the rate limiter wrapper, this means that the total calls (and active calls) will always be tracked, even in the case where authentication fails.
Currently if auth fails on an endpoint like this the total count will not increase (but the error count might)

Copy link

@scunningham scunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok.

internal/pkg/limit/limiter.go Show resolved Hide resolved
if l.maxLimit != nil {
l.maxLimit.Release(1)
}
}

func (l *limiter) wrap(logger zerolog.Logger, level zerolog.Level, h httprouter.Handle, i StatIncer) httprouter.Handle {
return func(w http.ResponseWriter, r *http.Request, p httprouter.Params) {
dfunc := i.IncStart()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or worse, this logic is different than the original implementation. In the original implementation, the counter was not incremented until:

  1. The limiter semaphore was acquired
  2. The transaction was authorized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i made a comment about that, we can definitely restore that behaviour if we need to.
But with it tracking every attempt it would mean that the error count associated with a route can't be higher than the total calls

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just metrics; doesn't affect logic so doesn't matter much.

@@ -27,30 +33,41 @@ const (
)

type Router struct {
ctx context.Context
ctx context.Context // used only by handleEnroll

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment as to where/why this is intialized.

bctx := func(net.Listener) context.Context { return ctx }

errChan := make(chan error)
cancelCtx, cancel := context.WithCancel(ctx) // TODO should we set rt.ctx = cancelCtx?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The cancel context wraps the lifetime of the server. We are using the global context here in handleEnroll to do a rollback. Perhaps clearer to rename ctx in the router to "baseCtx" or "appCtx".

internal/pkg/api/handleAck.go Outdated Show resolved Hide resolved
internal/pkg/limit/error.go Outdated Show resolved Hide resolved
internal/pkg/limit/error.go Outdated Show resolved Hide resolved
internal/pkg/limit/error.go Outdated Show resolved Hide resolved
internal/pkg/limit/error_test.go Outdated Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Sep 26, 2022

This pull request is now in conflicts. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b separate-limiters upstream/separate-limiters
git merge upstream/main
git push upstream separate-limiters

@michel-laterman michel-laterman mentioned this pull request Sep 27, 2022
2 tasks
@michel-laterman michel-laterman merged commit c99ccd8 into elastic:main Sep 29, 2022
mergify bot pushed a commit that referenced this pull request Sep 29, 2022
* Unique limiters for each API listener

Refactor the limit.Limiter so it can wrap the separate API httprouter
endpoints. Limiter.WrapX() calls take the handler and stats incrementer
for metrics/error counting. api.Run() replaced with Router.Run(), which
will generate an httprouter for each listener in order to be able to
associate the httprouter with a unique Limiter.

* Add listener address labeled logs to limiter

* Review feedback

* Apply suggestions from code review

Co-authored-by: Anderson Queiroz <me@andersonq.me>

* review feedback

* fix import

* Fix test

Co-authored-by: Anderson Queiroz <me@andersonq.me>
(cherry picked from commit c99ccd8)
@michel-laterman michel-laterman deleted the separate-limiters branch September 29, 2022 17:19
mergify bot added a commit that referenced this pull request Sep 29, 2022
* Unique limiters for each API listener

Refactor the limit.Limiter so it can wrap the separate API httprouter
endpoints. Limiter.WrapX() calls take the handler and stats incrementer
for metrics/error counting. api.Run() replaced with Router.Run(), which
will generate an httprouter for each listener in order to be able to
associate the httprouter with a unique Limiter.

* Add listener address labeled logs to limiter

* Review feedback

* Apply suggestions from code review

Co-authored-by: Anderson Queiroz <me@andersonq.me>

* review feedback

* fix import

* Fix test

Co-authored-by: Anderson Queiroz <me@andersonq.me>
(cherry picked from commit c99ccd8)

Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
blakerouse added a commit that referenced this pull request Nov 8, 2022
* [Automation] Update elastic stack version to 8.5.0-6b9f92c0 for testing (#1756)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-0616acda for testing (#1760)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-dd6f2bb0 for testing (#1765)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-feb644de for testing (#1768)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-7783a03c for testing (#1776)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-17b8a62d for testing (#1780)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-9aed3b11 for testing (#1784)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-440e0896 for testing (#1788)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-fedc3e60 for testing (#1791)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-b5001a6d for testing (#1795)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* ci: move to fleet-ci (#1199)

* Fic path to the packaging (#1806)

* Fix gcs credentials for packaging (#1807)

* [Automation] Update elastic stack version to 8.5.0-de69302b for testing (#1822)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-1bd77fc1 for testing (#1826)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-167dfc80 for testing (#1831)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-6b7dda2d for testing (#1835)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Allow multiple ES outputs as long as they are the same ES (#1684)

* add 'outputs' field to the ES agent schema to store the API key data and permission hash for each ES output

* add output name to API key metadata

* add v8.5 migration to migration.go

* add migration docs and improve logging

* group migration functions per version

* [Automation] Update elastic stack version to 8.5.0-4140365c for testing (#1837)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* updating upgrade_status: completed (#1833)

* updating upgrade_status: completed

* updated schema.json and regenerated schema.go

* updated license headers

* Fix v8.5.0 migration painless script (#1839)

* fix v8.5.0 migration painless script

* [Automation] Update elastic stack version to 8.5.0-8e906f9f for testing (#1843)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* ci: rename dra staging for release dra release staging (#1840)

* Remove events from agent checkin body. (#1842)

Remove the events attribute from the agent checkin body. Note that
removal of the attribute will not stop the server from issuing a 400 if
the response body is too long. The removal is so that the checkin code
on the fleet-server and agent remain comparable.

Co-authored-by: Blake Rouse <blake.rouse@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-589a4a10 for testing (#1852)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-37418cf3 for testing (#1855)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-fcf3d4c2 for testing (#1862)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-c7913db3 for testing (#1868)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Add error detail to catch-all HTTP response (#1854)

* Make authc log debug and add cache hit field (#1870)

* Document Go 1.18 certificate change in changelog. (#1871)

* Revert "Fix v8.5.0 migration painless script" (#1878)

* Revert "Fix v8.5.0 migration painless script (#1839)"

This reverts commit de5d74b.

* Revert "Allow multiple ES outputs as long as they are the same ES (#1684)"

This reverts commit 63fdcbf.

* [Automation] Update elastic stack version to 8.5.0-56d2c52d for testing (#1880)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Bulk API Keys update (#1779)

Bulk API Keys update (#1779)

* Fix and reintroduce "Allow multiple ES outputs as long as they are the same ES" (#1879)

* Revert "Revert "Fix v8.5.0 migration painless script" (#1878)"
  This reverts commit ef9ca2b.

* Revert "Revert "Allow multiple ES outputs as long as they are the same ES (#1684)""
  This reverts commit bb696ac.

* avoid new API keys being marked for invalidation

Co-authored-by: Michal Pristas <michal.pristas@gmail.com>
  He fixed the merge conflicts after Bulk API Keys update (#1779), commit 46ac14b, got merged

* [Automation] Update elastic stack version to 8.5.0-7dc445a0 for testing (#1888)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Update pre-sets limits to avoid overlap. (#1891)

Update file max limits and env_defaults_test.go running make defaults to generate the new one

* [Release] add-backport-next (#1892)

* Bump version to 8.6.0 (#1895)

* Catch error in waitBulkAction. Add bulk.WithRetryOnConflict(3) in multiple places. (#1896)

* Catch error in waitBulkAction. Add bulk.WithRetryOnConflict(3) in multiple places.

* Add changelog entry.

* Update CHANGELOG.next.asciidoc

Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>

Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>

* Update apikey.cache_hit log field name to match convention (#1900)

* [Automation] Update elastic stack version to 8.6.0-21651da3 for testing (#1908)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* LoadLimits does not override existing values (#1912)

Fleet-server will use any specified cache or server limit values over
whatever is returned by the default/agent number loader. For example, if
A max body size is specifically set to a value such as 5MB, and the
default returned by the LoadLimits is 1MB, the 5MB value is used.

* [Automation] Update elastic stack version to 8.6.0-326f84b0 for testing (#1916)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-df00693f for testing (#1925)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-a2f4f140 for testing (#1928)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Revert "updating upgrade_status: completed (#1833)" (#1920)

* Revert "updating upgrade_status: completed (#1833)"

This reverts commit 23be42a.

* Leaving in upgrade_status field for retry functionality

* Storing checkin message in last_checkin_message (#1932)

* Storing checkin message in last_checkin_message

* added changelog

* fixed tests

* Unique limiters for each API listener (#1904)

* Unique limiters for each API listener

Refactor the limit.Limiter so it can wrap the separate API httprouter
endpoints. Limiter.WrapX() calls take the handler and stats incrementer
for metrics/error counting. api.Run() replaced with Router.Run(), which
will generate an httprouter for each listener in order to be able to
associate the httprouter with a unique Limiter.

* Add listener address labeled logs to limiter

* Review feedback

* Apply suggestions from code review

Co-authored-by: Anderson Queiroz <me@andersonq.me>

* review feedback

* fix import

* Fix test

Co-authored-by: Anderson Queiroz <me@andersonq.me>

* Cleanup cmd/fleet/main.go (#1886)

* Replace cache.Config with config.Cache

* Move server setup from cmd/fleet to new pkg/server

* Move constants

* Fix imports and integration tests

* fix linter

* [Automation] Update elastic stack version to 8.6.0-158a13db for testing (#1938)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [8.6](forwardport) Add extra protection against accessing null fields to 8.5 migration (#1921) (#1926)

* [Automation] Update elastic stack version to 8.6.0-aea1c645 for testing (#1942)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-0fca2953 for testing (#1948)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-e4c15f15 for testing (#1954)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Conditional log level for api key read (#1946)

Conditional log level for api key read (#1946)

* Updated migration query to match items with deprecated field present (#1959)

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>

* Fix fleet.migration.total log key overlap (#1951)

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-d939cfde for testing (#1964)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-7c9f25a9 for testing (#1969)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-c49fac70 for testing (#1976)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Update to Go 1.18.7. (#1978)

* [Automation] Update elastic stack version to 8.6.0-5a8d757d for testing (#1981)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-40086bc7 for testing (#1987)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-233dc5d4 for testing (#1990)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-54a302f0 for testing (#1995)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Don't send POLICY_CHANGE actions retrieved from index to agent. (#1963)

* Don't send POLICY_CHANGE actions retrieved from index to agent.

The fleet-server should not send any policy change actions that are
written to the actions index to an agent on checkin. The server will
remove these actions in the convert method and emit a warning message.
The ack token that is used is not altered in this case. Policy change
actions are dynamically generated by the fleet-server when it detects
that the agent is not running an up to date version of the policy.

* move filtering to its own method

* Fix linter, tests, fix file name

* [Automation] Update elastic stack version to 8.6.0-cae815eb for testing (#2000)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-6545f2df for testing (#2005)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-055acc83 for testing (#2011)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-baf193e8 for testing (#2016)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-22d60ec9 for testing (#2020)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Allow upgrade action to signal retry (#1887)

* Allow upgrade action to signal retry

Allow the ack of an upgrade action to set the upgrade status to
retrying.

* fix tests set failed state

* Fix broken test

* nil upgrade status by default

* Set agent to healthy in case of upgrade failure

* fix upgrade fields

* Fix tests

* [Automation] Update elastic stack version to 8.6.0-b8b35931 for testing (#2024)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-a892f234 for testing (#2030)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Add GH action to add issues to ingest board

Issues in this repo labeled with `Team:Fleet` will be added to the ingest board automatically w/ the `Fleet Server` area.

* Update add-issues-to-ingest-board.yml

* [Automation] Update elastic stack version to 8.6.0-89d224d2 for testing (#2034)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-949a38d2 for testing (#2039)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-26dc1164 for testing (#2045)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Add active filter for enrollment key queries. (#2044)

* Add active filter for enrollment key queries.

Add an active: true filter to enrollment key queries. This allows
fleet-server to handle cases where there may be 10+ inactive keys
associated with a policy.

* review feedback

* fix linter

* fix tests

* Fix test cases

* [Automation] Update elastic stack version to 8.6.0-4765d2b0 for testing (#2048)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-8a615646 for testing (#2050)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-3f5f98b7 for testing (#2051)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-f20b7179 for testing (#2056)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Run mod tidy.

* Run make notice.

* Fix intergration tests.

* Run go mod tidy and make notice.

* Fix path to fleet-server.yml in integration test.

* Fix race condition.

* Fix try 2.

* Fix race.

* Fix race try 2.

Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>
Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com>
Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
Co-authored-by: Michal Pristas <michal.pristas@gmail.com>
Co-authored-by: Julien Lind <julien.lind@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Anderson Queiroz <me@andersonq.me>
Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co>
blakerouse added a commit that referenced this pull request Nov 9, 2022
* Support for Elastic Agent V2 status (#1747)

* Support for Elastic Agent V2 status

* Make 'make check-ci' happy

* Add a check that 'components' is valid array

* Rename variable to better reflect it's meaning

* [v2] Switch to Elastic Agent v2 control protocol (#1751)

* Switch to new client.V2 for communication with Elastic Agent.

* Fix tests.

* Fix integration tests.

* Update go.sum.

* Fix some lint issues.

* Fix panic with agentInfo.

* Fix panic in logger reconfigure.

* Fixes for switching units.

* updated version (#2014)

* Update the elastic-agent-client to latest version. (#2061)

* [v2] Merge main as of Nov 7 (#2062)

* [Automation] Update elastic stack version to 8.5.0-6b9f92c0 for testing (#1756)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-0616acda for testing (#1760)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-dd6f2bb0 for testing (#1765)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-feb644de for testing (#1768)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-7783a03c for testing (#1776)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-17b8a62d for testing (#1780)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-9aed3b11 for testing (#1784)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-440e0896 for testing (#1788)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-fedc3e60 for testing (#1791)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-b5001a6d for testing (#1795)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* ci: move to fleet-ci (#1199)

* Fic path to the packaging (#1806)

* Fix gcs credentials for packaging (#1807)

* [Automation] Update elastic stack version to 8.5.0-de69302b for testing (#1822)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-1bd77fc1 for testing (#1826)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-167dfc80 for testing (#1831)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-6b7dda2d for testing (#1835)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Allow multiple ES outputs as long as they are the same ES (#1684)

* add 'outputs' field to the ES agent schema to store the API key data and permission hash for each ES output

* add output name to API key metadata

* add v8.5 migration to migration.go

* add migration docs and improve logging

* group migration functions per version

* [Automation] Update elastic stack version to 8.5.0-4140365c for testing (#1837)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* updating upgrade_status: completed (#1833)

* updating upgrade_status: completed

* updated schema.json and regenerated schema.go

* updated license headers

* Fix v8.5.0 migration painless script (#1839)

* fix v8.5.0 migration painless script

* [Automation] Update elastic stack version to 8.5.0-8e906f9f for testing (#1843)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* ci: rename dra staging for release dra release staging (#1840)

* Remove events from agent checkin body. (#1842)

Remove the events attribute from the agent checkin body. Note that
removal of the attribute will not stop the server from issuing a 400 if
the response body is too long. The removal is so that the checkin code
on the fleet-server and agent remain comparable.

Co-authored-by: Blake Rouse <blake.rouse@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-589a4a10 for testing (#1852)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-37418cf3 for testing (#1855)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-fcf3d4c2 for testing (#1862)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.5.0-c7913db3 for testing (#1868)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Add error detail to catch-all HTTP response (#1854)

* Make authc log debug and add cache hit field (#1870)

* Document Go 1.18 certificate change in changelog. (#1871)

* Revert "Fix v8.5.0 migration painless script" (#1878)

* Revert "Fix v8.5.0 migration painless script (#1839)"

This reverts commit de5d74b.

* Revert "Allow multiple ES outputs as long as they are the same ES (#1684)"

This reverts commit 63fdcbf.

* [Automation] Update elastic stack version to 8.5.0-56d2c52d for testing (#1880)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Bulk API Keys update (#1779)

Bulk API Keys update (#1779)

* Fix and reintroduce "Allow multiple ES outputs as long as they are the same ES" (#1879)

* Revert "Revert "Fix v8.5.0 migration painless script" (#1878)"
  This reverts commit ef9ca2b.

* Revert "Revert "Allow multiple ES outputs as long as they are the same ES (#1684)""
  This reverts commit bb696ac.

* avoid new API keys being marked for invalidation

Co-authored-by: Michal Pristas <michal.pristas@gmail.com>
  He fixed the merge conflicts after Bulk API Keys update (#1779), commit 46ac14b, got merged

* [Automation] Update elastic stack version to 8.5.0-7dc445a0 for testing (#1888)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Update pre-sets limits to avoid overlap. (#1891)

Update file max limits and env_defaults_test.go running make defaults to generate the new one

* [Release] add-backport-next (#1892)

* Bump version to 8.6.0 (#1895)

* Catch error in waitBulkAction. Add bulk.WithRetryOnConflict(3) in multiple places. (#1896)

* Catch error in waitBulkAction. Add bulk.WithRetryOnConflict(3) in multiple places.

* Add changelog entry.

* Update CHANGELOG.next.asciidoc

Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>

Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>

* Update apikey.cache_hit log field name to match convention (#1900)

* [Automation] Update elastic stack version to 8.6.0-21651da3 for testing (#1908)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* LoadLimits does not override existing values (#1912)

Fleet-server will use any specified cache or server limit values over
whatever is returned by the default/agent number loader. For example, if
A max body size is specifically set to a value such as 5MB, and the
default returned by the LoadLimits is 1MB, the 5MB value is used.

* [Automation] Update elastic stack version to 8.6.0-326f84b0 for testing (#1916)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-df00693f for testing (#1925)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-a2f4f140 for testing (#1928)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Revert "updating upgrade_status: completed (#1833)" (#1920)

* Revert "updating upgrade_status: completed (#1833)"

This reverts commit 23be42a.

* Leaving in upgrade_status field for retry functionality

* Storing checkin message in last_checkin_message (#1932)

* Storing checkin message in last_checkin_message

* added changelog

* fixed tests

* Unique limiters for each API listener (#1904)

* Unique limiters for each API listener

Refactor the limit.Limiter so it can wrap the separate API httprouter
endpoints. Limiter.WrapX() calls take the handler and stats incrementer
for metrics/error counting. api.Run() replaced with Router.Run(), which
will generate an httprouter for each listener in order to be able to
associate the httprouter with a unique Limiter.

* Add listener address labeled logs to limiter

* Review feedback

* Apply suggestions from code review

Co-authored-by: Anderson Queiroz <me@andersonq.me>

* review feedback

* fix import

* Fix test

Co-authored-by: Anderson Queiroz <me@andersonq.me>

* Cleanup cmd/fleet/main.go (#1886)

* Replace cache.Config with config.Cache

* Move server setup from cmd/fleet to new pkg/server

* Move constants

* Fix imports and integration tests

* fix linter

* [Automation] Update elastic stack version to 8.6.0-158a13db for testing (#1938)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [8.6](forwardport) Add extra protection against accessing null fields to 8.5 migration (#1921) (#1926)

* [Automation] Update elastic stack version to 8.6.0-aea1c645 for testing (#1942)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-0fca2953 for testing (#1948)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-e4c15f15 for testing (#1954)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Conditional log level for api key read (#1946)

Conditional log level for api key read (#1946)

* Updated migration query to match items with deprecated field present (#1959)

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>

* Fix fleet.migration.total log key overlap (#1951)

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-d939cfde for testing (#1964)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-7c9f25a9 for testing (#1969)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-c49fac70 for testing (#1976)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Update to Go 1.18.7. (#1978)

* [Automation] Update elastic stack version to 8.6.0-5a8d757d for testing (#1981)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-40086bc7 for testing (#1987)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-233dc5d4 for testing (#1990)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-54a302f0 for testing (#1995)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Don't send POLICY_CHANGE actions retrieved from index to agent. (#1963)

* Don't send POLICY_CHANGE actions retrieved from index to agent.

The fleet-server should not send any policy change actions that are
written to the actions index to an agent on checkin. The server will
remove these actions in the convert method and emit a warning message.
The ack token that is used is not altered in this case. Policy change
actions are dynamically generated by the fleet-server when it detects
that the agent is not running an up to date version of the policy.

* move filtering to its own method

* Fix linter, tests, fix file name

* [Automation] Update elastic stack version to 8.6.0-cae815eb for testing (#2000)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-6545f2df for testing (#2005)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-055acc83 for testing (#2011)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-baf193e8 for testing (#2016)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-22d60ec9 for testing (#2020)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Allow upgrade action to signal retry (#1887)

* Allow upgrade action to signal retry

Allow the ack of an upgrade action to set the upgrade status to
retrying.

* fix tests set failed state

* Fix broken test

* nil upgrade status by default

* Set agent to healthy in case of upgrade failure

* fix upgrade fields

* Fix tests

* [Automation] Update elastic stack version to 8.6.0-b8b35931 for testing (#2024)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-a892f234 for testing (#2030)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Add GH action to add issues to ingest board

Issues in this repo labeled with `Team:Fleet` will be added to the ingest board automatically w/ the `Fleet Server` area.

* Update add-issues-to-ingest-board.yml

* [Automation] Update elastic stack version to 8.6.0-89d224d2 for testing (#2034)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-949a38d2 for testing (#2039)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-26dc1164 for testing (#2045)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Add active filter for enrollment key queries. (#2044)

* Add active filter for enrollment key queries.

Add an active: true filter to enrollment key queries. This allows
fleet-server to handle cases where there may be 10+ inactive keys
associated with a policy.

* review feedback

* fix linter

* fix tests

* Fix test cases

* [Automation] Update elastic stack version to 8.6.0-4765d2b0 for testing (#2048)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-8a615646 for testing (#2050)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-3f5f98b7 for testing (#2051)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* [Automation] Update elastic stack version to 8.6.0-f20b7179 for testing (#2056)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* Run mod tidy.

* Run make notice.

* Fix intergration tests.

* Run go mod tidy and make notice.

* Fix path to fleet-server.yml in integration test.

* Fix race condition.

* Fix try 2.

* Fix race.

* Fix race try 2.

Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>
Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com>
Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
Co-authored-by: Michal Pristas <michal.pristas@gmail.com>
Co-authored-by: Julien Lind <julien.lind@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Anderson Queiroz <me@andersonq.me>
Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co>

Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co>
Co-authored-by: Michal Pristas <michal.pristas@gmail.com>
Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>
Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com>
Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
Co-authored-by: Julien Lind <julien.lind@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Anderson Queiroz <me@andersonq.me>
Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.5.0 Automated backport with mergify bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load limit on fleet server also limits its own Elastic Agent
4 participants