-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Elastic-Agent] Use internal port for local fleet server #28993
[Elastic-Agent] Use internal port for local fleet server #28993
Conversation
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
💚 Flaky test reportTests succeeded. 🤖 GitHub commentsTo re-run your PR in the CI, just comment with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share your thoughts on why you made port and host configurable?
@@ -308,6 +311,24 @@ func enroll(streams *cli.IOStreams, cmd *cobra.Command, args []string) error { | |||
|
|||
ctx := handleSignal(context.Background()) | |||
|
|||
if localFleetServer := fServer != ""; localFleetServer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this to a separate function and then have some unit tests on it to validate the behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that port should be configurable. What if the user is already using that port for any other process running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further down a random port is selected that is free. So this conflict should never happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is really hard to understand. Can you break it up somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i made it configurable to make the option available for configuring port. if user wants to have it on some managed port. i can take it out if that's something not important atm.
@@ -308,6 +311,24 @@ func enroll(streams *cli.IOStreams, cmd *cobra.Command, args []string) error { | |||
|
|||
ctx := handleSignal(context.Background()) | |||
|
|||
if localFleetServer := fServer != ""; localFleetServer { | |||
if fInternalHost == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case where we would not use localhost? Should we even support something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fleet-server and its own Elastic Agent are always gonna be on the same host aren't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be on the same host. Elastic Agent starts fleet-server as a process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it will always be a localhost communication. Gotcha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should always be on localhost, however making it overridable could be useful for testing.
@@ -308,6 +311,24 @@ func enroll(streams *cli.IOStreams, cmd *cobra.Command, args []string) error { | |||
|
|||
ctx := handleSignal(context.Background()) | |||
|
|||
if localFleetServer := fServer != ""; localFleetServer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is really hard to understand. Can you break it up somehow?
@@ -308,6 +311,24 @@ func enroll(streams *cli.IOStreams, cmd *cobra.Command, args []string) error { | |||
|
|||
ctx := handleSignal(context.Background()) | |||
|
|||
if localFleetServer := fServer != ""; localFleetServer { | |||
if fInternalHost == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should always be on localhost, however making it overridable could be useful for testing.
@@ -384,3 +407,19 @@ func mapFromEnvList(envList []string) map[string]string { | |||
} | |||
return m | |||
} | |||
|
|||
func getRandomPort() (uint16, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works on all os's? Seems risky to me. Assuming that the OS will allow a rebind in a short period of time and/or won't decide to be aggressive and reuse port elsewhere.
Can we just default to 8221 or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking also about reversal flow, and if we agree that having this configurable is ok.
8221 is ok as well, seems it's not used by a lot of services. initially i wanted to avoid low ports, but i'm ok with this as well
@@ -32,6 +32,8 @@ rules: | |||
selectors: | |||
- fleet.server.host | |||
- fleet.server.port | |||
- fleet.server.internal_host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with exposing the internal fleet-server port as it can help for testing. But lets make sure we note that it is unsupported in production in our docs and the log files. Users should NOT use this flag.
@@ -401,6 +405,11 @@ func (c *enrollCmd) prepareFleetTLS() error { | |||
if c.options.URL == "" { | |||
return errors.New("url is required when a certificate is provided") | |||
} | |||
|
|||
if c.options.FleetServer.InternalPort > 0 { | |||
c.options.InternalURL = fmt.Sprintf("%s:%d", defaultFleetServerInternalHost, c.options.FleetServer.InternalPort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a long entry here which internal port is used and that this is not "officially" supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can put it there, but as soon as you have something on a machine which binds this port there needs to be official workaround and i think this is it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets wait until this happens. Having the log message and user pinging us on it will also help us detect if this might become a common problem and the port we selected is not ideal.
@@ -865,6 +878,9 @@ func createFleetServerBootstrapConfig( | |||
if port == 0 { | |||
port = defaultFleetServerPort | |||
} | |||
if internalPort <= 0 { | |||
port = defaultFleetServerInternalPort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be if internalPort > 0 { internalPort = defaultFleetServerInternalPort }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your condition sets default for a case when somebody tried to configure it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️ Yes. Why is it that on line 878 we have ==
and here <=
?
But I think the second part still holds, this should be assigned to internalPort
and not port
? Maybe I'm blind here too :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, fixed
defaultFleetServerHost = "0.0.0.0" | ||
defaultFleetServerPort = 8220 | ||
defaultFleetServerInternalHost = "localhost" | ||
defaultFleetServerInternalPort = 8221 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make sure we document this internal as needed by fleet-server. @jlind23 Can you make sure we follow up on this? We also need to have this in the changelog as if by chance someone uses this port already on the machine, it would break things (hopefully unlikely).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin Where should it be documented though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two ideas:
- https://www.elastic.co/guide/en/fleet/current/fleet-faq.html
- https://www.elastic.co/guide/en/fleet/master/fleet-troubleshooting.html
@bmorelli25 Where would these internal docs around elastic-agent and fleet-server fit best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is @dedemorton's territory. DeDe, can you take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so it sounds like you don't want to document the internal port setting in the main docs (command reference + Fleet Server settings) because users will not set it in a production environment. It also sounds like we're covered for the situation where the port is already in use (the code will select an unused port when there's a conflict).
I'd like to understand specific use cases for this setting, though, because I'm always suspicious of hidden settings that we don't want to document for users. We have a lot of users who are on the support team and might look in the reference first. My preference is to have reference docs that are complete, but to make the usage clear in the description.
Sounds like it also makes sense to cover this setting in the troubleshooting docs, but we need to explain when/why to change this setting during testing.
We could also mention in the Fleet Server docs that Fleet Server uses port 8221 internally, but will look for another port if that one isn't available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not look for antoher port. We might want to use the port for testing purpose or as an escape hatch if indeed a user has a conflict.
There might we 2 things we should document:
- Architecture / Internals: What port is used for internal communication -> public
- Hidden config to change it -> developer focus, mentioned that not officially supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only comment left is around the log message.
…ws-on-file-changes * upstream/master: override host on statsd metricset (elastic#29103) Skip config check in autodiscover for duplicated configurations (elastic#29048) Change "filebeat.config.modules.enabled" to "true" (elastic#28769) Remove deprecated spool queue from Beats (elastic#28869) Add `beat` field back to beat.stats (elastic#29094) Revert "Move labels and annotations under kubernetes.namespace. (elastic#27917)" (elastic#29069) heartbeat: remove w2008 in the CI (elastic#29093) Remove deprecated `--template` and `--index-policy` flags (elastic#28870) Fix parsing of apache trace log levels (elastic#28717) [Elastic-Agent] IUse itnernal port for local fleet server (elastic#28993) [Heartbeat] Log error on dupe monitor ID instead of strict req (elastic#29041) Enable pprof for elastic-agent and beats (elastic#28983)
What does this PR do?
What this PR solves is a problem when agent got unenrolled on heavier load when agent managing fleet server cannot checkin to it's own server so it will fallback to unenroll.
Problem is solved by adding internal endpoint which is used for communication on local network (with agent handling fleet server)
It lets FS to spin up 2 set of handlers, one on public 8220 and one on port defined in config.
Why is it important?
Prevents unwanted un-enrolls of the most important agent in deployment
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
This needs to be tested with work on
fleet-server
Link: elastic/fleet-server#8808220
port, there should be no comm