Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix direct agent configuration #5380

Merged
merged 10 commits into from
Jun 3, 2021

Conversation

stuartnelson3
Copy link
Contributor

Motivation/summary

Manual testing with elastic/kibana#100744 revealed
errors in the current apm-server logic. This PR introduces changes so that
apm-server correctly switches to direct agent configuration when running under
managed mode.

depends on elastic/kibana#100744

Checklist

How to test these changes

  1. Start elasticsearch using apm-integration-testing
  2. Start a version of kibana including this PR
  3. Start elastic-agent (not managed by fleet) with a version of apm-server
    including this PR
  4. From the kibana UI:
  • Add apm-server as an integration to a fleet policy
  • Create an agent config in APM / Settings / Agent Config
  • Verify that the agent_config exists in the fleet policy and is nested under
    apm-server
  1. Update the running elastic-agent.yml with the generated policy from the
    previous step
  2. Verify that cURLing the config endpoint returns your agent config
  • e.g. curl 'http://localhost:8200/config/v1/agents?service.name=all'

selecting "all" for service name or environment in
the kibana ui marshals service.name and
service.environment as empty strings, which is
valid
@stuartnelson3 stuartnelson3 requested a review from a team May 31, 2021 14:29
beater/beater.go Outdated Show resolved Hide resolved
@stuartnelson3
Copy link
Contributor Author

Note: The current nesting in elastic/kibana#100744 has agent_configs on the same level as apm-server. In order for the config to work correctly, agent_configs needs to be nested underneath it, ie. apm-server.agent_configs.

@stuartnelson3
Copy link
Contributor Author

Note: If we're running in managed mode (Kibana.Enabled == false), but no agent configs exist, then this code branch will be executed and attempts to query the agent config will fail:

if !cfg.Kibana.Enabled && cfg.AgentConfigs == nil {
msg := "Agent remote configuration is disabled. " +
"Configure the `apm-server.kibana` section in apm-server.yml to enable it. " +
"If you are using a RUM agent, you also need to configure the `apm-server.rum` section. " +
"If you are not using remote configuration, you can safely ignore this error."
mw = append(mw, middleware.KillSwitchMiddleware(cfg.Kibana.Enabled, msg))
}

I could thread some sort of "managed" bool through to help make this decision, but I'm not sure if there might be some better way to handle it. Any suggestions?

@apmmachine
Copy link
Contributor

apmmachine commented May 31, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #5380 updated

  • Start Time: 2021-06-03T06:40:09.143+0000

  • Duration: 41 min 57 sec

  • Commit: 40a67c4

Test stats 🧪

Test Results
Failed 0
Passed 6142
Skipped 120
Total 6262

Trends 🧪

Image of Build Times

Image of Tests

@mergify
Copy link
Contributor

mergify bot commented Jun 1, 2021

This pull request is now in conflicts. Could you fix it @stuartnelson3? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-direct-agent upstream/fix-direct-agent
git merge upstream/master
git push upstream fix-direct-agent

@stuartnelson3
Copy link
Contributor Author

jenkins run the tests

@stuartnelson3 stuartnelson3 merged commit 9980296 into elastic:master Jun 3, 2021
@stuartnelson3 stuartnelson3 deleted the fix-direct-agent branch June 3, 2021 07:27
mergify bot pushed a commit that referenced this pull request Jun 3, 2021
* disable kibana if running in managed mode

* remove agent_config service validation

selecting "all" for service name or environment in
the kibana ui marshals service.name and
service.environment as empty strings, which is
valid

* remove kibana api key

this was used temporarily to support central
config via kibana, which is not necessary for 7.14

(cherry picked from commit 9980296)
stuartnelson3 added a commit that referenced this pull request Jun 3, 2021
* disable kibana if running in managed mode

* remove agent_config service validation

selecting "all" for service name or environment in
the kibana ui marshals service.name and
service.environment as empty strings, which is
valid

* remove kibana api key

this was used temporarily to support central
config via kibana, which is not necessary for 7.14

(cherry picked from commit 9980296)

Co-authored-by: stuart nelson <stuartnelson3@gmail.com>
@axw axw self-assigned this Jul 9, 2021
mergify bot pushed a commit that referenced this pull request Jul 9, 2021
* disable kibana if running in managed mode

* remove agent_config service validation

selecting "all" for service name or environment in
the kibana ui marshals service.name and
service.environment as empty strings, which is
valid

* remove kibana api key

this was used temporarily to support central
config via kibana, which is not necessary for 7.14

(cherry picked from commit 9980296)

# Conflicts:
#	apmpackage/apm/0.3.0/agent/input/template.yml.hbs
@axw
Copy link
Member

axw commented Jul 9, 2021

Verified with 7.14.0-BC2. I ran apm-integration-testing with --apm-server-managed, and then ran this Go program:

package main                                                                  
                                                                              
import (                                                                      
        "context"                                                             
        "fmt"                                                                 
                                                                              
        "go.elastic.co/apm/apmconfig"                                         
        "go.elastic.co/apm/transport"                                         
)                                                                             
                                                                              
func main() {                                                                 
        var args apmconfig.WatchParams                                        
        args.Service.Name = "main"                                            
        transport, _ := transport.NewHTTPTransport()                          
        changes := transport.WatchConfig(context.Background(), args)          
        for change := range changes {                                         
                fmt.Println("changed:", change)                               
        }                                                                     
}                                                                             

(This is using internals of the Go Agent.)

Then setting/updating a central config rule for service "main", I observed the changes in the config-watching program.

There is something wrong with how apm-server is handling config changes generally. On some changes, I observed apm-server would become unresponsive. Then looking inside the container I found the process had exited, but there was no sign of any errors in the logs. I'll continue digging into that separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants