Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Agent policies per each test execution #1866

Merged
merged 32 commits into from
Jun 6, 2024

Conversation

mrodm
Copy link
Contributor

@mrodm mrodm commented May 24, 2024

Relates #787

This PR adds a new Agent Policy that will be created and configured per each system test execution.

This new Agent Policy is going to help us:

After each test run defined that new Agent Policy is deleted as well as the Data Stream created for it.

@mrodm mrodm self-assigned this May 24, 2024
@mrodm
Copy link
Contributor Author

mrodm commented May 24, 2024

/test

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @mrodm

Comment on lines 850 to 853
if !r.options.RunIndependentElasticAgent {
// keep same behaviour as previously when Elastic Agent of the stack is used.
return false
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to keep the same behaviour when using Elastic Agent of the stack. WDYT ?

Comment on lines +283 to +286
tdErr := r.tearDownTest(ctx)
if tdErr != nil {
logger.Errorf("failed to tear down runner: %s", tdErr.Error())
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required to delete the policy created just for testing and also the related Data Streams.

@mrodm mrodm marked this pull request as ready for review May 29, 2024 09:13
@mrodm mrodm requested a review from a team May 29, 2024 09:13
@mrodm
Copy link
Contributor Author

mrodm commented May 29, 2024

test integrations

@mrodm
Copy link
Contributor Author

mrodm commented Jun 5, 2024

test integrations

@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10005

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the concept, but I wonder if we need as many policies as we have. We are creating three kinds of policies now.

Comment on lines 1012 to 1013
policyToAssignDatastreamTests := policyToTest
if r.shouldCreateNewAgentPolicyForTest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the policyToTest? Couldn't we always use the policy created here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I just update the policyToTest to be like this one, that means that the namespace is random, it would work just while the system tests are not run per stages (--setup, --no-provision, --tear-down).

When these flags are used it would be used the same data stream created in the first run, retrieved from the service state:

policyToTest = &serviceStateData.CurrentPolicy

That would mean that the method to delete old documents from the data stream could not be delete since that data stream would contain documents for sure.

Not tested, maybe it could be created always the policyToTest and just used the one from the state file to assign back the policy to the agent in the scenarios where the tests are performed in stages.

And for those test runs without these flags (common usage), there would be data streams created for each run:
data streams created

It could be added the logic to remove those data streams in one of the handlers.

In this PR these 3 policies are used as following:

  • one for enrolling without any package policy
    • there were having issues when the stack is running for more than X minutes that agents were not able to enroll using the default policy.
  • one to assign the package policy to the agent
    • this one would be used to assign back when running per stages after each --no-provision run
  • one for the actual testing
    • this policy is the one that would be using the Elastic Agent to run the system tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a try to remove one agent policy. I'll need to test with system test running per stages too.

internal/testrunner/runners/system/runner.go Show resolved Hide resolved
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we can go further in the direction of using an isolated policy and data stream for testing, so we don't need to clean documents at any moment. But we can incrementally go in this direction.

@@ -920,7 +919,7 @@ func (r *runner) prepareScenario(ctx context.Context, config *testConfig, svcInf
policyEnroll := kibana.Policy{
Name: fmt.Sprintf("ep-test-system-enroll-%s-%s-%s", r.options.TestFolder.Package, r.options.TestFolder.DataStream, testTime),
Description: fmt.Sprintf("test policy created by elastic-package to enroll agent for data stream %s/%s", r.options.TestFolder.Package, r.options.TestFolder.DataStream),
Namespace: "ep",
Namespace: "enrollep",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be also randomized, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update it.
This policy has no package policy assigned so it will not create any data stream in any case, but better to have it randomized to avoid future issues.

Comment on lines 1153 to 1156
r.resetAgentLogLevelHandler = func(ctx context.Context) error {
if r.options.RunTestsOnly {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. In these cases, could we directly avoid adding the handler?

Suggested change
r.resetAgentLogLevelHandler = func(ctx context.Context) error {
if r.options.RunTestsOnly {
return nil
}
if r.options.RunTestsOnly {
r.resetAgentLogLevelHandler = func(ctx context.Context) error {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be done, but there are handlers that have more conditions to not run. For instance

	r.removeAgentHandler = func(ctx context.Context) error {
		if r.runTestsOnly {
			return nil
		}
		// When not using independent agents, service deployers like kubernetes or custom agents create new Elastic Agent
		if !r.runIndependentElasticAgent && !svcInfo.Agent.Independent {
			return nil
		}

I think I'll keep all the handlers as they are to keep the same pattern in all of them.

@mrodm
Copy link
Contributor Author

mrodm commented Jun 6, 2024

test integrations

Comment on lines 972 to 976
logger.Debug("creating test policy...")
policyToAssignDatastreamTests := kibana.Policy{
Name: fmt.Sprintf("ep-test-system-%s-%s-%s", r.testFolder.Package, r.testFolder.DataStream, testTime),
Description: fmt.Sprintf("test policy created by elastic-package test system for data stream %s/%s", r.testFolder.Package, r.testFolder.DataStream),
Namespace: "ep",
Namespace: common.CreateTestRunID(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, every execution will create a new Agent Policy for testing, except when running with --tear-down. This tear down stage does not run any tests, so it is not needed.

@mrodm mrodm requested a review from jsoriano June 6, 2024 12:18
@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10005

@mrodm
Copy link
Contributor Author

mrodm commented Jun 6, 2024

test integrations

@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10092

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change 👍

if r.resetAgentPolicyHandler != nil {
if err := r.resetAgentPolicyHandler(cleanupCtx); err != nil {
return err
}
r.resetAgentPolicyHandler = nil
}

// Shutting down the service should be run one of the first actions
// to ensure that resources created by terraform are deleted even if other
// errors fail.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should run all handlers, and report all returned errors if any. (For a future change).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.

There could be handlers that depend on others, but if so there will be another error. Or it could be detected the error and no execute the handler.

As you mentioned, for a future change.

// There are some issues when the stack is running for some time,
// agents cannot enroll with the default policy
// This enroll policy must be created even if independent Elastic Agents are not used. Agents created
// in Kubernetes or Custom Agents require this enroll policy too (service deployer).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to comment on this, I didn't remember the issues after running for some time 👍

// This allows us to ensure that the Agent Policy used for testing is
// assigned to the agent with all the required changes (e.g. Package DataStream)
logger.Debug("creating test policy...")
policyToAssignDatastreamTests := kibana.Policy{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. I find this name confusing. Why do we have to mention that this policy is going to have data streams assigned? As this is only used in this reduced scope, could we just call it policy?

Suggested change
policyToAssignDatastreamTests := kibana.Policy{
policy := kibana.Policy{

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wrote that name while developing the feature and I didn't re-check it 😅
I'll rename it with just policy 👍

return err
}
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 🔥

I think we are going to avoid some flakiness cases by removing this. And probably tests will be slightly faster.

@mrodm mrodm enabled auto-merge (squash) June 6, 2024 14:59
@mrodm mrodm merged commit 3cc9d99 into elastic:main Jun 6, 2024
3 checks passed
@mrodm mrodm deleted the create-policy-per-test branch June 6, 2024 15:13
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @mrodm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants