-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GCP test package #701
Add GCP test package #701
Conversation
.ci/Jenkinsfile
Outdated
@@ -371,9 +371,15 @@ def withCloudTestEnv(Closure body) { | |||
if (!aws.containsKey('secret_key')) { | |||
error("${AWS_ACCOUNT_SECRET} doesn't contain 'secret_key'") | |||
} | |||
// GCP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it's not so simple in this case, but no worries, we'll sort it out :)
Could you please try something similar to this:
withCredentials([string(credentialsId: "${env.JOB_GCS_BUCKET_INTERNAL}", variable: 'GOOGLE_CREDENTIALS')]) {
maskedVars.add([var: 'GOOGLE_CREDENTIALS', password: env.GOOGLE_CREDENTIALS]);
}
The idea is to pull the env from credentials and place it in the masked vars. Be careful not to leak the confidential data with some "print" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I tried :D I updated the code with your suggestion, still I do not understand who is going to set GOOGLE_CREDENTIALS
variable in the GCP project :) (If there is any docs about this, happy to read them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elastic-package uses the env.yml file to pass selected env variables to the Terraform docker container.
I see that you also use GCP_PROJECT_ID. Is it required? If so, then we need to set it here too (hardcode?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@endorama My bad, try this one: JOB_GCS_EXT_CREDENTIALS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCP_PROJECT_ID
is required, and is the project where resources will be created into, and where the credentials passed are valid. Should it be a masked var as well? Which project should I use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@v1v We're having a hard time figuring out why the Jenkins pipeline fails here. Apparently, there isn't any error logged. We want to read the credentials for the elastic-observability
account and store them in GOOGLE_CREDENTIALS.
Would you mind taking a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and I added GCP_PROJECT_ID
, but CI is still red without errors. I'm confused because there are no errors in the CI tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of clarity about debugging errors:
- Genuine errors are reported correctly in the Blue Ocean UI
- Pipeline errors are not reported correctly in the Blue Ocean UI
The problem is related to the credentials type:
11:09:05 ERROR: Credentials 'beats-ci-gcs-plugin-file-credentials' is of type 'Secret file' where 'org.jenkinsci.plugins.plaincredentials.StringCredentials' was expected
The below command helped to debug the error:
curl -s https://beats-ci.elastic.co/job/Ingest-manager/job/elastic-package/job/PR-701/5/consoleText | grep -i error
I'll raise an issue to be able to show errors related to 2)
since the ones for 1)
are reported in the GitHub comment, but the ones for 2)
are not (I don't know if I'll be success)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, sorry that you had some issues while debugging these errors. It's not a great user experience!
The traditional jenkins UI is the last resort to debug errors that are not easy to catch
Out of this scope, OTOH, with the Observing the CI/CD solution, for every build in the CI there will be a distributed trace, there is a direct link in the GitHub comment with the build status
Although, some spans are not shown (maybe the APM Server didn't cope with all the load) but I found the error in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @v1v for bringing it up!
What we're trying to achieve here is to extract GOOGLE_CREDENTIALS for the elastic-observability
account and pass it to the system tests runner (elastic-package Terraform service deployer). We have already done with AWS credentials (access key, secret access key) and it works well.
Now, it's time to fetch credentials for GCP. The GOOGLE_CREDENTIALS
env will be passed to the Terraform service deployer (Docker container) to execute TF scenario and create GCP resources.
PS. Probably same story for GCP_PROJECT_ID
.
I got this error now:
I'm wondering which credentials is the TF using. |
Hold on a minute! I'll write here since those credentials are not the ones to be used |
.ci/Jenkinsfile
Outdated
// GCP | ||
withCredentials([file(credentialsId: env.JOB_GCS_EXT_CREDENTIALS, variable: 'FILE_CREDENTIAL')]) { | ||
maskedVars.add([var: 'GOOGLE_CREDENTIALS', password: readFile(file: env.FILE_CREDENTIAL)]); | ||
maskedVars.add([var: 'GCP_PROJECT_ID', password: env.GCP_PROJECT_ID]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this section but a shared library step to wrap the command that requires the GCP connection:
This step allows to use the credentials stored in vault, this is much more dynamic. You can see an example in:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can experiment with withGCPEnv
, but all we need is to sniff the GOOGLE_CREDENTIALS and pass it to the TF container (only this ENV, maybe also GCP_PROJECT_ID). We don't need full initialization of withGCPEnv
. Once we pull the credentials, we can disable the withGCPEnv
's context.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case: I adapted the PR to use withGCP
I fixed the issue with Google Cloud credentials. Now the problem is:
|
Error is most likely related to: elastic/kibana#125625 |
/test |
internal/install/stack_version.go
Outdated
@@ -6,5 +6,5 @@ package install | |||
|
|||
const ( | |||
// DefaultStackVersion is the default version of the stack | |||
DefaultStackVersion = "8.0.0" | |||
DefaultStackVersion = "8.0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8.0.1-SNAPSHOT, unless the 8.0.1 has been already published.
@endorama CI failed due to a different reason than the misconfigured GCP terraform deployer. I suggest keeping digging in internal filebeat/metricbeat logs or policy. |
I fixed the issue with metricbeat configuration, the current errors are (log simplified for easier reading):
They suggest an issue with GCP credentials. I noticed that credentials are in a different project than the target for resource creation ( |
Where did you find that it uses a different project? Could you please double-check that Terraform uses the right one, Maybe current credentials don't allow for listing metrics ( I will try to help you with this. |
In the metricbeat logs I found this:
The From the logs I can confirm Terraform instance creation works and resource are created in
I'm digging into which credentials are used at the moment (as it seems is using default credentials but I'm not sure which IAM account that is linked to) |
Question: given the permission issue we are observing, I'm wondering the benefits of running the CI worker in a project and creating test resources in a different project. Is what we usually do? Should we continue trying to do this or should we just use the same GCP project? /cc @v1v |
@endorama I'm working (with Victor's help) on setting up a different service account. This way we can control permissions. I don't have preferences here, but I bet that we don't need to create a new project. Let's focus on merging/closing this PR first. Then, we can talk about improvements. |
@@ -0,0 +1,6 @@ | |||
# wait_for_data_timeout: 10m | |||
vars: | |||
project_id: elastic-obs-integrations-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be inherited, for example {{GCP_PROJECT_ID}}
?
/test |
Updated permission for test SA. Triggering tests /test |
/test |
Nice, next level:
|
- TF_VAR_GCP_PROJECT_ID=${GCP_PROJECT_ID} | ||
- GCP_PROJECT_ID=${GCP_PROJECT_ID} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to define it twice here? You said that Terraform creates resources in the right project. Did you mean to include it in the test config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TF_VAR
is Terraform specific (as is the way Terraform accepts variables through environment variables); to prevent a dependency in test-default-config
I think is better to have defined the pass through variable GCP_PROJECT_ID
(which is a convention and how you would expect it to be called) and the TF_VAR
format.
Even if it's true that the value is the same I would not consider this duplication as the variables have different usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I also wanted to quickly check if this was the root cause, so open to discussion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is this ENV (GCP_PROJECT_ID
) required by any part of Terraform code? If not, let's follow KISS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No is not (but is used in test-default-config
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it means that you don't need it in this file.
/test |
I paired with @gpop63 and checked all the wires. It looks like we need at least 8.1.0 to use |
Luckily |
As per elastic/integrations#2712 `credentials_json` presented some parsing issues when not quoted in the agent template. Leverage suggestion by @andrewkroh on raw escaping the credentials string. #701 (comment)
Currently if period is less that the minimum the metricbeat reported initialization fails. See elastic/beats#30434
Tests are failing due to an empty
What am I missing? |
Well, the Terraform runner reads the right project ID and it isn't hardcoded there, so Jenkins sets it correctly. We need to trace it down to the Metricbeat. EDIT: Last week we checked that the EDIT2: Error:
... so what you have to do now is to try running it locally and check the agent's policy with |
@endorama Please do not use the force-push mode as it erases the history. We won't be able to figure out why it was working (collecting metrics) before and now. Please use rebase/pull instead. Here is my branch: https://github.com/mtojek/elastic-package/tree/gcp-test-package-2, which AFAIR worked well, collected metrics. You can use it to find out the differences. Proof:
|
@mtojek I have a backup branch for these cases, but I was hoping to push a clean history to reuse it as example in the future. I misinterpreted your comment
Is not needed but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused now as we managed to make it running a few days ago. Did you try to copy over files and figure out the diff?
test/packages/parallel/gcp/data_stream/compute/_dev/deploy/tf/env.yml
Outdated
Show resolved
Hide resolved
Without `GCP_PROJECT_ID` tests fail. I see no reason why this is happening, but `TF_VAR_gcp_project_id` is not enough for `project_id` to be correctly filled in Agent configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad that the issue is fixed. I left few observations around the test configuration, mostly nit-picks/improvements.
} | ||
|
||
resource "google_compute_instance" "default" { | ||
name = "test-${var.TEST_RUN_ID}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a prefix identifying the instance with elastic-package. The test
prefix may be too vague.
|
||
variable "zone" { | ||
type = string | ||
// NOTE: if you change this value you **must** change it also for test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it possible to read it from the env.yml
config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can pass it like the project id with TF_VAR_zone
, but I would prefer having a default value so if someone runs it locally the value is already present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can define default variables like this.
It was designed to treat Analyzing your case: services:
terraform:
environment:
- GCP_PROJECT_ID=${GCP_PROJECT_ID}
# pass project id to Terraform
- TF_VAR_gcp_project_id=${GCP_PROJECT_ID}
- GOOGLE_CREDENTIALS=${GOOGLE_CREDENTIALS} From Terraform's perspective, it should be ok to set only the - GCP_PROJECT_ID=${GCP_PROJECT_ID} to read the |
Great that makes perfect sense! So the current code is correct, may I ask for the final review then? |
I guess you have to run the formatter now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Adds
gcp
test package undertest/packages/parallel
with a basic test.This PR splits from #662, where we deemed necessary proper testing to ensure correct functionality. To reduce complexity of #662 this PR only focus in having a single test running successfully in the CI.
Change:
@mtojek I don't know how to get GCP credentials usable by the CI, but I edited the Jenkinsfile to support GCP too.