Use GCP ADC for auth in Terraform Deployer #662

endorama · 2022-01-24T15:29:21Z

Terraform Google Provider allow to pass credentials file content in GOOGLE_CREDENTIALS env var.

More about this in the Terraform docs: https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#full-reference

Unfortunately this does not seem to work other GCP Cloud SDK components,
as I've not been able to find references to that environment variable.

As Terraform provider and components support ADC (Application Defatul
Credentials) this commit changes how Terraform Deployer setup
credentials to use ADC instead of Terraform dedicated variable.

elasticmachine · 2022-01-24T16:03:10Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-02-11T09:06:19.218+0000
Duration: 27 min 55 sec

Test stats 🧪

Test	Results
Failed	0
Passed	548
Skipped	0
Total	548

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

endorama · 2022-01-24T17:34:17Z

/test

endorama · 2022-01-25T09:54:57Z

/test

mtojek

If we decide to extend the terraform deployer with such a change, we have to add a test package here and make sure the flow works.

Let's decide first if this is the way we'd like to follow.

@endorama Could you please clarify (describe) your use case? Is there any PR we can look at?

mtojek · 2022-01-25T09:48:22Z

internal/install/_static/terraform_deployer_run.sh

@@ -16,6 +16,15 @@ cleanup() {
 }
 trap cleanup EXIT INT TERM

+# Save GCP credentials on disk and perform authentication


Terraform deployer is not used just for GCP, but also another cloud provider like AWS, so you can't add a provider-specific code in a common place without any conditions.

Right, sorry for the bad implementation :) This is an easy fix, adding a guard clause for presence of the GOOGLE_CREDENTIALS variable. Before adding it I'd like to discuss the other points.

It isn't a matter of adding a guard, but is it really necessary to put it here? see: #662 (comment)

mtojek · 2022-01-25T09:52:50Z

internal/install/_static/terraform_deployer_run.sh

@@ -16,6 +16,15 @@ cleanup() {
 }
 trap cleanup EXIT INT TERM

+# Save GCP credentials on disk and perform authentication
+# NOTE: this is required for bq (and maybe other gcloud related tools) to authenticate


Another thought: have you considered using gcloud emulators? We're already using them here.

mtojek · 2022-01-25T10:10:53Z

internal/install/_static/terraform_deployer_run.sh

+echo "$GOOGLE_CREDENTIALS" > "$GOOGLE_APPLICATION_CREDENTIALS"
+gcloud auth login --cred-file "$GOOGLE_APPLICATION_CREDENTIALS"
+# NOTE: Terraform support authentication through GOOGLE_CREDENTIALS and usual gcloud ADC but other
+# tools (like bq) don't support the first, so we always rely on gcloud ADC.


Could you please point me to the integration/system test, where you intend to use bq? I'm wondering if something changed since that time and maybe we're fine to use a terraform module.

Is being used in this PR elastic/integrations#2312

I found a terraform resource google_bigquery_job that would allow doing this without bq cli, but it requires source data to be in a GCS bucket.
This approach has multiple disadvantages: it requires more infrastructure to be created which increase complexity, it prevent use of emulator and there is no support to waiting for the load job completion.

I tried asking in the GCP Community Slack if they were considering local file support but did not receive an answer.

This approach has multiple disadvantages: it requires more infrastructure to be created which increase complexity

What do you mean by "more infrastructure"? It sounds easier to load data on S3, than refactoring service deployer :)

it prevent use of emulator and there is no support to waiting for the load job completion.

This is really unfortunate and without a proper waiter it won't be possible to perform it.

Thanks for the explainaton.

It sounds easier to load data on S3, than refactoring service deployer :)

😅 totally true.

If google_bigquery_job was working as expected I would have chosen that way, because at least is predictable and all within terraform, but I think we could only have something fragile at the moment.

mtojek · 2022-01-25T10:13:42Z

internal/install/_static/terraform_deployer_run.sh

+# NOTE: this is required for bq (and maybe other gcloud related tools) to authenticate
+export "GOOGLE_APPLICATION_CREDENTIALS=/root/.config/gcloud/application_default_credentials.json"
+echo "$GOOGLE_CREDENTIALS" > "$GOOGLE_APPLICATION_CREDENTIALS"
+gcloud auth login --cred-file "$GOOGLE_APPLICATION_CREDENTIALS"


If this is required by bq, maybe we can execute it just before running bq?

In case we are going to use other tools, something possible given the new Docker image, I think it make sense to unify authentication so all tools leverage the same credentials and should not do it multiple times (with possible concurrency risks).
Unifying authentication also prevents that Terraform and tools run with different credentials, something that would make the CI harder to debug.

I reviewed the pro and cons of moving authentication where bq is used. The main cons I see is around how this affects authentication globally. gcloud auth state is globally shared across all instances of gcloud (and related tools). So doing that within terraform code would create side effects that may be unintuitive or difficult to debug.

What is the downside of the current approach? (given that it does not trigger always but only when it makes sense)

What is the downside of the current approach? (given that it does not trigger always but only when it makes sense)

I think you included the answer here. If there is a way of not introducing the coupling on auth, why should we do this? Please remember that it isn't a dedicated GCP service.

gcloud auth state is globally shared across all instances of gcloud (and related tools). So doing that within terraform code would create side effects that may be unintuitive or difficult to debug.

It means that you can call it this way:

gcloud auth login bq gcloud auth revoke

You are at risk of race conditions if multiple resources are using this commands. Terraform runs in parallel and this command affects a global state.

If there is a way of not introducing the coupling on auth, why should we do this?

My reasoning about this is that coupling is already present as in "terraform requires credentials to run", so if I use it with GCP resources is coupled with GCP auth, if I use it with AWS is coupled with AWS auth. How could we avoid this? I agree that ideally credentials would be passed in env vars and we should not rely on other tools.

Please remember that it isn't a dedicated GCP service.

Could we consider having a dedicated deployer for gcp? Would this help untangling this?

mtojek · 2022-01-25T10:14:02Z

internal/install/_static/terraform_deployer_run.sh

+# NOTE: this is required for bq (and maybe other gcloud related tools) to authenticate
+export "GOOGLE_APPLICATION_CREDENTIALS=/root/.config/gcloud/application_default_credentials.json"
+echo "$GOOGLE_CREDENTIALS" > "$GOOGLE_APPLICATION_CREDENTIALS"
+gcloud auth login --cred-file "$GOOGLE_APPLICATION_CREDENTIALS"


Is there any default location, so that cred-file is not needed to be defined?

I did not find any documentation allowing to use a default location, the argument is required as is like providing username/password.

Ok, so it looks like we need to confirm that it works as expected. It would be great if you prepare a similar test package like the special aws. It's the AWS integration with stripped data streams, only one left with a system test using Terraform service deployer to create the EC2 machine.

Here is my proposal on the plan:

You need to do a similar exercise with GCP. You can use this PR to push the sample package, but please make sure it's stripped and CI discovers it. Then, please you can execute the system test, but I'm afraid that it may fail due to configured environment variables. It's about modifying the Jenkinsfile to enable those envs. Then, we should observe some issues around bq and terraform service deployer, and eliminate them.

Ok, so it looks like we need to confirm that it works as expected.

May you clarify what should be confirmed?

What's the goal for this test package? Is to test if tests are being able to be executed correctly through terraform deployer?

Yes, that's the goal - to verify if the terraform service deployer works correctly with Google Cloud: auth, bq, Jenkins changes.

Otherwise, we will end up with ping-backs between both repositories (Integrations and elastic-package), when something doesn't work as expected.

@mtojek I added the test package, the pipeline is green but is it really running some tests? https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Felastic-package/detail/PR-662/8/pipeline/279

mtojek · 2022-02-02T14:35:41Z

@endorama Please pull the latest main branch.

endorama · 2022-02-08T17:00:53Z

The CI is failing because the GCP package does not pass the lint check. This is the shown error:

Error: checking package failed: linting package failed: found 4 validation errors:
   1. item [.gitignore] is not allowed in folder [.../test/packages/parallel/gcp/data_stream/billing/_dev/deploy/tf]
   2. item [.terraform.lock.hcl] is not allowed in folder [.../test/packages/parallel/gcp/data_stream/billing/_dev/deploy/tf]
   3. item [billing-schema.json] is not allowed in folder [.../test/packages/parallel/gcp/data_stream/billing/_dev/deploy/tf]
   4. item [test-data.ndjson.tftpl] is not allowed in folder **[.../test/packages/parallel/gcp/data_stream/billing/_dev/deploy/tf]**

File 3 and 4 are mandatory.
File 2 has been added because Terraform suggest to do it for dependency management. In my opinion is a great idea as it makes the build reproducible.
File 1 has been added to prevent local folder or state files to be included.

@mtojek I'll open an issue in https://github.com/elastic/package-spec to discuss this further.

endorama · 2022-02-10T11:04:59Z

I updated github.com/elastic/package-spec to v1.4.1 for testing waiting for #693

Terraform Google Provider allow to pass credentials file content in GOOGLE_CREDENTIALS env var. More about this in the Terraform docs: https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#full-reference Unfortunately this does not seem to work other GCP Cloud SDK components, as I've not been able to find references to that environment variable. As Terraform provider and components support ADC (Application Defatul Credentials) this commit changes how Terraform Deployer setup credentials to use ADC instead of Terraform dedicated variable.

With the addition of the `gcp_auth` function the logic flow was split into multiple pieces. This commit introduce a main like area where all logic is handled.

.gitignore is not currently supported in package-spec, so package validation fails. See elastic/package-spec#273

mtojek · 2022-02-11T10:31:07Z

internal/install/_static/terraform_deployer_run.sh

+  fi
+}
+
+if [[ "${BASH_SOURCE[0]}" = "$0" ]]; then


Why is this condition required here?

It is not required, but it avoid running the code in case the file is sourced and identify a "main" like are, like the main function in go or C. From that point onward all code will be executed only when the file is explicitly executed (bash file.sh or ./file.sh). It helps grouping the relevant "main" code in the same area, preventing arbitrary code being added between functions.

Consider it a good way of writing BASH files, as it aids reading them (like __file__ == __main__ does for Python)

I understand your point of view, but please keep it simple as it was before. This file is intended to be called by container engine.

Is run by container engine but is still read by developers :) The point was to simplifying reading the code flow. Anyway I removed it (will push once rebased onto main)

internal/install/_static/terraform_deployer_run.sh

mtojek · 2022-02-11T10:37:33Z

internal/install/_static/terraform_deployer_run.sh

@@ -2,25 +2,41 @@



I don't see any container logs for the GCP test package in beats-ci-temp-internal/Ingest-manager/elastic-package/PR-662-16/insecure-logs/gcp. Is it intended?

No is not, from the pipeline logs it seems it not running system tests (command start at line 244, hidden by the task header)

I set up the env.yml file in data_stream/billing/_dev/deploy/tf. Is this not enough to run system tests?

I checked your branch and you haven't configured any test policies. Test policies are used by elastic-package during system tests. Otherwise the elastic-package won't know what you re trying to test. It's covered in our manual.

Please take a look at the AWS test package and ec2_metrics tests.

Test package with first test case (not requiring changes in this PR) added in #701

mtojek · 2022-02-11T15:06:31Z

@endorama I converted this PR to draft as it's still in progress (missing tests, unaddressed comments).

mtojek · 2022-03-16T08:44:12Z

@endorama Are you going to rebase this PR as we merged the other one?

mtojek · 2022-09-23T13:37:00Z

Closing it for now as it's a stale PR.

endorama requested a review from mtojek January 24, 2022 16:19

mtojek suggested changes Jan 25, 2022

View reviewed changes

mtojek requested a review from a team January 25, 2022 10:15

endorama self-assigned this Jan 27, 2022

mtojek mentioned this pull request Feb 2, 2022

Collect unsafe logs #679

Merged

endorama mentioned this pull request Feb 8, 2022

[Change Proposal] Support Terraform related files in deploy/tf elastic/package-spec#269

Closed

endorama added 13 commits February 11, 2022 09:41

add gcp test package

a726836

remove missing data_stream docs

d65cae2

remove missing data_stream docs

1834d21

fix linting

73d2317

leftover bash x options

cfb8674

move GCP auth to function

e1e3f4d

split main from functions

a14abb0

With the addition of the `gcp_auth` function the logic flow was split into multiple pieces. This commit introduce a main like area where all logic is handled.

remove docker test deployment

2860bce

add gcp.billing tests

8bc1d37

format file

d908897

remove unneded file

e670cce

remove .gitignore

cf19a6c

.gitignore is not currently supported in package-spec, so package validation fails. See elastic/package-spec#273

mtojek reviewed Feb 11, 2022

View reviewed changes

mtojek marked this pull request as draft February 11, 2022 15:05

mtojek requested a review from jsoriano February 11, 2022 15:06

endorama mentioned this pull request Feb 15, 2022

Add GCP test package #701

Merged

endorama mentioned this pull request Feb 16, 2022

Migrate some Metricbeat modules to GCP package elastic/integrations#490

Closed

11 tasks

endorama mentioned this pull request Mar 2, 2022

[gcp] Migrate some gcp beat metricset to data streams elastic/integrations#2707

Merged

4 tasks

mtojek closed this Sep 23, 2022

Use GCP ADC for auth in Terraform Deployer #662

Use GCP ADC for auth in Terraform Deployer #662

Conversation

endorama commented Jan 24, 2022

elasticmachine commented Jan 24, 2022 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

endorama commented Jan 24, 2022

endorama commented Jan 25, 2022

mtojek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek Jan 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek Jan 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek commented Feb 2, 2022

endorama commented Feb 8, 2022

endorama commented Feb 10, 2022

Choose a reason for hiding this comment

endorama Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

mtojek Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek commented Feb 11, 2022

mtojek commented Mar 16, 2022

mtojek commented Sep 23, 2022

elasticmachine commented Jan 24, 2022 •

edited

Loading

mtojek Jan 27, 2022 •

edited

Loading

mtojek Jan 25, 2022 •

edited

Loading

endorama Feb 11, 2022 •

edited

Loading

mtojek Feb 11, 2022 •

edited

Loading