Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2] Configure firewall rules and policies for the federated learning use case #74

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

ferrarimarco
Copy link
Member

@ferrarimarco ferrarimarco commented Dec 16, 2024

  • Configure firewall policies for the federated learning use case
  • Fix an issue with the core teardown.sh script: remove backend configuration only when destroying the core initialize terraservice, instead of running it in any case after destroying the Terraservices in CORE_TERRASERVICES_DESTROY because CORE_TERRASERVICES_DESTROY might not include initialize.
  • Fix an issue with the federated learning use case teardown.sh script: ensure that terraservices are destroyed in order.

@ferrarimarco ferrarimarco changed the base branch from main to int-federated-learning December 16, 2024 12:17
@ferrarimarco ferrarimarco force-pushed the fl-firewall branch 2 times, most recently from c65e608 to d1ac6dc Compare December 17, 2024 06:59
@ferrarimarco ferrarimarco self-assigned this Dec 20, 2024
@arueth arueth force-pushed the int-federated-learning branch from 9123bf4 to e5385e4 Compare December 20, 2024 15:51
@ferrarimarco ferrarimarco force-pushed the fl-firewall branch 2 times, most recently from a869035 to 8f05a35 Compare December 24, 2024 11:17
@ferrarimarco ferrarimarco changed the base branch from int-federated-learning to feature-fl-service-accounts December 24, 2024 11:18
@ferrarimarco ferrarimarco force-pushed the fl-firewall branch 4 times, most recently from 865fa3a to 0f461fe Compare December 27, 2024 10:34
@ferrarimarco ferrarimarco requested a review from arueth December 27, 2024 11:55
@ferrarimarco ferrarimarco marked this pull request as ready for review December 27, 2024 11:55
@ferrarimarco ferrarimarco changed the title Configure firewall rules and policies for the federated learning use case [2] Configure firewall rules and policies for the federated learning use case Jan 7, 2025
Base automatically changed from feature-fl-service-accounts to int-federated-learning January 9, 2025 08:30
@arueth arueth force-pushed the int-federated-learning branch from c3ecb28 to 3783ea1 Compare January 9, 2025 15:36
@ferrarimarco ferrarimarco force-pushed the fl-firewall branch 2 times, most recently from 052b3bf to 4d73609 Compare January 10, 2025 08:15
@arueth
Copy link
Collaborator

arueth commented Jan 13, 2025

Can you remove the - "-e" from the https://github.com/GoogleCloudPlatform/accelerated-platforms/blob/int-federated-learning/test/ci-cd/cloudbuild/uc-federated-learning-terraform.yaml#L37

It prevents the teardown script from running on a deploy failure.

@ferrarimarco ferrarimarco force-pushed the fl-firewall branch 2 times, most recently from c68a80e to 8bdcb96 Compare January 13, 2025 17:13
@ferrarimarco
Copy link
Member Author

Can you remove the - "-e" from the https://github.com/GoogleCloudPlatform/accelerated-platforms/blob/int-federated-learning/test/ci-cd/cloudbuild/uc-federated-learning-terraform.yaml#L37

It prevents the teardown script from running on a deploy failure.

Done!

@arueth
Copy link
Collaborator

arueth commented Jan 13, 2025

It seems like the build is silently failing on the "Provisioning the core platform":
https://console.cloud.google.com/cloud-build/builds;region=us-central1/36ded629-3cdb-494e-9a10-838a3203ac10?e=13803378&mods=monitoring_api_prod&project=accelerated-platforms#L2499

As it doesn't seem to reach the "Provisioning the use case resources" and just goes straight to the destroy.

@arueth
Copy link
Collaborator

arueth commented Jan 13, 2025

I was able to get it to deploy, but it looks like the destroy is failing on "Destroying the services that the core platform depends on"

https://console.cloud.google.com/cloud-build/builds;region=us-central1/0fd89e00-03bf-4311-8c10-4d03e464c5c5?e=13803378&mods=monitoring_api_prod&project=accelerated-platforms#L6722

- configure firewall for federated learning
- configure iam roles and service accounts
- configure dedicated node pools
- configure policy controller and policies
- configure dedicated Kubernetes namespaces
@ferrarimarco
Copy link
Member Author

ferrarimarco commented Jan 14, 2025

It seems like the build is silently failing on the "Provisioning the core platform": https://console.cloud.google.com/cloud-build/builds;region=us-central1/36ded629-3cdb-494e-9a10-838a3203ac10?e=13803378&mods=monitoring_api_prod&project=accelerated-platforms#L2499

As it doesn't seem to reach the "Provisioning the use case resources" and just goes straight to the destroy.

This is the error:

Error: Error creating Feature: googleapi: Error 409: Resource 'projects/accelerated-platforms/locations/global/features/policycontroller' already exists
--
2624 | Details:
2625 | [
2626 | {
2627 | "@type": "type.googleapis.com/google.rpc.ResourceInfo",
2628 | "resourceName": "projects/PROJECT_NAME/locations/global/features/policycontroller"
2629 | }
2630 | ]

I think this is due to the fact that we cannot enable a global feature if it's already enabled. Importing it in the Terraform state would work around this, but it will also disable the feature when running the teardown script. We can sync about this.

I was able to get it to deploy, but it looks like the destroy is failing on "Destroying the services that the core platform depends on"

https://console.cloud.google.com/cloud-build/builds;region=us-central1/0fd89e00-03bf-4311-8c10-4d03e464c5c5?e=13803378&mods=monitoring_api_prod&project=accelerated-platforms#L6722

The cause of this issue was that the use case deploy script initialized core platform configuration files after running the core platform deploy script, so the initialize service didn't configure the backend for use case services. It should be fixed now, I checked the logs.

@arueth arueth merged commit 2bac60b into int-federated-learning Jan 14, 2025
14 checks passed
@arueth arueth deleted the fl-firewall branch January 14, 2025 16:37
arueth pushed a commit that referenced this pull request Jan 14, 2025
- configure firewall for federated learning
- configure iam roles and service accounts
- configure dedicated node pools
- configure policy controller and policies
- configure dedicated Kubernetes namespaces
arueth pushed a commit that referenced this pull request Jan 17, 2025
- configure firewall for federated learning
- configure iam roles and service accounts
- configure dedicated node pools
- configure policy controller and policies
- configure dedicated Kubernetes namespaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants