Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reusing existing storage containers across task providers #687

Merged
merged 13 commits into from
Oct 21, 2022

Conversation

tasdomas
Copy link
Contributor

@tasdomas tasdomas commented Oct 5, 2022

This closes #299.

The storage type is inferred by provider.

The storage config section is extended with additional options:

 storage {
    container = <container-id>
    container_path = <subdirectory in container>
    container_opts = {
    }
  }

tasdomas and others added 5 commits September 8, 2022 10:30
* Update config schema.

* AWS support for existing S3 buckets.

* Existing bucket support for gcp.

* Add mocks and tests to existing bucket resource in aws.

* Update docs with new pre-allocated container fields.
* AWS support for existing S3 buckets.

* Existing bucket support for gcp.

* Fix subdirectory in rclone remote.

* Blob container generates the rclone connection string.

* Introduce a type for generating rclone connection strings.

* Azure support for reusable blob containers.

* Update docs.

* Fix path prefix.

* Initialize s3 existing bucket with RemoteStorage struct.

* Update gcp and aws existing bucket data sources to align with the azure data source.

Use common.RemoteStorage to initialize the data sources.
Using rclone to verify storage during Read.
Remove aws s3 client mocks and tests that rely on them.

* Fix comment.
* K8s support for specifying an existing persistent volume claim.

Co-authored-by: Helio Machado <0x2b3bfa0+git@googlemail.com>
@tasdomas tasdomas temporarily deployed to automatic October 5, 2022 14:18 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 5, 2022 14:18 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 5, 2022 14:18 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 5, 2022 14:18 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 5, 2022 14:18 Inactive
@0x2b3bfa0 0x2b3bfa0 self-requested a review October 5, 2022 14:46
@0x2b3bfa0 0x2b3bfa0 changed the title Support reusing existing storage containers across task providers. Support reusing existing storage containers across task providers Oct 6, 2022
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic October 10, 2022 05:40 Inactive
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic October 10, 2022 05:40 Inactive
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic October 10, 2022 05:40 Inactive
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic October 10, 2022 05:40 Inactive
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic October 10, 2022 05:40 Inactive
@0x2b3bfa0 0x2b3bfa0 added enhancement New feature or request storage cloud-common Applies to every cloud vendor labels Oct 10, 2022
@tasdomas tasdomas temporarily deployed to automatic October 12, 2022 12:11 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 12, 2022 12:11 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 12, 2022 12:11 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 12, 2022 12:11 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 12, 2022 12:11 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 18, 2022 16:43 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 18, 2022 16:43 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 18, 2022 16:43 Inactive
@tasdomas tasdomas force-pushed the feature-bucket-reuse branch from b900f4b to 670f214 Compare October 19, 2022 07:58
@tasdomas tasdomas temporarily deployed to automatic October 19, 2022 07:58 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 19, 2022 07:58 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 19, 2022 07:58 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 19, 2022 07:58 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 19, 2022 07:58 Inactive
@tasdomas
Copy link
Contributor Author

tasdomas commented Oct 19, 2022

How does leo list work when using this feature? Worth shifting from storage enumeration to e.g. compute enumeration and creating the latter first?

leo list will need to be updated. But I think we should rethink the approach instead of switching to listing compute instances. Unless I'm mistaken, resource groups are supported by all major cloud providers, we could use (and list) those.

@0x2b3bfa0
Copy link
Member

Unless I'm mistaken, resource groups are supported by all major cloud providers

It depends on your definition of resource group.1 😅

Footnotes

  1. Tangentially related to https://github.com/iterative/terraform-provider-iterative/issues/289#issue-1062601339

@tasdomas tasdomas requested a review from 0x2b3bfa0 October 20, 2022 12:19
@tasdomas tasdomas temporarily deployed to automatic October 21, 2022 08:40 Inactive
@tasdomas tasdomas requested a review from a team October 21, 2022 08:40
@tasdomas tasdomas temporarily deployed to automatic October 21, 2022 08:40 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 21, 2022 08:40 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 21, 2022 08:40 Inactive
@tasdomas tasdomas temporarily deployed to automatic October 21, 2022 08:40 Inactive
@tasdomas tasdomas merged commit 110a740 into master Oct 21, 2022
@tasdomas tasdomas deleted the feature-bucket-reuse branch October 21, 2022 10:03
@0x2b3bfa0
Copy link
Member

@tasdomas, would you mind opening an issue to track #687 (comment)?

@tasdomas
Copy link
Contributor Author

Unless I'm mistaken, resource groups are supported by all major cloud providers

It depends on your definition of resource group.1 sweat_smile

Footnotes

1. Tangentially related to [Allow tasks to destroy themselves #289 (comment)](https://github.com/iterative/terraform-provider-iterative/issues/289#issue-1062601339) [leftwards_arrow_with_hook](#user-content-fnref-1-0d2c7bd068cff5611d0349f0c7a4d25e)

Done. Thanks for reminding me!

Comment on lines +286 to +306
### Generic

A set of "permissions" assigned to the `task` instance, format depends on the cloud provider

#### Amazon Web Services
### Cloud-specific

#### Kubernetes

The name of a service account in the current namespace.

### Amazon Web Services

An [instance profile `arn`](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html), e.g.:
`permission_set = "arn:aws:iam:1234567890:instance-profile/rolename"`

#### Google Cloud Platform
### Google Cloud Platform

A service account email and a [list of scopes](https://cloud.google.com/sdk/gcloud/reference/alpha/compute/instances/set-scopes#--scopes), e.g.:
`permission_set = "sa-name@project_id.iam.gserviceaccount.com,scopes=storage-rw"`

#### Microsoft Azure
### Microsoft Azure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this title reworking. There are no generic options for permission_set, so there shouldn't be a ### Generic section?

A comma-separated list of [user-assigned identity](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview) ARM resource ids, e.g.:
`permission_set = "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}"`

## Pre-allocated blob container

### Generic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, would remove "### Generic" here since there are no generic options for container, right? (for example, machine="m" is generic, machine="t2.micro" is not).

@@ -65,6 +65,8 @@ resource "iterative_task" "example" {
- `storage.workdir` - (Optional) Local working directory to upload and use as the `script` working directory.
- `storage.output` - (Optional) Results directory (**relative to `workdir`**) to download (default: no download).
- `storage.exclude` - (Optional) List of files and globs to exclude from transfering. Excluded files are neither uploaded to cloud storage nor downloaded from it. Exclusions are defined relative to `storage.workdir`.
- `storage.container` - (Optional) Pre-allocated container to use for storage of task data, results and status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly wary of clashing with "docker containers" but willing to accept owing to "storage" namespace.

Comment on lines +68 to +69
- `storage.container` - (Optional) Pre-allocated container to use for storage of task data, results and status.
- `storage.container_opts` - (Optional) Block of cloud-specific container settings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about storage.container_(options|conf(ig)|settings)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud-common Applies to every cloud vendor enhancement New feature or request storage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

task bucket usage vs "directory" within a bucket
4 participants