Allow local filesystem state stores (to aid CI pull-request workflows) #6465

ari-becker · 2019-02-13T15:49:52Z

Allows the usage of state storage based on the local filesystem, based on the file:// prefix/protocol, as documented.

k8s-ci-robot · 2019-02-13T15:50:00Z

Hi @ari-becker. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chrisz100 · 2019-02-13T18:03:23Z

/ok-to-test

ari-becker · 2019-02-14T07:46:52Z

Test failure is about the CI server not having credentials to create a cluster in ap-south-1? Seems unrelated?

/retest

zetaab · 2019-02-14T08:00:17Z

how this is supposed to work? Currently when people create cluster, the s3 will contain all information what nodes is supposed to fetch. I mean that when instance is started - the cloudinit script will fetch the configuration from the state store. Now if this state store is located in your local machine, the instance cannot fetch that information?

ari-becker · 2019-02-14T08:24:07Z

how this is supposed to work? Currently when people create cluster, the s3 will contain all information what nodes is supposed to fetch. I mean that when instance is started - the cloudinit script will fetch the configuration from the state store. Now if this state store is located in your local machine, the instance cannot fetch that information?

@zetaab As I said in #6463, the use case is for running dry-runs on pull requests in a CI environment. The CI build for the pull request downloads the state to the local filesystem, runs kops replace --force on the local filesystem state (thus not affecting the shared S3 remote state), and then runs kops update cluster (but intentionally not kops update cluster --yes), so that the dry-run output can be reviewed manually. After the pull request is approved and merged, the master build runs kops replace --force and kops update cluster --yes on the normal shared S3 remote state. The fetchability of a local state store by instances is completely irrelevant since only a dry-run is being executed with local filesystem state stores.

Smirl · 2019-02-14T11:30:14Z

+1 to the PR.

I was thinking of raise something similar myself. Perhaps to make sure people don't "accidentally" use this in prod. We could instead add a CLI flag such as --allow-local-state. This could override the IsClusterReadable, or be optionally used in https://github.com/kubernetes/kops/blob/master/cmd/kops/util/factory.go#L108:12

I also am working on a pipeline for kops and it changing the state store during a pull request is a bit of a pain. I am fortunately using the cloudformation output which means that instances do not need to pull much from the state store on start up as the cluster.spec is in the UserData script.

Smirl · 2019-02-26T17:47:21Z

Hi @zetaab @ari-becker is there any update on this?

I initially thought that I could get around not having this for our CI pipeline because when using cloudformation, like in my case, I assumed that nodeup would use the cluster.spec in the UserData. However, that cluster.spec is only used to invalidate the UserData, and all state is pulled from S3 on start up by nodeup.

Therefore, I ran into a situation, where changes from kops replace -f <new_cluster.config> (added a new hook) were picked up by a new worker node before I had updated the cloudformation. This is ok, but it means that "dryruns" during CI/CD are not possible at all.

I don't mind implementing the --allow-local-state or similar if you do not want to accept this PR as it is 😄

ari-becker · 2019-02-26T20:04:47Z

@Smirl I ran into an issue about a week ago where, upon a dry-run of a fresh cluster creation with a file:// state VFS, Kops would try to create an IAM role to grant access for instances in an instance group to access the state bucket, and it failed since file:// doesn't correspond to an S3 bucket.

I worked around it by creating IAM profiles outside of Kops which are fixed to point to the relevant state bucket, and therefore not dependent on what's in configBase. Doing so allows Kops to skip the problematic code section entirely. Unfortunately it also made it clear that getting this to work isn't a one-line code change, more fundamental work needs to be done to fix Kops's tight coupling to S3.

chrisz100 · 2019-03-16T16:26:03Z

I’d really appreciate some more documentation about the use case and it’s implications (like that it won’t work for real clusters and tests only). It will confuse our users otherwise.
Other then that it’s fine

chrisz100 · 2019-03-16T16:26:11Z

/retest

justinsb · 2019-03-18T03:53:40Z

One thing we could do is put this behind a feature-flag ... (git grep featureflag for examples).

I'm going to move this out of 1.12, because as we discussed in office hours we really want to prioritize getting 1.12 out. But if it's behind a featureflag, I think it could still go in to 1.12 as it's a fairly small change.

/milestone 1.13

ari-becker · 2019-03-18T07:59:28Z

I’d really appreciate some more documentation about the use case and it’s implications (like that it won’t work for real clusters and tests only). It will confuse our users otherwise.
Other then that it’s fine

@chrisz100 I've added some documentation.

fejta-bot · 2019-06-16T09:24:03Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

ari-becker · 2019-06-16T11:47:27Z

/remove-lifecycle stale

We've worked around this issue by rethinking how we built our infrastructure pipelines, but I imagine that this is still a common usecase for many CI workflows, so it is not stale.

colinhoglund · 2019-06-16T13:28:14Z

@ari-becker Would you mind sharing your current workflow? I’ve also been looking into various ways to achieve similar functionality.

ari-becker · 2019-06-17T10:44:49Z

@colinhoglund we have a pipeline that can re-create our infrastructure from scratch, so we made the pipeline a function of various inputs to change relevant values for a given infrastructure run (we use Dhall to make the configurations/scripts functional). Now, instead of a PR build telling us what to expect when we run the infrastructure scripts for real, we have a PR build create an entire environment using those same infrastructure scripts. Because infrastructure is versioned, we can be reasonably confident that a test that shows success doing an automatic, real migration from X->Y (where X is the current production infrastructure version) in a non-production environment means that an automatic, real migration from X->Y in production will be successful.

So far we haven't had any production outages yet from this approach (mostly thanks to the safety which Dhall provides), but I think most organizations would think of such an approach as being too aggressive and unsafe.

rifelpet · 2019-07-16T14:56:30Z

It would be great if we could get this merged

/test pull-kops-e2e-kubernetes-aws

rifelpet · 2019-07-24T12:46:17Z

/test pull-kops-e2e-kubernetes-aws

omeid · 2019-08-17T03:52:54Z

@robinpercy @rdrgmnzs Could you guys please look into this? I have been waiting for this for a while now. It is also helpful for setting up terraform config before actually deploying it.

rifelpet · 2019-09-19T19:14:03Z

/test pull-kops-verify-gomod

elblivion · 2019-09-26T21:25:26Z

Hi, I've built kops 1.13.1 with this patch and worked it into our pipeline, using the --state flag after an aws s3 sync when planning changes:

Syncing S3 Kops state to local...                                                                                                                                          
kops --state file://.local-state replace -f cluster.yaml                                                                                              
kops --state file://.local-state replace -f instance_groups/nodes-a.yaml; kops --state file://.local-state replace -f instance_groups/nodes-b.yaml; kops --state file://.lo
cal-state replace -f instance_groups/nodes-c.yaml; kops --state file://.local-state replace -f instance_groups/master-a.yaml; kops --state file://.local-state replace -f i
nstance_groups/master-b.yaml; kops --state file://.local-state replace -f instance_groups/master-c.yaml;       
RO_KUBECONFIG=1 KOPS_BASE_URL=https://my-bucket/kops/1.13.1 kops --state file://.local-state update cluster --target=terraform --name my-cluster --out=.

it prevents the config file from being modified but a bunch of other files are updated in S3 (listing only the files modified today):

2019-09-26 23:01:01       2580 my-cluster/addons/bootstrap-channel.yaml
2019-09-26 23:01:00         61 my-cluster/addons/core.addons.k8s.io/v1.4.0.yaml
2019-09-26 23:01:01       2030 my-cluster/addons/dns-controller.addons.k8s.io/k8s-1.12.yaml
2019-09-26 23:01:01       2156 my-cluster/addons/dns-controller.addons.k8s.io/k8s-1.6.yaml
2019-09-26 23:01:01       1098 my-cluster/addons/dns-controller.addons.k8s.io/pre-k8s-1.6.yaml
2019-09-26 23:01:01       6297 my-cluster/addons/kube-dns.addons.k8s.io/k8s-1.12.yaml
2019-09-26 23:01:01       6479 my-cluster/addons/kube-dns.addons.k8s.io/k8s-1.6.yaml
2019-09-26 23:01:01       5010 my-cluster/addons/kube-dns.addons.k8s.io/pre-k8s-1.6.yaml
2019-09-26 23:01:01        295 my-cluster/addons/kubelet-api.rbac.addons.k8s.io/k8s-1.9.yaml
2019-09-26 23:01:01        150 my-cluster/addons/limit-range.addons.k8s.io/v1.5.0.yaml
2019-09-26 23:01:01        403 my-cluster/addons/rbac.addons.k8s.io/k8s-1.8.yaml
2019-09-26 23:01:01        458 my-cluster/addons/storage-aws.addons.k8s.io/v1.6.0.yaml
2019-09-26 23:01:01        448 my-cluster/addons/storage-aws.addons.k8s.io/v1.7.0.yaml
2019-09-26 23:01:00      18947 my-cluster/cluster.spec
2019-09-26 10:17:55      14927 my-cluster/config
2019-09-26 23:01:00        380 my-cluster/instancegroup/master-a
2019-09-26 23:01:00        380 my-cluster/instancegroup/master-b
2019-09-26 23:01:00        380 my-cluster/instancegroup/master-c
2019-09-26 23:01:00        545 my-cluster/instancegroup/nodes-a
2019-09-26 23:01:00        545 my-cluster/instancegroup/nodes-b
2019-09-26 23:01:00        545 my-cluster/instancegroup/nodes-c
2019-09-26 23:01:01       1934 my-cluster/manifests/etcd/events.yaml
2019-09-26 23:01:01       1908 my-cluster/manifests/etcd/main.yaml

Am I doing something wrong or is this the expected behaviour?

johngmyers · 2019-12-29T08:05:43Z

/test pull-kops-e2e-kubernetes-aws

kpucynski · 2020-01-03T14:56:06Z

bump. This is really useful in specific cases.

rifelpet · 2020-01-03T23:01:13Z

I just tested this out myself and it works as expected. kops replace -f cluster.yaml --state file://foo; kops update cluster $cluster --stage file://foo is able to report dryrun changes without touching any files in the real s3 state store bucket.

@justinsb @mikesplain do you think this needs to be behind a feature flag? It is new functionality: no one can currently use file:// state store paths, so I dont think anyone will accidentally find themselves using this feature. Perhaps we could have a warning printed but I don't think anything beyond that is necessary. Will wait for their feedback before approving.

/lgtm

and sorry for the delay!

rifelpet · 2020-01-17T18:05:51Z

/approve

I think we can cherry-pick this back to 1.17. I will try to add a warning message for this usage in a separate PR as well.

k8s-ci-robot · 2020-01-17T18:06:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ari-becker, rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rifelpet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

#8368-upstream-release-1.17 Automated cherry pick of #6465: Allow local filesystem state stores #8368: Add a warning when using file:// state store

MPV · 2020-03-23T07:48:22Z

Just a short mention/reminder that the docs/continuous_integration.md still mentions this PR as open, and the lack of this functionality as a "current limitation".

Might it be worth updating to reflect what is now possible (and how one would use this)? 😍

rifelpet · 2020-03-24T00:22:41Z

I haven't been using it yet but once we upgrade to Kops 1.17 and start using it I'll be sure to update that docs page, unless someone else gets to it first :)

Update wording to note that kubernetes#6465 was merged and add an AWS example of a dry run.

k8s-ci-robot added the cncf-cla: yes label Feb 13, 2019

k8s-ci-robot added needs-ok-to-test size/XS labels Feb 13, 2019

k8s-ci-robot requested review from rdrgmnzs and robinpercy February 13, 2019 15:50

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Feb 13, 2019

ari-becker changed the title ~~Allow local filesystem state stores~~ Allow local filesystem state stores (to aid CI pull-request workflows) Feb 14, 2019

justinsb added this to the 1.12 milestone Mar 14, 2019

justinsb modified the milestones: 1.12, 1.13 Mar 18, 2019

k8s-ci-robot added size/M and removed size/XS labels Mar 18, 2019

k8s-ci-robot added the lifecycle/stale label Jun 16, 2019

k8s-ci-robot removed the lifecycle/stale label Jun 16, 2019

Seb-Solon mentioned this pull request Jul 23, 2019

Honor documentation regarding state : enable file state store #6463 #7306

Closed

k8s-ci-robot added the needs-rebase label Dec 22, 2019

Ari Becker added 2 commits December 29, 2019 09:12

Allow local filesystem state stores

3236ba1

Added documentation about acceptable usecases for file:// state store

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

5e1dbf6

k8s-ci-robot removed the needs-rebase label Dec 29, 2019

k8s-ci-robot assigned rifelpet Jan 3, 2020

k8s-ci-robot added the lgtm label Jan 3, 2020

k8s-ci-robot added the approved label Jan 17, 2020

k8s-ci-robot merged commit b356bd4 into kubernetes:master Jan 17, 2020

k8s-ci-robot modified the milestones: v1.13, v1.18 Jan 17, 2020

This was referenced Jan 17, 2020

Automated cherry pick of #6465: Allow local filesystem state stores #8365

Closed

Automated cherry pick of #6465: Allow local filesystem state stores #8368: Add a warning when using file:// state store #8369

Merged

ari-becker deleted the bugfix/allow-local-filesystem-state-store branch January 18, 2020 08:44

yurrriq added a commit to yurrriq/kops that referenced this pull request Jul 26, 2022

Update continuous_integration.md

4bcd240

Update wording to note that kubernetes#6465 was merged and add an AWS example of a dry run.

yurrriq mentioned this pull request Jul 26, 2022

Update continuous_integration.md #14032

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow local filesystem state stores (to aid CI pull-request workflows) #6465

Allow local filesystem state stores (to aid CI pull-request workflows) #6465

ari-becker commented Feb 13, 2019 •

edited

Loading

k8s-ci-robot commented Feb 13, 2019

chrisz100 commented Feb 13, 2019

ari-becker commented Feb 14, 2019

zetaab commented Feb 14, 2019

ari-becker commented Feb 14, 2019 •

edited

Loading

Smirl commented Feb 14, 2019

Smirl commented Feb 26, 2019

ari-becker commented Feb 26, 2019

chrisz100 commented Mar 16, 2019

chrisz100 commented Mar 16, 2019

justinsb commented Mar 18, 2019

ari-becker commented Mar 18, 2019

fejta-bot commented Jun 16, 2019

ari-becker commented Jun 16, 2019

colinhoglund commented Jun 16, 2019

ari-becker commented Jun 17, 2019

rifelpet commented Jul 16, 2019

rifelpet commented Jul 24, 2019

omeid commented Aug 17, 2019

rifelpet commented Sep 19, 2019

elblivion commented Sep 26, 2019

johngmyers commented Dec 29, 2019

kpucynski commented Jan 3, 2020

rifelpet commented Jan 3, 2020

rifelpet commented Jan 17, 2020

k8s-ci-robot commented Jan 17, 2020

MPV commented Mar 23, 2020 •

edited

Loading

rifelpet commented Mar 24, 2020

Allow local filesystem state stores (to aid CI pull-request workflows) #6465

Allow local filesystem state stores (to aid CI pull-request workflows) #6465

Conversation

ari-becker commented Feb 13, 2019 • edited Loading

k8s-ci-robot commented Feb 13, 2019

chrisz100 commented Feb 13, 2019

ari-becker commented Feb 14, 2019

zetaab commented Feb 14, 2019

ari-becker commented Feb 14, 2019 • edited Loading

Smirl commented Feb 14, 2019

Smirl commented Feb 26, 2019

ari-becker commented Feb 26, 2019

chrisz100 commented Mar 16, 2019

chrisz100 commented Mar 16, 2019

justinsb commented Mar 18, 2019

ari-becker commented Mar 18, 2019

fejta-bot commented Jun 16, 2019

ari-becker commented Jun 16, 2019

colinhoglund commented Jun 16, 2019

ari-becker commented Jun 17, 2019

rifelpet commented Jul 16, 2019

rifelpet commented Jul 24, 2019

omeid commented Aug 17, 2019

rifelpet commented Sep 19, 2019

elblivion commented Sep 26, 2019

johngmyers commented Dec 29, 2019

kpucynski commented Jan 3, 2020

rifelpet commented Jan 3, 2020

rifelpet commented Jan 17, 2020

k8s-ci-robot commented Jan 17, 2020

MPV commented Mar 23, 2020 • edited Loading

rifelpet commented Mar 24, 2020

ari-becker commented Feb 13, 2019 •

edited

Loading

ari-becker commented Feb 14, 2019 •

edited

Loading

MPV commented Mar 23, 2020 •

edited

Loading