Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/asset/machines: Convert Master to a WriteableAsset #1211

Merged
merged 2 commits into from
Feb 12, 2019

Conversation

wking
Copy link
Member

@wking wking commented Feb 8, 2019

Since 3326b6b (#792), we've been consuming structured master configurations to generate Terraform variables. But before this commit, the Openshift asset was the WriteableAsset responsible for the master configs, giving you the following issue:

  1. openshift-install create manifests

    1. Master asset populated with structured data
    2. Openshift asset pulls in Master, flattens to YAML, and writes to disk.
  2. openshift-install create cluster

    1. Openshift asset reads master YAML from disk.
    2. TerraformVariables asset pulls in Master, but it's empty.
    3. Panic (create manifests then create cluster panic #1205).

With this commit, responsibility for reading and writing master manifests to disk gets shifted to the Master asset itself.

Unpacking the YAML data into the appropriate structures is a bit hairy. I'm not familiar with this code though, maybe there's a better way. It will help once we complete the shift to the OpenShift machine-API types started in cf60daa (#1175).

I've also taken the opportunity to drop the list. It saves us a few lines of code for (de)serializing, and there's no upside to collecting the Machine configs together in a single file.

The "glob, but not the master files" logic in the Openshift loader is also a bit ugly. Moving forward, I expect us to push the remaining Openshift assets out into their own assets as well, which would allow us to tighten down on the wildcard there.

@staebler, can you check the asset logic here? I think I have it right, but I'm doing more pattern-matching than understanding ;).

Fixes #1205

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 8, 2019
@dgoodwin
Copy link
Contributor

dgoodwin commented Feb 8, 2019

Is there anywhere in the code you could slip in a unit test that covered the installconfig on disk use case without requiring an actual provision?

@wking
Copy link
Member Author

wking commented Feb 8, 2019

Is there anywhere in the code you could slip in a unit test that covered the installconfig on disk use case without requiring an actual provision?

#552 has a framework for something like that, although the issue here is manifests on disk, not install-config on disk.

@wking
Copy link
Member Author

wking commented Feb 8, 2019

I've pushed 7aaa902 -> 7f61d57 to fix the golint issue. I'm not sure how to handle the unit error:

--- FAIL: TestCreatedAssetsAreNotDirty (10.29s)
    --- FAIL: TestCreatedAssetsAreNotDirty/manifests (2.99s)
    	assetcreate_test.go:108: 
    			Error Trace:	assetcreate_test.go:108
    			Error:      	Not equal: 
    			            	expected: &machines.Master{Machines:[]v1beta1.Machine{}, MachinesDeprecated:[]v1alpha1.Machine(nil), FileList:[]*asset.File{(*asset.File)(0xc4207303f0)}}
    			            	actual  : &machines.Master{Machines:[]v1beta1.Machine(nil), MachinesDeprecated:[]v1alpha1.Machine(nil), FileList:[]*asset.File{(*asset.File)(0xc420b292c0)}}
    			            	
    			            	Diff:
    			            	--- Expected
    			            	+++ Actual
    			            	@@ -1,4 +1,3 @@
    			            	 (*machines.Master)({
    			            	- Machines: ([]v1beta1.Machine) {
    			            	- },
    			            	+ Machines: ([]v1beta1.Machine) <nil>,
    			            	  MachinesDeprecated: ([]v1alpha1.Machine) <nil>,
    			Test:       	TestCreatedAssetsAreNotDirty/manifests
    			Messages:   	fetched and generated asset "Master Machines" are not equal

More on that issue here, but I'm currently distinguishing between "set to the empty list" and "unset" when consuming the data while we have both Machines and MachinesDeprecated. Maybe I have to give up on that and switch to len(master.Machines) instead, unless @staebler has a way to coerce to nil on load?

@staebler
Copy link
Contributor

staebler commented Feb 8, 2019

unless @staebler has a way to coerce to nil on load?

I don't have a better solution for you. I had the same problem.

// This is needed to ensure that roundtrip generate-load tests pass when
// comparing this value. Otherwise, generate will use a nil value while
// load will use an empty slice.
m.Machines = []machineapi.Machine{}

@wking
Copy link
Member Author

wking commented Feb 8, 2019

I don't have a better solution for you...

Ok. Should be fixed using your approach with 7f61d57 -> d5c01c0.

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this approach. I was planning on doing this same thing to close the loop on the https://bugzilla.redhat.com/show_bug.cgi?id=1662119.

pkg/asset/machines/master.go Outdated Show resolved Hide resolved
pkg/asset/machines/master.go Outdated Show resolved Hide resolved
pkg/asset/machines/master.go Show resolved Hide resolved
pkg/asset/machines/master.go Outdated Show resolved Hide resolved
pkg/asset/machines/master.go Show resolved Hide resolved
@wking wking force-pushed the fix-machine-loads branch 3 times, most recently from 955cf95 to 1e32e76 Compare February 8, 2019 20:52
@staebler
Copy link
Contributor

staebler commented Feb 8, 2019

Wondering why hive was having problems with this panic led me to the realization that this PR does not completely solve the problem. The following will still panic, because loading from the state file will keep the provider config as raw json.

openshift-install create ignition-config
openshift-install create cluster

I am disheartened that this isn't caught by the unit tests that verify round-trip loading. And I am not sure why it isn't caught.

@wking
Copy link
Member Author

wking commented Feb 8, 2019

The following will still panic, because loading from the state file will keep the provider config as raw json.

Do we have hooks to deserialize when we load from the state file? If not, maybe we just need to throw in the towel on structured asset properties (which will make @abhinavdahiya sad ;).

@staebler
Copy link
Contributor

staebler commented Feb 8, 2019

The following will still panic, because loading from the state file will keep the provider config as raw json.

Do we have hooks to deserialize when we load from the state file? If not, maybe we just need to throw in the towel on structured asset properties (which will make @abhinavdahiya sad ;).

No hooks.

@staebler
Copy link
Contributor

staebler commented Feb 8, 2019

The following will still panic, because loading from the state file will keep the provider config as raw json.

Do we have hooks to deserialize when we load from the state file? If not, maybe we just need to throw in the towel on structured asset properties (which will make @abhinavdahiya sad ;).

What if we remove the Machines and MachinesDeprecated fields and instead have Machines and MachinesDeprecated functions that do the loading from the raw bytes when needed? It is maybe not as neat, but it should be roughly the same amount of code.

@wking
Copy link
Member Author

wking commented Feb 8, 2019

What if we remove the Machines and MachineDeprecated fields and instead have Machines and MachineDeprecated functions that do the loading from the raw bytes when needed?

I don't have strong opinions on whether the deserializing code lives in Master methods or locally in the TerraformVariables asset or the TFVars conversion code that is our only consumer of the structured data. And I'm personally fine with this approach, but @abhinavdahiya had previous concerns about the inefficiency of washing to bytes and back for the case when the user is going straight from the structured generated code into Terraform variable generation (e.g. here).

@wking
Copy link
Member Author

wking commented Feb 9, 2019

Ok, I've pushed 1e32e76 -> 3fd7e2b shifting to unmarshalling Master methods. I still need to work in some runtime.Decode changes @abhinavdahiya suggested, but I thought I'd push so folks could look at the new approach (runtime.Decode will just be a local change in the unmarshalling methods).

@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 9, 2019
@wking
Copy link
Member Author

wking commented Feb 9, 2019

e2e-aws was close:

Flaky tests:

[Feature:Builds][Conformance] s2i build with a quota  Building from a template should create an s2i build with a quota and run it [Suite:openshift/conformance/parallel/minimal]

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

I've just pushed 3fd7e2b -> 6b0bd4b with the pivot to runtime.Decode, CC @abhinavdahiya.

@openshift-ci-robot openshift-ci-robot removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 9, 2019
@wking wking force-pushed the fix-machine-loads branch 2 times, most recently from 3e19bea to 2f16fa5 Compare February 9, 2019 08:08
@dgoodwin
Copy link
Contributor

/test e2e-aws

@abhinavdahiya
Copy link
Contributor

will #1223 affect the Decode ?

@wking
Copy link
Member Author

wking commented Feb 11, 2019

will #1223 affect the Decode ?

Sigh. Maybe? I guess for that I need to bump our vendor around openshift/cluster-api-provider-aws#152.

Since 3326b6b (pkg/tfvars: Pull from []Machine instead of
InstallConfig, 2018-11-28, openshift#792), we've been consuming structured
master configurations to generate Terraform variables.  But before
this commit, the Openshift asset was the WriteableAsset responsible
for the master configs, giving you the following issue:

1. openshift-install create manifests
  i. Master asset populated with structured data
  ii. Openshift asset pulls in Master, flattens to YAML, and writes to
      disk.

2. openshift-install create cluster
  i. Openshift asset reads master YAML from disk.
  ii. TerraformVariables asset pulls in Master, but it's empty.
  iii. Panic [1].

With this commit, responsibility for reading and writing master
manifests to disk gets shifted to the Master asset itself.

I've dropped the structured Master properties (Machines and
MachineDeprecated) in favor of new methods on the asset.  There are
three possible paths towards recovering the asset that we feed in to
TerraformVariables:

a. After rendering it with Generate.  This information had been
   structured before this commit, so no problems here.
b. After reading from the disk.  This information could be
   unmarshalled in Master.Load(), but...
c. After loading it from the state file.  There are no hooks for
   custom unmarshalling in this pathway, so while the generic YAML
   unmarshal would give us a structured machine object, it would not
   unmarshal the RawExtension that holds the provider spec.

The lack of custom-unmarshal hooks for (c) means there's no reliable
way to use Master properties to feed TerraformVariables structured
provider information.  With this commit, we use Master methods
instead.  That's just as efficient for the (b) and (c) cases, and
while it's an uneccessary (de)serialize cycle for (a), that's not too
expensive.

Unpacking the YAML data into the appropriate structures is a bit
hairy.  I'm not familiar with this code though, maybe there's a better
way.  It will help once we complete the shift to the OpenShift
machine-API types started in cf60daa (Pivot aws from cluster.k8s.io
into machine.openshift.io, 2019-02-01, openshift#1175).

I've also taken the opportunity to drop the list.  It saves us a few
lines of code for (de)serializing, and there's no upside to collecting
the Machine configs together in a single file.

The "glob, but not the master files" logic in the Openshift loader is
also a bit ugly.  Moving forward, I expect us to push the remaining
Openshift assets out into their own assets as well, which would allow
us to tighten down on the wildcard there.

There's also a bit of dancing to ensure our Machine filenames are
ordered by increasing index.  I'm padding the filenames with %02d (for
example) to get ...-01.yaml, etc. so they come back in the right order
on load (filepath.Glob sorts its output [2]).  To avoid hard-coding a
pad size, I format the largest index, measure its length, and use that
length to construct padFormat.

[1]: openshift#1205
[2]: https://github.com/golang/go/blob/go1.11.5/src/path/filepath/match.go#L325
And shift the AWS provider's commit to land after they fixed
registration [1] and also after their pivot to v1beta1 [2] (because
3a125a6, Change awsproviderconfig.k8s.io/v1alpha1 API version to
awsproviderconfig.openshift.io/v1beta1, 2019-02-11, openshift#1223 already
started us along that pivot).  Generated (after the Gopkg.toml bump)
with:

  $ dep ensure

using:

  $ dep version
  dep:
   version     : v0.5.0-31-g73b3afe
   build date  : 2019-02-08
   git hash    : 73b3afe
   go version  : go1.10.3
   go compiler : gc
   platform    : linux/amd64
   features    : ImportDuringSolve=false

[1]: openshift/cluster-api-provider-aws#149
[2]: openshift/cluster-api-provider-aws#152
@wking
Copy link
Member Author

wking commented Feb 11, 2019

Rebased around #1223 and bumped our vendored AWS-provider code from v1alpha1 to v1beta1 with 2f16fa5 -> d983eae. I haven't tested it though, although I guess CI will help with that.

@abhinavdahiya
Copy link
Contributor

api requests getting rate-limited

level=error msg="Error: Error applying plan:"
level=error
level=error msg="5 errors occurred:"
level=error msg="\t* module.vpc.aws_lb.api_external: 1 error occurred:"
level=error msg="\t* aws_lb.api_external: timeout while waiting for state to become 'active' (last state: 'provisioning', timeout: 10m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_lb.api_internal: 1 error occurred:"
level=error msg="\t* aws_lb.api_internal: timeout while waiting for state to become 'active' (last state: 'provisioning', timeout: 10m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_route_table_association.route_net[2]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.2: timeout while waiting for state to become 'success' (timeout: 5m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_route_table_association.route_net[3]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.3: timeout while waiting for state to become 'success' (timeout: 5m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_route.igw_route: 1 error occurred:"
level=error msg="\t* aws_route.igw_route: Error creating route: timeout while waiting for state to become 'success' (timeout: 2m0s)"

will retest in some time.

@dgoodwin
Copy link
Contributor

/test e2e-aws

Hive CI still totally blocked on this. :(

@abhinavdahiya
Copy link
Contributor

/retest

@abhinavdahiya
Copy link
Contributor

The tests are green..

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 12, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 0f8e6b8 into openshift:master Feb 12, 2019
@wking wking deleted the fix-machine-loads branch February 13, 2019 22:28
wking added a commit to wking/openshift-installer that referenced this pull request Mar 27, 2019
Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a
WriteableAsset, 2019-02-07, openshift#1211), but for Worker.  This is
groundwork for extracting configured availability zones when
generating AWS Terraform variables, which will in turn allow us to
avoid provisioning availability zones unless we are going to populate
them with machines.
wking added a commit to wking/openshift-installer that referenced this pull request Mar 27, 2019
Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a
WriteableAsset, 2019-02-07, openshift#1211), but for Worker.  This is
groundwork for extracting configured availability zones when
generating AWS Terraform variables, which will in turn allow us to
avoid provisioning availability zones unless we are going to populate
them with machines.
wking added a commit to wking/openshift-installer that referenced this pull request Mar 28, 2019
Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a
WriteableAsset, 2019-02-07, openshift#1211), but for Worker.  This is
groundwork for extracting configured availability zones when
generating AWS Terraform variables, which will in turn allow us to
avoid provisioning availability zones unless we are going to populate
them with machines.
wking added a commit to wking/openshift-installer that referenced this pull request Mar 28, 2019
Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a
WriteableAsset, 2019-02-07, openshift#1211), but for Worker.  This is
groundwork for extracting configured availability zones when
generating AWS Terraform variables, which will in turn allow us to
avoid provisioning availability zones unless we are going to populate
them with machines.
wking added a commit to wking/openshift-installer that referenced this pull request Mar 28, 2019
Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a
WriteableAsset, 2019-02-07, openshift#1211), but for Worker.  This is
groundwork for extracting configured availability zones when
generating AWS Terraform variables, which will in turn allow us to
avoid provisioning availability zones unless we are going to populate
them with machines.
wking added a commit to wking/openshift-installer that referenced this pull request Mar 28, 2019
Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a
WriteableAsset, 2019-02-07, openshift#1211), but for Worker.  This is
groundwork for extracting configured availability zones when
generating AWS Terraform variables, which will in turn allow us to
avoid provisioning availability zones unless we are going to populate
them with machines.
wking added a commit to wking/openshift-installer that referenced this pull request Mar 28, 2019
Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a
WriteableAsset, 2019-02-07, openshift#1211), but for Worker.  This is
groundwork for extracting configured availability zones when
generating AWS Terraform variables, which will in turn allow us to
avoid provisioning availability zones unless we are going to populate
them with machines.
wking added a commit to wking/openshift-installer that referenced this pull request Mar 28, 2019
Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a
WriteableAsset, 2019-02-07, openshift#1211), but for Worker.  This is
groundwork for extracting configured availability zones when
generating AWS Terraform variables, which will in turn allow us to
avoid provisioning availability zones unless we are going to populate
them with machines.
vrutkovs pushed a commit to vrutkovs/installer that referenced this pull request Apr 1, 2019
Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a
WriteableAsset, 2019-02-07, openshift#1211), but for Worker.  This is
groundwork for extracting configured availability zones when
generating AWS Terraform variables, which will in turn allow us to
avoid provisioning availability zones unless we are going to populate
them with machines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants