Check if intermediate cert needs to be renewed. #6835

hanshasselberg · 2019-11-25T09:32:59Z

Currently when using the built-in CA provider for Connect, root certificates are valid for 10 years, however secondary DCs get intermediates that are valid for only 1 year. There is no mechanism currently short of rotating the root in the primary that will cause the secondary DCs to renew their intermediates.
This PR adds a check that renews the cert if it is half way through its validity period.

In order to be able to test these changes, a new configuration option was added: IntermediateCertTTL which is set extremely low in the tests.

The commits for this PR are done so that you could read them one by one and get a better overview. It contains some bugfixes and refactorings. Only the last commit has the new feature!

sdk/freeport/freeport.go

freddygv

I'm not very familiar with the Connect CA, so I won't approve the PR, but I have a couple comments/questions:

The definition for waitForActiveCARoot in TestLeader_SecondaryCA_IntermediateRenew is missing
Would using syscall.Getrlimit cause problems if someone runs the test suite in a non-unix platform like Windows?

hanshasselberg · 2019-12-17T12:35:20Z

@freddygv thanks for the review

not sure how it ever worked before, but I swear I ran the tests! fixing it.
you are correct, I fixed it so that it is a noop on windows.

banks

Great job @i0rek!

Overall I think this is great and super close. I have one significant suggestion about how we trigger this which I think will also help you solve the test flakyness by not relying on blocking semantics at all.

banks · 2019-12-19T12:24:26Z

agent/consul/leader_connect.go

+			intermediateCert.NotAfter) {
+			needsNewIntermediate = true
+		}
+	}


This currently relies on our blocking query implementation which times out every 10 mins to implement a "periodic check". I'm not sure that's the best approach for a few reasons:

When we upgrade this to streaming there will be no periodic timeouts

We are using an implementation detail (that this method is called regularly even when there are no changes) in an unexpected/undocumented way. Even without streaming it would be easy to miss this assumption and for example add a line to the replication loop that skips calling this in the case of a timeout where the index is the same. I doubt that's plausibly going to happen in practice but is an example of why it feels wrong to implicitly assume regular iterations of this loop in the steady state.

Could we instead have a dedicated timer loop that somehow triggers the renewal? I realise it's more complicated having two different goroutines call this as they could do so concurrently but I don't think it's that hard to either trigger it via the same goroutine as now - just have it select on either watch update or timer update - or else jsut synchronize them with a mutex?

Thats a good point. It required me redoing most of the work, but it is better now!

banks · 2019-12-19T12:25:44Z

sdk/freeport/freeport.go

+		logf("INFO", "blockSize %d too big for system limit %d. Adjusting...", blockSize, limit)
+		blockSize = limit - 3
+	}
+


IS this solving the darwin annoying warning in tests? If so cool :) I wasn't sure if it's relevant to this PR or not?

I don't think so. But it solves the case where your system limits are lower than the lower bound of freeport, which makes its own test fail instantly on OSX.

hanshasselberg · 2020-01-14T21:27:22Z

Tracking test failures atm.

banks

Hey Hans, Looking super good!

I think most of this is just naming bikesheds - I really like the split of some of the code that was getting kinda hairy though once names are clear!

The main blocker I see is that the renew can race with noticing a new root which could corrupt the CA config state just like the issue we had the other day. I don't think we need to do anything really sophisticated here - maybe just have a mutex held for the entire process of checking renewal and hold same mutex for any other secondary CA reconfiguration so we can be sure that there is only one reconfig going at once.

agent/connect/ca/provider_consul.go

agent/consul/leader_connect.go

hanshasselberg · 2020-01-16T08:50:31Z

Thank you for the review @banks!
The function names were an oversight on my part, I forgot to come up with proper names after I just picked a placeholder name during the refactoring. Your suggestions were very welcome!

The provider reconfiguration now also has its own lock. Despite the existence of caProviderLock which is being used to lock reading and writing the provider from the state. I chose to not reuse that to not block other operations during a possibly longer running intermediate cert renewal.

banks

This looks great, thanks Hans! 🎉

banks · 2020-01-17T14:42:35Z

agent/consul/leader_connect.go

@@ -173,6 +177,9 @@ func (s *Server) initializeCA() error {
 	if err != nil {
 		return err
 	}
+
+	s.caProviderReconfigurationLock.Lock()
+	defer s.caProviderReconfigurationLock.Unlock()
 	s.setCAProvider(provider, nil)


Might be a good idea to document in the comment for setCAProvider that it is often called while holding caProviderReconfigurationLock and so must never take that or call anything that does.

I think it's unlikely it would but it's not super obvious that would cause a deadlock which might bite us eventually.

I added that warning in a bunch of places.

If system limits are lower than blocksize, it outright fails. Even though it is adjusted now, it is still possible that it fails, but at least not that obvious anymore.

hanshasselberg changed the base branch from release/1.6.x to master December 13, 2019 13:31

hanshasselberg requested review from a team and removed request for a team December 13, 2019 15:23

hanshasselberg force-pushed the connect_secondary_renew branch from a805de9 to 372ef03 Compare December 13, 2019 15:50

hanshasselberg requested a review from a team December 13, 2019 16:45

hanshasselberg commented Dec 13, 2019

View reviewed changes

sdk/freeport/freeport.go Outdated Show resolved Hide resolved

freddygv reviewed Dec 13, 2019

View reviewed changes

hanshasselberg self-assigned this Dec 17, 2019

hanshasselberg force-pushed the connect_secondary_renew branch from 8d391e3 to 1855f53 Compare December 17, 2019 21:47

banks requested changes Dec 19, 2019

View reviewed changes

hanshasselberg force-pushed the connect_secondary_renew branch 9 times, most recently from e0c9e51 to 5e8bb04 Compare January 14, 2020 13:43

hanshasselberg requested review from banks and a team January 14, 2020 19:53

hanshasselberg force-pushed the connect_secondary_renew branch from 5e8bb04 to ea92d8c Compare January 14, 2020 20:35

hanshasselberg force-pushed the connect_secondary_renew branch 4 times, most recently from 63a1502 to 4839682 Compare January 15, 2020 12:06

banks requested changes Jan 15, 2020

View reviewed changes

hanshasselberg added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 16, 2020

hanshasselberg force-pushed the connect_secondary_renew branch from 56c44d2 to 88b90ad Compare January 16, 2020 10:22

stale bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 16, 2020

banks approved these changes Jan 17, 2020

View reviewed changes

hanshasselberg added 8 commits January 17, 2020 22:25

Mention dc in error message.

5982851

Adjust blocksize to system limits on OSX

77685be

If system limits are lower than blocksize, it outright fails. Even though it is adjusted now, it is still possible that it fails, but at least not that obvious anymore.

Show correct roots in error message.

b23082c

Introduce new config: IntermediateCertTTL

5eb3924

Extract functions.

a8addf5

Add routine to check if intermediate needs to be renewed.

b57fce7

address pr feedback

f2a4d46

add a comment that explains lock usages and warns

03a95a5

hanshasselberg force-pushed the connect_secondary_renew branch from 9bc417f to 03a95a5 Compare January 17, 2020 21:27

hanshasselberg merged commit 804eb17 into master Jan 17, 2020

hanshasselberg deleted the connect_secondary_renew branch January 17, 2020 22:27

This was referenced Jan 21, 2020

test: run both the TestVaultProvider and TestVaultCAProvider tests in CI #7097

Merged

fix the submodule go.mod and go.sum files #7098

Merged

hanshasselberg added the backport/1.6 label Jan 21, 2020

rboyer mentioned this pull request Jan 21, 2020

test: ensure we don't ask vault to sign a leaf that outlives its CA when acting as a secondary #7100

Merged

hanshasselberg removed the backport/1.6 label Jan 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check if intermediate cert needs to be renewed. #6835

Check if intermediate cert needs to be renewed. #6835

hanshasselberg commented Nov 25, 2019 •

edited

Loading

freddygv left a comment

hanshasselberg commented Dec 17, 2019 •

edited

Loading

banks left a comment

banks Dec 19, 2019

hanshasselberg Jan 14, 2020

banks Dec 19, 2019

hanshasselberg Jan 14, 2020

hanshasselberg commented Jan 14, 2020

banks left a comment

hanshasselberg commented Jan 16, 2020 •

edited

Loading

banks left a comment

banks Jan 17, 2020

hanshasselberg Jan 17, 2020

Check if intermediate cert needs to be renewed. #6835

Check if intermediate cert needs to be renewed. #6835

Conversation

hanshasselberg commented Nov 25, 2019 • edited Loading

freddygv left a comment

Choose a reason for hiding this comment

hanshasselberg commented Dec 17, 2019 • edited Loading

banks left a comment

Choose a reason for hiding this comment

banks Dec 19, 2019

Choose a reason for hiding this comment

hanshasselberg Jan 14, 2020

Choose a reason for hiding this comment

banks Dec 19, 2019

Choose a reason for hiding this comment

hanshasselberg Jan 14, 2020

Choose a reason for hiding this comment

hanshasselberg commented Jan 14, 2020

banks left a comment

Choose a reason for hiding this comment

hanshasselberg commented Jan 16, 2020 • edited Loading

banks left a comment

Choose a reason for hiding this comment

banks Jan 17, 2020

Choose a reason for hiding this comment

hanshasselberg Jan 17, 2020

Choose a reason for hiding this comment

hanshasselberg commented Nov 25, 2019 •

edited

Loading

hanshasselberg commented Dec 17, 2019 •

edited

Loading

hanshasselberg commented Jan 16, 2020 •

edited

Loading