-
Notifications
You must be signed in to change notification settings - Fork 15
implement (re)store of proxy's state #107
base: master
Are you sure you want to change the base?
Conversation
@dvoytik sorry but we are kind of rushing those days because we have some milestones to reach. And unfortunately, what you implemented is not in the scope of our P1 issues right now, that's why the slow reviews. |
Hi @sboeuf, |
abbbdb5
to
7a2f972
Compare
proxy.go
Outdated
@@ -124,6 +125,33 @@ func newClient(proxy *proxy, conn net.Conn) *client { | |||
} | |||
} | |||
|
|||
func (proxy *proxy) restoreTokens(vm *vm, tokens []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be tokens []Token
? Also, what if vm == nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately proxy.allocateTokens() returns []string. I'd like to avoid writing a conversion loop from []string to []Token in registerVM() and revert conversion loop in storeVMState().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha.
proxy.go
Outdated
return storeStateDir + "vm_" + id + ".json" | ||
} | ||
|
||
func storeVMState(vm *vm, tokens []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, tokens []Token
?
proxy.go
Outdated
} | ||
} | ||
|
||
func delVMAndState(proxy *proxy, vm *vm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if proxy
and vm
are nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add checks.
proxy.go
Outdated
} | ||
} | ||
|
||
func restoreVMState(proxy *proxy, containerID string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing checks on params. Also, this is a very long function - I think it must be close the the cyclomatic complexity limit so I'd consider refactoring it into smaller chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do.
proxy.go
Outdated
@@ -188,6 +216,207 @@ func (proxy *proxy) releaseToken(token Token) (*tokenInfo, error) { | |||
return info, nil | |||
} | |||
|
|||
var storeStateDir = "/var/lib/clear-containers/proxy/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to the top of the file? Its value should also be passed in from Makefile
so that at the top it's something like:
// XXX: set by build
var storeStateDir string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will do. Should it be set up to "/var/lib/clear-containers/proxy/" in Makefile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it should probably be set to $PKGLIBDIR/proxy/
(feel free to steal that chunk of code from the runtime's Makefile
;)
proxy.go
Outdated
} | ||
|
||
var proxyState proxyStateOnDisk | ||
err = json.Unmarshal(fdata, &proxyState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider: what happens when we change the proxy state file format? The 3 main approaches are:
- Don't handle it and just error if the on-disk state doesn't match what the proxy expects.
- Add a version field to
vmStateOnDisk
to allow the proxy to determine how to deal with a different format state file. - Make the code "sniff" the state file data and see which fields it contains and their types to determine how to handle such state file differences (the "
autoconf
approach" ;)
Clearly, here, there is only 1 possible on-disk format, but I think it would be worth thinking about this and documenting the strategy for how we plan to handle any changes to vmStateOnDisk
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO adding a version field to the format would be easiest way to handle this kind of issues.
proxy.go
Outdated
if proxy.socketPath, err = getSocketPath(); err != nil { | ||
return fmt.Errorf("couldn't get a rigth socket path: %v", err) | ||
if proxy.socketPath, err = getSocketPath(); err != nil { | ||
return fmt.Errorf("couldn't get a rigth socket path: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo that could be fixed now I've spotted it ;-) "rigth" => "right".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will do.
proxy.go
Outdated
proxy.Lock() | ||
delete(proxy.vms, vm.containerID) | ||
proxy.Unlock() | ||
proxy.storeState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This take a new lock. I don't know if there is an issue here in that the state could change between the call to Unlock()
and the call to storeState()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here proxy.vms hash-map is protected. I regularly run code with -race parameter to see if data races appear.
proxy_test.go
Outdated
|
||
// clean up a possible state | ||
os.RemoveAll(storeStateDir) | ||
assert.Equal(t, rig.proxy.restoreState(), false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a major problem, but in the runtime for new code we've started creating a assert
variable to avoid having to hass the testing.T
each time:
assert := assert.New(t)
:
:
assert.Equal(rig.proxy.restoreState(), false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks. Will do.
proxy_test.go
Outdated
@@ -900,3 +901,29 @@ func TestHyperstartResponse(t *testing.T) { | |||
|
|||
rig.Stop() | |||
} | |||
|
|||
func TestStoreRestore(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some new tests where the restore fails due to an invalid state file (zero size, invalid contents (non-json, pure text (hello world
or something ;-)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
Hi @dvoytik - thanks for raising this! It looks good. I've done a first-pass review and given some suggestions. There are 2 other areas to mention:
|
@jodh-intel, thank you for the review! |
Hi @dvoytik - no problem! And enjoy your holiday! 😎 |
@jodh-intel, thanks! 😃 |
Hi @dvoytik - hope you had a good holiday. Are you still planning to update this branch? Let me know if there is anything I can do to help. |
@jodh-intel, thanks! Sorry, for delaying on the PR. I'll rework it and address the comments today after my main job. Unfortunately for the time being I can finish this only in my spare time :( |
Hi @dvoytik - understood. I'm happy to help with this if you are tied up with other things. |
HI @jodh-intel, thank you! I feel responsibility to finish this anyway, so I'm going to work on the PR today evening. Sorry again for delaying this. |
d703819
to
3e0c350
Compare
Hi @jodh-intel, I've update the PR. Could you please take a look at it again? |
proxy.go
Outdated
@@ -188,6 +216,207 @@ func (proxy *proxy) releaseToken(token Token) (*tokenInfo, error) { | |||
return info, nil | |||
} | |||
|
|||
var storeStateDir = "/var/lib/clear-containers/proxy/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it should probably be set to $PKGLIBDIR/proxy/
(feel free to steal that chunk of code from the runtime's Makefile
;)
proxy.go
Outdated
// returns false if it's a clean start (i.e. no state is stored) or restoring failed | ||
func (proxy *proxy) restoreState() bool { | ||
proxyStateFilePath := storeStateDir + "proxy_state.json" | ||
if _, err := os.Stat(storeStateDir); os.IsNotExist(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - sorry, yes I see now.
proxy.go
Outdated
@@ -84,6 +92,20 @@ type proxy struct { | |||
wg sync.WaitGroup | |||
} | |||
|
|||
// proxyStateOnDisk is used to re(store) proxy state on disk | |||
type proxyStateOnDisk struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty significant feature which I think justifies moving all the state handling code into a state.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll do it.
proxy.go
Outdated
@@ -124,6 +125,33 @@ func newClient(proxy *proxy, conn net.Conn) *client { | |||
} | |||
} | |||
|
|||
func (proxy *proxy) restoreTokens(vm *vm, tokens []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha.
proxy.go
Outdated
} | ||
|
||
for _, token := range tokens { | ||
token, err := vm.AllocateTokenAs(Token(token)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to assert that token != ""
here? AllocateTokenAs()
will handle it I see, but it's not an expected scenario I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks!
proxy.go
Outdated
|
||
func restoreVMState(proxy *proxy, containerID string) { | ||
if proxy == nil { | ||
proxyLog.Errorf("proxy parameter must be not nil") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you add a check for
containerID == ""
. - Can you add the
containerID
to the log message to aid debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks.
proxy.go
Outdated
proxyLog.Warn("Recovering proxy state from: ", proxyStateFilePath) | ||
if proxyState.Version != Version { | ||
proxyLog.Warnf("Stored state version (%s) mismatches proxy"+ | ||
" version (%s). Aborting", proxyState.Version, Version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't deal with the scenario where proxyState.Version > Version
as that implies the state is from a newer version of the proxy than is running (forced downgrade), but I think we do need to handle the scenario where proxyState.Version <= Version
as the running proxy should be able to handle the state of all previous versions.
I'm still not 100% convinced that a version field is the right approach: even if we have one, the code will still need to sanity-check all the state values read off disk to ensure:
- they are of the correct type.
- they have a "reasonable" value.
However, it doesn't hurt to include a version.
proxy.go
Outdated
proxy.vms[regVM.ContainerID] = vm | ||
proxy.Unlock() | ||
|
||
proxyLog.Infof("restoreVMState(containerId=%s,ctlSerial=%s,ioSerial=%s,console=%s)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you could convert all the new log messages on this PR to use the structured logging approach. For example, this one could become:
proxyLog.WithFields(logrus.Fields{
"container": regVM.ContainerID,
"control-channel": regVM.CtlSerial,
"io-channel": regVM.IoSerial,
"console": regVM.Console,
}).Info("restoring state")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do.
proxy.go
Outdated
} | ||
|
||
proxyState := &proxyStateOnDisk{ | ||
Version: Version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the storing the protocol version. We need a separate version to describe the state file version as that might need to change more frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll rework this part.
proxy_test.go
Outdated
@@ -929,3 +930,93 @@ func TestGetSocketPath(t *testing.T) { | |||
assert.NotNil(t, err, err) | |||
assert.Equal(t, p, "") | |||
} | |||
|
|||
func TestStoreRestore(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for all the functions added? (This might necessitate splitting them into smaller functions to make them more easily testable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll try :)
Hi @dvoytik - thank you again raising this PR. It's a very significant new feature and because of its sensitivity, I'm afraid my pedantic knob is up to 11 😄 A couple of general points not already mentioned in the review:
As I've said before, I'm more than happy to help out with any of this. This is a big feature so if we can find a way to logically sub-divide the work, just ping me so we can work together on this. |
Hi @jodh-intel, thank you again for reviewing the PR! No problems with a pedantic attitude at all! Actually I value this a lot, especially when it's reasonable. All your points are valid, I agree. I'll rework the PR with your comments in mind. If you don't mind I'll try to finish this PR. It's just going to take a little longer because for the time being I can only work on this at evenings. IMHO, |
Thanks @dvoytik! ;) |
9673679
to
c2a705c
Compare
The PR has been updated. |
Thanks @dvoytik - reviewing and testing with clearcontainers/shim#54 ... |
state.go
Outdated
} | ||
|
||
func logContID(containerID string) *logrus.Entry { | ||
return proxyLog.WithField("container", containerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cute! 😄
} | ||
|
||
proxy.Lock() | ||
proxy.tokenToVM[token] = &tokenInfo{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not wholly clear on this. Should the token be removed if findSessionByToken()
fails below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practically if vm.findSessionByToken(token)
fails here then the vm object, which owns tokens, is being released later by delVMAndState, thus tokens are being also released.
But maybe I should release them here with such code:
vm.Lock()
for token := range vm.tokenToSession {
_ = vm.freeTokenUnlocked(token)
delete(vm.tokenToSession, token)
}
vm.Unlock()
Slightly more complicated code but more future-proof?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that looks good. What do you think @sboeuf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes fine with me.
@jodh-intel sure I'll take a look |
state.go
Outdated
|
||
// returns false if it's a clean start (i.e. no state is stored) or restoring | ||
// failed | ||
func (proxy *proxy) restoreState() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for clarity this should be called something like restoreAllState()
because it is not doing the opposite job to storeState()
:
storeState()
: only stores proxy state.restoreState()
: restores both proxy state and state of all VMs.
(alternatively, you could split the VM restore code in restoreState()
into a separate function I guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll rename it.
proxy.go
Outdated
logContID(vm.containerID).Errorf( | ||
"couldn't store a VM state: %v", err) | ||
} | ||
if err := proxy.storeState(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing blank line above the if
(here and in various other places).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops :) I'll fix these, thanks!
proxy.go
Outdated
"couldn't store a VM state: %v", err) | ||
} | ||
if err := proxy.storeState(); err != nil { | ||
proxyLog.Errorf("couldn't store proxy's state: %v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put the err
on this line as it looks a bit odd down there on its own (same comment for a few other occurrences).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll fix these.
} | ||
|
||
proxy.Lock() | ||
proxy.tokenToVM[token] = &tokenInfo{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that looks good. What do you think @sboeuf?
97a65e3
to
11d2371
Compare
Sorry I could not review that, but I'll do that tomorrow first thing! And I know I am gonna spend some time since that's a pretty huge PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dvoytik thanks for this huge PR. I have several comments, but I have also one concern (unless I am missing something). From my understanding, you basically STORE during RegisterVM()
and RESTORE during UnregisterVM()
. The issue is that you're not covering all cases here. You will miss all the tokens/sessionID created during a pod lifecycle. For instance, after the VM has been started and that we have stored all the info, if we create new containers/processes, new tokens/sessionIDs are gonna be generated by the proxy. Am I missing something or this is something expected ? Cause that means that we are not covering a lot of crashing cases otherwise.
@sameo could you please also take a look since this is a significant PR/feature.
Makefile
Outdated
@@ -16,6 +16,7 @@ LOCALSTATEDIR := /var | |||
|
|||
SOURCES := $(shell find . 2>&1 | grep -E '.*\.(c|h|go)$$') | |||
PROXY_SOCKET := $(LOCALSTATEDIR)/run/clear-containers/proxy.sock | |||
STORE_STATE_DIR := $(LOCALSTATEDIR)/lib/clear-containers/proxy/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$(LOCALSTATEDIR)/run/clear-containers/proxy/
would be more appropriate since we're gonna store some data generated at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree. I'll change this. Thanks.
@@ -53,7 +54,7 @@ all: cc-proxy $(UNIT_FILES) | |||
|
|||
cc-proxy: $(SOURCES) Makefile | |||
$(QUIET_GOBUILD)go build -i -o $@ -ldflags \ | |||
"-X main.DefaultSocketPath=$(PROXY_SOCKET) -X main.Version=$(VERSION)" | |||
"-X main.DefaultSocketPath=$(PROXY_SOCKET) -X main.Version=$(VERSION) -X main.storeStateDir=$(STORE_STATE_DIR)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you could fix the other variables as they don't need to be upper case since they don't need to be exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like not to mix other refactoring in one patch if you don't mind. But if you insist, I could write a separate patch and make it as a part of this PR though.
state.go
Outdated
|
||
// Returns false if it's a clean start (i.e. no state is stored) or restoring | ||
// failed. | ||
func (proxy *proxy) restoreAllState() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make the difference between the three cases: clean start
/ restoration succeeded
/ restoration failed
. IMO, that's the caller responsibility to decide what to do with the different output combinations.
func (proxy *proxy) restoreAllState() (bool, error) {
Or we could have one function to detect if we are in a clean start
case by checking if we have something in .../run/clear-containers/proxy/...
, and another function for the actual restoration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, you should not define methods belonging to the proxy object outside of proxy.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rework this function.
proxy.go
Outdated
@@ -622,12 +632,14 @@ func (proxy *proxy) init() error { | |||
// Force a coredump + full stacktrace on internal error | |||
debug.SetTraceback("crash") | |||
|
|||
// flags | |||
proxy.enableVMConsole = logrus.GetLevel() == logrus.DebugLevel | |||
if !proxy.restoreAllState() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear enough. See my comment about restoreAllState()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks. I'll rework this.
state.go
Outdated
} | ||
|
||
func vmStateFilePath(id string) string { | ||
return storeStateDir + "vm_" + id + ".json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using path/filepath
package is usually clearer/cleaner when creating/modifying a path.
filepath.Join(storeStateDir, "vm_" + id + ".json")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the hint! I'll rework this function as you suggested.
state.go
Outdated
Tokens []string `json:"tokens"` | ||
} | ||
|
||
func logContID(containerID string) *logrus.Entry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very generic, it should go into log.go or utils.go, but I don't think it should stay here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK to move it to a distinct file, but maybe it'd be better to do this as a separate refactoring PR? If you insist I can rework this as a part of my PR.
state.go
Outdated
} | ||
|
||
// On success returns nil, otherwise an error string message. | ||
func (proxy *proxy) restoreTokens(vm *vm, tokens []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not define methods belonging to the proxy object outside of proxy.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change this.
} | ||
|
||
proxy.Lock() | ||
proxy.tokenToVM[token] = &tokenInfo{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes fine with me.
state.go
Outdated
|
||
storeFile := vmStateFilePath(vm.containerID) | ||
|
||
err = ioutil.WriteFile(storeFile, o, proxyStateFilesPerm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the format
if err := ...; err != nil {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, as you wish ;)
state.go
Outdated
} | ||
|
||
// On success returns nil, otherwise an error string message. | ||
func storeVMState(vm *vm, tokens []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think you should define a storeState()
and fetchState()
methods both for vm
and proxy
. That way, it would be more consistent with the overall naming of those new functions that you're introducing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally the code was in proxy.go but @jodh-intel asked me to move it out to a separate file. If I define the methods on vm and proxy then I have to move them to proxy.go and vm.go back according to what you ask :)
Hi @sboeuf, thank you for the review!
No, restoring is done only when proxy starts and detects a stored state on disk. During unregisterVM() we call delVMAndState(), which in turn updates proxy's state and deletes the requested vm's state on disk.
Tokens are stored in a state. Please see |
Yes sorry I used wrong explanation, but basically you're saving things only at the
Yes sure, but you're doing the STORE only when the VM is registered. And according to a full pod lifecycle, you're gonna miss almost everything. I mean that you're only be able to RESTORE the very first configuration that you saved when the VM has been created... Lots of containers and process could be pending at the time the proxy had crashed, and you would miss all of that. I think that you need to store in lot more places in the code. Does that make sense ? |
Oops, actually you are right! I've just checked again all API methods and totally forgot about attachVM(). Originally in my previous versions it was there and somehow disappeared :) I remember I was testing attaching manually and it worked. |
11d2371
to
885e85b
Compare
@sboeuf, thank you for the review again! |
@sboeuf - ping. |
Hi All Currently:
With a redesign:
in that way containers will not be affected if the cc-proxy dies since part of the logic of the cc-proxy will be moved to cc-agent |
@devimc that could work, sure. But the drawback of such a solution is that we would get a larger attack surface because you would create as much serial ports as we have processes. |
@jodh-intel @dvoytik I'll look into this today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have noticed that you're missing to save all the container IDs (the one you can find into ioSession
structure). We make a difference between the containerID in the vm
structure which means the podID and the containerID in the ioSession
structure which means a containerID. There is one session per container and one VM per pod.
I think that you need to figure out a way to save the containers related to each pod/vm, otherwise we could not be able to perform proper action like killing the container process after the proxy restarted.
Take a look here: https://github.com/clearcontainers/proxy/blob/master/vm.go#L363 and here: https://github.com/clearcontainers/proxy/blob/master/vm.go#L572-L581
state.go
Outdated
@@ -0,0 +1,387 @@ | |||
// Copyright (c) 2017 Dmitry Voytik <dmitry.voytik@huawei.com> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dvoytik have you made the change here ?
if err := os.Remove(proxyStateFilePath); err != nil { | ||
return fmt.Errorf("couldn't remove file %s: %v", | ||
proxyStateFilePath, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to your logic, if there's no VM, we should not have a state file, I am fine with that, but this means you should return nil after you properly removed the file, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thank you. I'll add here return nil
.
state.go
Outdated
Version: stateFileFormatVersion, | ||
SocketPath: proxy.socketPath, | ||
EnableVMConsole: proxy.enableVMConsole, | ||
ContainerIDs: make([]string, 0, len(proxy.vms)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to allocate dynamically the size of the buffer. Calling into append()
will do that for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know that. It looks like a premature optimization, but on the other hand does it really hurt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I don't consider this as an optimization since you could directly remove this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you insist, I'll remove this line, no problem. My intention was to avoid re-allocations on each append operation.
state.go
Outdated
// On success returns nil, otherwise an error string message. | ||
func storeVMState(vm *vm) error { | ||
vm.Lock() | ||
tokens := make([]string, 0, len(vm.tokenToSession)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not need dynamic allocation since you're using append right after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@sboeuf, thank you again for the review. |
@dvoytik no problem ! Let me know when you have something ready. |
b932d09
to
e7ebeda
Compare
Introduce the high availability feature of cc-proxy by implementing store/restore of proxy's state to/from disk. This feature depends on the ability of shim to reconnect to cc-proxy if connection is lost. Fixes clearcontainers#4. Signed-off-by: Dmitry Voytik <dmitry.voytik@huawei.com>
e7ebeda
to
b4df8d5
Compare
Hi @sboeuf, could you please take a look? |
Introduce the high availability feature of cc-proxy by implementing
store/restore of proxy's state to/from disk. This feature depends
on the ability of shim to reconnect to cc-proxy if connection is lost.
Fixes #4.
Signed-off-by: Dmitry Voytik dmitry.voytik@huawei.com