Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commit confirm support #1118

Open
rbcollins123 opened this issue Feb 6, 2020 · 30 comments
Open

commit confirm support #1118

rbcollins123 opened this issue Feb 6, 2020 · 30 comments

Comments

@rbcollins123
Copy link
Contributor

I'm interested in contributing a PR to add this ability (and subsequently pass it upstream to napalm_ansible via another PR also), but wanted to gauge interest from maintainers on accepting something along those lines? We have a number of use-cases where it would be advantageous to have the commit auto-reverted quickly if connectivity to the target EOS device was lost due to a change in the config-session (this is a lot cleaner than scheduling and cancelling a reload), but I'd like to understand if efforts to add would be well received here or if we should fork to do what we need?

This feature was added back in EOS v4.18.0F which already went end-of-support in August 2019.
So, it could be made the default operation method for all commits and the min-supported release of EOS for Napalm could be raised from 4.15.0F to 4.18.0F fairly safely these days. Or, leave min release at 4.15.0F and add optional keyword var to enable/disable it. In all cases new keyword vars would be needed to control the timer value at minimum. Ideally that would be via something like revert_in to control when it auto-reverts and confirm_in to control how long you wait before confirming the session (which would default to 0 secs but allow you to add some delay if you want to allow for network convergence events before confirming the commit).

I see there was similar work in #550 on this previously for some non-EOS drivers, but it was abandoned and I couldn't see why from the issue comments?

@tlund42
Copy link

tlund42 commented Feb 6, 2020

This would be fantastically useful for us. We currently do the scheduled reload trick to revert in case we make a bad deploy and it seems a bit messy.

@cdorry
Copy link

cdorry commented Feb 6, 2020

Agreed this would be so much cleaner than the scheduled reload and also reduce the impact as config replace would be so much quicker than device reload.

@rbcollins123
Copy link
Contributor Author

Bump! @ktbyers i noticed you did a lot of work on #550. Any thoughts on the below?

I'm interested in contributing a PR to add this ability (and subsequently pass it upstream to napalm_ansible via another PR also), but wanted to gauge interest from maintainers on accepting something along those lines? We have a number of use-cases where it would be advantageous to have the commit auto-reverted quickly if connectivity to the target EOS device was lost due to a change in the config-session (this is a lot cleaner than scheduling and cancelling a reload), but I'd like to understand if efforts to add would be well received here or if we should fork to do what we need?

This feature was added back in EOS v4.18.0F which already went end-of-support in August 2019.
So, it could be made the default operation method for all commits and the min-supported release of EOS for Napalm could be raised from 4.15.0F to 4.18.0F fairly safely these days. Or, leave min release at 4.15.0F and add optional keyword var to enable/disable it. In all cases new keyword vars would be needed to control the timer value at minimum. Ideally that would be via something like revert_in to control when it auto-reverts and confirm_in to control how long you wait before confirming the session (which would default to 0 secs but allow you to add some delay if you want to allow for network convergence events before confirming the commit).

I see there was similar work in #550 on this previously for some non-EOS drivers, but it was abandoned and I couldn't see why from the issue comments?

@ktbyers
Copy link
Contributor

ktbyers commented Feb 11, 2020

Yes, we tried to do this in late 2017 and it eventually got abandoned due to how much work it was (abandoned was mostly be me i.e. implementing it + tests and documentation was going to take too much time).

We could open up discussions on it, but some caveats:

  1. Any solution has to apply to all drivers (excluding NX-OS which I don't think has any viable solution for this). This doesn't mean we have to have an implementation for all drivers at the the beginning, but the methods/arguments we decide on need to consider all the core platforms (excluding NX-OS).
  2. Probably would start by agreeing on new methods and new arguments.
  3. After that tests would need built out for all the new methods and new arguments.
  4. At that point we would stub out all of the drivers with the methods and with a NotImplementedError (but that would get all of the underlying structure and tests in place).
  5. commit_config() would default to its current behavior (i.e. no commit confirm by default). Reason for that is a lot of people don't wouldn't want commit confirm to be the default behavior and since NX-OS has no way of supporting this consistency across drivers would require commit_config() to remain as it is.

Definitely open to more discussions on this, but it is a lot of work.

@ktbyers ktbyers changed the title Interest in using EOS commit timer for EOSDriver.commit_config operations? commit confirm support Feb 11, 2020
@ktbyers
Copy link
Contributor

ktbyers commented Feb 11, 2020

Here is relevant info from issue #550


Did some more research and review (which also helped me remember past research):

IOS - Should be able to do reasonable system-wide solution.

NX-OS - Does not have any reasonable solution (AFAIK). The only thing I see is a kludge that we shouldn't do. This is also what I found in past research. We can ask in Cisco channel or NAPALM channel and see if anyone know of a reasonable solution, but I am not seeing one.

EOS - I think there is support, but a specific version is required. I believe I had this discussion with @bewing previously.

Junos - should be fine

IOS-XR - Can you commit-confirm in a different SSH session on IOS-XR? I wasn't seeing a good way of doing this in brief testing.

@rbcollins123
Copy link
Contributor Author

Yes, we tried to do this in late 2017 and it eventually got abandoned due to how much work it was (abandoned was mostly be me i.e. implementing it + tests and documentation was going to take too much time).

We could open up discussions on it, but some caveats:

  1. Any solution has to apply to all drivers (excluding NX-OS which I don't think has any viable solution for this). This doesn't mean we have to have an implementation for all drivers at the the beginning, but the methods/arguments we decide on need to consider all the core platforms (excluding NX-OS).
  2. Probably would start by agreeing on new methods and new arguments.
  3. After that tests would need built out for all the new methods and new arguments.
  4. At that point we would stub out all of the drivers with the methods and with a NotImplementedError (but that would get all of the underlying structure and tests in place).
  5. commit_config() would default to its current behavior (i.e. no commit confirm by default). Reason for that is a lot of people don't wouldn't want commit confirm to be the default behavior and since NX-OS has no way of supporting this consistency across drivers would require commit_config() to remain as it is.

Definitely open to more discussions on this, but it is a lot of work.

Thanks @ktbyers for the feedback. My main concern was if I could avoid the implementation for the other drivers as we need the EOS use-case 1st. I'm fine with just adding NotImplemented stubs for all the other core drivers and tests, etc. I'll start taking a look at it next week and making some plans for initial implementation

@ktbyers
Copy link
Contributor

ktbyers commented Feb 12, 2020

Yes, I think we need to consider all of the core NAPALM platforms as far as the behavior and methods we define (so that as they eventually get implemented for other platforms the methods make sense/don't need to be changed).

That being said we should not require all of the platforms to actually be implemented i.e. we should just implement them one at a time (as someone who has interests in that platform decides to do a PR). This was probably the biggest problem we did in 2017 (i.e. tried to say we needed N-platforms to have this support before we integrated the PR).

@mayuresh82
Copy link
Contributor

so why not just add it to junos to begin with ? Junos has supported this since day 1, all the other vendors are playing catch-up. Why are we imposing the need for additional methods here ?
A revert_commit() method is not required, as it defeats the purpose of doing a commit confirmed which by definition, will revert the commit anyway after the given time. The correct way of doing is is by simply requesting a rollback(1) before the timer expires, which should happen in client code.
A has_pending_commit() method is also a nice-to-have and I see no reason why this is being imposed as a requirement. It doesnt make sense to impose the burden of figuring out pending commits on the library when it should be done by client by other means (e.g running a show|compare on junos)
Lets not block an essential feature for the sake of it especially when this can be easily achieved by simply passing an optional argument similar to how ignore_warning was implemented for junos.

@ktbyers
Copy link
Contributor

ktbyers commented Mar 4, 2020

@mayuresh82 This already exists on Juniper's PyEZ and consequently is accessible by NAPALM (i.e. you can access the underlying PyEZ driver via NAPALM). Consequently, if your primary concern is Juniper, then this is already available.

A main purpose of napalm (probably the main purpose) is to make a common set of methods across different dissimilar platforms. This implies that the we need to consider the other platforms when creating the methods (including which methods should exist or not exist).

@rbcollins123
Copy link
Contributor Author

Sorry, took me a while to get back to this. I took a 1st pass at the base methods/args from the past work from @ktbyers and threw PR #1179 up as initial code for discussion. Let me know thoughts or suggestions for edits. Once we agree on the base implementation I'll submit a PR for the EOS driver to implement all of these and then others can do the same or other drivers. Thanks!

@ktbyers
Copy link
Contributor

ktbyers commented Apr 27, 2020

@rbcollins123 Nag me if I don't start looking at this by mid-May. I am worried it might just slip through the cracks...

@rbcollins123
Copy link
Contributor Author

@rbcollins123 Nag me if I don't start looking at this by mid-May. I am worried it might just slip through the cracks...

OK will do @ktbyers!

@mirceaulinic
Copy link
Member

Just wanted to add a note to make sure this doesn't lead to false assumptions. For example, I noticed this:

We currently do the scheduled reload trick to revert in case we make a bad deploy and it seems a bit messy.

If you do a bad deploy and you lost access to the device because of it, NAPALM won't be able to save you unless you're using a platform that supports commit confirm natively. In other words, if you're on, for example, Cisco IOS or others similar, NAPALM can at most revert to the previous config IF and only if still able to connect to the device.

@rbcollins123
Copy link
Contributor Author

rbcollins123 commented Apr 30, 2020 via email

@rbcollins123
Copy link
Contributor Author

@rbcollins123 Nag me if I don't start looking at this by mid-May. I am worried it might just slip through the cracks...

Hey @ktbyers, bumping this and PR #1179 on your stack. Just looking for thoughts on the methods/args/naming with everything set to raise NotImplementedError for now. Then we will do small individual PRs for each OS driver once this is merged.

@ktbyers
Copy link
Contributor

ktbyers commented May 11, 2020

Correcting something here (from statement above)...Cisco IOS can do this (I basically implemented it in a POC at one point).

Will try to start working on this a bit today through Wednesday.

@ktbyers
Copy link
Contributor

ktbyers commented May 14, 2020

@rbcollins123 Haven't forgotten...just haven't got to it yet.

@gcotone
Copy link

gcotone commented Jun 10, 2020

👍 for this feature. We've implemented this in a napalm-community driver by passing the confirm (optional) argument.

It would be nice if we could standardize behavior/naming as @rbcollins123 was suggesting.

@ktbyers
Copy link
Contributor

ktbyers commented Jun 10, 2020

@gcotone Relevant PR is here:

#1179

@gcotone
Copy link

gcotone commented Dec 1, 2020

Description for method confirm_commit() should be updated since commit_confirm=True is not used at all:

Confirm the changes requested via commit_config when commit_confirm=True.

@ktbyers
Copy link
Contributor

ktbyers commented Dec 2, 2020

Yes, that needs updated... @gcotone

Regards, Kirk

@mirceaulinic
Copy link
Member

As this has been merged into the develop branch, before releasing rc1, would anyone mind adding some documentation around the new features? Thanks.

@mirceaulinic
Copy link
Member

@ktbyers @rbcollins123 any thoughts? I'd like to release 3.3.0 soon, but I can't do this without having this major feature documented; I'll check back in a couple of weeks - and unless anyone that has been in the loop contributes with some documentation on this, I'm afraid I'll revert it from the develop branch, and leave it open as a PR till someone will come up with the docs. Sounds good?

@ktbyers
Copy link
Contributor

ktbyers commented Feb 8, 2021

I will see if I can write some docs on it. Give me 10 days though. Sound reasonable?

@mirceaulinic
Copy link
Member

Sure, thanks @ktbyers - as long as you need, just a confirmation is all I needed. 😄

@ktbyers
Copy link
Contributor

ktbyers commented Feb 15, 2021

First pass at docs are here:

#1379

@Kircheneer
Copy link
Contributor

Kircheneer commented Jun 9, 2022

This comment can be ignored, work is being done in #1538.

I would be willing to give commit confirm for config replacement for the
IOS driver a shot. I see that there was some work on that done by @DanSheps here but the effort seems to have slowed down.

Proposed solution

Use the built-in Cisco IOS feature to accomplish automatic rollback on error/timeout:

config replace flash:candidate.cfg force revert trigger error time $SOMETHING

Caveats

See here.

This solution comes with the following caveats:

  • Config archival needs to be set up (which AFAIK NAPALM already requires for configuration replacement)
  • Configuration merging is not supported
    (it would probably be possible to implement this, but it is not needed for my use case)
  • A minimum IOS version is needed to make use of this feature,
    see last page of this

Any thoughts on this?

@DanSheps
Copy link
Contributor

DanSheps commented Jun 9, 2022

I am not sure what you mean by slowed down. I am pretty sure what I provided does exactly what it needs to, plus, unlike your proposal it also covers merging and not just replace.

I believe that PR just needs to be reviewed for completeness by the team.

@DanSheps
Copy link
Contributor

DanSheps commented Jun 9, 2022

Nevermind, I see that it failed on a check and I didn't notice, I will see if I can fix that up shortly.

@Kircheneer
Copy link
Contributor

Very sorry, I did not see that there was a PR for it - I thought that you just had commits on a branch and never submitted a PR. In that case you can ignore that comment. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants