Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Geneve link support #565

Closed
wants to merge 1 commit into from
Closed

Conversation

shassard
Copy link
Contributor

Heavily based on the existing Gretap support

@shassard shassard mentioned this pull request Sep 14, 2020
@shassard shassard force-pushed the master branch 2 times, most recently from 14baa29 to 6a46481 Compare September 14, 2020 17:44
Heavily based on the existing Gretap support
@ddyzhang
Copy link

ddyzhang commented Sep 22, 2020

@shassard Has this PR been abandoned? I'm interested in Geneve support here as well.

Happy to pick this up if this is no longer needed from your side.

@shassard
Copy link
Contributor Author

shassard commented Sep 22, 2020

The automated tests proved that the geneve VNIs setting seemed to not work properly. Testing of creating interfaces seems to work just fine though, so I'm unsure if something is obviously wrong in the geneve interface setup. I haven't had sufficient time to dive into the resolution yet.

If you can figure out what I'm missing to get the VNI unit tests working for geneve, please feel free to run with what I have here.

@ddyzhang
Copy link

Thanks for the context! I'll see if I can find out what's going on and if I get the tests fixed up, I'll put something out.

Appreciate your work on this so far.

@shassard
Copy link
Contributor Author

@ddyzhang
Copy link

I took a look at your PR locally and noticed that on L2465 of link_linux.go, we aren't assigning the result of the VNI parsing back into geneve.ID: https://github.com/vishvananda/netlink/pull/565/files#diff-350920f2216bdd28aa0ddd41f71bcab0R2465

This would probably explain your test failures at L305-307 in compareGeneve().

That being said, I'm having troubles replicating your test failure. In my dev environment, I can get all the other tests to run successfully, but TestLinkAddDelGeneve fails for me with a numerical result out of range error. Tracing this error through the call stack, I found out that the error was emitted here. For some reason, when I run my unit tests (even with the geneve.ID fix), the netlink response returns [222, 255, 255, 255] in the first 4 m.Data array values. When we try to convert this into a uint32, we get a massive unsigned 32-bit integer. This then causes the cast to int32 to freak out.

I'll keep looking into this, but hopefully this helps.

@shassard
Copy link
Contributor Author

Thanks! That helps a ton. I'll run with this a bit and see if I can figure out the last piece.

@shassard
Copy link
Contributor Author

shassard commented Sep 23, 2020

Can you give this a try:
shassard@18fa705

A cleanup to the VNI conversion and specficying the encap type for the tests seems to make things generally work.

The linux kernel changed the format of the struct between 4.19 and 5.8. The header here is for 4.19. I'm not sure if/how we can deal with that here ..?

@ddyzhang
Copy link

That worked great! The only thing that I had to do was temporarily remove the assertion against the destination port. That assertion was failing because we never assigned a Dport to the expected Geneve object, so the default Geneve destination port of 6081 was chosen by Linux.

Regarding the struct change between kernel versions, that's a tricky one and a great call-out. If we want to simultaneously support both 4.19-style and 5.8-style structs, we would need to make sure both the input to netlink as well as the de-serialization of the netlink response are compatible with both versions. Since the de-serialization doesn't care too much about missing fields if the kernel is on 4.19, I'm guessing addGeneveAttrs would need to be kernel-version-aware if we want to fully support this on both kernel versions.

That being said, I'm definitely not a networking guru, so I'm not sure if any of this is even accurate. Curious to hear your thoughts on this.

@shassard
Copy link
Contributor Author

This fixes up the port check:
shassard@5fab7f7

And the tests all pass on my arch linux 5.8.10 kernel.

It doesn't seem like their is precidence in the code base for performing runtime kernel version detection and adjusting the struct ordering. Given that my very new kernel didn't fun into nasty issues, we can probably punt any significant design change related to the kernel versioning. It seems like PR'ing this in the current state is worth it, as it's working for at least basic use cases.

@ddyzhang
Copy link

Confirmed that this test is passing on 4.14 as well, thanks so much for the fixes!

Since neither 5.8 nor 4.14 seemed to have issues with this current implementation, I'm also fine with leaving things as-is and PRing this. Really appreciate it!

@shassard
Copy link
Contributor Author

I've cleaned up the commits and have pushed a new PR:
#567

@shassard
Copy link
Contributor Author

shassard commented Sep 24, 2020

Thanks again for your help @ddyzhang on getting the last wrinkles sorted.

@shassard shassard mentioned this pull request Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants