Skip to content

Latest commit

 

History

History
1365 lines (1168 loc) · 62.9 KB

MEETING-NOTES.md

File metadata and controls

1365 lines (1168 loc) · 62.9 KB

CNI meeting notes

note: the notes are checked in after every meeting to https://github.com/containernetworking/meeting-notes

An editable copy is hosted at https://hackmd.io/jU7dQ49dQ86ugrXBx1De9w. Feel free to add agenda items there

2024-11-25: no meeting (thanksgiving holiday)

2024-11-18

2024-11-11

  • Q: Skip for KubeCon week?
    • A: Casey/Tomo regrets, unrelated reasons

2024-11-04

  • [Zappa] cni.dev not updated with latest spec. I can do this if this is an oversight. This is just a reminder for me
  • [Lionel] Validation
    • containernetworking/cni#1132
    • [tomo] need to discuss
      • where to implement (plugin or other)
      • how we define (in spec?)
      • example: define json schema in different file (of diffent repo, or directory)
    • likely conclude that we could have schema (e.g. json schema) in repository and recommend to use to validate. No need to update SPEC. Good to implement to provide such schema file from current golang file
      • TODO: come up with the way to generate schema file
  • [Zappa] do we have any call outs for kubecon that I should include?
    • Casey: remind everyone that CNI composes in two dimensions: multiple interfaces, and multiple plugins for the same interface. Thus, a single gRPC call cannot represent the currrent API surface
  • [Swagat] bridge CNI plugin containernetworking/plugins#1107
    • related to isolating containers connected to teh same bridge, there is similar functionality in Docker

2024-10-28

Regrets: Casey, Tomo

  • [Lionel/Antonio] CNI DRA Driver
    • How to validate values for CNI plugins.
    • e.g. a VALIDATE call to make sure a config is "valid"
      • Where "valid" means not just structure (which we already check/validate)
      • but also that the values of the config are within the bounds the plugin will accept
      • to catch/minimize schedule-time add failures
    • CHECK is for containers that already are scheduled, and can't work for this
    • STATUS doesn't currently take a config at all.
    • We do not currently require plugins to define value ranges they accept, so ADD would just fail today.
    • There would likely still be some sort of practical gap between VALID = TRUE and ADD = FAIL, due to node/plugin runtime state?
    • Could also be used for scheduling feedback in K8S?
  • [Dylan/Isovalent] Delegated IPAM issue

2024-10-21

  • [Lionel/Antonio] CNI DRA Driver
  • Casey: CNI and DRA look, from a higher level, identical. Both are privileged hooks in to Pod and PodSandbox lifecycle. The only big difference is composition (i.e. chaining).
    • The nice thing that DRA brings to the table is declarative in-cluster configuration (A Real K8S API)
    • The nice thing that CNI brings to the table is vendor-and-user-extensible networking hooks

2024-10-14

2024-10-07

2024-09-30

  • plugins release: which PRs should get in?

  • [Zappa] propose Lionel/Ben as CNI maintainers

    • heck yeah!
  • [tomo] CNI2.0 requirement brainstorming

    • Revisit CNI1.x requirements and check whether it is applicable or not - do people want accomplish these tasks without explicit support in 2.0, and if so, how would they do so. Perhaps frame this as a back compat problem.

      • Chaining
      • Cached state (for GC)
      • Return type w/ interfaces & addresses
      • init
      • de-init (e.g. events for bridge deletion)
      • capture interface events? (i.e. v6 SLAAC events)
      • dynamic changing attribute without ADD/DEL
        • Route
        • IP address
        • MTU
        • and so on
        • How
          • Adding API verb (CHANGE)?
          • ADD/DEL for attirubute?
  • [lionel] Please continue to review/comment on containernetworking/plugins#1096

2024-09-23

2024-09-16

2024-09-09

  • [zappa] moved agenda to next week

2024-09-02

2024-08-26

  • [cdc] can't make it, on train
  • [zappa/lionel] CNI 2.0/DRA brainstorming
  • [tomo] Reviews...
    • Bridge CNI PR request.

2024-08-19

  • [danwinship] nftables containernetworking/plugins#935
    • cdc to review
    • minor questions about /sbin/nft vs. direct netlink APIs
  • [zappa/antonio] 2.0 proto design discussion
  • [zappa] CNI 2.0 paradigm shift to RPC
  • [cdc] Anything else stalled on review? assign it to @squeed as reviewer

2024-08-12

  • [cdc] poor bandwidh, back next week!
  • [zappa] CNI 2.0 paradigm shift to RPC (moved to next week)

2024-08-05

2024-07-29

  • skipped

2024-07-22

2024-07-15

  • [Tomo] not join, due to national holiday

2024-07-08

  • [tomo] Multi-gc containernetworking/cni#1091
    • [cdc]
      • what's difference between current vs. new?
      • what's about delegation?
  • [cdc] containernetworking/cni#1103
    • "cni.dev/attachments" vs. "cni.dev/valid-attachments"
    • Oops, we used different keys in SPEC vs libcni
    • Should we change SPEC or libcni?
    • Decision: valid-attachments is a clearer name in this case.

2024-07-01

2024-06-24

2024-06-17

2024-06-10

  • [Tomo] containernetworking/cni#1097
  • We merge some PRs for v1.1.1
    • We will cut v1.1.11.1.1.1.1 shortly, once PRs merge and deps are bumped
    • Also need wording for "Also mention that current GC is mainly for single CNI config and needs to design GC that supports multiple CNI config"
  • 1.1 runtime checkin
  • 1.2 wishes milestone
    • drop-ins
    • multi-network GC?
    • INIT
    • DEINIT
    • metadata for interface (and more, e.g. address?)
    • capability/runtimeConfig for deviceID?

2024-06-03

  • [cdc / lionel] oops, overriding JSONMarshal() breaks types that embed it -- containernetworking/plugins#1050 (review)
  • [jaime] cri-o GC call
  • [Tomo] GC Improvelment discussion
    • Gist: should we add GC for 'CNI Configs', not 'a CNI config' for runtimes with multiple CNI config (i.e. multus or containerd)
    • Note: Containerd supports multiple CNI Configs with following config:
      [plugins."io.containerd.grpc.v1.cri".cni]
        max_conf_num = 2
      
      with above config, containerd picks two CNI configs from CNI directory.
    • Current risk:
      • As CNI Spec, if plugin identifies attachment uniqueness with "CONTAINER_ID" and "IFNAME", then current GC (validAttachment is identified with "CONTAINER_ID", "IFNAME" AND "CNI network name") may remove valid attachments unexpectedly...
        • NetA: VA1, VA2
          • VB1, VB2 seems to be invalid from NetA
        • NetB: VB1, VB2
          • VA1, VA2 seems to be invalid from NetB
    • Toughts:
      • keep current one for single CNI
      • Just add new API GCNetworkLists(ctx context.Context, net []*NetworkConfigList, args *GCArgs) error
        • get CNI Configs
        • gather each validAttachemnents
        • for each CNI plugin, call it
      • This API also optimizes GC call (i.e. less 'Stop the world')
      • Required for SPEC:
    • Alternate solution: Just recommend not to use GC in multiple CNI config environment
    • AI:
      • Add words in SPEC, to mention GC care about not only CONTAINER_ID, IF_NAME and also "CNI network name" (i.e. 'name' in CNI config)
      • Also mention that current GC is mainly for single CNI config and needs to design GC that supports multiple CNI config
  • [mz] kubecon roll call!?

2024-05-27

2024-05-20

  • Casey on holiday, but we will have the call without him
  • Tagged plugins v1.5.0 (contains "CNI version output" fix)
  • Extend tuning plugin to also support ethtool configuration
    • [tomo] how about to another CNI plugin ('ethtool' pluign?) because now tuning plugin has a lots of feature...
    • Could you please file a issue in github!
  • Extend tuning plugin to make configuration also on the host side for veth
    • Could you please file a issue in github!
  • containerd: PR is pending, stuck on failed CI
  • [Ben] CNI 1.2 - dropin (Reviewed+Approved - probably waiting on Casey to merge) containernetworking/cni#1052
  • SBR Table ID: https://www.cni.dev/plugins/current/meta/sbr/#future-enhancements-and-known-limitations
  • [miguel] how is GC supposed to be used ?? Any docs w/ examples anywhere ?
  • [Jaime] Q: multus GC status

2024-05-13

  • Check in on CNI v1.1 runtime implementations
    • Multus: "primary" network GC, STATUS in progress, no big hurdles. secondary networks trickier (need discussion)
    • cri-o: oci-cni support has merged, Jaime working on cri-o GC.
      • Question: when to issue a GC? Answer: On startup at least, on a timer if you like. Fun would also be on CNI DEL failure. May need to disable GC by explicit config

2024-05-06

2024-04-29

  • regrets: Tomo (national holiday). Pls ping me @K8s slack

2024-04-22

  • We chat about k8s and the DRA

    • Antonio working on adding netdevs to oci spec
    • [ben] thinking aloud, if netdev management was done here, would that mean that CNI plugins might become like FINALIZE variants (more or less), basically?
  • GC doubts: should we add a DisableGC option?

    • use-case: one network, multiple runtimes
    • [Tomo]+1
  • dropin (updated, PR review) containernetworking/cni#1052

  • ocicni STATUS and GC PRs: cri-o/ocicni#196, cri-o/ocicni#197

2024-04-15

  • Tagging v1.1: minor cleanup needed
  • v1.2 milestone:
  • FINALIZE discussion
    • use-cases:
      • ensuring routes
      • inserting iptables rules (e.g. service-mesh proxy)
      • ECMP (eaugh)
    • lifecycle:
      1. ADD (network 1)
      2. ADD (network 2)
      3. FINALIZE (network 1)
      4. FINALIZE (network 2)
      5. later... CHECK
    • Configuration source:
      • in-line from config
      • specific FINALIZE configuration?
        • maybe not needed.
        • cri-o / containerd could have a magic dropin directory?
      • What if a configuration only has FINALIZE plugins?
        • then we don't ADD, just FINALIZE
    • What is passed to the plugin(s)?
      • We could pass all results of all networks
        • Tomo: this is complicated (and plugin could get that by netlink), let's not
        • fair enough
      • CNI_IFNAME?
      • Standard prevResult?
    • What is returned?
      • Not allowed to produce result?
    • Philosophical question: is FINALIZE "network-level" or "container-level"
      • does it get IFNAME? PrevResult?
      • Homework: come up with more use-cases.
  • [minor] readme PR: containernetworking/cni#1081
  • [minor] licensing question containernetworking/plugins#1021

2024-04-08

PR

  • containernetworking/cni#1054 [approved]
  • Merge today! (or reject today), it is so big and hard to keep in PR list...: containernetworking/plugins#921 [merged]
  • containernetworking/cni#1052 [for 1.2]
    • Q: should this be a config or not? containernetworking/cni#1052 (comment)
    • A: Consensus seems to be leave in the flag, change the name, maybe add docs around how libcni implements the spec with file loading.
    • [dougbtv] working on NetworkPlumbing WG proposal to attach secondary networks more granularly
    • [ben] that sounds good to me, I want to keep this very tightly scoped to avoid main config file contention, more granular behaviors should be handled elsewhere, 100%

Discussion

  • Should 'ready' be both CNI network configuration and binaries present? Right now its just the network config. [Zappa]
    • [tomo] agreed but need to care UX (how to told this to user)
      • k8s node object should have error messages ()
    • [ben] since we can't check much about the binary, this is necessarily a simplistic check that a file exists at the binary path - it could still fail to execute, or be partially copied, etc
    • [cdc] I wrote the .Validate() libcni function some time ago for this, use it :-)
  • CNI X.Y [Zappa/Casey]
  • Finalize/Init Verb [Zappa]
  • Loopback fun [Zappa]

2024-04-01

No CNI calls today due to Easter holiday, dismiss!

  • Headsup:
    • [squeed] Last call for tagging libcni for v1.1.0 and let's conclude next week meeting, 4/8!!

2024-03-25

KubeCon update:

CNI v1.2 ideas:

  • drop-in directories
  • Interface metadata
    • TODO: file issue
  • FINALIZE verb
    • Problem: we have no inter-container lifecycle guarantees
    • use-cases:
      • Ensuring route table is in a defined state
      • insering iptables rules for proxy sidecar (e.g. Envoy) chaining
    • Biggest problem: how to configure?
      • /etc/cni/net.d/_finalize.d/00_istio.{fconf,conf}
      • Do we use a standardized directory that applies to all plugins?
      • Do we have finalizers per-network, or finalizers after all networks?
        • Ben: Do we need both, or could we get away with just global finalizers?
        • Casey, others: We might (for some use cases) actually need per-network.
        • Which one is less footgunny? Would running finalizers per-network "hide" global state that might make finalizers more likely to break things?
      • Multus is also trying to add a finalizer pattern, for multiple CNIs - consider how this works as well.
/etc/cni/net.d/00-foobar.conf
{
  // usual CNI
  plugins: [],
  finalize_plugins: [{"type":"force_route"}] // type a: in CNI config
}

// type b: drop-in directory
// should we change '_finalize.d' as configurable?

/etc/cni/net.d/_finalize.d/00-foobar.conf
{
  // one finalizer
}

/etc/cni/net.d/_finalize.d/99-istio.conf
{
  // put istio, that wants to be final!
}

/etc/cni/net.d/_finalize.d/999-barfoo.conf
{
  // oh, sorry, it is the actual final guy!
}

2024-03-11

Work on outline for Kubecon project update.

CNI: update, what's next

  • CNI basic overview
    • CNI is an execution protocol, run by the runtime
  • CNI 1.0
    • '.conf' file (i.e. single plugin conf) is removed!
    • interface has no longer 'version' field (of address)
  • new version CNI 1.1 Update!
  • new verbs
    • STATUS: replaces "write or delete CNI config file" to signify plugin readiness
    • GC: clean up leftover resources
  • new fields
    • route MTU, table, scope
    • interface MTU, PCIID, socket path
  • cniVersions array
    • Configuration files can now support multiple supported versions
  • Will be released shortly
    • implementing v1.1 in plugins now
    • cri-o, containerd also in-progress
    • Community question: Should the CRI closer reflect the CNI?
      • a.k.a. how can I use these shiny new fields?
      • our opinion: heck yeah! Let's make the CRI more fun
      • Should we expand v1/PodStatus? MultiNetwork WG is proposing this, 👍
  • device capability (not 1.2, but whatever)
    • Standardized way for devices to be passed down from kubelet -> runtime -> plugin (e.g. SR-IOV)
    • Still no way for plugins to say they need a certain device
      • Looking for ways to tie config back in to the scheduler
      • complicated! help wanted!

what's next (for v1.2)

  • drop-in directories (definite)
    • no more manipulating CNI configuration files
    • isto community contribution
  • Interface metadata, (likely)
    • We prefer to have these as fields
      • We generally have a low threshold for adding a field
    • But some things are just too weird even for us :p
  • FINALIZE (maybe??)
    • some kind of post-ADD "finalize" plugin?
    • called after every ADD of every interface
    • possible use case: resolve route conflicts
  • INIT (stretch)
    • Called before first ADD
    • Not container-specific
    • Really sorry about this one, Multus
    • Nasty lifecycle leaks
  • Dynamic reconfiguration (vague idea)
    • Spec says ADD is non-idempotent
    • But there's no reason this has to be the case
    • Do you want this? Get involved!!

So, what about KNI?

  • KNI is, and is not a replacement for CNI
    • e.g. KNI proposed to be responsible for isolation domain management
  • KNI extensibility is still a work in progress
  • If KNI merges, the default impl will be containerd / cri-o wrapping CNI

Calls to action:

  • This is a dynamic area of k8s right now, lots of things are being proposed
  • CNI fits in a complicated ecosystem (k8s, CNI, CRI, runtimes)
  • There is a lot of room for improvement, but it reaches across a lot of concerns
  • We are all busy, we can't be in all projects at all times
  • Reach out! Let's make features people use!

2024-03-04

  • PR Review:

  • Discussions:

    • CNI cached directory questions
      • Why is the cached directory not on volatile storage?
        • because we try and pass the same values to DEL as to ADD, even after a reboot
      • But we sometimes fail to delete because of invalid cache :-)
        • We should handle this case gracefully, same as a missing cache file
      • Casey wonders: How do we handle this for GC?????
    • [Zappa] go-cni PR for Status [draft]
    • [Zappa] go-cni PR for additional fields [draft]
    • need same work for CRI-O, where are you, Jaime?
    • [cdc] Device conventions: containernetworking/cni#1070
    • [cdc] working on CNI v1.1 for plugins, slowly
    • Kubecon Talk: What did we do????????
      • CNI v1.0
        • no .conf files
        • CHECK
      • CNI v1.1 -- lots of new features
        • GC, SATUS
          • more types
      • KNI?
      • What do people actually want? What verbs should come next?
        • FINALIZE?
      • Hey multi-networking, please figure out how to configure CNI via k8s API plz thanks

2024-02-26

2024-02-19

2024-02-12

2024-02-05

  • PR's

    • Support loading plugins from subdirectories: containernetworking/cni#1052
      • Comments addressed, this now adds a new opt-in config flag rather than forcing more drastic changes to the config spec.
      • PTAL, need Casey/Dan to do a final pass
  • discuss:

    • SocketPath/DeviceID aka metadata (need Casey/Dan)
    • KNI? (need Casey/Dan)
    • CNI 2.0 / Multi-Network / KNI / DRA / NRI / ? meetup at KubeCon?
      • Plan/strategy? KNI will probably need a plugin strategy, will probably support CNI plugins, but going forward could support a better/smarter plugin interface.
    • containernetworking/cni#1061
    • containernetworking/cni#598

2024-01-29

  • PR's
  • (if time, otherwise defer) Conf vs Conflist in libcni [bl]
    • only conflist is supported by current spec, and that has been true for some time.
    • looking for historical context on current state
    • should we mark conf as deprecated, and remove on major bump? Given the above, that seems reasonable.
    • Decision: Even though the pre-1.0.0 format is deprecated, we cannot remove it yet.
  • Metadata
    • Socket Path
    • pciid
  • Convention for results from device plugins [cdc, others]
    • Should we make it easy for containerd / crio to pass devices ti plugins?
    • Prefer to phased approach
      • phase1: Just formalize in cni repo and change multus/sr-iov
      • phase2: Integrate cni runtime as well as container runtime
        • CDI
  • Add MTU to the interface results in the CNI
  • annoying that spec v1.0 is library v1.1
    • Do we split spec repo and library?
    • What if we skip v1.1 and move to v1.2?
  • KNI

2024-01-22

  • PR's
  • (if time, otherwise defer) Conf vs Conflist in libcni [bl]
    • only conflist is supported by current spec, and that has been true for some time.
    • looking for historical context on current state
    • should we mark conf as deprecated, and remove on major bump? Given the above, that seems reasonable.
  • Metadata
    • Socket Path
    • pciid

2024-01-15

  • US Holiday, cdc out too

2024-01-08

  • Welcome to the New Year!
  • PR:
    • containernetworking/plugins#844
      • from the last comment: It's unfortunate that this as been pending so long, it seems like the maintainers are ignoring this or don't find value in this PR 😭
      • May need to decision (to include/not include?)
    • containernetworking/plugins#921 (local bandwidth)
      • Tomo and Mike approved
      • We will ship this in the next release.
  • CNI 1.1
    • tag -rc1?
    • implement in plugins
    • implement in runtimes (go-cni cri-o)
  • What belongs in CNI v1.2:
  • Metadata proposal
    • conclusion: this seems worthy, let's expore it
    • come up with some use cases, draft a SPEC change
  • multi-ADD / idempotent ADD / reconfigure
    • wellllll, k8s doesn't have network configuration, so how can we reconfigure what we don't have?
    • Pete to write up proposal?
  • [bl] QQ: Config versioning
    • Do we distinguish between config file and config (in-mem) schema versions, or are they always 1:1 in the spec?

2023-12-18

2023-12-11

2023-12-04

2023-11-27

  • CNI 2.0 Note Comparison [Zappa]
    • KNI Design proposal
    • Mike Z is going to do further work on gRPC in the container runtime to see if it fits conceptually once its in there.
    • Key points:
      • Something like this exists in some private forks
  • Extra metadata on CNI result [Zappa]
  • KubeCon EU Presentation? (skipped / next time) (From Tomo)
    • Tomo is out but Doug said he'd bring this up.
    • usual (not community one) CFP deadline is Nov 26.
    • CNI1.1 talk?
    • No one has the dates for the maintainer track CFP due date, but Casey's going to ask around about it.
  • CNI 2.0 requirements discussion (We should still discuss 1.1)
  • [mz] Extra metadata on CNI result

2023-11-20

2023-11-13

2023-11-06

  • Do some PR reviews (it's kubecon week)

2023-10-30

2023-10-23

  • [Tomo] (TODO)Writing DNS Doc...
    • Discovery: DNS type is not currently called out as optional, it should be
  • [Ed] Request from kubevirt community
    1. containernetworking/plugins#951 (activateInterface option for bridge CNI plugin)
    2. support non-interface specific sysctl params in tuning (with as stand-alone, not meta plugin)
    • note: tuning currently allows anything in /proc/sys/net
  • [Casey] Status PRs are ready for review

2023-10-16

NOTE: jitsi died, https://meet.google.com/hpm-ifun-ujm

  • [PeterW] Multi-network update
    • KEP-EP heading towards Alpha
  • GC is merged!

2023-10-09

  • We get distracted talking about multi-network and DNS responses
    • AI: Tomo will create doc (problem statement)
  • [Tomo] containernetworking/plugins#951
    • should we support that?
    • config name / semantic both are weird...
    • Todo: Ask them about that in next
  • [Tomo] Request from kubevirt community
    • support non-interface specific sysctl params in tuning (with as stand-alone, not meta plugin)
    • Todo: Ask them about that in next
    • note: tuning currently allows anything in /proc/sys/net
  • [cdc] re-review GC (containernetworking/cni#1022)

2023-10-02

2023-09-25

  • Attendance: Doug, Michael Cambria, Antonio, Dan Williams , Dan Winship
  • Tomo: maintainer
    • If not discussed this meeting Tomo will open a PR for github discussion
    • resolved: files PR to add to maintainer list
    • containernetworking/cni#1024
  • Doug: High level question, what do you all think about K8s native multi-networking?
    • Giving a talk with Maciej Skrocki (Google) at Kubecon NA on K8s native multi-networking
    • I want to address "what's the position from a CNI viewpoint?"
    • My point is: We kinda of "ignore CNI as an implementation detail"
      • But! ...It's an important ecosystem.
    • Multus is kind of a "kubernetes enabled CNI runtime" -- or at least, users treat it that way
      • Should it continue to function in that role?
      • Should CNI evolve to have the "kubernetes enabled" functionality?
      • What do you all think?
    • CNI has always supported multinetworking (especially: rkt)
      • And K8s has taken almost 10 years!
      • Mike brings up, that it's really that the runtime insisted on doing only one interface.
      • Doug asks is it the runtime should be enabled with the functionality
        • What about dynamic reconfiguration
      • Mike Z mentions he's working on the CRI side, executing multiple cni
        • Pod sandbox status, to relay multiple IPs back, and the network names
        • Node network interface(NRI) doesn't have a network domain
        • Network domain hooks pre-and-post
        • This is happening outside of the k8s space.
        • Use cases outside of Kubernetes, as well.
        • Custom schedulers for BGP, OSPF, etc.
      • Mike C brings up consideration of scheduling a pod with knowledge of which networks will be available.
    • Re: STATUS
      • Progrmatically distributing network configuration, and
        • that problem appears in single network as well, and has relation to STATUS
    • Antonio brings up, what percent of community benefits from multiple interfaces?
    • CNI has been surprisingly static in the face of other changes (e.g. kubenet -> [...])
  • Doug: Also any updates in Kubecon NA maintainer's summit?
    • no maintainers going :-(
  • Back to STATUS?
    • Sticking point: how do you know whether or not to rely on STATUS -- as a plugin that automatically writes a configuration file
    • Idea: version negotiation when cniVersion is empty
      • This works if CRI-O / containerd ignore conflist files w/o a cniVersion
      • Casey to experiment
    • Still doesn't solve the problem of plugins knowing which value to use
  • How do we know that a node supports v1.1 (and thus uses STATUS)?
    • sweet, the ContainerD / CRI-O version is exported in the Node object
    • Ugly, but heuristics will work
  • draft GC PR: containernetworking/cni#1022
    • of interest: deprecate PluginMain(add, del, check) b/c signature changes stink

2023-09-18

  • Attendance: Tomo, Antonio, Henry Wang, Dan Williams, Dan Winship
  • Network Ready
    • Now: have a CNI config file on-disk
    • Container Runtimes: containerd and crio reply NetworkReady through CRI based on the existing of that file
    • When CNI plugin can't add interfaces to new Pods, we want the node to be no-schedule TODO(aojea) if condition Network notReady = tainted
      • Only way to currently indicate this is the CNI config file on-disk
      • Can't really remove the config file to indicate readiness (though libcni does cache config for DEL)
    • One option: enforce STATUS in plugin by always writing out CNI config with CNIVersion that includes status
      • Runtimes that don't know STATUS won't parse your config and will ignore your plugin
      • Downside: you have to know the runtime supports STATUS
      • Downside: in OpenShift upgrades, old CRIO runs with new plugin until node reboot, this would break that. You'd have to have a window where the runtime supported STATUS but your plugin didn't use it yet. Then 2 OpenShift releases later you can flip to requiring STATUS.

2023-09-11

2023-09-04 (Labour day in US/Canada)

  • Attendance: Casey, Peter, Tomo
  • Question for the US people: KubeCon maintainer's summit?
    • Definite topic for next week
  • Tomo: maintainer
    • will discuss next week

2023-08-28

  • Attendance: Antonio, MikeZ, Tomo
  • Discussion about NRI/multi-network design

2023-08-21

  • Attendance: Antonio, Peter, Dan Winship, Tomo
  • Mike Zappa is likely to write down KEP for kubernetes CRI/CNI/NRI to handle cases like multi network
  • Current multinetwork approach for the KEP is focusing on API phases
  • Follow STATUS PR: containernetworking/cni#1003

2023-08-14

2023-08-07

  • Reviving version negotiation
    • Casey's proposal: a configlist without a version uses VERSION to pick the highest one
    • This means that administrators don't have to pick a version, which requires understanding too many disparate components
  • Can we rely on IPs never being used?
    • Nope, you have to use the ContainerID
    • No good way around it
  • CNCF graduation?
    • MikeZ reached out about security audit, need to add the "best practices" badge
  • We merge the GC spec
    • woohoo!
  • Casey looks at PR https://github.com/containernetworking/plugins/pull/936/files and is a bit surprised at how many bridge VLAN settings there are
    • Could we get some holistic documentation of these options?

2023-07-31

  • Discussion w.r.t.: containernetworking/cni#927
    • Background: how do we tell what version if config to install?
    • We talk about adding more information to the VERSION command; could do things like discovering capabilities
    • Dream about executing containers instead of binaries on disk
      • (wow, it's like a shitty PodSpec! But still very interesting)
    • Remove as much as possible from CNI configuration, make it easy for administrators
    • Hope that multi-networking will make it easier for admins to push out network changes
    • How does one feel about version autonegotiation
      • let's do it
  • CNCF graduated project?
  • nftables! (FYI) containernetworking/plugins#935
    • We talk about whether it is safe to rely on IP addresses being cleaned up between DEL and ADD
    • libcni always deletes chained plugins last-to-first to avoid this very issue... except not quite
    • Thus, it is potentially safe to delete map entries solely by IP address
  • Is it safe to rely on IP addresses always being cleaned up?

2023-07-24

  • Multinetwork report from Pete White
    • MikeZ is meditating on how it fits in with the CRI
  • STATUS verb (PR 1003) (Issue 859)
  • The problem: plugins don't know whether they should use legacy (write file when ready) behavior versus rely on STATUS
  • Potential solutions:
    • CRI signals whether or not it supports STATUS via config file or something (discussed in issue 927)
      • Biggest blocker for a feature file is downgrades
    • Add an additional directory, "cniv1.1", that is only read by cni clients
    • Plugins write a file that is invalid for v1.0, but valid for v1.1, when status is failing
    • Switch to a new directory entirely
    • New filename suffix (.conflistv1.1)
      • Not a terrible idea
  • reviewers wanted: containernetworking/plugins#921
  • Review of PRs, looking in pretty good shape

2023-07-17

  • Regrets: Tomo

2023-07-03

2023-06-26

  • Let's review some PRs
  • multi-network chit chat

2023-06-19

  • Continuing STATUS editing
  • Tomo asks about version divergence between a plugin and its delegate. We talk about version negotiation.
  • Circle back for CNI+CRI
  • Update: we file containernetworking/cni#1003

2023-06-12

Status

Questions:

  1. What do we return? Just non-zero exit code? OR JSON type?
    • We should return a list of conditions
    • Conditions: (please better names please)
      • AddReady
      • RoutingReady (do we need this)?
      • ContainerReady
      • NetworkReady
  2. Should we return 0 or non-zero?
    • after a lot of discussion, we come back to returning nothing on success and just error

Draft spec:

STATUS: Check plugin status

STATUS is a way for a runtime to determine the readiness of a network plugin.

A plugin must exit with a zero (success) return code if the plugin is ready to service ADD requests. If the plugin is not able to service ADD requests, it must exit with a non-zero return code and output an error on standard out (see below).

The following error codes are defined in the context of STATUS:

  • 50: The plugin is not available (i.e. cannot service ADD requests)
  • 51: The plugin is not available, and existing containers in the network may have limited connectivity.

Plugin considerations:

  • Status is purely informational. A plugin MUST NOT rely on STATUS being called.
  • Plugins should always expect other CNI operations (like ADD, DEL, etc) even if STATUS returns an error.STATUS does not prevent other runtime requests.
  • If a plugin relies on a delegated plugin (e.g. IPAM) to service ADD requests, it must also execute a STATUS request to that plugin. If the delegated plugin return an error result, the executing plugin should return an error result.

Input:

The runtime will provide a json-serialized plugin configuration object (defined below) on standard in.

Optional environment parameters:

  • CNI_PATH

References:

CRI API https://github.com/kubernetes/cri-api/blob/e5515a56d18bcd51b266ad9e3a7c40c7371d3a6f/pkg/apis/runtime/v1/api.proto#L1480C1-L1502

message RuntimeCondition {
    // Type of runtime condition.
    string type = 1;
    // Status of the condition, one of true/false. Default: false.
    bool status = 2;
    // Brief CamelCase string containing reason for the condition's last transition.
    string reason = 3;
    // Human-readable message indicating details about last transition.
    string message = 4;
}

2023-06-05

  • PTAL: containernetworking/cni.dev#119
    • cdc observes we're due for some website maintainance
  • [aojea] - more on CNI status checks - dryRun option?
  • We would really like the STATUS verb
    • It would solve an annoying user situation
    • Let's do it.
    • Next week we'll sit down and hammer out the spec.
    • containernetworking/cni#859
      • strawman approach: kubelet (networkReady) -- CRI --> container_runtime -- (exec) --> CNI STATUS
      • runtimes should use the version to use the new VERB
  • See if GC spec needs any changes: containernetworking/cni#981
    • We need wording for paralleization:
    • The container runtime must not invoke parallel operations for the same container, but is allowed to invoke parallel operations for different containers. This includes across multiple attachments.
    • Exception: The runtime must exclusively execute either gc or add and delete. The runtime must ensure that no add or delete operations are in progress before executing gc, and must wait for gc to complete before issuing new add or delete commands.

2023-05-22

2023-05-15

  • [cdc] we cut a release! yay!
  • [cdc] Chatting with Multus implementers about version negotiation
    • Should we just do this automatically? We already have the VERSION command...
    • Everyone is uncomfortable with "magic" happening without someone asking for it
    • Original proposal was for cniVersions array.
    • What if we added "auto" as a possible cni version?
    • Concern: how do we expose what version we decided to use?
    • Or what if we just autonegotiate all the time
      • YOU GET A NEW VERSION! YOU GET A NEW VERSION!

2023-05-08

Agenda:

  • [tomo] What should GC(de-INIT) do if INIT fails
  • What parallel operations should be allowed?
    • Can you GC and ADD / DEL at the same time?
    • No way; the runtime has to "stop-the-world"
    • This makes sense, "network-wide" operations can be thought of as touching "every" attachment, and we don't allow parallel operations on an attachment.
  • [cdc] Sorry, I owe a release
  • [aojea] Evolve CNI to support new demands containernetworking/cni#785 (comment)
    • We talk about the difference between the "configuration" API vs. the plugin API
    • Everyone seems to settle on CNI via the CRI
    • Casey drafted a version of this: https://hackmd.io/@squeed/cri-cni

2023-05-1

  • Labor day

2023-04-24

2023-04-17

  • KubeCon week, kind of quiet.
  • PR 873 is merged.
    • Let's try and do a release in the next few weeks.
  • containernetworking/cni#981 has feedback, let's address it
  • We button up some of the wording for GC
  • Review some PRs. Merge some of them.

2023-04-10

Agenda:

2023-04-03

Agenda:

2023-03-27

Agenda:

  • INIT/DE-INIT discussion
    • Should INIT/DEINIT be per-network, or per-plugin?
      • Probably per-network... but resources shared across networks? (see DEINIT discussion)
    • Serialization
      • Should the runtime be required to serialize calls to a plugin?
      • Eg can the plugin call INIT for two networks for the same plugin simultaneously, or not
    • Tomo asked about ordering guarantees; we shouldn't have double-INIT or double-DEINIT
    • If the config changes, do we DEINIT and then INIT with the new config? That could be very problematic.
      • Do we need an UPDATE for config change case?
      • What if the chain itself changes, plugins added/removed?
    • What if a plugin in the chain fails INIT?
      • What is the failure behavior if INIT fails?
      • When does the runtime retry INIT?
    • What if DEINIT fails?
      • Should GC be called?
    • Timing is pretty vague; when should DEINIT be called?
      • when the network config disappears (deleted from disk, removed from Kube API, etc)
      • when config disappears and all containers using the network are gone?
      • How should plugins handle deleting resources common to all networks? (eg plugin iptable chain)
        • Should we require that networks use unique resources to prevent this issue?
        • And/or punt to plugins that they just have to track/handle this kind of thing
    • "How do I uninstall a CNI plugin?"
      • CNI spec doesn't talk about any of this
      • (partly because we let the runtime decide where config is actually stored, even though libcni implements one method for doing this -- on-disk)
    • When config gets deleted, how do we invoke DEINIT with the now-deleted config?
      • Use cached config?
      • libcni would need to keep cached config after DEL; currently it doesn't
      • Keep a new kind of "network"-level config for this?
  • PR review

2023-03-20

Agenda:

  • Brief discussion about some sort of SYNC
    • usual chained plugin issue
  • Let's try and write the spec for GC.
  • We do! containernetworking/cni#981

2023-03-13

Agenda:

round of introductions

  • STATUS for v1.1?
  • Multi-network? Doug not present
  • Network Plugin for containerd(Henry)
    • initial problem: trying to solve leaking resources (sometimes cleanup fails)
    • led to GC proposal, as well as GC() method on libcni
    • containerd/containerd#7947
  • Does it make sense for some kind of idempotent Sync()
    • Challenges:
      • hard to make fast / high overhead
      • chained plugins make this difficult, might have flapping interfaces
      • pushes a lot of overhead on the plugins
    • Does INIT solve this? Not really; runtime might not call INIT when it's needed
  • What do we do on failed CHECK?
    • Should we allow for ADD after failed CHECK
    • Chained plugins make this difficult, but we could change the spec
  • (tomo) Consider a bridge - when should we delete it?
    • even though no container interface in bridge, user may add some physical interface to the pod
    • bridge plugin does not have lock mechanism for multiple container
    • we considered a DEINIT verb, but it didn't seem useful
  • Let's do some reviews. Oops, we run out of time
  • Should we formalize "how to interact with libcni"?
    • What are the expectations for how configuration files are dropped in? (e.g. permission error)

2023-03-06

CNI v1.1 roadmap: https://github.com/containernetworking/cni/milestone/9

Agenda:

Tomo asks for clarification about GC and INIT

INIT: runtime calls INIT on a configuration but without a container. It means "please prepare to receive CNI ADDs". For example, a plugin could create a bridge interface.

GC: two aspects to this discussion. Most of the GC logic would actually be in libcni, which already maintains a cache of configuration and attachments. The runtime would pass a list of still-valid attachments. Libcni could synthesize DEL for any "stale" attachments.

Separately, there could be a spec verb, GC, that would tell runtimes to delete any stale resources

We do some reviews.

Next week: Woah, there are a lot of PRs to review. Oof.