diff --git a/keps/sig-network/3866-nftables-proxy/README.md b/keps/sig-network/3866-nftables-proxy/README.md
new file mode 100644
index 000000000000..3c48975af7ea
--- /dev/null
+++ b/keps/sig-network/3866-nftables-proxy/README.md
@@ -0,0 +1,1342 @@
+# KEP-3866: An nftables-based kube-proxy backend
+
+<!-- toc -->
+- [Release Signoff Checklist](#release-signoff-checklist)
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [The iptables kernel subsystem has unfixable performance problems](#the-iptables-kernel-subsystem-has-unfixable-performance-problems)
+  - [Upstream development has moved on from iptables to nftables](#upstream-development-has-moved-on-from-iptables-to-nftables)
+  - [The <code>ipvs</code> mode of kube-proxy will not save us](#the--mode-of-kube-proxy-will-not-save-us)
+  - [The <code>nf_tables</code> mode of <code>/sbin/iptables</code> will not save us](#the--mode-of--will-not-save-us)
+  - [The <code>iptables</code> mode of kube-proxy has grown crufty](#the--mode-of-kube-proxy-has-grown-crufty)
+  - [We will hopefully be able to trade 2 supported backends for 1](#we-will-hopefully-be-able-to-trade-2-supported-backends-for-1)
+  - [Writing a new kube-proxy mode may help with our &quot;KPNG&quot; goals](#writing-a-new-kube-proxy-mode-may-help-with-our-kpng-goals)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+- [Proposal](#proposal)
+  - [Notes/Constraints/Caveats](#notesconstraintscaveats)
+  - [Risks and Mitigations](#risks-and-mitigations)
+- [Design Details](#design-details)
+  - [High level](#high-level)
+  - [Low level](#low-level)
+    - [Tables](#tables)
+    - [Communicating with the kernel nftables subsystem](#communicating-with-the-kernel-nftables-subsystem)
+    - [Versioning and compatibility](#versioning-and-compatibility)
+    - [NAT rules](#nat-rules)
+      - [General Service dispatch](#general-service-dispatch)
+      - [Masquerading](#masquerading)
+      - [Session affinity](#session-affinity)
+    - [Filter rules](#filter-rules)
+      - [Dropping or rejecting packets for services with no endpoints](#dropping-or-rejecting-packets-for-services-with-no-endpoints)
+      - [Dropping traffic rejected by <code>LoadBalancerSourceRanges</code>](#dropping-traffic-rejected-by-)
+      - [Forcing traffic on <code>HealthCheckNodePorts</code> to be accepted](#forcing-traffic-on--to-be-accepted)
+    - [Future improvements](#future-improvements)
+  - [Test Plan](#test-plan)
+      - [Prerequisite testing updates](#prerequisite-testing-updates)
+      - [Unit tests](#unit-tests)
+      - [Integration tests](#integration-tests)
+      - [e2e tests](#e2e-tests)
+  - [Graduation Criteria](#graduation-criteria)
+  - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
+  - [Version Skew Strategy](#version-skew-strategy)
+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
+  - [Feature Enablement and Rollback](#feature-enablement-and-rollback)
+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
+  - [Monitoring Requirements](#monitoring-requirements)
+  - [Dependencies](#dependencies)
+  - [Scalability](#scalability)
+  - [Troubleshooting](#troubleshooting)
+- [Implementation History](#implementation-history)
+- [Drawbacks](#drawbacks)
+- [Alternatives](#alternatives)
+  - [Continue to improve the <code>iptables</code> mode](#continue-to-improve-the--mode)
+  - [Fix up the <code>ipvs</code> mode](#fix-up-the--mode)
+  - [Use an existing nftables-based kube-proxy implementation](#use-an-existing-nftables-based-kube-proxy-implementation)
+  - [Create an eBPF-based proxy implementation](#create-an-ebpf-based-proxy-implementation)
+<!-- /toc -->
+
+## Release Signoff Checklist
+
+Items marked with (R) are required *prior to targeting to a milestone / release*.
+
+- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+- [ ] (R) KEP approvers have approved the KEP status as `implementable`
+- [ ] (R) Design details are appropriately documented
+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
+  - [ ] e2e Tests for all Beta API Operations (endpoints)
+  - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
+  - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
+- [ ] (R) Graduation criteria is in place
+  - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
+- [ ] (R) Production readiness review completed
+- [ ] (R) Production readiness review approved
+- [ ] "Implementation History" section is up-to-date for milestone
+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
+
+[kubernetes.io]: https://kubernetes.io/
+[kubernetes/enhancements]: https://git.k8s.io/enhancements
+[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
+[kubernetes/website]: https://git.k8s.io/website
+
+## Summary
+
+The default kube-proxy implementation on Linux is currently based on
+iptables. IPTables was the preferred packet filtering and processing
+system in the Linux kernel for many years (starting with the 2.4
+kernel in 2001). However, problems with iptables led to the
+development of a successor, nftables, first made available in the 3.13
+kernel in 2014, and growing increasingly featureful and usable as a
+replacement for iptables since then. Development on iptables has
+mostly stopped, with new features and performance improvements
+primarily going into nftables instead.
+
+This KEP proposes the creation of a new official/supported nftables
+backend for kube-proxy. While it is hoped that this backend will
+eventually replace both the `iptables` and `ipvs` backends and become
+the default kube-proxy mode on Linux, that replacement/deprecation
+would be handled in a separate future KEP.
+
+## Motivation
+
+There are currently two officially supported kube-proxy backends for
+Linux: `iptables` and `ipvs`. (The original `userspace` backend was
+deprecated several releases ago and removed from the tree in 1.25.)
+
+The `iptables` mode of kube-proxy is currently the default, and it is
+generally considered "good enough" for most use cases. Nonetheless,
+there are good arguments for replacing it with a new `nftables` mode.
+
+### The iptables kernel subsystem has unfixable performance problems
+
+Although much work has been done to improve the performance of the
+kube-proxy `iptables` backend, there are fundamental
+performance-related problems with the implementation of iptables in
+the kernel, both on the "control plane" side and on the "data plane"
+side:
+
+  - The control plane is problematic because the iptables API does not
+    support making incremental changes to the ruleset. If you want to
+    add a single iptables rule, the iptables binary must acquire a lock,
+    download the entire ruleset from the kernel, find the appropriate
+    place in the ruleset to add the new rule, add it, re-upload the
+    entire ruleset to the kernel, and release the lock. This becomes
+    slower and slower as the ruleset increases in size (ie, as the
+    number of Kubernetes Services grows). If you want to replace a large
+    number of rules (as kube-proxy does frequently), then simply the
+    time that it takes `/sbin/iptables-restore` to parse all of the
+    rules becomes substantial.
+
+  - The data plane is problematic because (for the most part), the
+    number of iptables rules used to implement a set of Kubernetes
+    Services is directly proportional to the number of Services. And
+    every packet going through the system then needs to pass through
+    all of these rules, slowing down the traffic.
+
+IPTables is the bottleneck in kube-proxy performance, and it always
+will be until we stop using it.
+
+### Upstream development has moved on from iptables to nftables
+
+In large part due to its unfixable problems, development on iptables
+in the kernel has slowed down and mostly stopped. New features are not
+being added to iptables, because nftables is supposed to do everything
+iptables does, but better.
+
+Although there is no plan to remove iptables from the upstream kernel,
+that does not guarantee that iptables will remain supported by
+_distributions_ forever. In particular, Red Hat has declared that
+[iptables is deprecated in RHEL 9] and is likely to be removed
+entirely in RHEL 10, a few years from now. Other distributions have
+made smaller steps in the same direction; for instance, [Debian
+removed `iptables` from the set of "required" packages] in Debian 11
+(Bullseye).
+
+The RHEL deprecation in particular impacts Kubernetes in two ways:
+
+  1. Many Kubernetes users run RHEL or one of its downstreams, so in a
+     few years when RHEL 10 is released, they will be unable to use
+     kube-proxy in `iptables` mode (or, for that matter, in `ipvs` or
+     `userspace` mode, since those modes also make heavy use of the
+     iptables API).
+
+  2. Several upstream iptables bugs and performance problems that
+     affect Kubernetes have been fixed by Red Hat developers over the
+     past several years. With Red Hat no longer making any effort to
+     maintain iptables, it is less likely that upstream iptables bugs
+     that affect Kubernetes in the future would be fixed promptly, if
+     at all.
+
+[iptables is deprecated in RHEL 9]: https://access.redhat.com/solutions/6739041
+[Debian removed `iptables` from the set of "required" packages]: https://salsa.debian.org/pkg-netfilter-team/pkg-iptables/-/commit/c59797aab9
+
+### The `ipvs` mode of kube-proxy will not save us
+
+Because of the problems with iptables, some developers added an `ipvs`
+mode to kube-proxy in 2017. It was generally hoped that this could
+eventually solve all of the problems with the `iptables` mode and
+become its replacement, but this never really happened. It's not
+entirely clear why... [kubeadm #817], "Track when we can enable the
+ipvs mode for the kube-proxy by default" is perhaps a good snapshot of
+the initial excitement followed by growing disillusionment with the
+`ipvs` mode:
+
+  - "a few issues ... re: the version of iptables/ipset shipped in the
+    kube-proxy container image"
+  - "clearly not ready for defaulting"
+  - "complications ... with IPVS kernel modules missing or disabled on
+    user nodes"
+  - "we are still lacking tests"
+  - "still does not completely align with what [we] support in
+    iptables mode"
+  - "iptables works and people are familiar with it"
+  - "[not sure that it was ever intended for IPVS to be the default]"
+
+Additionally, the kernel IPVS APIs alone do not provide enough
+functionality to fully implement Kubernetes services, and so the
+`ipvs` backend also makes heavy use of the iptables API. Thus, if we
+are worried about iptables deprecation, then in order to switch to
+using `ipvs` as the default mode, we would have to port the iptables
+parts of it to use nftables anyway. But at that point, there would be
+little excuse for using IPVS for the core load-balancing part,
+particularly given that IPVS, like iptables, is no longer an
+actively-developed technology.
+
+[kubeadm #817]: https://github.com/kubernetes/kubeadm/issues/817
+[not sure that it was ever intended for IPVS to be the default]: https://en.wikipedia.org/wiki/The_Fox_and_the_Grapes
+
+### The `nf_tables` mode of `/sbin/iptables` will not save us
+
+In 2018, with the 1.8.0 release of the iptables client binaries, a new
+mode was added to the binaries, to allow them to use the nftables API
+in the kernel rather than the legacy iptables API, while still
+preserving the "API" of the original iptables binaries. As of 2022,
+most Linux distributions now use this mode, so the legacy iptables
+kernel API is mostly dead.
+
+However, this new mode does not add any new _syntax_, and so it is not
+possible to use any of the new nftables features (like maps) that are
+not present in iptables.
+
+Furthermore, the compatibility constraints imposed by the user-facing
+API of the iptables binaries themselves prevent them from being able
+to take advantage of many of the performance improvements associated
+with nftables.
+
+### The `iptables` mode of kube-proxy has grown crufty
+
+Because `iptables` is the default kube-proxy mode, it is subject to
+strong backward-compatibility constraints which mean that certain
+"features" that are now considered to be bad ideas cannot be removed
+because they might break some existing users. A few examples:
+
+  - It allows NodePort services to be accessed on `localhost`, which
+    requires it to set a sysctl to a value that may introduce security
+    holes on the system. More generally, it defaults to having
+    NodePort services be accessible on _all_ node IPs, when most users
+    would probably prefer them to be more restricted.
+
+  - It implements the `LoadBalancerSourceRanges` feature for traffic
+    addressed directly to LoadBalancer IPs, but not for traffic
+    redirected to a NodePort by an external LoadBalancer.
+
+  - Some new functionality only works correctly if the administrator
+    passes certain command-line options to kube-proxy (eg,
+    `--cluster-cidr`), but we cannot make those options be mandatory,
+    since that would break old clusters that aren't passing them.
+
+A new kube-proxy, which existing users would have to explicitly opt
+into, could revisit these and other decisions.
+
+### We will hopefully be able to trade 2 supported backends for 1
+
+Right now SIG Network is supporting both the `iptables` and `ipvs`
+backends of kube-proxy, and does not feel like it can ditch `ipvs`
+because of performance issues with `iptables`. If we create a new
+backend which is as functional and non-buggy as `iptables` but as
+performant as `ipvs`, then we could (eventually) deprecate both of the
+existing backends and only have one backend to support in the future.
+
+### Writing a new kube-proxy mode may help with our "KPNG" goals
+
+The [KPNG] (Kube-Proxy Next Generation) working group has been working
+on the future of kube-proxy's underlying architecture. They have
+recently proposed a [kube-proxy library KEP]. Creating a new proxy
+mode which will be officially supported, but which does not (yet) have
+the same compatibility and non-bugginess requirements as the
+`iptables` and `ipvs` modes should help with that project, because we
+can target the new backend to the new library without worrying about
+breaking the old backends.
+
+[KPNG]: https://github.com/kubernetes-sigs/kpng
+[kube-proxy library KEP]: https://github.com/kubernetes/enhancements/pull/3649
+
+### Goals
+
+- Design and implement an `nftables` mode for kube-proxy.
+
+    - Drop support for localhost nodeports
+
+    - Ensure that all configuration which is _required_ for full
+      functionality (eg, `--cluster-cidr`) is actually required,
+      rather than just logging warnings about missing functionality.
+
+    - Consider other fixes to legacy `iptables` mode behavior.
+
+- Come up with at least a vague plan to eventually make `nftables` the
+  default backend.
+
+- Decide whether we can/should deprecate or even remove the `iptables`
+  and/or `ipvs` backends. (Perhaps they can be pushed out of tree, a
+  la `cri-dockerd`.)
+
+- Take advantage of kube-proxy-related work being done by the kpng
+  working group.
+
+### Non-Goals
+
+- Falling into the same traps as the `ipvs` backend, to the extent
+  that we can identify what those traps were.
+
+## Proposal
+
+### Notes/Constraints/Caveats
+
+At least three nftables-based kube-proxy implementations already
+exist, but none of them seems suitable either to adopt directly or to
+use as a starting point:
+
+- [kube-nftlb]: This is built on top of a separate nftables-based load
+  balancer project called [nftlb], which means that rather than
+  translating Kubernetes Services directly into nftables rules, it
+  translates them into nftlb load balancer objects, which then get
+  translated into nftables rules. Besides making the code more
+  confusing for users who aren't already familiar with nftlb, this
+  also means that in many cases, new Service features would need to
+  have features added to the nftlb core first before kube-nftld could
+  consume them. (Also, it has not been updated in two years.)
+
+- [nfproxy]: Its README notes that "nfproxy is not a 1:1 copy of
+  kube-proxy (iptables) in terms of features. nfproxy is not going to
+  cover all corner cases and special features addressed by
+  kube-proxy". (Also, it has not been updated in two years.)
+
+- [kpng's nft backend]: This was written as a proof of concept and is
+  mostly a straightforward translation of the iptables rules to
+  nftables, and doesn't make good use of nftables features that would
+  let it reduce the total number of rules. It also makes heavy use of
+  kpng's APIs, like "DiffStore", which there is not consensus about
+  adopting upstream.
+
+[kube-nftlb]: https://github.com/zevenet/kube-nftlb
+[nftlb]: https://github.com/zevenet/nftlb
+[nfproxy]: https://github.com/sbezverk/nfproxy
+[kpng's nft backend]: https://github.com/kubernetes-sigs/kpng/tree/master/backends/nft
+
+### Risks and Mitigations
+
+The primary risk of the proposal is feature regressions, which will be
+addressed by testing, and by a slow, optional, rollout of the new proxy
+mode.
+
+The `nftables` mode should not pose any new security issues relative
+to the `iptables` mode.
+
+## Design Details
+
+### High level
+
+At a high level, the new mode should have the same architecture as the
+existing modes; it will use the service/endpoint-tracking code in
+`k8s.io/kubernetes/pkg/proxy` (or its eventual replacement from kpng)
+to watch for changes, and update rules in the kernel accordingly.
+
+### Low level
+
+Some details will be figured out as we implement it. We may start with
+an implementation that is architecturally closer to the `iptables`
+mode, and then rewrite it to take advantage of additional nftables
+features over time.
+
+#### Tables
+
+Unlike iptables, nftables does not have any reserved/default tables or
+chains (eg, `nat`, `PREROUTING`). Users are expected to create their
+own tables and chains for their own purposes. An nftables table can
+only contain rules for a single "family" (`ip` (v4), `ip6`, `inet`
+(both IPv4 and IPv6), `arp`, `bridge`, or `netdev`), but unlike in
+iptables, you can have both "filter"-type chains and "NAT"-type chains
+in the same table.
+
+So, we will create a single `kube_proxy` table in the `ip` family, and
+another in the `ip6` family. All of our chains, sets, maps, etc, will
+go into those tables. Other system components (eg, firewalld) should
+ignore our table, so we should not need to worry about watching for
+other people deleting our rules like we have to in the `iptables`
+backend.
+
+(In theory, instead of creating one table each in the `ip` and `ip6`
+families, we could create a single table in the `inet` family and put
+both IPv4 and IPv6 chains/rules there. However, this wouldn't really
+result in much simplification, because we would still need separate
+sets/maps to match IPv4 addresses and IPv6 addresses. (There is no
+data type that can store/match either an IPv4 address or an IPv6
+address.) Furthermore, because of how Kubernetes Services evolved in
+parallel with the existing kube-proxy implementation, we have ended up
+with a dual-stack Service semantics that is most easily implemented by
+handling IPv4 and IPv6 completely separately anyway.)
+
+#### Communicating with the kernel nftables subsystem
+
+At least initially, we will use the `nft` command-line tool to read
+and write rules, much like how we use command-line tools in the
+`iptables` and `ipvs` backends. However, the `nft` tool is mostly just
+a thin wrapper around `libnftables`, and it would be possible to use
+that directly instead in the future, given a cgo wrapper.
+
+When reading data from the kernel (`nft list ...`), `nft` outputs the
+data in a nested "object" form:
+
+```
+table ip kube_proxy {
+  comment "Kubernetes service proxying rules";
+
+  chain services {
+    ip daddr . ip protocol . th dport vmap @service_ips
+  }
+}
+```
+
+(This is the "native" nftables syntax, but the tools also support a
+JSON syntax that may be easier for us to work with...)
+
+When writing data to the kernel, `nft` accepts the data in either the
+same "object" form used by `nft list`, or in the form of a set of
+`nft` command lines without the leading "`nft`" (which are then
+executed atomically):
+
+```
+add table ip kube_proxy { comment "Kubernetes service proxying rules"; }
+add chain ip kube_proxy services
+add rule ip kube_proxy services ip daddr . ip protocol . th dport vmap @service_ips
+```
+
+The "object" form is more logical and easy to understand, but the
+"command" form is better for dynamic usage. In particular, it allows
+you to add and remove individual chains, rules, map/set elements, etc,
+without needing to also include the chains/rules/elements that you are
+not modifying.
+
+The examples below all show the "object" form of data, but it should
+be understood that these are examples of what would be seen in `nft
+list` output after kube-proxy creates the rules (with additional
+`#`-preceded comments added to help the KEP reader), not examples of
+the data we will actually be passing to `nft`.
+
+The examples below are also all IPv4-specific, for simplicity. When
+actually writing out rules for nft, we will need to switch between,
+e.g., "`ip daddr`" and "`ip6 daddr`" appropriately, to match an IPv4
+or IPv6 destination address. This will actually be fairly simple
+because the `nft` command lets you create "variables" (really
+constants) and substitute their values into the rules. Thus, we can
+just always have the rule-generating code write "`$IP daddr`", and
+then pass either "`-D IP=ip`" or "`-D IP=ip6`" to `nft` to fix it up.)
+
+(Also, most of the examples below have not actually been tested and
+may have syntax errors. Caveat lector.)
+
+#### Versioning and compatibility
+
+Since nftables is subject to much more development than iptables has
+been recently, we will need to pay more attention to kernel and tool
+versions.
+
+The `nft` command has a `--check` option which can be used to check if
+a command could be run successfully; it parses the input, and then
+(assuming success), uploads the data to the kernel and asks the kernel
+to check it (but not actually act on it) as well. Thus, with a few
+`nft --check` runs at startup we should be able to confirm what
+features are known to both the tooling and the kernel.
+
+It is not yet clear what the minimum kernel or `nft` command-line
+versions needed by the `nftables` backend will be. The newest feature
+used in the examples below was added in Linux 5.6, released in March
+2020 (though they could be rewritten to not need that feature).
+
+It is possible some users will not be able to upgrade from the
+`iptables` and `ipvs` backends to `nftables`. (Certainly the
+`nftables` backend will not support RHEL 7, which some people are
+still using Kubernetes with.)
+
+#### NAT rules
+
+##### General Service dispatch
+
+For ClusterIP and external IP services, we will use an nftables
+"verdict map" to store the logic about where to dispatch traffic,
+based on destination IP, protocol, and port. We will then need only a
+single actual rule to apply the verdict map to all inbound traffic.
+(Or it may end up making more sense to have separate verdict maps for
+ClusterIP, ExternalIP, and LoadBalancer IP?) Likewise, for NodePort
+traffic, we will use a verdict map matching only on destination
+protocol / port, with the rules set up to only check the `nodeports`
+map for packets addressed to a local IP.
+
+```
+map service_ips {
+  comment "ClusterIP, ExternalIP and LoadBalancer IP traffic";
+
+  # The "type" clause defines the map's datatype; the key type is to
+  # the left of the ":" and the value type to the right. The map key
+  # in this case is a concatenation (".") of three values; an IPv4
+  # address, a protocol (tcp/udp/sctp), and a port (aka
+  # "inet_service"). The map value is a "verdict", which is one of a
+  # limited set of nftables actions. In this case, the verdicts are
+  # all "goto" statements.
+
+  type ipv4_addr . inet_proto . inet_service : verdict;
+
+  elements {
+    172.30.0.44 . tcp . 80 : goto svc_4SW47YFZTEDKD3PK,
+    192.168.99.33 . tcp . 80 : goto svc_4SW47YFZTEDKD3PK,
+    ...
+  }
+}
+
+map service_nodeports {
+  comment "NodePort traffic";
+  type inet_proto . inet_service : verdict;
+
+  elements {
+    tcp . 3001 : goto svc_4SW47YFZTEDKD3PK,
+    ...
+  }
+}
+
+chain prerouting {
+  jump services
+  jump nodeports
+}
+
+chain services {
+  # Construct a key from the destination address, protocol, and port,
+  # then look that key up in the `service_ips` vmap and take the
+  # associated action if it is found.
+
+  ip daddr . ip protocol . th dport vmap @service_ips
+}
+
+chain nodeports
+  # Return if the destination IP is non-local, or if it's localhost.
+  fib daddr type != local return
+  ip daddr == 127.0.0.1 return
+
+  # If --nodeport-addresses was in use then the above would instead be
+  # something like:
+  #   ip daddr != { 192.168.1.5, 192.168.3.10 } return
+
+  # dispatch on the service_nodeports vmap
+  ip protocol . th dport vmap @service_nodeports
+}
+
+# Example per-service chain
+chain svc_4SW47YFZTEDKD3PK {
+  # Send to random endpoint chain using an inline vmap
+  numgen random mod 2 vmap {
+    0 : goto sep_UKSFD7AGPMPPLUHC,
+    1 : goto sep_C6EBXVWJJZMIWKLZ
+  }
+}
+
+# Example per-endpoint chain
+chain sep_UKSFD7AGPMPPLUHC {
+  # masquerade hairpin traffic
+  ip saddr 10.180.0.4 jump mark_for_masquerade
+
+  # send to selected endpoint
+  dnat to 10.180.0.4:8000
+}
+```
+
+##### Masquerading
+
+The example rules above include
+
+```
+  ip saddr 10.180.0.4 jump mark_for_masquerade
+```
+
+to masquerade hairpin traffic, as in the `iptables` proxier. This
+assumes the existence of a `mark_for_masquerade` chain, not shown.
+
+nftables has the same constraints on DNAT and masquerading as iptables
+does; you can only DNAT from the "prerouting" stage and you can only
+masquerade from the "postrouting" stage. Thus, as with `iptables`, the
+`nftables` proxy will have to handle DNAT and masquerading at separate
+times. One possibility would be to simply copy the existing logic from
+the `iptables` proxy, using the packet mark to communicate from the
+prerouting chains to the postrouting ones.
+
+However, it should be possible to do this in nftables without using
+the mark or any other externally-visible state; we can just create an
+nftables `set`, and use that to communicate information between the
+chains. Something like:
+
+```
+# Set of 5-tuples of connections that need masquerading
+set need_masquerade {
+  type ipv4_addr . inet_service . ipv4_addr . inet_service . inet_proto;
+  flags timeout ; timeout 5s ;
+}
+
+chain mark_for_masquerade {
+  update @need_masquerade { ip saddr . th sport . ip daddr . th dport . ip protocol }
+}
+
+chain postrouting_do_masquerade {
+  # We use "ct original ip daddr" and "ct original proto-dst" here
+  # since the packet may have been DNATted by this point.
+
+  ip saddr . th sport . ct original ip daddr . ct original proto-dst . ip protocol @need_masquerade masquerade
+}
+```
+
+This is not yet tested, but some kernel nftables developers have
+confirmed that it ought to work.
+
+##### Session affinity
+
+Session affinity can be done in roughly the same way as in the
+`iptables` proxy, just using the more general nftables "set" framework
+rather than the affinity-specific version of sets provided by the
+iptables `recent` module. In fact, since nftables allows arbitrary set
+keys, we can optimize relative to `iptables`, and only have a single
+affinity set per service, rather than one per endpoint. (And we also
+have the flexibility to change the affinity key in the future if we
+want to, eg to key on source IP+port rather than just source IP.)
+
+```
+set affinity_4SW47YFZTEDKD3PK {
+  # Source IP . Destination IP . Destination Port
+  type ipv4_addr . ipv4_addr . inet_service;
+  flags timeout; timeout 3h;
+}
+
+chain svc_4SW47YFZTEDKD3PK {
+  # Check for existing session affinity against each endpoint
+  ip saddr . 10.180.0.4 . 80 @affinity_4SW47YFZTEDKD3PK goto sep_UKSFD7AGPMPPLUHC
+  ip saddr . 10.180.0.5 . 80 @affinity_4SW47YFZTEDKD3PK goto sep_C6EBXVWJJZMIWKLZ
+
+  # Send to random endpoint chain
+  numgen random mod 2 vmap {
+    0 : goto sep_UKSFD7AGPMPPLUHC,
+    1 : goto sep_C6EBXVWJJZMIWKLZ
+  }
+}
+
+chain sep_UKSFD7AGPMPPLUHC {
+  # Mark the source as having affinity for this endpoint
+  update @affinity_4SW47YFZTEDKD3PK { ip saddr . 10.180.0.4 . 80 }
+
+  ip saddr 10.180.0.4 jump mark_for_masquerade
+  dnat to 10.180.0.4:8000
+}
+
+# likewise for other endpoint(s)...
+```
+
+#### Filter rules
+
+The `iptables` mode uses the `filter` table for three kinds of rules:
+
+##### Dropping or rejecting packets for services with no endpoints
+
+As with service dispatch, this is easily handled with a verdict map:
+
+```
+map no_endpoint_services {
+  type ipv4_addr . inet_proto . inet_service : verdict
+  elements = {
+    192.168.99.22 . tcp . 80 : drop,
+    172.30.0.46 . tcp . 80 : goto reject_chain,
+    1.2.3.4 . tcp . 80 : drop
+  }
+}
+
+chain filter {
+  ...
+  ip daddr . ip protocol . th dport vmap @no_endpoint_services
+  ...
+}
+
+# helper chain needed because "reject" is not a "verdict" and so can't
+# be used directly in a verdict map
+chain reject_chain {
+  reject
+}
+```
+
+##### Dropping traffic rejected by `LoadBalancerSourceRanges`
+
+The implementation of LoadBalancer source ranges will be similar to
+the ipset-based implementation in the `ipvs` kube proxy: we use one
+set to recognize "traffic that is subject to source ranges", and then
+another to recognize "traffic that is _accepted_ by its service's
+source ranges". Traffic which matches the first set but not the second
+gets dropped:
+
+```
+set firewall {
+  comment "destinations that are subject to LoadBalancerSourceRanges";
+  type ipv4_addr . inet_proto . inet_service
+}
+set firewall_allow {
+  comment "destination+sources that are allowed by LoadBalancerSourceRanges";
+  type ipv4_addr . inet_proto . inet_service . ipv4_addr
+}
+
+chain filter {
+  ...
+  ip daddr . ip protocol . th dport @firewall jump firewall_check
+  ...
+}
+
+chain firewall_check {
+  ip daddr . ip protocol . th dport . ip saddr @firewall_allow return
+  drop
+}
+```
+
+Where, eg, adding a Service with LoadBalancer IP `10.1.2.3`, port
+`80`, and source ranges `["192.168.0.3/32", "192.168.1.0/24"]` would
+result in:
+
+```
+add element ip kube_proxy firewall { 10.1.2.3 . tcp . 80 }
+add element ip kube_proxy firewall { 10.1.2.3 . tcp . 80 }
+add element ip kube_proxy firewall_allow { 10.1.2.3 . tcp . 80 . 192.168.0.3/32 }
+add element ip kube_proxy firewall_allow { 10.1.2.3 . tcp . 80 . 192.168.1.0/24 }
+```
+
+##### Forcing traffic on `HealthCheckNodePorts` to be accepted
+
+The `iptables` mode adds rules to ensure that traffic to NodePort
+services' health check ports is allowed through the firewall. eg:
+
+```
+-A KUBE-NODEPORTS -m comment --comment "ns2/svc2:p80 health check node port" -m tcp -p tcp --dport 30000 -j ACCEPT
+```
+
+(There are also rules to accept any traffic that has already been
+tagged by conntrack.)
+
+This cannot be done reliably in nftables; the `accept` and `drop`
+rules work differently than they do in iptables, and so if there is a
+firewall that would drop traffic to that port, then there is no
+guaranteed way to "sneak behind its back" like you can in iptables; we
+would need to actually properly configure _that firewall_ to accept
+the packets.
+
+However, these sorts of rules are somewhat legacy anyway; they work
+(in the `iptables` proxy) to bypass a _local_ firewall, but they would
+do nothing to bypass a firewall implemented at the cloud network
+layer, which is perhaps a more common configuration these days anyway.
+Administrators using non-local firewalls are already required to
+configure those firewalls correctly to allow Kubernetes traffic
+through, and it is reasonable for us to just extend that requirement
+to administrators using local firewalls as well.
+
+Thus, the `nftables` backend will not attempt to replicate these
+`iptables`-backend rules.
+
+#### Future improvements
+
+Further improvements are likely possible.
+
+For example, it would be nice to not need a separate "hairpin" check for
+every endpoint. There is no way to ask directly "does this packet have
+the same source and destination IP?", but the proof-of-concept [kpng
+nftables backend] does this instead:
+
+```
+set hairpin {
+  type ipv4_addr . ipv4_addr;
+  elements {
+    10.180.0.4 . 10.180.0.4,
+    10.180.0.5 . 10.180.0.5,
+    ...
+  }
+}
+
+chain ... {
+  ...
+  ip saddr . ip daddr @hairpin jump mark_for_masquerade
+}
+```
+
+More efficiently, if nftables eventually got the ability to call eBPF
+programs as part of rule processing (like iptables's `-m ebpf`) then
+we could write a trivial eBPF program to check "source IP equals
+destination IP" and then call that rather than needing the giant set
+of redundant IPs.
+
+If we do this, then we don't need the per-endpoint hairpin check
+rules. If we could also get rid of the per-endpoint affinity-updating
+rules, then we could get rid of the per-endpoint chains entirely,
+since `dnat to ...` is an allowed vmap verdict:
+
+```
+chain svc_4SW47YFZTEDKD3PK {
+  # FIXME handle affinity somehow
+
+  # Send to random endpoint
+  random mod 2 vmap {
+    0 : dnat to 10.180.0.4:8000
+    1 : dnat to 10.180.0.5:8000
+  }
+}
+```
+
+With the current set of nftables functionality, it does not seem
+possible to do this (in the case where affinity is in use), but future
+features may make it possible.
+
+It is not yet clear what the tradeoffs of such rewrites are, either in
+terms of runtime performance, or of admin/developer-comprehensibility
+of the ruleset.
+
+[kpng nftables backend]: https://github.com/kubernetes-sigs/kpng/tree/master/backends/nft
+
+### Test Plan
+
+<!--
+**Note:** *Not required until targeted at a release.*
+The goal is to ensure that we don't accept enhancements with inadequate testing.
+
+All code is expected to have adequate tests (eventually with coverage
+expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
+when drafting this test plan.
+
+[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
+-->
+
+[X] I/we understand the owners of the involved components may require updates to
+existing tests to make this code solid enough prior to committing the changes necessary
+to implement this enhancement.
+
+##### Prerequisite testing updates
+
+<!--
+Based on reviewers feedback describe what additional tests need to be added prior
+implementing this enhancement to ensure the enhancements have also solid foundations.
+-->
+
+##### Unit tests
+
+We will add unit tests for the `nftables` mode that are equivalent to
+the ones for the `iptables` mode. In particular, we will port over the
+tests that feed Services and EndpointSlices into the proxy engine,
+dump the generated ruleset, and then mock running packets through the
+ruleset to determine how they would behave.
+
+The `cmd/kube-proxy/app` tests mostly only test configuration parsing,
+and we will extend them to understand the new mode and its associated
+configuration options, but there will not be many changes made there.
+
+<!--
+Additionally, for Alpha try to enumerate the core package you will be touching
+to implement this enhancement and provide the current unit coverage for those
+in the form of:
+- <package>: <date> - <current test coverage>
+The data can be easily read from:
+https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
+
+This can inform certain test coverage improvements that we want to do before
+extending the production code to implement this enhancement.
+-->
+
+- `<package>`: `<date>` - `<test coverage>`
+
+##### Integration tests
+
+Kube-proxy does not have integration tests.
+
+##### e2e tests
+
+Most of the e2e testing of kube-proxy is backend-agnostic. Initially,
+we will need a separate e2e job to test the nftables mode (like we do
+with ipvs). Eventually, if nftables becomes the default, then this
+would be flipped around to having a legacy "iptables" job.
+
+The handful of e2e tests that specifically examine iptables rules will
+need to be updated to be able to work with either backend.
+
+<!--
+This question should be filled when targeting a release.
+For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
+
+For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
+https://storage.googleapis.com/k8s-triage/index.html
+
+We expect no non-infra related flakes in the last month as a GA graduation criteria.
+-->
+
+- <test>: <link to test coverage>
+
+### Graduation Criteria
+
+<!--
+**Note:** *Not required until targeted at a release.*
+
+Define graduation milestones.
+
+These may be defined in terms of API maturity, [feature gate] graduations, or as
+something else. The KEP should keep this high-level with a focus on what
+signals will be looked at to determine graduation.
+
+Consider the following in developing the graduation criteria for this enhancement:
+- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
+- [Feature gate][feature gate] lifecycle
+- [Deprecation policy][deprecation-policy]
+
+Clearly define what graduation means by either linking to the [API doc
+definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning)
+or by redefining what graduation means.
+
+In general we try to use the same stages (alpha, beta, GA), regardless of how the
+functionality is accessed.
+
+[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
+[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
+[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
+
+Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].
+
+#### Alpha
+
+- Feature implemented behind a feature flag
+- Initial e2e tests completed and enabled
+
+#### Beta
+
+- Gather feedback from developers and surveys
+- Complete features A, B, C
+- Additional tests are in Testgrid and linked in KEP
+
+#### GA
+
+- N examples of real-world usage
+- N installs
+- More rigorous forms of testing—e.g., downgrade tests and scalability tests
+- Allowing time for feedback
+
+**Note:** Generally we also wait at least two releases between beta and
+GA/stable, because there's no opportunity for user feedback, or even bug reports,
+in back-to-back releases.
+
+**For non-optional features moving to GA, the graduation criteria must include
+[conformance tests].**
+
+[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
+
+#### Deprecation
+
+- Announce deprecation and support policy of the existing flag
+- Two versions passed since introducing the functionality that deprecates the flag (to address version skew)
+- Address feedback on usage/changed behavior, provided on GitHub issues
+- Deprecate the flag
+-->
+
+### Upgrade / Downgrade Strategy
+
+The new mode should not introduce any upgrade/downgrade problems,
+excepting that you can't downgrade or feature-disable a cluster using
+the new kube-proxy mode without switching it back to `iptables` or
+`ipvs` first.
+
+When rolling out or rolling back the feature, it should be safe to
+enable the feature gate and change the configuration at the same time,
+since nothing cares about the feature gate except for kube-proxy
+itself. Likewise, it is expected to be safe to roll out the feature in
+a live cluster, even though this will result in different proxy modes
+running on different nodes, because Kubernetes service proxying is
+defined in such a way that no node needs to be aware of the
+implementation details of the service proxy implementation on any
+other node.
+
+(However, see the notes below in [Feature Enablement and
+Rollback](#feature-enablement-and-rollback) about stale rule cleanup
+when switching modes.)
+
+### Version Skew Strategy
+
+The feature is isolated to kube-proxy and does not introduce any API
+changes, so the versions of other components do not matter.
+
+## Production Readiness Review Questionnaire
+
+<!--
+
+Production readiness reviews are intended to ensure that features merging into
+Kubernetes are observable, scalable and supportable; can be safely operated in
+production environments, and can be disabled or rolled back in the event they
+cause increased failures in production. See more in the PRR KEP at
+https://git.k8s.io/enhancements/keps/sig-architecture/1194-prod-readiness.
+
+The production readiness review questionnaire must be completed and approved
+for the KEP to move to `implementable` status and be included in the release.
+
+In some cases, the questions below should also have answers in `kep.yaml`. This
+is to enable automation to verify the presence of the review, and to reduce review
+burden and latency.
+
+The KEP must have a approver from the
+[`prod-readiness-approvers`](http://git.k8s.io/enhancements/OWNERS_ALIASES)
+team. Please reach out on the
+[#prod-readiness](https://kubernetes.slack.com/archives/CPNHUMN74) channel if
+you need any help or guidance.
+-->
+
+### Feature Enablement and Rollback
+
+<!--
+This section must be completed when targeting alpha to a release.
+-->
+
+###### How can this feature be enabled / disabled in a live cluster?
+
+The administrator must enable the feature gate to make the feature
+available, and then must run kube-proxy with the
+`--proxy-mode=nftables` flag.
+
+Kube-proxy does not delete its rules on exit (to avoid service
+interruptions when restarting/upgrading kube-proxy, or if it crashes).
+This means that when switching between proxy modes, it is necessary
+for the administrator to ensure that the rules created by the old
+proxy mode get deleted. (Failure to do so may result in stale service
+rules being left behind for an arbitrarily long time.) The simplest
+way to do this is to reboot each node when switching from one proxy
+mode to another, but it is also possible to run kube-proxy in "cleanup
+and exit" mode, eg:
+
+```
+kube-proxy --proxy-mode=iptables --cleanup
+```
+
+- [X] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name: NFTablesKubeProxy
+  - Components depending on the feature gate:
+      - kube-proxy
+- [X] Other
+  - Describe the mechanism:
+      - See above
+  - Will enabling / disabling the feature require downtime of the control
+    plane?
+      - No
+  - Will enabling / disabling the feature require downtime or reprovisioning
+    of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
+      - See above
+
+###### Does enabling the feature change any default behavior?
+
+Enabling the feature gate does not change any behavior; it just makes
+the `--proxy-mode=nftables` option available.
+
+Switching from `--proxy-mode=iptables` or `--proxy-mode=ipvs` to
+`--proxy-mode=nftables` will likely change some behavior, depending
+on what we decide to do about certain un-loved kube-proxy features
+like localhost nodeports.
+
+###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
+
+Yes, though the same caveat about rebooting or running `kube-proxy
+--cleanup` applies as in the "enabling" case.
+
+Of course, if the user is rolling back, that suggests that the
+`nftables` mode was not working correctly, in which case the
+`--cleanup` option may _also_ not work correctly, so rebooting the
+node is safer.
+
+###### What happens if we reenable the feature if it was previously rolled back?
+
+It should just work.
+
+###### Are there any tests for feature enablement/disablement?
+
+<!--
+The e2e framework does not currently support enabling or disabling feature
+gates. However, unit tests in each component dealing with managing data, created
+with and without the feature, are necessary. At the very least, think about
+conversion tests if API types are being modified.
+
+Additionally, for features that are introducing a new API field, unit tests that
+are exercising the `switch` of feature gate itself (what happens if I disable a
+feature gate after having objects written with the new field) are also critical.
+You can take a look at one potential example of such test in:
+https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
+-->
+
+### Rollout, Upgrade and Rollback Planning
+
+<!--
+This section must be completed when targeting beta to a release.
+-->
+
+###### How can a rollout or rollback fail? Can it impact already running workloads?
+
+<!--
+Try to be as paranoid as possible - e.g., what if some components will restart
+mid-rollout?
+
+Be sure to consider highly-available clusters, where, for example,
+feature flags will be enabled on some API servers and not others during the
+rollout. Similarly, consider large clusters and how enablement/disablement
+will rollout across nodes.
+-->
+
+###### What specific metrics should inform a rollback?
+
+<!--
+What signals should users be paying attention to when the feature is young
+that might indicate a serious problem?
+-->
+
+###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
+
+<!--
+Describe manual testing that was done and the outcomes.
+Longer term, we may want to require automated upgrade/rollback tests, but we
+are missing a bunch of machinery and tooling and can't do that now.
+-->
+
+###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
+
+<!--
+Even if applying deprecation policies, they may still surprise some users.
+-->
+
+### Monitoring Requirements
+
+<!--
+This section must be completed when targeting beta to a release.
+
+For GA, this section is required: approvers should be able to confirm the
+previous answers based on experience in the field.
+-->
+
+###### How can an operator determine if the feature is in use by workloads?
+
+The feature is used by the cluster as a whole, and the operator would
+know that it was in use from looking at the cluster configuration.
+
+###### How can someone using this feature know that it is working for their instance?
+
+- [X] Other (treat as last resort)
+  - Details: If Services still work then the feature is working
+
+###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
+
+<!--
+This is your opportunity to define what "normal" quality of service looks like
+for a feature.
+
+It's impossible to provide comprehensive guidance, but at the very
+high level (needs more precise definitions) those may be things like:
+  - per-day percentage of API calls finishing with 5XX errors <= 1%
+  - 99% percentile over day of absolute value from (job creation time minus expected
+    job creation time) for cron job <= 10%
+  - 99.9% of /health requests per day finish with 200 code
+
+These goals will help you determine what you need to measure (SLIs) in the next
+question.
+-->
+
+###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
+
+- [X] Metrics
+  - Metric names:
+      - ...
+  - Components exposing the metric:
+      - kube-proxy
+
+###### Are there any missing metrics that would be useful to have to improve observability of this feature?
+
+<!--
+Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
+implementation difficulties, etc.).
+-->
+
+### Dependencies
+
+<!--
+This section must be completed when targeting beta to a release.
+-->
+
+###### Does this feature depend on any specific services running in the cluster?
+
+It may require a newer kernel than some current users have. It does
+not depend on anything else in the cluster.
+
+### Scalability
+
+<!--
+For alpha, this section is encouraged: reviewers should consider these questions
+and attempt to answer them.
+
+For beta, this section is required: reviewers must answer these questions.
+
+For GA, this section is required: approvers should be able to confirm the
+previous answers based on experience in the field.
+-->
+
+###### Will enabling / using this feature result in any new API calls?
+
+Probably not; kube-proxy will still be using the same
+Service/EndpointSlice-monitoring code, it will just be doing different
+things locally with the results.
+
+###### Will enabling / using this feature result in introducing new API types?
+
+No
+
+###### Will enabling / using this feature result in any new calls to the cloud provider?
+
+No
+
+###### Will enabling / using this feature result in increasing size or count of the existing API objects?
+
+No
+
+###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
+
+No
+
+###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
+
+It is not expected to...
+
+### Troubleshooting
+
+<!--
+This section must be completed when targeting beta to a release.
+
+For GA, this section is required: approvers should be able to confirm the
+previous answers based on experience in the field.
+
+The Troubleshooting section currently serves the `Playbook` role. We may consider
+splitting it into a dedicated `Playbook` document (potentially with some monitoring
+details). For now, we leave it here.
+-->
+
+###### How does this feature react if the API server and/or etcd is unavailable?
+
+The same way that kube-proxy currently does; updates stop being
+processed until the apiserver is available again.
+
+###### What are other known failure modes?
+
+<!--
+For each of them, fill in the following information by copying the below template:
+  - [Failure mode brief description]
+    - Detection: How can it be detected via metrics? Stated another way:
+      how can an operator troubleshoot without logging into a master or worker node?
+    - Mitigations: What can be done to stop the bleeding, especially for already
+      running user workloads?
+    - Diagnostics: What are the useful log messages and their required logging
+      levels that could help debug the issue?
+      Not required until feature graduated to beta.
+    - Testing: Are there any tests for failure mode? If not, describe why.
+-->
+
+###### What steps should be taken if SLOs are not being met to determine the problem?
+
+## Implementation History
+
+- Initial proposal: 2023-02-01
+
+## Drawbacks
+
+Adding a new officially-supported kube-proxy implementation implies
+more work for SIG Network (especially if we are not able to deprecate
+either of the existing backends soon).
+
+Replacing the default kube-proxy implementation will affect many
+users.
+
+However, doing nothing would result in a situation where, eventually,
+many users would be unable to use the default proxy implementation.
+
+## Alternatives
+
+### Continue to improve the `iptables` mode
+
+We have made many improvements to the `iptables` mode, and could make
+more. In particular, we could make the `iptables` mode use IP sets
+like the `ipvs` mode does.
+
+However, even if we could solve literally all of the performance
+problems with the `iptables` mode, there is still the looming
+deprecation issue.
+
+(See also "[The iptables kernel subsystem has unfixable performance
+problems](#the-iptables-kernel-subsystem-has-unfixable-performance-problems)".)
+
+### Fix up the `ipvs` mode
+
+Rather than implementing an entirely new `nftables` kube-proxy mode,
+we could try to fix up the existing `ipvs` mode.
+
+However, the `ipvs` mode makes extensive use of the iptables API in
+addition to the IPVS API. So while it solves the performance problems
+with the `iptables` mode, it does not address the deprecation issue.
+So we would at least have to rewrite it to be IPVS+nftables rather
+than IPVS+iptables.
+
+(See also "[The <code>ipvs</code> mode of kube-proxy will not save
+us](#the--mode-of-kube-proxy-will-not-save-us)".)
+
+### Use an existing nftables-based kube-proxy implementation
+
+Discussed in [Notes/Constraints/Caveats](#notesconstraintscaveats).
+
+### Create an eBPF-based proxy implementation
+
+Another possibility would be to try to replace the `iptables` and
+`ipvs` modes with an eBPF-based proxy backend, instead of an an
+nftables one. eBPF is very trendy, but it is also notoriously
+difficult to work with.
+
+One problem with this approach is that the APIs to access conntrack
+information from eBPF programs only exist in the very newest kernels.
+In particular, the API for NATting a connection from eBPF was only
+added in the recently-released 6.1 kernel. It will be a long time
+before a majority of Kubernetes users have a kernel new enough that we
+can depend on that API.
+
+Thus, an eBPF-based kube-proxy implementation would initially need a
+number of workarounds for missing functionality, adding to its
+complexity (and potentially forcing architectural choices that would
+not otherwise be necessary, to support the workarounds).
+
+One interesting eBPF-based approach for service proxying is to use
+eBPF to intercept the `connect()` call in pods, and rewrite the
+destination IP before the packets are even sent. In this case, eBPF
+conntrack support is not needed (though it would still be needed for
+non-local service connections, such as connections via NodePorts). One
+nice feature of this approach is that it integrates well with possible
+future "multi-network Service" ideas, in which a pod might connect to
+a service IP that resolves to an IP on a secondary network which is
+only reachable by certain pods. In the case of a "normal" service
+proxy that does destination IP rewriting in the host network
+namespace, this would result in a packet that was undeliverable
+(because the host network namespace has no route to the isolated
+secondary pod network), but a service proxy that does `connect()`-time
+rewriting would rewrite the connection before it ever left the pod
+network namespace, allowing the connection to proceed.
+
+The multi-network effort is still in the very early stages, and it is
+not clear that it will actually adopt a model of multi-network
+Services that works this way. (It is also _possible_ to make such a
+model work with a mostly-host-network-based proxy implementation; it's
+just more complicated.)
+
diff --git a/keps/sig-network/3866-nftables-proxy/kep.yaml b/keps/sig-network/3866-nftables-proxy/kep.yaml
new file mode 100644
index 000000000000..0549e182a374
--- /dev/null
+++ b/keps/sig-network/3866-nftables-proxy/kep.yaml
@@ -0,0 +1,39 @@
+title: An nftables-based kube-proxy backend
+kep-number: 3866
+authors:
+  - "@danwinship"
+owning-sig: sig-network
+status: provisional
+creation-date: 2023-02-01
+reviewers:
+  - "@thockin"
+  - "@dcbw"
+  - "@aojea"
+approvers:
+  - "@thockin"
+
+# The target maturity stage in the current dev cycle for this KEP.
+stage: alpha
+
+# The most recent milestone for which work toward delivery of this KEP has been
+# done. This can be the current (upcoming) milestone, if it is being actively
+# worked on.
+latest-milestone: "v1.27"
+
+# The milestone at which this feature was, or is targeted to be, at each stage.
+milestone:
+  alpha: "v1.28"
+  beta: "v1.30"
+  stable: "v1.32"
+
+# The following PRR answers are required at alpha release
+# List the feature gate name and the components for which it must be enabled
+feature-gates:
+  - name: NFTablesKubeProxy
+    components:
+      - kube-proxy
+disable-supported: true
+
+# The following PRR answers are required at beta release
+metrics:
+  - ...