forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ipvlan: Initial check-in of the IPVLAN driver.
This driver is very similar to the macvlan driver except that it uses L3 on the frame to determine the logical interface while functioning as packet dispatcher. It inherits L2 of the master device hence the packets on wire will have the same L2 for all the packets originating from all virtual devices off of the same master device. This driver was developed keeping the namespace use-case in mind. Hence most of the examples given here take that as the base setup where main-device belongs to the default-ns and virtual devices are assigned to the additional namespaces. The device operates in two different modes and the difference in these two modes in primarily in the TX side. (a) L2 mode : In this mode, the device behaves as a L2 device. TX processing upto L2 happens on the stack of the virtual device associated with (namespace). Packets are switched after that into the main device (default-ns) and queued for xmit. RX processing is simple and all multicast, broadcast (if applicable), and unicast belonging to the address(es) are delivered to the virtual devices. (b) L3 mode : In this mode, the device behaves like a L3 device. TX processing upto L3 happens on the stack of the virtual device associated with (namespace). Packets are switched to the main-device (default-ns) for the L2 processing. Hence the routing table of the default-ns will be used in this mode. RX processins is somewhat similar to the L2 mode except that in this mode only Unicast packets are delivered to the virtual device while main-dev will handle all other packets. The devices can be added using the "ip" command from the iproute2 package - ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ] Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Laurent Chavey <chavey@google.com> Cc: Tim Hockin <thockin@google.com> Cc: Brandon Philips <brandon.philips@coreos.com> Cc: Pavel Emelianov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
- Loading branch information
Showing
9 changed files
with
1,678 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
|
||
IPVLAN Driver HOWTO | ||
|
||
Initial Release: | ||
Mahesh Bandewar <maheshb AT google.com> | ||
|
||
1. Introduction: | ||
This is conceptually very similar to the macvlan driver with one major | ||
exception of using L3 for mux-ing /demux-ing among slaves. This property makes | ||
the master device share the L2 with it's slave devices. I have developed this | ||
driver in conjuntion with network namespaces and not sure if there is use case | ||
outside of it. | ||
|
||
|
||
2. Building and Installation: | ||
In order to build the driver, please select the config item CONFIG_IPVLAN. | ||
The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module | ||
(CONFIG_IPVLAN=m). | ||
|
||
|
||
3. Configuration: | ||
There are no module parameters for this driver and it can be configured | ||
using IProute2/ip utility. | ||
|
||
ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 } | ||
|
||
e.g. ip link add link ipvl0 eth0 type ipvlan mode l2 | ||
|
||
|
||
4. Operating modes: | ||
IPvlan has two modes of operation - L2 and L3. For a given master device, | ||
you can select one of these two modes and all slaves on that master will | ||
operate in the same (selected) mode. The RX mode is almost identical except | ||
that in L3 mode the slaves wont receive any multicast / broadcast traffic. | ||
L3 mode is more restrictive since routing is controlled from the other (mostly) | ||
default namespace. | ||
|
||
4.1 L2 mode: | ||
In this mode TX processing happens on the stack instance attached to the | ||
slave device and packets are switched and queued to the master device to send | ||
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable) | ||
as well. | ||
|
||
4.2 L3 mode: | ||
In this mode TX processing upto L3 happens on the stack instance attached | ||
to the slave device and packets are switched to the stack instance of the | ||
master device for the L2 processing and routing from that instance will be | ||
used before packets are queued on the outbound device. In this mode the slaves | ||
will not receive nor can send multicast / broadcast traffic. | ||
|
||
|
||
5. What to choose (macvlan vs. ipvlan)? | ||
These two devices are very similar in many regards and the specific use | ||
case could very well define which device to choose. if one of the following | ||
situations defines your use case then you can choose to use ipvlan - | ||
(a) The Linux host that is connected to the external switch / router has | ||
policy configured that allows only one mac per port. | ||
(b) No of virtual devices created on a master exceed the mac capacity and | ||
puts the NIC in promiscous mode and degraded performance is a concern. | ||
(c) If the slave device is to be put into the hostile / untrusted network | ||
namespace where L2 on the slave could be changed / misused. | ||
|
||
|
||
6. Example configuration: | ||
|
||
+=============================================================+ | ||
| Host: host1 | | ||
| | | ||
| +----------------------+ +----------------------+ | | ||
| | NS:ns0 | | NS:ns1 | | | ||
| | | | | | | ||
| | | | | | | ||
| | ipvl0 | | ipvl1 | | | ||
| +----------#-----------+ +-----------#----------+ | | ||
| # # | | ||
| ################################ | | ||
| # eth0 | | ||
+==============================#==============================+ | ||
|
||
|
||
(a) Create two network namespaces - ns0, ns1 | ||
ip netns add ns0 | ||
ip netns add ns1 | ||
|
||
(b) Create two ipvlan slaves on eth0 (master device) | ||
ip link add link eth0 ipvl0 type ipvlan mode l2 | ||
ip link add link eth0 ipvl1 type ipvlan mode l2 | ||
|
||
(c) Assign slaves to the respective network namespaces | ||
ip link set dev ipvl0 netns ns0 | ||
ip link set dev ipvl1 netns ns1 | ||
|
||
(d) Now switch to the namespace (ns0 or ns1) to configure the slave devices | ||
- For ns0 | ||
(1) ip netns exec ns0 bash | ||
(2) ip link set dev ipvl0 up | ||
(3) ip link set dev lo up | ||
(4) ip -4 addr add 127.0.0.1 dev lo | ||
(5) ip -4 addr add $IPADDR dev ipvl0 | ||
(6) ip -4 route add default via $ROUTER dev ipvl0 | ||
- For ns1 | ||
(1) ip netns exec ns1 bash | ||
(2) ip link set dev ipvl1 up | ||
(3) ip link set dev lo up | ||
(4) ip -4 addr add 127.0.0.1 dev lo | ||
(5) ip -4 addr add $IPADDR dev ipvl1 | ||
(6) ip -4 route add default via $ROUTER dev ipvl1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# | ||
# Makefile for the Ethernet Ipvlan driver | ||
# | ||
|
||
obj-$(CONFIG_IPVLAN) += ipvlan.o | ||
|
||
ipvlan-objs := ipvlan_core.o ipvlan_main.o |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
/* | ||
* Copyright (c) 2014 Mahesh Bandewar <maheshb@google.com> | ||
* | ||
* This program is free software; you can redistribute it and/or | ||
* modify it under the terms of the GNU General Public License as | ||
* published by the Free Software Foundation; either version 2 of | ||
* the License, or (at your option) any later version. | ||
* | ||
*/ | ||
#ifndef __IPVLAN_H | ||
#define __IPVLAN_H | ||
|
||
#include <linux/kernel.h> | ||
#include <linux/types.h> | ||
#include <linux/module.h> | ||
#include <linux/init.h> | ||
#include <linux/rculist.h> | ||
#include <linux/notifier.h> | ||
#include <linux/netdevice.h> | ||
#include <linux/etherdevice.h> | ||
#include <linux/if_arp.h> | ||
#include <linux/if_link.h> | ||
#include <linux/if_vlan.h> | ||
#include <linux/ip.h> | ||
#include <linux/inetdevice.h> | ||
#include <net/rtnetlink.h> | ||
#include <net/gre.h> | ||
#include <net/route.h> | ||
#include <net/addrconf.h> | ||
|
||
#define IPVLAN_DRV "ipvlan" | ||
#define IPV_DRV_VER "0.1" | ||
|
||
#define IPVLAN_HASH_SIZE (1 << BITS_PER_BYTE) | ||
#define IPVLAN_HASH_MASK (IPVLAN_HASH_SIZE - 1) | ||
|
||
#define IPVLAN_MAC_FILTER_BITS 8 | ||
#define IPVLAN_MAC_FILTER_SIZE (1 << IPVLAN_MAC_FILTER_BITS) | ||
#define IPVLAN_MAC_FILTER_MASK (IPVLAN_MAC_FILTER_SIZE - 1) | ||
|
||
typedef enum { | ||
IPVL_IPV6 = 0, | ||
IPVL_ICMPV6, | ||
IPVL_IPV4, | ||
IPVL_ARP, | ||
} ipvl_hdr_type; | ||
|
||
struct ipvl_pcpu_stats { | ||
u64 rx_pkts; | ||
u64 rx_bytes; | ||
u64 rx_mcast; | ||
u64 tx_pkts; | ||
u64 tx_bytes; | ||
struct u64_stats_sync syncp; | ||
u32 rx_errs; | ||
u32 tx_drps; | ||
}; | ||
|
||
struct ipvl_port; | ||
|
||
struct ipvl_dev { | ||
struct net_device *dev; | ||
struct list_head pnode; | ||
struct ipvl_port *port; | ||
struct net_device *phy_dev; | ||
struct list_head addrs; | ||
int ipv4cnt; | ||
int ipv6cnt; | ||
struct ipvl_pcpu_stats *pcpu_stats; | ||
DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE); | ||
netdev_features_t sfeatures; | ||
u32 msg_enable; | ||
u16 mtu_adj; | ||
}; | ||
|
||
struct ipvl_addr { | ||
struct ipvl_dev *master; /* Back pointer to master */ | ||
union { | ||
struct in6_addr ip6; /* IPv6 address on logical interface */ | ||
struct in_addr ip4; /* IPv4 address on logical interface */ | ||
} ipu; | ||
#define ip6addr ipu.ip6 | ||
#define ip4addr ipu.ip4 | ||
struct hlist_node hlnode; /* Hash-table linkage */ | ||
struct list_head anode; /* logical-interface linkage */ | ||
struct rcu_head rcu; | ||
ipvl_hdr_type atype; | ||
}; | ||
|
||
struct ipvl_port { | ||
struct net_device *dev; | ||
struct hlist_head hlhead[IPVLAN_HASH_SIZE]; | ||
struct list_head ipvlans; | ||
struct rcu_head rcu; | ||
int count; | ||
u16 mode; | ||
}; | ||
|
||
static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d) | ||
{ | ||
return rcu_dereference(d->rx_handler_data); | ||
} | ||
|
||
static inline struct ipvl_port *ipvlan_port_get_rtnl(const struct net_device *d) | ||
{ | ||
return rtnl_dereference(d->rx_handler_data); | ||
} | ||
|
||
static inline bool ipvlan_dev_master(struct net_device *d) | ||
{ | ||
return d->priv_flags & IFF_IPVLAN_MASTER; | ||
} | ||
|
||
static inline bool ipvlan_dev_slave(struct net_device *d) | ||
{ | ||
return d->priv_flags & IFF_IPVLAN_SLAVE; | ||
} | ||
|
||
void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev); | ||
void ipvlan_set_port_mode(struct ipvl_port *port, u32 nval); | ||
void ipvlan_init_secret(void); | ||
unsigned int ipvlan_mac_hash(const unsigned char *addr); | ||
rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb); | ||
int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev); | ||
void ipvlan_ht_addr_add(struct ipvl_dev *ipvlan, struct ipvl_addr *addr); | ||
bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6); | ||
struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port, | ||
const void *iaddr, bool is_v6); | ||
void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync); | ||
#endif /* __IPVLAN_H */ |
Oops, something went wrong.