Skip to content

Commit 2ad7bf3

Browse files
Mahesh Bandewardavem330
Mahesh Bandewar
authored andcommitted
ipvlan: Initial check-in of the IPVLAN driver.
This driver is very similar to the macvlan driver except that it uses L3 on the frame to determine the logical interface while functioning as packet dispatcher. It inherits L2 of the master device hence the packets on wire will have the same L2 for all the packets originating from all virtual devices off of the same master device. This driver was developed keeping the namespace use-case in mind. Hence most of the examples given here take that as the base setup where main-device belongs to the default-ns and virtual devices are assigned to the additional namespaces. The device operates in two different modes and the difference in these two modes in primarily in the TX side. (a) L2 mode : In this mode, the device behaves as a L2 device. TX processing upto L2 happens on the stack of the virtual device associated with (namespace). Packets are switched after that into the main device (default-ns) and queued for xmit. RX processing is simple and all multicast, broadcast (if applicable), and unicast belonging to the address(es) are delivered to the virtual devices. (b) L3 mode : In this mode, the device behaves like a L3 device. TX processing upto L3 happens on the stack of the virtual device associated with (namespace). Packets are switched to the main-device (default-ns) for the L2 processing. Hence the routing table of the default-ns will be used in this mode. RX processins is somewhat similar to the L2 mode except that in this mode only Unicast packets are delivered to the virtual device while main-dev will handle all other packets. The devices can be added using the "ip" command from the iproute2 package - ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ] Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Laurent Chavey <chavey@google.com> Cc: Tim Hockin <thockin@google.com> Cc: Brandon Philips <brandon.philips@coreos.com> Cc: Pavel Emelianov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 2bbea0a commit 2ad7bf3

File tree

9 files changed

+1678
-0
lines changed

9 files changed

+1678
-0
lines changed

Documentation/networking/ipvlan.txt

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
2+
IPVLAN Driver HOWTO
3+
4+
Initial Release:
5+
Mahesh Bandewar <maheshb AT google.com>
6+
7+
1. Introduction:
8+
This is conceptually very similar to the macvlan driver with one major
9+
exception of using L3 for mux-ing /demux-ing among slaves. This property makes
10+
the master device share the L2 with it's slave devices. I have developed this
11+
driver in conjuntion with network namespaces and not sure if there is use case
12+
outside of it.
13+
14+
15+
2. Building and Installation:
16+
In order to build the driver, please select the config item CONFIG_IPVLAN.
17+
The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
18+
(CONFIG_IPVLAN=m).
19+
20+
21+
3. Configuration:
22+
There are no module parameters for this driver and it can be configured
23+
using IProute2/ip utility.
24+
25+
ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 }
26+
27+
e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
28+
29+
30+
4. Operating modes:
31+
IPvlan has two modes of operation - L2 and L3. For a given master device,
32+
you can select one of these two modes and all slaves on that master will
33+
operate in the same (selected) mode. The RX mode is almost identical except
34+
that in L3 mode the slaves wont receive any multicast / broadcast traffic.
35+
L3 mode is more restrictive since routing is controlled from the other (mostly)
36+
default namespace.
37+
38+
4.1 L2 mode:
39+
In this mode TX processing happens on the stack instance attached to the
40+
slave device and packets are switched and queued to the master device to send
41+
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
42+
as well.
43+
44+
4.2 L3 mode:
45+
In this mode TX processing upto L3 happens on the stack instance attached
46+
to the slave device and packets are switched to the stack instance of the
47+
master device for the L2 processing and routing from that instance will be
48+
used before packets are queued on the outbound device. In this mode the slaves
49+
will not receive nor can send multicast / broadcast traffic.
50+
51+
52+
5. What to choose (macvlan vs. ipvlan)?
53+
These two devices are very similar in many regards and the specific use
54+
case could very well define which device to choose. if one of the following
55+
situations defines your use case then you can choose to use ipvlan -
56+
(a) The Linux host that is connected to the external switch / router has
57+
policy configured that allows only one mac per port.
58+
(b) No of virtual devices created on a master exceed the mac capacity and
59+
puts the NIC in promiscous mode and degraded performance is a concern.
60+
(c) If the slave device is to be put into the hostile / untrusted network
61+
namespace where L2 on the slave could be changed / misused.
62+
63+
64+
6. Example configuration:
65+
66+
+=============================================================+
67+
| Host: host1 |
68+
| |
69+
| +----------------------+ +----------------------+ |
70+
| | NS:ns0 | | NS:ns1 | |
71+
| | | | | |
72+
| | | | | |
73+
| | ipvl0 | | ipvl1 | |
74+
| +----------#-----------+ +-----------#----------+ |
75+
| # # |
76+
| ################################ |
77+
| # eth0 |
78+
+==============================#==============================+
79+
80+
81+
(a) Create two network namespaces - ns0, ns1
82+
ip netns add ns0
83+
ip netns add ns1
84+
85+
(b) Create two ipvlan slaves on eth0 (master device)
86+
ip link add link eth0 ipvl0 type ipvlan mode l2
87+
ip link add link eth0 ipvl1 type ipvlan mode l2
88+
89+
(c) Assign slaves to the respective network namespaces
90+
ip link set dev ipvl0 netns ns0
91+
ip link set dev ipvl1 netns ns1
92+
93+
(d) Now switch to the namespace (ns0 or ns1) to configure the slave devices
94+
- For ns0
95+
(1) ip netns exec ns0 bash
96+
(2) ip link set dev ipvl0 up
97+
(3) ip link set dev lo up
98+
(4) ip -4 addr add 127.0.0.1 dev lo
99+
(5) ip -4 addr add $IPADDR dev ipvl0
100+
(6) ip -4 route add default via $ROUTER dev ipvl0
101+
- For ns1
102+
(1) ip netns exec ns1 bash
103+
(2) ip link set dev ipvl1 up
104+
(3) ip link set dev lo up
105+
(4) ip -4 addr add 127.0.0.1 dev lo
106+
(5) ip -4 addr add $IPADDR dev ipvl1
107+
(6) ip -4 route add default via $ROUTER dev ipvl1

drivers/net/Kconfig

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,24 @@ config MACVTAP
145145
To compile this driver as a module, choose M here: the module
146146
will be called macvtap.
147147

148+
149+
config IPVLAN
150+
tristate "IP-VLAN support"
151+
---help---
152+
This allows one to create virtual devices off of a main interface
153+
and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
154+
on packets. All interfaces (including the main interface) share L2
155+
making it transparent to the connected L2 switch.
156+
157+
Ipvlan devices can be added using the "ip" command from the
158+
iproute2 package starting with the iproute2-X.Y.ZZ release:
159+
160+
"ip link add link <main-dev> [ NAME ] type ipvlan"
161+
162+
To compile this driver as a module, choose M here: the module
163+
will be called ipvlan.
164+
165+
148166
config VXLAN
149167
tristate "Virtual eXtensible Local Area Network (VXLAN)"
150168
depends on INET

drivers/net/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
# Networking Core Drivers
77
#
88
obj-$(CONFIG_BONDING) += bonding/
9+
obj-$(CONFIG_IPVLAN) += ipvlan/
910
obj-$(CONFIG_DUMMY) += dummy.o
1011
obj-$(CONFIG_EQUALIZER) += eql.o
1112
obj-$(CONFIG_IFB) += ifb.o

drivers/net/ipvlan/Makefile

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#
2+
# Makefile for the Ethernet Ipvlan driver
3+
#
4+
5+
obj-$(CONFIG_IPVLAN) += ipvlan.o
6+
7+
ipvlan-objs := ipvlan_core.o ipvlan_main.o

drivers/net/ipvlan/ipvlan.h

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
/*
2+
* Copyright (c) 2014 Mahesh Bandewar <maheshb@google.com>
3+
*
4+
* This program is free software; you can redistribute it and/or
5+
* modify it under the terms of the GNU General Public License as
6+
* published by the Free Software Foundation; either version 2 of
7+
* the License, or (at your option) any later version.
8+
*
9+
*/
10+
#ifndef __IPVLAN_H
11+
#define __IPVLAN_H
12+
13+
#include <linux/kernel.h>
14+
#include <linux/types.h>
15+
#include <linux/module.h>
16+
#include <linux/init.h>
17+
#include <linux/rculist.h>
18+
#include <linux/notifier.h>
19+
#include <linux/netdevice.h>
20+
#include <linux/etherdevice.h>
21+
#include <linux/if_arp.h>
22+
#include <linux/if_link.h>
23+
#include <linux/if_vlan.h>
24+
#include <linux/ip.h>
25+
#include <linux/inetdevice.h>
26+
#include <net/rtnetlink.h>
27+
#include <net/gre.h>
28+
#include <net/route.h>
29+
#include <net/addrconf.h>
30+
31+
#define IPVLAN_DRV "ipvlan"
32+
#define IPV_DRV_VER "0.1"
33+
34+
#define IPVLAN_HASH_SIZE (1 << BITS_PER_BYTE)
35+
#define IPVLAN_HASH_MASK (IPVLAN_HASH_SIZE - 1)
36+
37+
#define IPVLAN_MAC_FILTER_BITS 8
38+
#define IPVLAN_MAC_FILTER_SIZE (1 << IPVLAN_MAC_FILTER_BITS)
39+
#define IPVLAN_MAC_FILTER_MASK (IPVLAN_MAC_FILTER_SIZE - 1)
40+
41+
typedef enum {
42+
IPVL_IPV6 = 0,
43+
IPVL_ICMPV6,
44+
IPVL_IPV4,
45+
IPVL_ARP,
46+
} ipvl_hdr_type;
47+
48+
struct ipvl_pcpu_stats {
49+
u64 rx_pkts;
50+
u64 rx_bytes;
51+
u64 rx_mcast;
52+
u64 tx_pkts;
53+
u64 tx_bytes;
54+
struct u64_stats_sync syncp;
55+
u32 rx_errs;
56+
u32 tx_drps;
57+
};
58+
59+
struct ipvl_port;
60+
61+
struct ipvl_dev {
62+
struct net_device *dev;
63+
struct list_head pnode;
64+
struct ipvl_port *port;
65+
struct net_device *phy_dev;
66+
struct list_head addrs;
67+
int ipv4cnt;
68+
int ipv6cnt;
69+
struct ipvl_pcpu_stats *pcpu_stats;
70+
DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
71+
netdev_features_t sfeatures;
72+
u32 msg_enable;
73+
u16 mtu_adj;
74+
};
75+
76+
struct ipvl_addr {
77+
struct ipvl_dev *master; /* Back pointer to master */
78+
union {
79+
struct in6_addr ip6; /* IPv6 address on logical interface */
80+
struct in_addr ip4; /* IPv4 address on logical interface */
81+
} ipu;
82+
#define ip6addr ipu.ip6
83+
#define ip4addr ipu.ip4
84+
struct hlist_node hlnode; /* Hash-table linkage */
85+
struct list_head anode; /* logical-interface linkage */
86+
struct rcu_head rcu;
87+
ipvl_hdr_type atype;
88+
};
89+
90+
struct ipvl_port {
91+
struct net_device *dev;
92+
struct hlist_head hlhead[IPVLAN_HASH_SIZE];
93+
struct list_head ipvlans;
94+
struct rcu_head rcu;
95+
int count;
96+
u16 mode;
97+
};
98+
99+
static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d)
100+
{
101+
return rcu_dereference(d->rx_handler_data);
102+
}
103+
104+
static inline struct ipvl_port *ipvlan_port_get_rtnl(const struct net_device *d)
105+
{
106+
return rtnl_dereference(d->rx_handler_data);
107+
}
108+
109+
static inline bool ipvlan_dev_master(struct net_device *d)
110+
{
111+
return d->priv_flags & IFF_IPVLAN_MASTER;
112+
}
113+
114+
static inline bool ipvlan_dev_slave(struct net_device *d)
115+
{
116+
return d->priv_flags & IFF_IPVLAN_SLAVE;
117+
}
118+
119+
void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev);
120+
void ipvlan_set_port_mode(struct ipvl_port *port, u32 nval);
121+
void ipvlan_init_secret(void);
122+
unsigned int ipvlan_mac_hash(const unsigned char *addr);
123+
rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb);
124+
int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev);
125+
void ipvlan_ht_addr_add(struct ipvl_dev *ipvlan, struct ipvl_addr *addr);
126+
bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
127+
struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
128+
const void *iaddr, bool is_v6);
129+
void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
130+
#endif /* __IPVLAN_H */

0 commit comments

Comments
 (0)