Skip to content

Commit

Permalink
openvswitch: Add original direction conntrack tuple to sw_flow_key.
Browse files Browse the repository at this point in the history
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry.  This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
Jarno Rajahalme authored and davem330 committed Feb 10, 2017
1 parent 09aa98a commit 9dd7f89
Show file tree
Hide file tree
Showing 8 changed files with 246 additions and 47 deletions.
20 changes: 19 additions & 1 deletion include/uapi/linux/openvswitch.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

/*
* Copyright (c) 2007-2013 Nicira, Inc.
* Copyright (c) 2007-2017 Nicira, Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
Expand Down Expand Up @@ -331,6 +331,8 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, /* struct ovs_key_ct_tuple_ipv4 */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6, /* struct ovs_key_ct_tuple_ipv6 */

#ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */
Expand Down Expand Up @@ -472,6 +474,22 @@ struct ovs_key_ct_labels {

#define OVS_CS_F_NAT_MASK (OVS_CS_F_SRC_NAT | OVS_CS_F_DST_NAT)

struct ovs_key_ct_tuple_ipv4 {
__be32 ipv4_src;
__be32 ipv4_dst;
__be16 src_port;
__be16 dst_port;
__u8 ipv4_proto;
};

struct ovs_key_ct_tuple_ipv6 {
__be32 ipv6_src[4];
__be32 ipv6_dst[4];
__be16 src_port;
__be16 dst_port;
__u8 ipv6_proto;
};

/**
* enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
* @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
Expand Down
2 changes: 2 additions & 0 deletions net/openvswitch/actions.c
Original file line number Diff line number Diff line change
Expand Up @@ -1074,6 +1074,8 @@ static int execute_masked_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_CT_ZONE:
case OVS_KEY_ATTR_CT_MARK:
case OVS_KEY_ATTR_CT_LABELS:
case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4:
case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6:
err = -EINVAL;
break;
}
Expand Down
86 changes: 80 additions & 6 deletions net/openvswitch/conntrack.c
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,20 @@ static void ovs_ct_get_labels(const struct nf_conn *ct,
memset(labels, 0, OVS_CT_LABELS_LEN);
}

static void __ovs_ct_update_key_orig_tp(struct sw_flow_key *key,
const struct nf_conntrack_tuple *orig,
u8 icmp_proto)
{
key->ct.orig_proto = orig->dst.protonum;
if (orig->dst.protonum == icmp_proto) {
key->ct.orig_tp.src = htons(orig->dst.u.icmp.type);
key->ct.orig_tp.dst = htons(orig->dst.u.icmp.code);
} else {
key->ct.orig_tp.src = orig->src.u.all;
key->ct.orig_tp.dst = orig->dst.u.all;
}
}

static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
const struct nf_conntrack_zone *zone,
const struct nf_conn *ct)
Expand All @@ -155,6 +169,35 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
key->ct.zone = zone->id;
key->ct.mark = ovs_ct_get_mark(ct);
ovs_ct_get_labels(ct, &key->ct.labels);

if (ct) {
const struct nf_conntrack_tuple *orig;

/* Use the master if we have one. */
if (ct->master)
ct = ct->master;
orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;

/* IP version must match with the master connection. */
if (key->eth.type == htons(ETH_P_IP) &&
nf_ct_l3num(ct) == NFPROTO_IPV4) {
key->ipv4.ct_orig.src = orig->src.u3.ip;
key->ipv4.ct_orig.dst = orig->dst.u3.ip;
__ovs_ct_update_key_orig_tp(key, orig, IPPROTO_ICMP);
return;
} else if (key->eth.type == htons(ETH_P_IPV6) &&
!sw_flow_key_is_nd(key) &&
nf_ct_l3num(ct) == NFPROTO_IPV6) {
key->ipv6.ct_orig.src = orig->src.u3.in6;
key->ipv6.ct_orig.dst = orig->dst.u3.in6;
__ovs_ct_update_key_orig_tp(key, orig, NEXTHDR_ICMP);
return;
}
}
/* Clear 'ct.orig_proto' to mark the non-existence of conntrack
* original direction key fields.
*/
key->ct.orig_proto = 0;
}

/* Update 'key' based on skb->_nfct. If 'post_ct' is true, then OVS has
Expand Down Expand Up @@ -208,24 +251,55 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
ovs_ct_update_key(skb, NULL, key, false, false);
}

int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb)
#define IN6_ADDR_INITIALIZER(ADDR) \
{ (ADDR).s6_addr32[0], (ADDR).s6_addr32[1], \
(ADDR).s6_addr32[2], (ADDR).s6_addr32[3] }

int ovs_ct_put_key(const struct sw_flow_key *swkey,
const struct sw_flow_key *output, struct sk_buff *skb)
{
if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, key->ct.state))
if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct.state))
return -EMSGSIZE;

if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, key->ct.zone))
nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct.zone))
return -EMSGSIZE;

if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
nla_put_u32(skb, OVS_KEY_ATTR_CT_MARK, key->ct.mark))
nla_put_u32(skb, OVS_KEY_ATTR_CT_MARK, output->ct.mark))
return -EMSGSIZE;

if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) &&
nla_put(skb, OVS_KEY_ATTR_CT_LABELS, sizeof(key->ct.labels),
&key->ct.labels))
nla_put(skb, OVS_KEY_ATTR_CT_LABELS, sizeof(output->ct.labels),
&output->ct.labels))
return -EMSGSIZE;

if (swkey->ct.orig_proto) {
if (swkey->eth.type == htons(ETH_P_IP)) {
struct ovs_key_ct_tuple_ipv4 orig = {
output->ipv4.ct_orig.src,
output->ipv4.ct_orig.dst,
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
output->ct.orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,
sizeof(orig), &orig))
return -EMSGSIZE;
} else if (swkey->eth.type == htons(ETH_P_IPV6)) {
struct ovs_key_ct_tuple_ipv6 orig = {
IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.src),
IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.dst),
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
output->ct.orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,
sizeof(orig), &orig))
return -EMSGSIZE;
}
}

return 0;
}

Expand Down
10 changes: 8 additions & 2 deletions net/openvswitch/conntrack.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ int ovs_ct_execute(struct net *, struct sk_buff *, struct sw_flow_key *,
const struct ovs_conntrack_info *);

void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key);
int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb);
int ovs_ct_put_key(const struct sw_flow_key *swkey,
const struct sw_flow_key *output, struct sk_buff *skb);
void ovs_ct_free_action(const struct nlattr *a);

#define CT_SUPPORTED_MASK (OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED | \
Expand Down Expand Up @@ -79,9 +80,14 @@ static inline void ovs_ct_fill_key(const struct sk_buff *skb,
key->ct.zone = 0;
key->ct.mark = 0;
memset(&key->ct.labels, 0, sizeof(key->ct.labels));
/* Clear 'ct.orig_proto' to mark the non-existence of original
* direction key fields.
*/
key->ct.orig_proto = 0;
}

static inline int ovs_ct_put_key(const struct sw_flow_key *key,
static inline int ovs_ct_put_key(const struct sw_flow_key *swkey,
const struct sw_flow_key *output,
struct sk_buff *skb)
{
return 0;
Expand Down
34 changes: 29 additions & 5 deletions net/openvswitch/flow.c
Original file line number Diff line number Diff line change
Expand Up @@ -765,7 +765,7 @@ static int key_extract_mac_proto(struct sk_buff *skb)
int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
struct sk_buff *skb, struct sw_flow_key *key)
{
int res;
int res, err;

/* Extract metadata from packet. */
if (tun_info) {
Expand All @@ -792,25 +792,33 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
key->phy.priority = skb->priority;
key->phy.in_port = OVS_CB(skb)->input_vport->port_no;
key->phy.skb_mark = skb->mark;
ovs_ct_fill_key(skb, key);
key->ovs_flow_hash = 0;
res = key_extract_mac_proto(skb);
if (res < 0)
return res;
key->mac_proto = res;
key->recirc_id = 0;

return key_extract(skb, key);
err = key_extract(skb, key);
if (!err)
ovs_ct_fill_key(skb, key); /* Must be after key_extract(). */
return err;
}

int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
struct sk_buff *skb,
struct sw_flow_key *key, bool log)
{
const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
u64 attrs = 0;
int err;

err = parse_flow_nlattrs(attr, a, &attrs, log);
if (err)
return -EINVAL;

/* Extract metadata from netlink attributes. */
err = ovs_nla_get_flow_metadata(net, attr, key, log);
err = ovs_nla_get_flow_metadata(net, a, attrs, key, log);
if (err)
return err;

Expand All @@ -824,5 +832,21 @@ int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
*/

skb->protocol = key->eth.type;
return key_extract(skb, key);
err = key_extract(skb, key);
if (err)
return err;

/* Check that we have conntrack original direction tuple metadata only
* for packets for which it makes sense. Otherwise the key may be
* corrupted due to overlapping key fields.
*/
if (attrs & (1 << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4) &&
key->eth.type != htons(ETH_P_IP))
return -EINVAL;
if (attrs & (1 << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6) &&
(key->eth.type != htons(ETH_P_IPV6) ||
sw_flow_key_is_nd(key)))
return -EINVAL;

return 0;
}
49 changes: 38 additions & 11 deletions net/openvswitch/flow.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2007-2014 Nicira, Inc.
* Copyright (c) 2007-2017 Nicira, Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
Expand Down Expand Up @@ -107,34 +107,61 @@ struct sw_flow_key {
__be32 src; /* IP source address. */
__be32 dst; /* IP destination address. */
} addr;
struct {
u8 sha[ETH_ALEN]; /* ARP source hardware address. */
u8 tha[ETH_ALEN]; /* ARP target hardware address. */
} arp;
union {
struct {
__be32 src;
__be32 dst;
} ct_orig; /* Conntrack original direction fields. */
struct {
u8 sha[ETH_ALEN]; /* ARP source hardware address. */
u8 tha[ETH_ALEN]; /* ARP target hardware address. */
} arp;
};
} ipv4;
struct {
struct {
struct in6_addr src; /* IPv6 source address. */
struct in6_addr dst; /* IPv6 destination address. */
} addr;
__be32 label; /* IPv6 flow label. */
struct {
struct in6_addr target; /* ND target address. */
u8 sll[ETH_ALEN]; /* ND source link layer address. */
u8 tll[ETH_ALEN]; /* ND target link layer address. */
} nd;
union {
struct {
struct in6_addr src;
struct in6_addr dst;
} ct_orig; /* Conntrack original direction fields. */
struct {
struct in6_addr target; /* ND target address. */
u8 sll[ETH_ALEN]; /* ND source link layer address. */
u8 tll[ETH_ALEN]; /* ND target link layer address. */
} nd;
};
} ipv6;
};
struct {
/* Connection tracking fields. */
u8 state;
u8 orig_proto; /* CT orig tuple IP protocol. */
u16 zone;
u32 mark;
u8 state;
struct {
__be16 src; /* CT orig tuple tp src port. */
__be16 dst; /* CT orig tuple tp dst port. */
} orig_tp;

struct ovs_key_ct_labels labels;
} ct;

} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */

static inline bool sw_flow_key_is_nd(const struct sw_flow_key *key)
{
return key->eth.type == htons(ETH_P_IPV6) &&
key->ip.proto == NEXTHDR_ICMP &&
key->tp.dst == 0 &&
(key->tp.src == htons(NDISC_NEIGHBOUR_SOLICITATION) ||
key->tp.src == htons(NDISC_NEIGHBOUR_ADVERTISEMENT));
}

struct sw_flow_key_range {
unsigned short int start;
unsigned short int end;
Expand Down
Loading

0 comments on commit 9dd7f89

Please sign in to comment.