Documentation/networking/openvswitch.txt - nest-learning-thermostat/5.6/linux-imx - Git at Google

 Open vSwitch datapath developer documentation
 =============================================

 The Open vSwitch kernel module allows flexible userspace control over
 flow-level packet processing on selected network devices.  It can be
 used to implement a plain Ethernet switch, network device bonding,
 VLAN processing, network access control, flow-based network control,
 and so on.

 The kernel module implements multiple "datapaths" (analogous to
 bridges), each of which can have multiple "vports" (analogous to ports
 within a bridge).  Each datapath also has associated with it a "flow
 table" that userspace populates with "flows" that map from keys based
 on packet headers and metadata to sets of actions.  The most common
 action forwards the packet to another vport; other actions are also
 implemented.

 When a packet arrives on a vport, the kernel module processes it by
 extracting its flow key and looking it up in the flow table.  If there
 is a matching flow, it executes the associated actions.  If there is
 no match, it queues the packet to userspace for processing (as part of
 its processing, userspace will likely set up a flow to handle further
 packets of the same type entirely in-kernel).


 Flow key compatibility
 ----------------------

 Network protocols evolve over time.  New protocols become important
 and existing protocols lose their prominence.  For the Open vSwitch
 kernel module to remain relevant, it must be possible for newer
 versions to parse additional protocols as part of the flow key.  It
 might even be desirable, someday, to drop support for parsing
 protocols that have become obsolete.  Therefore, the Netlink interface
 to Open vSwitch is designed to allow carefully written userspace
 applications to work with any version of the flow key, past or future.

 To support this forward and backward compatibility, whenever the
 kernel module passes a packet to userspace, it also passes along the
 flow key that it parsed from the packet.  Userspace then extracts its
 own notion of a flow key from the packet and compares it against the
 kernel-provided version:

     - If userspace's notion of the flow key for the packet matches the
       kernel's, then nothing special is necessary.

     - If the kernel's flow key includes more fields than the userspace
       version of the flow key, for example if the kernel decoded IPv6
       headers but userspace stopped at the Ethernet type (because it
       does not understand IPv6), then again nothing special is
       necessary.  Userspace can still set up a flow in the usual way,
       as long as it uses the kernel-provided flow key to do it.

     - If the userspace flow key includes more fields than the
       kernel's, for example if userspace decoded an IPv6 header but
       the kernel stopped at the Ethernet type, then userspace can
       forward the packet manually, without setting up a flow in the
       kernel.  This case is bad for performance because every packet
       that the kernel considers part of the flow must go to userspace,
       but the forwarding behavior is correct.  (If userspace can
       determine that the values of the extra fields would not affect
       forwarding behavior, then it could set up a flow anyway.)

 How flow keys evolve over time is important to making this work, so
 the following sections go into detail.


 Flow key format
 ---------------

 A flow key is passed over a Netlink socket as a sequence of Netlink
 attributes.  Some attributes represent packet metadata, defined as any
 information about a packet that cannot be extracted from the packet
 itself, e.g. the vport on which the packet was received.  Most
 attributes, however, are extracted from headers within the packet,
 e.g. source and destination addresses from Ethernet, IP, or TCP
 headers.

 The <linux/openvswitch.h> header file defines the exact format of the
 flow key attributes.  For informal explanatory purposes here, we write
 them as comma-separated strings, with parentheses indicating arguments
 and nesting.  For example, the following could represent a flow key
 corresponding to a TCP packet that arrived on vport 1:

     in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
     eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
     frag=no), tcp(src=49163, dst=80)

 Often we ellipsize arguments not important to the discussion, e.g.:

     in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)


 Basic rule for evolving flow keys
 ---------------------------------

 Some care is needed to really maintain forward and backward
 compatibility for applications that follow the rules listed under
 "Flow key compatibility" above.

 The basic rule is obvious:

     ------------------------------------------------------------------
     New network protocol support must only supplement existing flow
     key attributes.  It must not change the meaning of already defined
     flow key attributes.
     ------------------------------------------------------------------

 This rule does have less-obvious consequences so it is worth working
 through a few examples.  Suppose, for example, that the kernel module
 did not already implement VLAN parsing.  Instead, it just interpreted
 the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
 packet.  The flow key for any packet with an 802.1Q header would look
 essentially like this, ignoring metadata:

     eth(...), eth_type(0x8100)

 Naively, to add VLAN support, it makes sense to add a new "vlan" flow
 key attribute to contain the VLAN tag, then continue to decode the
 encapsulated headers beyond the VLAN tag using the existing field
 definitions.  With this change, a TCP packet in VLAN 10 would have a
 flow key much like this:

     eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)

 But this change would negatively affect a userspace application that
 has not been updated to understand the new "vlan" flow key attribute.
 The application could, following the flow compatibility rules above,
 ignore the "vlan" attribute that it does not understand and therefore
 assume that the flow contained IP packets.  This is a bad assumption
 (the flow only contains IP packets if one parses and skips over the
 802.1Q header) and it could cause the application's behavior to change
 across kernel versions even though it follows the compatibility rules.

 The solution is to use a set of nested attributes.  This is, for
 example, why 802.1Q support uses nested attributes.  A TCP packet in
 VLAN 10 is actually expressed as:

     eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
     ip(proto=6, ...), tcp(...)))

 Notice how the "eth_type", "ip", and "tcp" flow key attributes are
 nested inside the "encap" attribute.  Thus, an application that does
 not understand the "vlan" key will not see either of those attributes
 and therefore will not misinterpret them.  (Also, the outer eth_type
 is still 0x8100, not changed to 0x0800.)

 Handling malformed packets
 --------------------------

 Don't drop packets in the kernel for malformed protocol headers, bad
 checksums, etc.  This would prevent userspace from implementing a
 simple Ethernet switch that forwards every packet.

 Instead, in such a case, include an attribute with "empty" content.
 It doesn't matter if the empty content could be valid protocol values,
 as long as those values are rarely seen in practice, because userspace
 can always forward all packets with those values to userspace and
 handle them individually.

 For example, consider a packet that contains an IP header that
 indicates protocol 6 for TCP, but which is truncated just after the IP
 header, so that the TCP header is missing.  The flow key for this
 packet would include a tcp attribute with all-zero src and dst, like
 this:

     eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)

 As another example, consider a packet with an Ethernet type of 0x8100,
 indicating that a VLAN TCI should follow, but which is truncated just
 after the Ethernet type.  The flow key for this packet would include
 an all-zero-bits vlan and an empty encap attribute, like this:

     eth(...), eth_type(0x8100), vlan(0), encap()

 Unlike a TCP packet with source and destination ports 0, an
 all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
 VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
 attribute expressly to allow this situation to be distinguished.
 Thus, the flow key in this second example unambiguously indicates a
 missing or malformed VLAN TCI.

 Other rules
 -----------

 The other rules for flow keys are much less subtle:

     - Duplicate attributes are not allowed at a given nesting level.

     - Ordering of attributes is not significant.

     - When the kernel sends a given flow key to userspace, it always
       composes it the same way.  This allows userspace to hash and
       compare entire flow keys that it may not be able to fully
       interpret.
	Open vSwitch datapath developer documentation
	=============================================

	The Open vSwitch kernel module allows flexible userspace control over
	flow-level packet processing on selected network devices. It can be
	used to implement a plain Ethernet switch, network device bonding,
	VLAN processing, network access control, flow-based network control,
	and so on.

	The kernel module implements multiple "datapaths" (analogous to
	bridges), each of which can have multiple "vports" (analogous to ports
	within a bridge). Each datapath also has associated with it a "flow
	table" that userspace populates with "flows" that map from keys based
	on packet headers and metadata to sets of actions. The most common
	action forwards the packet to another vport; other actions are also
	implemented.

	When a packet arrives on a vport, the kernel module processes it by
	extracting its flow key and looking it up in the flow table. If there
	is a matching flow, it executes the associated actions. If there is
	no match, it queues the packet to userspace for processing (as part of
	its processing, userspace will likely set up a flow to handle further
	packets of the same type entirely in-kernel).


	Flow key compatibility
	----------------------

	Network protocols evolve over time. New protocols become important
	and existing protocols lose their prominence. For the Open vSwitch
	kernel module to remain relevant, it must be possible for newer
	versions to parse additional protocols as part of the flow key. It
	might even be desirable, someday, to drop support for parsing
	protocols that have become obsolete. Therefore, the Netlink interface
	to Open vSwitch is designed to allow carefully written userspace
	applications to work with any version of the flow key, past or future.

	To support this forward and backward compatibility, whenever the
	kernel module passes a packet to userspace, it also passes along the
	flow key that it parsed from the packet. Userspace then extracts its
	own notion of a flow key from the packet and compares it against the
	kernel-provided version:

	- If userspace's notion of the flow key for the packet matches the
	kernel's, then nothing special is necessary.

	- If the kernel's flow key includes more fields than the userspace
	version of the flow key, for example if the kernel decoded IPv6
	headers but userspace stopped at the Ethernet type (because it
	does not understand IPv6), then again nothing special is
	necessary. Userspace can still set up a flow in the usual way,
	as long as it uses the kernel-provided flow key to do it.

	- If the userspace flow key includes more fields than the
	kernel's, for example if userspace decoded an IPv6 header but
	the kernel stopped at the Ethernet type, then userspace can
	forward the packet manually, without setting up a flow in the
	kernel. This case is bad for performance because every packet
	that the kernel considers part of the flow must go to userspace,
	but the forwarding behavior is correct. (If userspace can
	determine that the values of the extra fields would not affect
	forwarding behavior, then it could set up a flow anyway.)

	How flow keys evolve over time is important to making this work, so
	the following sections go into detail.


	Flow key format
	---------------

	A flow key is passed over a Netlink socket as a sequence of Netlink
	attributes. Some attributes represent packet metadata, defined as any
	information about a packet that cannot be extracted from the packet
	itself, e.g. the vport on which the packet was received. Most
	attributes, however, are extracted from headers within the packet,
	e.g. source and destination addresses from Ethernet, IP, or TCP
	headers.

	The <linux/openvswitch.h> header file defines the exact format of the
	flow key attributes. For informal explanatory purposes here, we write
	them as comma-separated strings, with parentheses indicating arguments
	and nesting. For example, the following could represent a flow key
	corresponding to a TCP packet that arrived on vport 1:

	in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
	eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
	frag=no), tcp(src=49163, dst=80)

	Often we ellipsize arguments not important to the discussion, e.g.:

	in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)


	Basic rule for evolving flow keys
	---------------------------------

	Some care is needed to really maintain forward and backward
	compatibility for applications that follow the rules listed under
	"Flow key compatibility" above.

	The basic rule is obvious:

	------------------------------------------------------------------
	New network protocol support must only supplement existing flow
	key attributes. It must not change the meaning of already defined
	flow key attributes.
	------------------------------------------------------------------

	This rule does have less-obvious consequences so it is worth working
	through a few examples. Suppose, for example, that the kernel module
	did not already implement VLAN parsing. Instead, it just interpreted
	the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
	packet. The flow key for any packet with an 802.1Q header would look
	essentially like this, ignoring metadata:

	eth(...), eth_type(0x8100)

	Naively, to add VLAN support, it makes sense to add a new "vlan" flow
	key attribute to contain the VLAN tag, then continue to decode the
	encapsulated headers beyond the VLAN tag using the existing field
	definitions. With this change, a TCP packet in VLAN 10 would have a
	flow key much like this:

	eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)

	But this change would negatively affect a userspace application that
	has not been updated to understand the new "vlan" flow key attribute.
	The application could, following the flow compatibility rules above,
	ignore the "vlan" attribute that it does not understand and therefore
	assume that the flow contained IP packets. This is a bad assumption
	(the flow only contains IP packets if one parses and skips over the
	802.1Q header) and it could cause the application's behavior to change
	across kernel versions even though it follows the compatibility rules.

	The solution is to use a set of nested attributes. This is, for
	example, why 802.1Q support uses nested attributes. A TCP packet in
	VLAN 10 is actually expressed as:

	eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
	ip(proto=6, ...), tcp(...)))

	Notice how the "eth_type", "ip", and "tcp" flow key attributes are
	nested inside the "encap" attribute. Thus, an application that does
	not understand the "vlan" key will not see either of those attributes
	and therefore will not misinterpret them. (Also, the outer eth_type
	is still 0x8100, not changed to 0x0800.)

	Handling malformed packets
	--------------------------

	Don't drop packets in the kernel for malformed protocol headers, bad
	checksums, etc. This would prevent userspace from implementing a
	simple Ethernet switch that forwards every packet.

	Instead, in such a case, include an attribute with "empty" content.
	It doesn't matter if the empty content could be valid protocol values,
	as long as those values are rarely seen in practice, because userspace
	can always forward all packets with those values to userspace and
	handle them individually.

	For example, consider a packet that contains an IP header that
	indicates protocol 6 for TCP, but which is truncated just after the IP
	header, so that the TCP header is missing. The flow key for this
	packet would include a tcp attribute with all-zero src and dst, like
	this:

	eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)

	As another example, consider a packet with an Ethernet type of 0x8100,
	indicating that a VLAN TCI should follow, but which is truncated just
	after the Ethernet type. The flow key for this packet would include
	an all-zero-bits vlan and an empty encap attribute, like this:

	eth(...), eth_type(0x8100), vlan(0), encap()

	Unlike a TCP packet with source and destination ports 0, an
	all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
	VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
	attribute expressly to allow this situation to be distinguished.
	Thus, the flow key in this second example unambiguously indicates a
	missing or malformed VLAN TCI.

	Other rules
	-----------

	The other rules for flow keys are much less subtle:

	- Duplicate attributes are not allowed at a given nesting level.

	- Ordering of attributes is not significant.

	- When the kernel sends a given flow key to userspace, it always
	composes it the same way. This allows userspace to hash and
	compare entire flow keys that it may not be able to fully
	interpret.