linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-13 18:36:05 +07:00

Author	SHA1	Message	Date
Joachim Eastwood	4ed2d8fca7	stmmac: clean up platform/of_match data retrieval Refactor code to clearly separate probing non-dt versus dt. In the non-dt case platform data must be supplied to probe successfully. For dt the platform data structure is created and match data is copied into it. Note that support for supplying platform data in dt from AUXDATA is dropped as no users in mainline does this. This change will allow dt dwmac-* drivers to call the config_dt() function from probe to create the needed platform data struct and retrieve common dt properties. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:45:56 -07:00
Joachim Eastwood	0dacf3f664	stmmac: use of_device_get_match_data to retrieve of match data By using of_device_get_match_data() the code that retrieve match data can be simplified quite a bit. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:45:56 -07:00
David S. Miller	7781e5d10a	Merge branch 'tipc-separate-link-and-aggregation' Jon Maloy says: ==================== tipc: separate link and link aggregation layer This is the first batch of a longer series that has two main objectives: o Finer lock granularity during message sending and reception, especially regarding usage of the node spinlock. o Better separation between the link layer implementation and the link aggregation layer, represented by node.c::struct tipc_node. Hopefully these changes also make this part of code somewhat easier to comprehend and maintain. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	d999297c3d	tipc: reduce locking scope during packet reception We convert packet/message reception according to the same principle we have been using for message sending and timeout handling: We move the function tipc_rcv() to node.c, hence handling the initial packet reception at the link aggregation level. The function grabs the node lock, selects the receiving link, and accesses it via a new call tipc_link_rcv(). This function appends buffers to the input queue for delivery upwards, but it may also append outgoing packets to the xmit queue, just as we do during regular message sending. The latter will happen when buffers are forwarded from the link backlog, or when retransmission is requested. Upon return of this function, and after having released the node lock, tipc_rcv() delivers/tranmsits the contents of those queues, but it may also perform actions such as link activation or reset, as indicated by the return flags from the link. This reduces the number of cpu cycles spent inside the node spinlock, and reduces contention on that lock. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	1a20cc254e	tipc: introduce node contact FSM The logics for determining when a node is permitted to establish and maintain contact with its peer node becomes non-trivial in the presence of multiple parallel links that may come and go independently. A known failure scenario is that one endpoint registers both its links to the peer lost, cleans up it binding table, and prepares for a table update once contact is re-establihed, while the other endpoint may see its links reset and re-established one by one, hence seeing no need to re-synchronize the binding table. To avoid this, a node must not allow re-establishing contact until it has confirmation that even the peer has lost both links. Currently, the mechanism for handling this consists of setting and resetting two state flags from different locations in the code. This solution is hard to understand and maintain. A closer analysis even reveals that it is not completely safe. In this commit we do instead introduce an FSM that keeps track of the conditions for when the node can establish and maintain links. It has six states and four events, and is strictly based on explicit knowledge about the own node's and the peer node's contact states. Only events leading to state change are shown as edges in the figure below. +--------------+ \| SELF_UP/ \| +---------------->\| PEER_COMING \|-----------------+ SELF_ \| +--------------+ \|PEER_ ESTBL_ \| \| \|ESTBL_ CONTACT\| SELF_LOST_CONTACT \| \|CONTACT \| v \| \| +--------------+ \| \| PEER_ \| SELF_DOWN/ \| SELF_ \| \| LOST_ +--\| PEER_LEAVING \|<--+ LOST_ v +-------------+ CONTACT \| +--------------+ \| CONTACT +-----------+ \| SELF_DOWN/ \|<----------+ +----------\| SELF_UP/ \| \| PEER_DOWN \|<----------+ +----------\| PEER_UP \| +-------------+ SELF_ \| +--------------+ \| PEER_ +-----------+ \| LOST_ +--\| SELF_LEAVING/\|<--+ LOST_ A \| CONTACT \| PEER_DOWN \| CONTACT \| \| +--------------+ \| \| A \| PEER_ \| PEER_LOST_CONTACT \| \|SELF_ ESTBL_ \| \| \|ESTBL_ CONTACT\| +--------------+ \|CONTACT +---------------->\| PEER_UP/ \|-----------------+ \| SELF_COMING \| +--------------+ Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	8a1577c96f	tipc: move link supervision timer to node level In our effort to move control of the links to the link aggregation layer, we move the perodic link supervision timer to struct tipc_node. The new timer is shared between all links belonging to the node, thus saving resources, while still kicking the FSM on both its pertaining links at each expiration. The current link timer and corresponding functions are removed. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	333ef69ed2	tipc: simplify link timer implementation We create a second, simpler, link timer function, tipc_link_timeout(). The new function makes use of the new FSM function introduced in the previous commit, and just like it, takes a buffer queue as parameter. It returns an event bit field and potentially a link protocol packet to the caller. The existing timer function, link_timeout(), is still needed for a while, so we redesign it to become a wrapper around the new function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	6ab30f9cbe	tipc: improve link FSM implementation The link FSM implementation is currently unnecessarily complex. It sometimes checks for conditional state outside the FSM data before deciding next state, and often performs actions directly inside the FSM logics. In this commit, we create a second, simpler FSM implementation, that as far as possible acts only on states and events that it is strictly defined for, and postpone any actions until it is finished with its decisions. It also returns an event flag field and an a buffer queue which may potentially contain a protocol message to be sent by the caller. Unfortunately, we cannot yet make the FSM "clean", in the sense that its decisions are only based on FSM state and event, and that state changes happen only here. That will have to wait until the activate/reset logics has been cleaned up in a future commit. We also rename the link states as follows: WORKING_WORKING -> TIPC_LINK_WORKING WORKING_UNKNOWN -> TIPC_LINK_PROBING RESET_UNKNOWN -> TIPC_LINK_RESETTING RESET_RESET -> TIPC_LINK_ESTABLISHING The existing FSM function, link_state_event(), is still needed for a while, so we redesign it to make use of the new function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	426cc2b86d	tipc: introduce new link protocol msg create function As a preparation for later changes, we introduce a new function tipc_link_build_proto_msg(). Instead of actually sending the created protocol message, it only creates it and adds it to the head of a skb queue provided by the caller. Since we still need the existing function tipc_link_protocol_xmit() for a while, we redesign it to make use of the new function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	d3504c3449	tipc: clean up definitions and usage of link flags The status flag LINK_STOPPED is not needed any more, since the mechanism for delayed deletion of links has been removed. Likewise, LINK_STARTED and LINK_START_EVT are unnecessary, because we can just as well start the link timer directly from inside tipc_link_create(). We eliminate these flags in this commit. Instead of the above flags, we now introduce three new link modes, TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages the link is allowed to receive in this state. TIPC_LINK_BLOCKED also blocks timer-driven protocol messages to be sent out, and any change to the link FSM. Since the modes are mutually exclusive, we convert them to state values, and rename the 'flags' field in struct tipc_link to 'exec_mode'. Finally, we move the #defines for link FSM states and events from link.h into enums inside the file link.c, which is the real usage scope of these definitions. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	af9b028e27	tipc: make media xmit call outside node spinlock context Currently, message sending is performed through a deep call chain, where the node spinlock is grabbed and held during a significant part of the transmission time. This is clearly detrimental to overall throughput performance; it would be better if we could send the message after the spinlock has been released. In this commit, we do instead let the call revert on the stack after the buffer chain has been added to the transmission queue, whereafter clones of the buffers are transmitted to the device layer outside the spinlock scope. As a further step in our effort to separate the roles of the node and link entities we also move the function tipc_link_xmit() to node.c, and rename it to tipc_node_xmit(). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	22d85c7942	tipc: change sk_buffer handling in tipc_link_xmit() When the function tipc_link_xmit() is given a buffer list for transmission, it currently consumes the list both when transmission is successful and when it fails, except for the special case when it encounters link congestion. This behavior is inconsistent, and needs to be corrected if we want to avoid problems in later commits in this series. In this commit, we change this to let the function consume the list only when transmission is successful, and leave the list with the sender in all other cases. We also modifiy the socket code so that it adapts to this change, i.e., purges the list when a non-congestion error code is returned. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	36e78a463b	tipc: use bearer index when looking up active links struct tipc_node currently holds two arrays of link pointers; one, indexed by bearer identity, which contains all links irrespective of current state, and one two-slot array for the currently active link or links. The latter array contains direct pointers into the elements of the former. This has the effect that we cannot know the bearer id of a link when accessing it via the "active_links[]" array without actually dereferencing the pointer, something we want to avoid in some cases. In this commit, we do instead store the bearer identity in the "active_links" array, and use this as an index to find the right element in the overall link entry array. This change should be seen as a preparation for the later commits in this series. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	d39bbd445d	tipc: move link input queue to tipc_node At present, the link input queue and the name distributor receive queues are fields aggregated in struct tipc_link. This is a hazard, because a link might be deleted while a receiving socket still keeps reference to one of the queues. This commit fixes this bug. However, rather than adding yet another reference counter to the critical data path, we move the two queues to safe ground inside struct tipc_node, which is already protected, and let the link code only handle references to the queues. This is also in line with planned later changes in this area. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	d3a43b907a	tipc: move link creation from neighbor discoverer to node As a step towards turning links into node internal entities, we move the creation of links from the neighbor discovery logics to the node's link control logics. We also create an additional entry for the link's media address in the newly introduced struct tipc_link_entry, since this is where it is needed in the upcoming commits. The current copy in struct tipc_link is kept for now, but will be removed later. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	9d13ec65ed	tipc: introduce link entry structure to struct tipc_node struct 'tipc_node' currently contains two arrays for link attributes, one for the link pointers, and one for the usable link MTUs. We now group those into a new struct 'tipc_link_entry', and intoduce one single array consisting of such enties. Apart from being a cosmetic improvement, this is a starting point for the strict master-slave relation between node and link that we will introduce in the following commits. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jiri Benc	6acc232660	net: remove skb_frag_add_head It's not used anywhere. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:38:49 -07:00
David S. Miller	bd265242c1	Merge branch 'offload_fwd_mark' Scott Feldman says: ==================== switchdev: avoid duplicate packet forwarding v3: - Per Nicolas Dichtel review: remove errant empty union. v2: - Per davem review: in sk_buff, union fwd_mark with secmark to save space since features appear to be mutually exclusive. - Per Simon Horman review: - fix grammar in switchdev.txt wrt fwd_mark - remove some unrelated changes that snuck in v1: This patchset was previously submitted as RFC. No changes from the last version (v2) sent under RFC. Including RFC version history here for reference. RFC v2: - s/fwd_mark/offload_fwd_mark - use consume_skb rather than kfree_skb when dropping pkt on egress. - Use Jiri's suggestion to use ifindex of one of the ports in a group as the mark for all the ports in the group. This can be done with no additional storage (no hashtable from v1). To pull it off, we need some simple recursive routines to walk the netdev tree ensuring all leaves in the tree (ports) in the same group (e.g. bridge) belonging to the same switch device will have the same offload fwd mark. Maybe someone sees a better design for the recusive routines? They're not too bad, and should cover the stacked driver cases. RFC v1: With switchdev support for offloading L2/L3 forwarding data path to a switch device, we have a general problem where both the device and the kernel may forward the packet, resulting in duplicate packets on the wire. Anytime a packet is forwarded by the device and a copy is sent to the CPU, there is potential for duplicate forwarding, as the kernel may also do a forwarding lookup and send the packet on the wire. The specific problem this patch series is interested in solving is avoiding duplicate packets on bridged ports. There was a previous RFC from Roopa (http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this problem, but didn't solve the problem of mixed ports in the bridge from different devices; there was no way to exclude some ports from forwarding and include others. This RFC solves that problem by tagging the ingressing packet with a unique mark, and then comparing the packet mark with the egress port mark, and skip forwarding when there is a match. For the mixed ports bridge case, only those ports with matching marks are skipped. The switchdev port driver must do two things: 1) Generate a fwd_mark for each switch port, using some unique key of the switch device (and optionally port). This is done when the port netdev is registered or if the port's group membership changes (joins/leaves a bridge, for example). 2) On packet ingress from port, mark the skb with the ingress port's fwd_mark. If the device supports it, it's useful to only mark skbs which were already forwarded by the device. If the device does not support such indication, all skbs can be marked, even if they're local dst. Two new 32-bit fields are added to struct sk_buff and struct netdevice to hold the fwd_mark. I've wrapped these with CONFIG_NET_SWITCHDEV for now. I tried using skb->mark for this purpose, but ebtables can overwrite the skb->mark before the bridge gets it, so that will not work. In general, this fwd_mark can be used for any case where a packet is forwarded by the device and a copy is sent to the CPU, to avoid the kernel re-forwarding the packet. sFlow is another use-case that comes to mind, but I haven't explored the details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:32:45 -07:00
Scott Feldman	a48037e7c6	switchdev: update documentation for offload_fwd_mark Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:32:45 -07:00
Scott Feldman	3f98a8e636	rocker: add offload_fwd_mark support If device flags ingress packet as "fwd offload", mark the skb->offlaod_fwd_mark using the ingress port's dev->offlaod_fwd_mark. This will be the hint to the kernel that this packet has already been forwarded by device to egress ports matching skb->offlaod_fwd_mark. For rocker, derive port dev->offlaod_fwd_mark based on device switch ID and port ifindex. If port is bridged, use the bridge ifindex rather than the port ifindex. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:32:45 -07:00
Scott Feldman	1a3b2ec93d	switchdev: add offload_fwd_mark generator helper skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be unique for device and may even be unique for a sub-set of ports within device, so add switchdev helper function to generate unique marks based on port's switch ID and group_ifindex. group_ifindex would typically be the container dev's ifindex, such as the bridge's ifindex. The generator uses a global hash table to store offload_fwd_marks hashed by {switch ID, group_ifindex} key. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:32:44 -07:00
Scott Feldman	d754f98b50	net: add phys ID compare helper to test if two IDs are the same Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:32:44 -07:00
Scott Feldman	0c4f691ff6	net: don't reforward packets already forwarded by offload device Just before queuing skb for xmit on port, check if skb has been marked by switchdev port driver as already fordwarded by device. If so, drop skb. A non-zero skb->offload_fwd_mark field is set by the switchdev port driver/device on ingress to indicate the skb has already been forwarded by the device to egress ports with matching dev->skb_mark. The switchdev port driver would assign a non-zero dev->offload_skb_mark for each device port netdev during registration, for example. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:32:44 -07:00
Simon Horman	8254973fa3	rocker: forward packets to CPU when port is joined to openvswitch Teach rocker to forward packets to CPU when a port is joined to Open vSwitch. There is scope to later refine what is passed up as per Open vSwitch flows on a port. This does not change the behaviour of rocker ports that are not joined to Open vSwitch. Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:26:03 -07:00
Nikolay Aleksandrov	a7ce45a74b	bridge: mcast: fix br_multicast_dev_del warn when igmp snooping is not defined Fix: net/bridge/br_if.c: In function 'br_dev_delete': >> net/bridge/br_if.c:284:2: error: implicit declaration of function >> 'br_multicast_dev_del' [-Werror=implicit-function-declaration] br_multicast_dev_del(br); ^ cc1: some warnings being treated as errors when igmp snooping is not defined. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 16:19:03 -07:00
Phil Sutter	a0a9f33bdf	net/ipv6: update flowi6_oif in ip6_dst_lookup_flow if not set Newly created flows don't have flowi6_oif set (at least if the associated socket is not interface-bound). This leads to a mismatch in __xfrm6_selector_match() for policies which specify an interface in the selector (sel->ifindex != 0). Backtracing shows this happens in code-paths originating from e.g. ip6_datagram_connect(), rawv6_sendmsg() or tcp_v6_connect(). (UDP was not tested for.) In summary, this patch fixes policy matching on outgoing interface for locally generated packets. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:59:32 -07:00
Nikolay Aleksandrov	22f94e6256	bonding: trivial: remove unused variables Get rid of these: drivers/net/bonding//bond_main.c: In function ‘bond_update_slave_arr’: drivers/net/bonding//bond_main.c:3754:6: warning: variable ‘slaves_in_agg’ set but not used [-Wunused-but-set-variable] int slaves_in_agg; ^ CC [M] drivers/net/bonding//bond_3ad.o drivers/net/bonding//bond_3ad.c: In function ‘ad_marker_response_received’: drivers/net/bonding//bond_3ad.c:1870:61: warning: parameter ‘marker’ set but not used [-Wunused-but-set-parameter] static void ad_marker_response_received(struct bond_marker marker, ^ drivers/net/bonding//bond_3ad.c:1871:19: warning: parameter ‘port’ set but not used [-Wunused-but-set-parameter] struct port port) ^ Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:50:32 -07:00
David S. Miller	5b1d0d8f52	Merge branch 'bridge-temp-and-perm' Nikolay Aleksandrov says: ==================== bridge: multicast: temp and perm entries behaviour enhancements Patch 01 adds a notify when a group is deleted via br_multicast_del_pg() (on expire, on device delete or on device down). Patch 02 changes how bridge device and bridge port delete and down/up are handled. Until now on bridge down all groups were flushed, now only the temp ones are (same for port), perm entries are flushed only on port or bridge removal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:49:11 -07:00
Satish Ashok	e10177abf8	bridge: multicast: fix handling of temp and perm entries When the bridge (or port) is brought down/up flush only temp entries and leave the perm ones. Flush perm entries only when deleting the bridge device or the associated port. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:49:10 -07:00
Nikolay Aleksandrov	ef8299de7e	bridge: multicast: notify on group delete Group notifications were not sent when a group expired or was deleted due to bridge/port device being deleted. So add br_mdb_notify() to br_multicast_del_pg(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:49:10 -07:00
David S. Miller	03b6dc7d17	Merge branch 'bpf_cgroup_classid' Daniel Borkmann says: ==================== BPF update This small helper allows for accessing net_cls cgroups classid. Please see individual patches for more details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:41:30 -07:00
Daniel Borkmann	8d20aabe1c	ebpf: add helper to retrieve net_cls's classid cookie It would be very useful to retrieve the net_cls's classid from an eBPF program to allow for a more fine-grained classification, it could be directly used or in conjunction with additional policies. I.e. docker, but also tooling such as cgexec, can easily run applications via net_cls cgroups: cgcreate -g net_cls:/foo echo 42 > foo/net_cls.classid cgexec -g net_cls:foo <prog> Thus, their respecitve classid cookie of foo can then be looked up on the egress path to apply further policies. The helper is desigend such that a non-zero value returns the cgroup id. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:41:30 -07:00
Daniel Borkmann	b87a173e25	cls_cgroup: factor out classid retrieval Split out retrieving the cgroups net_cls classid retrieval into its own function, so that it can be reused later on from other parts of the traffic control subsystem. If there's no skb->sk, then the small helper returns 0 as well, which in cls_cgroup terms means 'could not classify'. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:41:30 -07:00
Govindarajulu Varadarajan	d9382bda4e	enic: allow adaptive coalesce setting for msi/legacy intr * Allow setting of adaptive coalescing setting for all types of interrupt. * In msi & legacy intr, we use single interrupt for rx & tx. In this case tx_coalesce_usecs is invalid. We should use only rx_coalesce_usecs. Do not display tx_coal values for msi/intx. And do not allow user to set this as well. * Driver supports only tx/rx_coalesce_usec and adaptive coalesce settings. For other values, driver does not return error. So ethtool succeeds for unsupported values. Introduce enic_coalesce_valid() function to validate the coalescing values. * If user requests for coalesce value greater than what adaptor supports, driver uses the max value. We should at least log this. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:39:34 -07:00
Govindarajulu Varadarajan	fc865d6b4a	enic: add adaptive coalescing intr for intx and msi poll Adaptive interrupt coalescing is available for msix. This patch adds the support for msi poll. Interface for adaptive interrupt coalescing is already added in driver. We just did not enable it for legacy intr & msi. enic_calc_int_moderation() & enic_set_int_moderation() are defined as static after enic_poll. Since enic_poll needs it, move both of these function definitions above enic_poll. No change in functionality. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:39:34 -07:00
YOSHIFUJI Hideaki	c15df306fc	ipv6: Remove unused arguments for __ipv6_dev_get_saddr(). Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-16 01:00:56 -07:00
David S. Miller	0d0578815b	Merge branch 'protodown' Anuradha Karuppiah says: ==================== net: Introduce protodown flag. User space daemons can detect errors in the network that need to be notified to the switch device drivers. Drivers can react to this error state by doing a phy-down on the switch-port which would result in a carrier-off locally and on the directly connected switch. Doing that would prevent loops and black-holes in the network. One such use case is the multi-chassis LAG application - 1. The MLAG application runs on peer switches (say Switch0 and Switch1) synchronizing states, forwarding entries etc. between the two switches over the peer-link (this is a link directly connecting the two switches). 2. An MLAG election process designates one of the switches as a primary (for e.g. Switch0 is primary and Switch1 is secondary). 3. The peer link plays a critical role in allowing Switch0-Switch1 to function as a single LAG partner to the downstream dual-connected servers. When the peer-link between the switches goes down we have a split-brain situation. Switch0 and Switch1 are no longer in sync and are acting independently. This can result in traffic loops and traffic black-holing in the network. 4. To prevent these problems the MLAG application on the secondary switch phy-downs the MLAG ports on detecting the peer-link down. This will be seen as a carrier down on servers that are dual-connected to Switch0 and Switch1. 5. Specifically a dual-connected server will see a carrier-down on the port connected to the MLAG secondary, Switch1, and will stop using that port for traffic TX. So traffic black holing is prevented. v6 to v7: Removed some unnecessary code in response to review comments. v5 to v6: Replaced proto_flags with a simple proto_down boolean attribute in response to Dave's comments. v4 to v5: Changed the ip link display format for protodown to match the set as recommended by Stephen. v3 to v4: I have moved protodown out of IFF_XXX and introduced a separate proto_flags field with IF_PROTOF_DOWN bit being used by apps to notify switch port errors. This is in response to Stephen's comments that adding a new IFF_XXX may break user space. I have used rocker as the sample switch driver. And to test this functionality I used the qemu-rocker patch that Scott sent out in response to the v3 posting (needed to set link up/down when phy is enabled/disabled). v1 to v2: Based on Dave's suggestion I have moved out aggregating of error bits across applications to a user space framework. This patch now simply notifies an aggregated error bit to drivers enabling them to handle the error gracefully. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:39:40 -07:00
Anuradha Karuppiah	c305524617	rocker: Handle protodown notifications. protodown can be set by user space applications like MLAG on detecting errors on a switch port. This patch provides sample switch driver changes for handling protodown. Rocker PHYS disables the port in response to protodown. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:39:40 -07:00
Anuradha Karuppiah	88d6378bd6	netlink: changes for setting and clearing protodown via netlink. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:39:40 -07:00
Anuradha Karuppiah	d746d707a8	net core: Add protodown support. This patch introduces the proto_down flag that can be used by user space applications to notify switch drivers that errors have been detected on the device. The switch driver can react to protodown notification by doing a phys down on the associated switch port. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:39:40 -07:00
Thomas Falcon	07e6a97da1	ibmveth: add support for TSO6 This patch adds support for a new method of signalling the firmware that TSO packets are being sent. The new method removes the need to alter the ip and tcp checksums and allows TSO6 support. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:34:56 -07:00
Haiyang Zhang	2de8530ba0	hv_netvsc: Add close of RNDIS filter into change mtu call The current change mtu call only stops tx before removing RNDIS filter. In case ringbufer is not empty, the rndis_filter_device_remove() may hang on removing the buffers. This patch adds close of RNDIS filter before removing it, also a gradual waiting loop until the ring is empty. The change_mtu hang issue under heavy traffic is solved by this patch. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:11:31 -07:00
YOSHIFUJI Hideaki/吉藤英明	c0b8da1e76	ipv6: Fix finding best source address in ipv6_dev_get_saddr(). Commit `9131f3de2` ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") did not properly update best source address available. Plus, it introduced possible NULL pointer dereference. Bug was reported by Erik Kline <ek@google.com>. Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>. Fixes: `9131f3de24` ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Hajime Tazaki <thehajime@gmail.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:06:13 -07:00
David S. Miller	9243b25b23	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-07-14 This series contains updates to i40e and i40evf only. Joe Stringer and Jesse Gross add a ndo_features_check function to ensure that the i40e driver does not try to offload packets that exceed 80 bytes in length. Anjali adds additional stats to track flow director ATR and SB current state and flow director flush count which will help the need for verbose debug logs with respect to flow director. Also refines an error message to avoid confusion, so that it indicates what may have really happened when the init_shared_code() call possibly fails. Pawel adds new fields to the capabilities structures to handle Flex-10 device/function capabilities which is needed to support Flex-10 configs. Jesse improves the transmit performance by added a prefetch for the next transmit descriptor to be used when we know there are more coming. Mitch modifies i40evf driver to handle/allow an abundance of vectors. Currently the driver only maps transmit and receive queues to a single MSI-X vector per queue if there are exactly enough vectors for this, but if we have too many vectors, it will fail and allocate queues to vectors in a suboptimal manner. So change the condition check to allow for an excess number of vectors and won't use the extras. Also update the driver to just return success if the user attempts to set a port VLAN on a VF that already has the same port VLAN configured, instead of going through unnecessary filter removals & adds. Fix the MAC filters for VFs, which were being programmed with 0 for the VLAN value when there was no VLAN assigned. Instead, we must use -1 to indicate that no VLAN is in use. Fix the VF disable code, which was not properly cleaning up the VF and would leave the VF in an indeterminate state, so fix this by notifying the VF and then call the normal VF reset routine. Fix the logic in the driver so that MAC filters are added and removed correctly and added a check for the driver's hardware MAC address so that this filter does not get removed incorrectly. Carolyn removes incorrect #ifdef's which should not have been added in the first place and with the #ifdef's removed, make the necessary changes in the driver to resolve compile errors. Greg updates the admin queue command header defines. v2: fix indentation in patch 12 based on feedback from Sergei Shtylyov ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:31:14 -07:00
Andrea Parri	40bdc5360d	pkt_sched: sch_qfq: remove unused member of struct qfq_sched The member (u32) "num_active_agg" of struct qfq_sched has been unused since its introduction in `462dbc9101` "pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT) there is no active plan to use it; this removes the member. Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Acked-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:21:31 -07:00
Christophe Jaillet	e29dd44325	net: qlcnic: Deletion of unnecessary memset There is no need to memset memory allocated with vzalloc. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:20:33 -07:00
David S. Miller	9061cb0235	Merge branch 'gianfar_rx_sg' Claudiu Manoil says: ==================== gianfar: Add Rx S/G This patch-set introduces scatter/gather support on the Rx side, addressing Rx path performance issues in the driver. Thanks. As an example, two boards connected back-to-back were used to measure the throughput, running the same kernel 4.1, before and after applying these patches. The netperf UDP_STREAM results below show that the bottleneck lies on the Rx side BEFORE applying the patches, and that the Rx throughput is even lower with a larger MTU. AFTER applying the patches the Rx bottleneck is gone (Rx throughput matches the Tx one) and the RX throughput is not influenced by MTU size any longer (as expected). BEFORE: 1) MTU 1500 (default) root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 512 150.00 20119124 0 549.4 100.00 14.911 163840 150.00 14057349 383.9 100.00 14.911 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 64 150.00 23654013 0 80.7 100.00 101.463 163840 150.00 15875288 54.2 100.00 101.463 2) MTU 8000 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 512 150.00 20067232 0 548.0 100.00 14.950 163840 150.00 6113498 166.9 99.95 14.942 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 64 150.00 23621279 0 80.6 100.00 101.604 163840 150.00 5868602 20.0 99.96 101.563 AFTER: (both MTU 1500 and MTU 8000) root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 512 150.00 19914969 0 543.8 100.00 15.064 163840 150.00 19914969 543.8 99.35 14.966 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 64 150.00 23433989 0 80.0 100.00 102.416 163840 150.00 23433989 80.0 99.62 102.023 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:13:24 -07:00
Claudiu Manoil	75354148ce	gianfar: Add paged allocation and Rx S/G The eTSEC h/w is capable of scatter/gather on the receive side too if MAXFRM > MRBLR, when the allowed maximum Rx frame size is set to be greater than the maximum Rx buffer size (MRBLR). It's about time the driver makes use of this h/w capability, by supporting fixed buffer sizes and Rx S/G. The buffer size given to eTSEC for reception is fixed to 1536B (must be multiple of 64), which is the same default buffer size as before, used to accommodate standard MTU (1500B) size frames. As before, eTSEC can receive frames of up to 9600B. Individual Rx buffers are mapped to page halves (page size for eTSEC systems is 4KB). The skb is built around the first buffer of a frame (using build_skb()). In case the frame spans multiple buffers, the trailing buffers are added as Rx fragments to the skb. The last buffer in frame is marked by the L status flag. A mechanism is in place to reuse the pages owned by the driver (for Rx) for subsequent receptions. Supporting fixed size buffers allows the implementation of Rx S/G, which in turn removes the memory pressure issues the driver had before when MTU was set for jumbo frame reception. Also, in most cases, the Rx path becomes faster due to Rx page reusal, since the overhead of allocating new rx buffers is removed from the fast path. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:13:24 -07:00
Claudiu Manoil	f23223f15f	gianfar: Use ndev, more Rx path cleanup Use "ndev" instead of "dev", as the rx queue back pointer to a net_device struct, to avoid name clashing with a "struct device" reference. This prepares the addition of a "struct device" back pointer to the rx queue structure. Remove duplicated rxq registration in the process. Move napi_gro_receive() outside gfar_process_frame(). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:13:24 -07:00
Claudiu Manoil	f966082e20	gianfar: Fix and cleanup rxbd status handling There are several (long standing) problems about how the status field of the rx buffer descriptor (rxbd) is currently handled on the error path: - too many unnecessary 16bit reads of the two halves of the rxbd status field (32bit), also resulting in overuse of endianness convesion macros; - "bdp->status = RXBD_LARGE" makes no sense, since the "large" flag is read only (only eTSEC can write it), and trying to clear the other status bits is also error prone in this context (most of the rx status bits are read only anyway). This is fixed with a single 32bit read of the "status" field, and then the appropriate 16bit shifting is applied to access the various status bits or the rx frame length. Also corrected the use of the RXBD_LARGE flag. Additional fix: "rx_over_errors" stat is incremented instead of "rx_crc_errors" in case of RXBD_OVERRUN occurrence. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:13:24 -07:00

1 2 3 4 5 ...

533539 Commits