linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Sowmini Varadhan	e97656d03c	rds: tcp: allow progress of rds_conn_shutdown if the rds_connection is marked ERROR by an intervening FIN rds_conn_shutdown() runs in workq context, and marks the rds_connection as DISCONNECTING before quiescing Tx/Rx paths. However, after all I/O has quiesced, we may still find the rds_connection state to be RDS_CONN_ERROR if an intervening FIN was processed in softirq context. This is not a fatal error: rds_conn_shutdown() should continue the shutdown, and there is no need to log noisy messages about this event. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:41:00 -07:00
Eric Dumazet	d3fbff306c	sock: correctly test SOCK_TIMESTAMP in sock_recv_ts_and_drops() It seems the code does not match the intent. This broke packetdrill, and probably other programs. Fixes: `6c7c98bad4` ("sock: avoid dirtying sk_stamp, if possible") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:34:55 -07:00
Andrew Morton	e270e96686	drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: fix build with gcc-4.4.4 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 'mlx5e_set_rxfh': drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: error: unknown field 'rss' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: missing braces around initializer drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: (near initialization for 'rrp.<anonymous>') drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1068: error: unknown field 'rss' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: excess elements in struct initializer drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: (near initialization for 'rrp') gcc-4.4.4 has issues with anonymous union initializers. Work around this. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:32:57 -07:00
Andrew Morton	956327913c	drivers/net/ethernet/mellanox/mlx5/core/en_main.c: fix build with gcc-4.4.4 drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts': drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2210: error: unknown field 'rqn' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: missing braces around initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: (near initialization for 'direct_rrp.<anonymous>') drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts_to_channels': drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: error: unknown field 'rss' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: missing braces around initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: (near initialization for 'rrp.<anonymous>') drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: initialization makes integer from pointer without a cast drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2228: error: unknown field 'rss' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: excess elements in struct initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: (near initialization for 'rrp') drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts_to_drop': drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2238: error: unknown field 'rqn' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: missing braces around initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: (near initialization for 'drop_rrp.<anonymous>') gcc-4.4.4 has issues with anonymous union initializers. Work around this. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:32:57 -07:00
Joao Pinto	44781fef13	net: stmmac: fix cbs configuration Sending again, because forgot to include net-dev. The QoS IP does not accept AVB capabilities to default/queue 0, this way we guarantee 75% bandwidth for AVB. This patch assures that only queues >= 1 gets CBS confgured. Additional info was also added to stmmac.txt. Reported-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:30:06 -07:00
David S. Miller	a6fc09dff2	Merge branch 'mpls-more-labels' David Ahern says: ==================== net: mpls: Allow users to configure more labels per route Increase the maximum number of new labels for MPLS routes from 2 to 30. To keep memory consumption in check, the labels array is moved to the end of mpls_nh and mpls_iptunnel_encap structs as a 0-sized array. Allocations use the maximum number of labels across all nexthops in a route for LSR and the number of labels configured for LWT. The mpls_route layout is changed to: +----------------------+ \| mpls_route \| +----------------------+ \| mpls_nh 0 \| +----------------------+ \| alignment padding \| 4 bytes for odd number of labels; 0 for even +----------------------+ \| via[rt_max_alen] 0 \| +----------------------+ \| alignment padding \| via's aligned on sizeof(unsigned long) +----------------------+ \| ... \| Meaning the via follows its mpls_nh providing better locality as the number of labels increases. UDP_RR tests with namespaces shows no impact to a modest performance increase with this layout for 1 or 2 labels and 1 or 2 nexthops. mpls_route allocation size is limited to 4096 bytes allowing on the order of 30 nexthops with 30 labels (or more nexthops with fewer labels). LWT encap shares same maximum number of labels as mpls routing. v3 - initialize n_labels to 0 in case RTA_NEWDST is not defined; detected by the kbuild test robot v2 - updates per Eric's comments + added patch to ensure all reads of rt_nhn_alive and nh_flags in the packet path use READ_ONCE and all writes via event handlers use WRITE_ONCE + limit mpls_route size to 4096 (PAGE_SIZE for most arch) + mostly killed use of MAX_NEW_LABELS; it exists only for common limit between lwt and routing paths ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:45 -07:00
David Ahern	1511009cd6	net: mpls: Increase max number of labels for lwt encap Alow users to push down more labels per MPLS encap. Similar to LSR case, move label array to the end of mpls_iptunnel_encap and allocate based on the number of labels for the route. For consistency with the LSR case, re-use the same maximum number of labels. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	a4ac8c986d	net: mpls: bump maximum number of labels Allow users to push down more labels per MPLS route. With the previous patches, no memory allocations are based on MAX_NEW_LABELS; the limit is only used to keep userspace in check. At this point MAX_NEW_LABELS is only used for mpls_route_config (copying route data from userspace) and processing nexthops looking for the max number of labels across the route spec. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	df1c631648	net: mpls: Limit memory allocation for mpls_route Limit memory allocation size for mpls_route to 4096. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	59b209667a	net: mpls: change mpls_route layout Move labels to the end of mpls_nh as a 0-sized array and within mpls_route move the via for a nexthop after the mpls_nh. The new layout becomes: +----------------------+ \| mpls_route \| +----------------------+ \| mpls_nh 0 \| +----------------------+ \| alignment padding \| 4 bytes for odd number of labels; 0 for even +----------------------+ \| via[rt_max_alen] 0 \| +----------------------+ \| alignment padding \| via's aligned on sizeof(unsigned long) +----------------------+ \| ... \| +----------------------+ \| mpls_nh n-1 \| +----------------------+ \| via[rt_max_alen] n-1 \| +----------------------+ Memory allocated for nexthop + via is constant across all nexthops and their via. It is based on the maximum number of labels across all nexthops and the maximum via length. The size is saved in the mpls_route as rt_nh_size. Accessing a nexthop becomes rt->rt_nh + index * rt->rt_nh_size. The offset of the via address from a nexthop is saved as rt_via_offset so that given an mpls_nh pointer the via for that hop is simply nh + rt->rt_via_offset. With prior code, memory allocated per mpls_route with 1 nexthop: via is an ethernet address - 64 bytes via is an ipv4 address - 64 via is an ipv6 address - 72 With this patch set, memory allocated per mpls_route with 1 nexthop and 1 or 2 labels: via is an ethernet address - 56 bytes via is an ipv4 address - 56 via is an ipv6 address - 64 The 8-byte reduction is due to the previous patch; the change introduced by this patch has no impact on the size of allocations for 1 or 2 labels. Performance impact of this change was examined using network namespaces with veth pairs connecting namespaces. ns0 inserts the packet to the label-switched path using an lwt route with encap mpls. ns1 adds 1 or 2 labels depending on test, ns2 (and ns3 for 2-label test) pops the label and forwards. ns3 (or ns4) for a 2-label is the destination. Similar series of namespaces used for 2-nexthop test. Intent is to measure changes to latency (overhead in manipulating the packet) in the forwarding path. Tests used netperf with UDP_RR. IPv4: current patches 1 label, 1 nexthop 29908 30115 2 label, 1 nexthop 29071 29612 1 label, 2 nexthop 29582 29776 2 label, 2 nexthop 29086 29149 IPv6: current patches 1 label, 1 nexthop 24502 24960 2 label, 1 nexthop 24041 24407 1 label, 2 nexthop 23795 23899 2 label, 2 nexthop 23074 22959 In short, the change has no effect to a modest increase in performance. This is expected since this patch does not really have an impact on routes with 1 or 2 labels (the current limit) and 1 or 2 nexthops. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	77ef013aad	net: mpls: Convert number of nexthops to u8 Number of nexthops and number of alive nexthops are tracked using an unsigned int. A route should never have more than 255 nexthops so convert both to u8. Update all references and intermediate variables to consistently use u8 as well. Shrinks the size of mpls_route from 32 bytes to 24 bytes with a 2-byte hole before the nexthops. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	39eb8cd175	net: mpls: rt_nhn_alive and nh_flags should be accessed using READ_ONCE The number of alive nexthops for a route (rt->rt_nhn_alive) and the flags for a next hop (nh->nh_flags) are modified by netdev event handlers. The event handlers run with rtnl_lock held so updates are always done with the lock held. The packet path accesses the fields under the rcu lock. Since those fields can change at any moment in the packet path, both fields should be accessed using READ_ONCE. Updates to both fields should use WRITE_ONCE. Update mpls_select_multipath (packet path) and mpls_ifdown and mpls_ifup (event handlers) accordingly. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
Paolo Abeni	3d8417d79e	udp: use sk_protocol instead of pcflag to detect udplite sockets In the udp_sock struct, the 'forward_deficit' and 'pcflag' fields share the same cacheline. While the first is dirtied by udp_recvmsg, the latter is read, possibly several times, by the bottom half processing to discriminate between udp and udplite sockets. With this patch, sk->sk_protocol is used to check is the socket is really an udplite one, avoiding some cache misses per packet and improving the performance under udp_flood with small packet up to 10%. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:11:36 -07:00
Tobias Regnery	768bfa2a06	net: dsa: fix build error with devlink build as module After commit `96567d5dac` ("net: dsa: dsa2: Add basic support of devlink") I see the following link error with CONFIG_NET_DSA=y and CONFIG_NET_DEVLINK=m: net/built-in.o: In function 'dsa_register_switch': (.text+0xe226b): undefined reference to `devlink_alloc' net/built-in.o: In function 'dsa_register_switch': (.text+0xe2284): undefined reference to `devlink_register' net/built-in.o: In function 'dsa_register_switch': (.text+0xe243e): undefined reference to `devlink_port_register' net/built-in.o: In function 'dsa_register_switch': (.text+0xe24e1): undefined reference to `devlink_port_register' net/built-in.o: In function 'dsa_register_switch': (.text+0xe24fa): undefined reference to `devlink_port_type_eth_set' net/built-in.o: In function 'dsa_dst_unapply.part.8': dsa2.c:(.text.unlikely+0x345): undefined reference to 'devlink_port_unregister' dsa2.c:(.text.unlikely+0x36c): undefined reference to 'devlink_port_unregister' dsa2.c:(.text.unlikely+0x38e): undefined reference to 'devlink_port_unregister' dsa2.c:(.text.unlikely+0x3f2): undefined reference to 'devlink_unregister' dsa2.c:(.text.unlikely+0x3fb): undefined reference to 'devlink_free' Fix this by adding a dependency on MAY_USE_DEVLINK so that CONFIG_NET_DSA get switched to be build as module when CONFIG_NET_DEVLINK=m. Fixes: `96567d5dac` ("net: dsa: dsa2: Add basic support of devlink") Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:10:20 -07:00
David S. Miller	88f913f5ee	Merge branch 'phylib-EEE-updates' Russell King says: ==================== phylib EEE updates This series of patches depends on the previous set of changes, and is therefore net-next material. While testing the EEE code, I discovered a number of issues: 1. It is possible to enable advertisment of EEE modes which are not supported by the hardware. We omit to check the supported modes and mask off those modes that are not supported before writing the EEE advertisment register. 2. We need to restart autonegotiation after a change of the EEE advertisment, otherwise the link partner does not see the updated EEE modes. 3. SGMII connected PHYs are also capable of supporting EEE. Through discussion with Florian, it has been decided to remove the check for the PHY interface mode in patch (3). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:04:04 -07:00
Russell King	32d751412b	net: phy: allow EEE with any interface mode EEE is able to work in any PHY interface mode, there is nothing which fundamentally restricts it to only a few modes. For example, EEE works in SGMII mode with the Marvell 88E1512. Rather than just adding SGMII mode to the list, Florian suggests removing the list of interface modes entirely: It actually sounds like we should just kill the check entirely, it does not appear that any of the interface mode would not fundamentally be able to support EEE, because the "lowest" mode we support is MII, and even there it's quite possible to support EEE. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:04:03 -07:00
Russell King	f75abeb833	net: phy: restart phy autonegotiation after EEE advertisment change When the EEE advertisment is changed, we should restart autonegotiation to update the link partner with the new EEE settings. Add this trigger but only if the advertisment has changed. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:04:03 -07:00
Russell King	83ea067fe2	net: phy: avoid setting unsupported EEE advertisments We currently allow userspace to set any EEE advertisments it desires, whether or not the PHY supports them. For example: # ethtool --set-eee eth1 advertise 0xffffffff # ethtool --show-eee eth1 EEE Settings for eth1: EEE status: disabled Tx LPI: disabled Supported EEE link modes: 100baseT/Full 1000baseT/Full 10000baseT/Full Advertised EEE link modes: 100baseT/Full 1000baseT/Full 1000baseKX/Full 10000baseT/Full 10000baseKX4/Full 10000baseKR/Full Clearly, this is not sane, we should only allow link modes that are supported to be advertised (as we do elsewhere.) Ensure that we mask the MDIO_AN_EEE_ADV value with the capabilities retrieved from the MDIO_PCS_EEE_ABLE register. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:04:03 -07:00
David S. Miller	eefe06e8ce	Merge branch 'bpf-prog-testing-framework' Alexei Starovoitov says: ==================== bpf: program testing framework Development and testing of networking bpf programs is quite cumbersome. Especially tricky are XDP programs that attach to real netdevices and program development feels like working on the car engine while the car is in motion. Another problem is ongoing changes to upstream llvm core that can introduce an optimization that verifier will not recognize. llvm bpf backend tests have no ability to run the programs. To improve this situation introduce BPF_PROG_TEST_RUN command to test and performance benchmark bpf programs. It achieves several goals: - development of xdp and skb based bpf programs can be done in a canned environment with unit tests - program performance optimizations can be benchmarked outside of networking core (without driver and skb costs) - continuous testing of upstream changes is finally practical Patches 4,5,6 add C based test cases of various complexity to cover some sched_cls and xdp features. More tests will be added in the future. The tests were run on centos7 only. For now the framework supports only skb and xdp programs. In the future it can be extended to socket_filter and tracing program types. More details are in individual patches. v1->v2: - rename bpf_program_test_run->bpf_prog_test_run - add missing #include <linux/bpf.h> since libbpf.h shouldn't depend on prior includes - reordered patches 3 and 4 to keep bisect clean ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:58 -07:00
Alexei Starovoitov	3782161362	selftests/bpf: add l4 load balancer test based on sched_cls this l4lb demo is a comprehensive test case for LLVM codegen and kernel verifier. It's using fully inlined jhash(), complex packet parsing and multiple map lookups of different types to stress llvm and verifier. The map sizes, map population and test vectors are artificial to exercise different paths through the bpf program. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Alexei Starovoitov	8d48f5e427	selftests/bpf: add a test for basic XDP functionality add C test for xdp_adjust_head(), packet rewrite and map lookups Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Alexei Starovoitov	6882804c91	selftests/bpf: add a test for overlapping packet range checks add simple C test case for llvm and verifier range check fix from commit `b1977682a3` ("bpf: improve verifier packet range checks") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Alexei Starovoitov	dd26b7f54a	tools/lib/bpf: expose bpf_program__set_type() expose bpf_program__set_type() to set program type Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Alexei Starovoitov	3084887378	tools/lib/bpf: add support for BPF_PROG_TEST_RUN command add support for BPF_PROG_TEST_RUN command to libbpf.a Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Alexei Starovoitov	1cf1cae963	bpf: introduce BPF_PROG_TEST_RUN command development and testing of networking bpf programs is quite cumbersome. Despite availability of user space bpf interpreters the kernel is the ultimate authority and execution environment. Current test frameworks for TC include creation of netns, veth, qdiscs and use of various packet generators just to test functionality of a bpf program. XDP testing is even more complicated, since qemu needs to be started with gro/gso disabled and precise queue configuration, transferring of xdp program from host into guest, attaching to virtio/eth0 and generating traffic from the host while capturing the results from the guest. Moreover analyzing performance bottlenecks in XDP program is impossible in virtio environment, since cost of running the program is tiny comparing to the overhead of virtio packet processing, so performance testing can only be done on physical nic with another server generating traffic. Furthermore ongoing changes to user space control plane of production applications cannot be run on the test servers leaving bpf programs stubbed out for testing. Last but not least, the upstream llvm changes are validated by the bpf backend testsuite which has no ability to test the code generated. To improve this situation introduce BPF_PROG_TEST_RUN command to test and performance benchmark bpf programs. Joint work with Daniel Borkmann. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Florian Fainelli	98cd1552ea	net: dsa: Mock-up driver This patch adds support for a DSA mock-up driver which essentially does the following: - registers/unregisters 4 fixed PHYs to the slave network devices - uses eth0 (configurable) as the master netdev - registers the switch as a fixed MDIO device against the fixed MDIO bus at address 31 - includes dynamic debug prints for dsa_switch_ops functions that can be enabled to get call traces This is a good way to test modular builds as well as exercise the DSA APIs without requiring access to real hardware. This does not test the data-path, although this could be added later on. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:39:32 -07:00
David S. Miller	772c3bdad1	Merge branch 'mv88e6xxx-cross-chip-bridging' Vivien Didelot says: ==================== net: dsa: mv88e6xxx: program cross-chip bridging The purpose of this patch series is to bring hardware cross-chip bridging configuration to the DSA layer and the mv88e6xxx DSA driver. Most recent Marvell switch chips have a Cross-chip Port Based VLAN Table (PVT) used to restrict to which internal destination port an arbitrary external source port is allowed to egress frames to. The current behavior of the mv88e6xxx driver is to program this table table with all ones, allowing any external ports to egress frames on any internal ports. This means that carefully crafted Ethernet frames can potentially bypass the user bridging configuration. Patches 1 to 7 prepare the setup of this table and factorize the common bits of both in-chip and cross-chip Marvell bridging code. Patch 8 adds new optional cross-chip bridging operations to DSA switch. Patch 9 switches the current behavior to program the table according to the user bridging configuration when (cross-chip) ports get (un)bridged. On a ZII Rev B board, bridging together the 3 user ports of both 88E6352 will result in the following PVTs on respectively switch 0 and switch 1: External Internal Ports Dev Port 0 1 2 3 4 5 6 1 0 * * * - - * * 1 1 * * * - - * * 1 2 * * * - - * * 1 3 - - - - - * * 1 4 - - - - - * * 1 5 * * * * * * * 1 6 * * * * * * * 0 0 * * * - - * * 0 1 * * * - - * * 0 2 * * * - - * * 0 3 - - - - - * * 0 4 - - - - - * * 0 5 * * * * * * * 0 6 * * * * * * * Changes since v2: - Define MV88E6XXX_MAX_PVT_SWITCHES and MV88E6XXX_MAX_PVT_PORTS - use mv88e6xxx_g2_misc_4_bit_port instead of the 5-bit variant - add Andrew's tags and reword commit 6/9 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:59 -07:00
Vivien Didelot	aec5ac88d3	net: dsa: mv88e6xxx: add cross-chip bridging Implement the DSA cross-chip bridging operations by remapping the local ports an external source port can egress frames to, when this cross-chip port joins or leaves a bridge. The PVT is no longer configured with all ones allowing any external frame to egress any local port. Only DSA and CPU ports, as well as bridge group members, can egress frames on local ports. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:57 -07:00
Vivien Didelot	40ef2c9339	net: dsa: add cross-chip bridging operations Introduce crosschip_bridge_{join,leave} operations in the dsa_switch_ops structure, which can be used by switches supporting interconnection. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:57 -07:00
Vivien Didelot	e96a6e0275	net: dsa: mv88e6xxx: remap existing bridge members When a local port of a switch chip becomes a member of a bridge group, we need to reprogram the Cross-chip Port Based VLAN Table (PVT) to allow existing cross-chip bridge members to egress frames on the new ports. There is no functional changes yet, since the PVT is still programmed with all ones, allowing any external port to egress frames locally. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:57 -07:00
Vivien Didelot	240ea3ef70	net: dsa: mv88e6xxx: factorize in-chip bridge map Factorize the code in the DSA port_bridge_{join,leave} routines used to program the port VLAN map of all local ports of a given bridge group. At the same time shorten the _mv88e6xxx_port_based_vlan_map to get rid of the old underscore prefix naming convention. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:57 -07:00
Vivien Didelot	e5887a2a11	net: dsa: mv88e6xxx: rework in-chip bridging All ports -- internal and external, for chips featuring a PVT -- have a mask restricting to which internal ports a frame is allowed to egress. Now that DSA exposes the number of ports and their bridge devices, it is possible to extract the code generating the VLAN map and make it generic so that it can be shared later with the cross-chip bridging code. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:57 -07:00
Vivien Didelot	73b1204d07	net: dsa: mv88e6xxx: allocate the number of ports The current code allocates DSA_MAX_PORTS ports for a Marvell dsa_switch structure. Provide the exact number of ports so the corresponding ds->num_ports is accurate. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:57 -07:00
Vivien Didelot	17a1594e2d	net: dsa: mv88e6xxx: program the PVT with all ones The Cross-chip Port Based VLAN Table (PVT) is currently initialized with all ones, allowing any external ports to egress frames on local ports. This commit implements the PVT access functions and programs the PVT with all ones for the local switch ports only, instead of using the Init operation. The current behavior is unchanged for the moment. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:57 -07:00
Vivien Didelot	812289960f	net: dsa: mv88e6xxx: use 4-bit port for PVT data The Cross-chip Port Based VLAN Table (PVT) supports two indexing modes, one using 5-bit for device and 4-bit for port, the other using 4-bit for device and 5-bit for port, configured via the Global 2 Misc register. Only 4 bits for the source port are needed when interconnecting 88E6xxx switch devices since they all support less than 16 physical ports. The full 5 bits are needed when interconnecting a device with 98DXxxx switch devices since they support more than 16 physical ports. Add a mv88e6xxx_pvt_setup helper to set the 4-bit port PVT mode, which will be extended later to also initialize the PVT content. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:57 -07:00
Vivien Didelot	f364565221	net: dsa: mv88e6xxx: move PVT description in info Not all Marvell switch chips feature a Cross-chip Port VLAN Table (PVT). Chips with a PVT use the same implementation, so a new mv88e6xxx_ops member won't be necessary yet. Add a "pvt" boolean member to the mv88e6xxx_info structure and kill the obsolete MV88E6XXX_FLAGS_PVT flag. Add a mv88e6xxx_has_pvt helper to wrap future checks of that condition. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:22:57 -07:00
Madalin Bucur	58b7bd0f4b	dpaa_eth: use AVOIDBLOCK for Tx confirmation queues The AVOIDBLOCK flag determines the Tx confirmation queues processing to be redirected to any available CPU when the current one is slow in processing them. This may result in a higher Tx confirmation interrupt count but may reduce pressure on a certain CPU that with the previous setting would process all Tx confirmation frames. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:03:31 -07:00
Madalin Bucur	b07e675b06	fsl/fman: take into account all RGMII modes Accept the internal delay RGMII variants. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 11:49:14 -07:00
Felix Manlunas	d6acfeb17d	vxlan: vxlan dev should inherit lowerdev's gso_max_size vxlan dev currently ignores lowerdev's gso_max_size, which adversely affects TSO performance of liquidio if it's the lowerdev. Egress TCP packets' skb->len often exceed liquidio's advertised gso_max_size. This may happen on other NIC drivers. Fix it by assigning lowerdev's gso_max_size to that of vxlan dev. Might as well do likewise for gso_max_segs. Single flow TSO throughput of liquidio as lowerdev (using iperf3): Before the patch: 139 Mbps After the patch : 8.68 Gbps Percent increase: 6,144 % Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 11:43:38 -07:00
Paolo Abeni	6c7c98bad4	sock: avoid dirtying sk_stamp, if possible sock_recv_ts_and_drops() unconditionally set sk->sk_stamp for every packet, even if the SOCK_TIMESTAMP flag is not set in the related socket. If selinux is enabled, this cause a cache miss for every packet since sk->sk_stamp and sk->sk_security share the same cacheline. With this change sk_stamp is set only if the SOCK_TIMESTAMP flag is set, and is cleared for the first packet, so that the user perceived behavior is unchanged. This gives up to 5% speed-up under udp-flood with small packets. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 20:05:24 -07:00
David S. Miller	7801a3225e	Merge branch 'ibmvnic-cleanup-resource-handling' Nathan Fontenot says: ==================== ibmvnic: Cleanup resource handling In order to better manage the resources of the ibmvnic driver, this set of patches creates a set of initialization and release routines for the drivers resources. Additionally, some patches do some re-naming of the affected routines so that there is a common naming scheme in the driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:58:43 -07:00
Nathan Fontenot	1b8955ee5f	ibmvnic: Cleanup failure path in ibmvnic_open Now that ibmvnic_release_resources will clean up all of our resources properly, even if they were not allocated, we can just call this for failues in ibmvnic_open. This patch also moves the ibmvnic_release_resources() routine up in the file to avoid creating a forward declaration ad re-names it to drop the ibmvnic prefix. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:58:43 -07:00
Nathan Fontenot	7bbc27a496	ibmvnic: Create init/release routines for stats token Create an initialization and a release routine for the stats token used by the ibmvnic driver. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:58:43 -07:00
Nathan Fontenot	b510888f96	ibmvnic: Merge the two release_sub_crq_queue routines Keeping two routines for releasing sub crqs, one for when irqs are not initialized and one for when they are, is a bit of overkill. Merge the two routines to a common release routine that will check for an irq and release it if needed. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:58:42 -07:00
Nathan Fontenot	0ffe2cb790	ibmvnic: Create init and release routines for the rx pool Move the initialization and the release of the rx pool to their own routines, and update them to do validation. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:58:42 -07:00
Nathan Fontenot	c657e32cd0	ibmvnic: Create init and release routines for the tx pool Move the initialization and the release of the tx pool to their own routines, and update them to do validation. This also adds validation to the release of the long term buffer. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:58:42 -07:00
Nathan Fontenot	f0b8c96cbc	ibmvnic: Create init and release routines for the bounce buffer Move the handling of initialization and releasing the bounce buffer to their own init and release routines. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:58:42 -07:00
Nathan Fontenot	f992887c34	ibmvnic: Update main crq initialization and release Update the initialization and release routines for the crq queue so that we validate the crq queue. Additionally this updates the naming of the init and release routines for the crq queue to drop the ibmvnic prefix. This matches the naming for similar routines in the driver Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:58:42 -07:00
Gao Feng	1935299d9c	net: tcp: Refine the __tcp_select_window 1. Move the "window = tp->rcv_wnd;" into the condition block without tp->rx_opt.rcv_wscale. Because it is unnecessary when enable wscale; 2. Use the macro ALIGN instead of two statements. The two statements are used to make window align to 1<<wscale. Use the ALIGN is more clearer. 3. Use the rounddown to make codes clearer. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:41:32 -07:00
Vivien Didelot	bae76dd95b	net: dsa: mv88e6xxx: debug ATU Age Time The ATU ageing time value programmed in the switch is rounded up to the nearest multiple of its coefficient (variable depending on the model.) Add a debug message to inform the user about the exact programmed value. On 6352, "brctl setageing br0 18" gives "AgeTime set to 0x01 (15000 ms)" while on 6390 we get "AgeTime set to 0x05 (18750 ms)". Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-30 15:35:23 -07:00

1 2 3 4 5 ...

663219 Commits