linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 04:45:12 +07:00

Author	SHA1	Message	Date
Linus Torvalds	f9da455b93	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...	2014-06-12 14:27:40 -07:00
David S. Miller	f666f87b94	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/xen-netback/netback.c net/core/filter.c A filter bug fix overlapped some cleanups and a conversion over to some new insn generation macros. A xen-netback bug fix overlapped the addition of multi-queue support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 16:22:02 -07:00
Fabio Estevam	d4c642ea12	gianfar: Call netif_carrier_off() prior to registration Quoting David Miller: "At the moment you call register_netdev() the device is visible, notifications are sent to userspace, and userland tools can try to bring the interface up and see the incorrect link state, before you do the netif_carrier_off(). Said another way, between the register_netdev() and netif_carrier_off() call, userspace can see the device in an inconsistent state." So call netif_carrier_off() prior to register_netdev(). Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:03:46 -07:00
Xiubo Li	898157ed74	gianfar: Fix the section mismatch warnings. Building with CONFIG_DEBUG_SECTION_MISMATCH enabled, the following WARNING is occured: LD drivers/net/built-in.o WARNING: drivers/net/built-in.o(.text+0xcd4c): Section mismatch in reference from the function gfar_probe() to the function .init.text:gfar_init_addr_hash_table() The function gfar_probe() references the function __init gfar_init_addr_hash_table(). This is often because gfar_probe lacks a __init annotation or the annotation of gfar_init_addr_hash_table is wrong. Signed-off-by: Xiubo Li <Li.Xiubo@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:55:23 -07:00
Linus Torvalds	776edb5931	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__() arch,doc: Convert smp_mb__() arch,xtensa: Convert smp_mb__() arch,x86: Convert smp_mb__() arch,tile: Convert smp_mb__() arch,sparc: Convert smp_mb__() arch,sh: Convert smp_mb__() arch,score: Convert smp_mb__() arch,s390: Convert smp_mb__() arch,powerpc: Convert smp_mb__() arch,parisc: Convert smp_mb__() arch,openrisc: Convert smp_mb__() arch,mn10300: Convert smp_mb__() arch,mips: Convert smp_mb__() arch,metag: Convert smp_mb__() arch,m68k: Convert smp_mb__() arch,m32r: Convert smp_mb__() arch,ia64: Convert smp_mb__() ...	2014-06-03 12:57:53 -07:00
Florian Fainelli	be40364544	gianfar: use the new fixed PHY helpers of_phy_connect_fixed_link() is becoming obsolete, and also required platform code to register the fixed PHYs at the specified addresses for those to be usable. Get rid of it and use the new of_phy_is_fixed_link() plus of_phy_register_fixed_link() helpers to transition over the new scheme. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:16:43 -04:00
Claudiu Manoil	6ce29b0e2a	gianfar: Avoid unnecessary reg accesses in adjust_link() For phy devices that don't issue interrupts upon link state changes, phylib polls the link state resulting in repeated calls to adjust_link(), even if the link state didn't change. As a result, some mac registers are repeatedly read and written with the same values, which is not ok. To fix this, adjust_link() has been refactored to check first whether the link state has changed and to take action only if needed, updating mac registers and local state variables. The 'new_state' local flag, set if one of the link params changed (link, speed or duplex), has been rendered useless and removed by this refactoring. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-30 16:12:23 -04:00
Peter Zijlstra	4e857c58ef	arch: Mass conversion of smp_mb__() Mostly scripted conversion of the smp_mb__ barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 14:20:48 +02:00
Claudiu Manoil	c65d753372	gianfar: Fix P1010 config regression (SQ polling) The P1010 device tree restricts the number of supported interrupt groups to 1, although the eth controller can support 2 interrupt groups and the driver assumes the Multi-Group mode ("fsl,etsec2" model). So, in this case the assumption that the Multi-Group mode (MQ_MG_MODE) devices always support 2 interrupt groups is false. To fix this, a check for the actual number of interrupt groups enabled in the board's device tree has been added in gfar_probe for the "fsl,etsec2" devices. Without this fix, P1010 based boards claim support for 2 Tx queues to the net stack but only one is actually allocated, leading to NULL access in xmit. This issue was introduced by enabling Single-Queue polling for the P1010 devices. (`71ff9e3` gianfar: Use Single-Queue polling for "fsl,etsec2") Fixes: `71ff9e3df7` Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-24 00:40:44 -04:00
Eric W. Biederman	c9974ad4ae	gianfar: Carefully free skbs in functions called by netpoll. netpoll can call functions in hard irq context that are ordinarily called in lesser contexts. For those functions use dev_kfree_skb_any and dev_consume_skb_any so skbs are freed safely from hard irq context. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 16:22:13 -04:00
Claudiu Manoil	b338ce270e	gianfar: Fix multi-queue support checks @probe() priv is not instantiated at gfar_of_init() time, when parsing the DT for info on supported HW queues. Before the netdev can be allocated, the number of supported queues must be known. Because the number of supported queues depends on device type, move the compatibility checks before netdev allocation. Local vars are used to hold the operation mode info before netdev allocation. This fixes the null accesses for priv->.., in gfar_of_init. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 00:47:52 -04:00
Claudiu Manoil	71ff9e3df7	gianfar: Use Single-Queue polling for "fsl,etsec2" For the "fsl,etsec2" compatible models the driver currently supports 8 Tx and Rx DMA rings (aka HW queues). However, there are only 2 pairs of Rx/Tx interrupt lines, as these controllers are integrated in low power SoCs with 2 CPUs at most. As a result, there are at most 2 NAPI instances that have to service multiple Tx and Rx queues for these devices. This complicates the NAPI polling routine having to iterate over the mutiple Rx/Tx queues hooked to the same interrupt lines. And there's also an overhead at HW level, as the controller needs to service all the 8 Tx rings in a round robin manner. The combined overhead shows up for multi parallel Tx flows transmitted by the kernel stack, when the driver usually starts returning NETDEV_TX_BUSY leading to NETDEV WATCHDOG Tx timeout triggering if the Tx path is congested for too long. As an alternative, this patch makes the driver support only one Tx/Rx DMA ring per NAPI instance (per interrupt group or pair of Tx/Rx interrupt lines) by default. The simplified single queue polling routine (gfar_poll_sq) will be the default napi poll routine for the etsec2 devices too. Some adjustments needed to be made to link the Tx/Rx HW queues with each NAPI instance (2 in this case). The gfar_poll_sq() is already successfully used by older SQ_SG_MODE (single interrupt group) controllers. This patch fixes Tx timeout triggering under heavy Tx traffic load (i.e. iperf -c -P 8) for the "fsl,etsec2" (currently the only MQ_MG_MODE devices). There's also a significant memory footprint reduction by supporting 2 Rx/Tx DMA rings (at most), instead of 8, for these devices. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-10 13:17:22 -04:00
Claudiu Manoil	aeb12c5ef7	gianfar: Separate out the Tx interrupt handling (Tx NAPI) There are some concurrency issues on devices w/ 2 CPUs related to the handling of Rx and Tx interrupts. eTSEC has separate interrupt lines for Rx and Tx but a single imask register to mask these interrupts and a single NAPI instance to handle both Rx and Tx work. As a result, the Rx and Tx ISRs are identical, both are invoking gfar_schedule_cleanup(), however both handlers can be entered at the same time when the Rx and Tx interrupts are taken by different CPUs. In this case spurrious interrupts (SPU) show up (in /proc/interrupts) indicating a concurrency issue. Also, Tx overruns followed by Tx timeout have been observed under heavy Tx traffic load. To address these issues, the schedule cleanup ISR part has been changed to handle the Rx and Tx interrupts independently. The patch adds a separate NAPI poll routine for Tx cleanup to be triggerred independently by the Tx confirmation interrupts only. Existing poll functions are modified to handle only the Rx path processing. The Tx poll routine does not need a budget, since Tx processing doesn't consume NAPI budget, and hence it is registered with minimum NAPI weight. NAPI scheduling does not require locking since there are different NAPI instances between the Rx and Tx confirmation paths now. So, the patch fixes the occurence of spurrious Rx/Tx interrupts. Tx overruns also occur less frequently now. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-10 13:17:22 -04:00
Claudiu Manoil	f19015baa2	gianfar: Fix Tx int miss, dont write IC on-the-fly Programming the interrupt coalescing (IC) registers while the controller/DMA is on may incur the loss of one Tx confirmation interrupt, under certain conditions. This is a subtle hw race because it does not occur during a burst of Tx packets. It has been observed on p2020 devices that, if just one packet is being xmit'ed, the Tx confirmation doesn't trigger and BQL evetually blocks the Tx queues, followed by Tx timeout and an un-responsive device. This issue was not apparent prior to introducing BQL support, as a late Tx confirmation was not an issue back then and the next burst of Tx frames would have triggered the Tx confirmation/ Tx ring cleanup anyway. Bottom line, the hw specifications state that the IC registers should not be programmed while the Rx/Tx blocks (the DMA) are enabled. Further more, these registers are currently re-written with the same values on the processing path, over and over again. To fix this, rewriting the IC registers has been removed from the processing path (napi poll). A complete MAC reset procedure has been implemented for the ethtool -c option instead, to reliably update these registers while the controller is stopped. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-24 19:38:20 -05:00
Claudiu Manoil	0851133bb5	gianfar: Fix device reset races (oops) for Tx The device reset procedure, stop_gfar()/startup_gfar(), has concurrency issues. "Kernel access of bad area" oopses show up during Tx timeout device reset or other reset cases (like changing MTU) that happen while the interface still has traffic. The oopses happen in start_xmit and clean_tx_ring when accessing tx_queue-> tx_skbuff which is NULL. The race comes from de-allocating the tx_skbuff while transmission and napi processing are still active. Though the Tx queues get temoprarily stopped when Tx timeout occurs, they get re-enabled as a result of Tx congestion handling inside the napi context (see clean_tx_ring()). Not disabling the napi during reset is also a bug, because clean_tx_ring() will try to access tx_skbuff while it is being de-alloc'ed and re-alloc'ed. To fix this, stop_gfar() needs to disable napi processing after stopping the Tx queues. However, in order to prevent clean_tx_ring() to re-enable the Tx queue before the napi gets disabled, the device state DOWN has been introduced. It prevents the Tx congestion management from re-enabling the de-congested Tx queue while the device is brought down. An additional locking state, RESETTING, has been introduced to prevent simultaneous resets or to prevent configuring the device while it is resetting. The bogus 'rxlock's (for each Rx queue) have been removed since their purpose is not justified, as they don't prevent nor are suited to prevent device reset/reconfig races (such as this one). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-24 19:38:20 -05:00
Claudiu Manoil	80ec396cb6	gianfar: Don't free/request irqs on device reset Resetting the device (stop_gfar()/startup_gfar()) should be fast and to the point, in order to timely recover from an error condition (like Tx timeout) or during device reconfig. The irq free/ request routines are just redundant here, and they should be part of the device close/ open routines instead. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-24 19:38:20 -05:00
Claudiu Manoil	88302648be	gianfar: Fix on-the-fly vlan and mtu updates The RCTRL and TCTRL registers should not be changed on-the-fly, while the controller is running, otherwise unexpected behaviour occurs. But that's exactly what gfar_vlan_mode() does, updating the VLAN acceleration bits inside RCTRL/TCTRL. The attempt to lock these operations doesn't help, but only adds to the confusion. There's also a dependency for Rx FCB insertion (activating /de-activating the TOE offload block on Rx) which might change the required rx buffer size. This makes matters worse as gfar_vlan_mode() ends up calling gfar_change_mtu(), though the MTU size remains the same. Note that there are other situations that may affect the required rx buffer size, like changing RXCSUM or rx hw timestamping, but errorneously the rx buffer size is not recomputed/ updated in the process. To fix this, do the vlan updates properly inside the MAC reset and reconfiguration procedure, which takes care of the rx buffer size dependecy and the rx TOE block (PRSDEP) activation/deactivation as well (in the correct order). As a consequence, MTU/ rx buff size updates are done now by the same MAC reset and reconfig procedure, so that out of context updates to MAXFRM, MRBLR, and MACCFG inside change_mtu() are no longer needed. The rx buffer size dependecy to Rx FCB is now handled for the other cases too (RXCSUM and rx hw timestamping). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-24 19:38:20 -05:00
Claudiu Manoil	a328ac92d3	gianfar: Implement MAC reset and reconfig procedure The main MAC config registers like: RCTRL/TCTRL, MRBLR, MAXFRM, RXIC/TXIC, most fields of MACCFG1/2, should not be changed on-the-fly, but at least after stopping the DMA and disabling the Rx/Tx blocks and, for increased reliability, after a MAC soft reset. Impelement a complete MAC soft reset and reconfig procedure following the latest HW advisories - gfar_mac_reset() - to replace gfar_mac_init() and (the confusing) init_registers() functions. Factor out separate config functions for RCTRL and TCTRL, insure programming order of the relevant config regs after MAC soft reset. Split gfar_hw_init() into gfar_mac_reset() and the remaining global regs that don't need to be reconfigured after MAC soft reset (FIFOCFG, ATTRELI, HW counters a.s.o). As gfar_hw_init() now makes all the register writes @probe() time, based on all the device flags and config options, it must be moved further down, just before register_netdev(), as the last config step when the config values are comitted to HW. Also, move netif_carrier_off() after register_netdev(), because it has no effect if called before. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-24 19:38:20 -05:00
Claudiu Manoil	c10650b661	gianfar: Add missing graceful reset steps and fixes gfar_halt() and gfar_start() are responsible for stopping and starting the DMA and the Rx/Tx hw rings. They implement the support for the "graceful Rx/Tx stop/start" hw procedure, and also disable/enable eTSEC's hw interrupts in the process. The GRS/GTS procedure requires however to have the RQUEUE/TQUEUE registers cleared first and to wait for a period of time for the current frame to pass through the interface (around ~10ms for a jumbo frame). Only then may the GTS and GRS bits from DMACTRL be set to shut down the DMA, and finally the Tx_EN and Rx_EN bits in MACCFG1 may be cleared to disable the Tx/Rx blocks. The same register programming order applies to start the Rx/Tx: enabling the RQUEUE/TQUEUE before clearing the GRS/GTS bits. This is a HW recommendation in order to avoid a possible controller "lock up" during graceful reset. Cleanup the gfar_halt()/start() prototypes, to take priv instead of ndev as their purpose is to operate on HW. Enabling the RQUEUE/TQUEUE in the hw_init() is not needed anymore since that's the job of gfar_start(). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 15:03:02 -05:00
Claudiu Manoil	efeddce7ea	gianfar: Factor out enabling/disabling of hw interrupts Throughout the code there are places where the controller's hw interrupt sources need to get disabled/enabled (masked/ un-masked) all at once. The recommendation for disabling the interrupts is to clear the ievent first then the imask register (not the other way around). Use the gfar_ints_enable/disable() helpers to make these operations consistent. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 15:03:02 -05:00
Claudiu Manoil	532c37bcb7	gianfar: Remove useless HAS_PADDING device flag The RCTRL updates of the FSL_GIANFAR_DEV_HAS_PADDING device flag get overriden by the FSL_GIANFAR_DEV_HAS_TIMER flag settings, which impose a Rx padding alignment of 8 bytes. As all the eTSEC devices that set HAS_PADDING also set the HAS_TIMER flag, the HAS_PADDING flag is now obsolete. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 15:03:02 -05:00
Claudiu Manoil	34018fd419	gianfar: Remove sysfs stubs for FIFOCFG and stashing Removing the sysfs stubs for the Tx FIFOCFG and ATTRELI (stashing) config registers, as these registers may only be configured after a MAC reset, with the controller stopped (i.e. during hw init, at probe() time). The current sysfs stubs allow on-the-fly updates of these registers (the locking measures are useless and only add unecessary code). Changing these registers is discouraged. Only the default values will be used instead. Moreover, the stashing (ATTRELI) configuration options were effectively disabled (didn't get to the hw anyway if changed) because the stashing device_flags (HAS_BD_STASHING\|HAS_BUF_STASHING) were "accidentally" cleared during probe(). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 15:03:02 -05:00
Claudiu Manoil	208627883e	gianfar: Cleanup/Fix gfar_probe and the hw init code Factor out gfar_hw_init() to contain all the controller hw initialization steps for a better control of register writes, and to significantly simplify the tangled code from gfar_probe(). This results in code size and stack usage reduction (besides code readability). Fix memory leak on device removal, by freeing the rx_/tx_queue structures. Replace custom bit swapping function with a library one (bitrev8). Move allocation of rx_/tx_queue struct arrays before the group structure init, because in order to assign Rx/Tx queues to groups we need to have the queues first. This also allows earlier bail out of gfar_probe(), in case the memory allocation fails. The flow control checks for maccfg1 were removed from gfar_probe(), since flow control is disabled at probe time (priv->rx_/tx_pause_en are 0). Redundant initializations (by 0) also removed. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 15:03:01 -05:00
Paul Gortmaker	a81ab36bf5	drivers/net: delete non-required instances of include <linux/init.h> None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. This covers everything under drivers/net except for wireless, which has been submitted separately. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 11:53:26 -08:00
Ben Hutchings	ca0c88c289	gianfar: Implement the SIOCGHWTSTAMP ioctl This is untested. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>	2013-11-21 17:17:38 +00:00
Linus Torvalds	42a2d923cc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) The addition of nftables. No longer will we need protocol aware firewall filtering modules, it can all live in userspace. At the core of nftables is a, for lack of a better term, virtual machine that executes byte codes to inspect packet or metadata (arriving interface index, etc.) and make verdict decisions. Besides support for loading packet contents and comparing them, the interpreter supports lookups in various datastructures as fundamental operations. For example sets are supports, and therefore one could create a set of whitelist IP address entries which have ACCEPT verdicts attached to them, and use the appropriate byte codes to do such lookups. Since the interpreted code is composed in userspace, userspace can do things like optimize things before giving it to the kernel. Another major improvement is the capability of atomically updating portions of the ruleset. In the existing netfilter implementation, one has to update the entire rule set in order to make a change and this is very expensive. Userspace tools exist to create nftables rules using existing netfilter rule sets, but both kernel implementations will need to co-exist for quite some time as we transition from the old to the new stuff. Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have worked so hard on this. 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements to our pseudo-random number generator, mostly used for things like UDP port randomization and netfitler, amongst other things. In particular the taus88 generater is updated to taus113, and test cases are added. 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet and Yang Yingliang. 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin Sujir. 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet, Neal Cardwell, and Yuchung Cheng. 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary control message data, much like other socket option attributes. From Francesco Fusco. 7) Allow applications to specify a cap on the rate computed automatically by the kernel for pacing flows, via a new SO_MAX_PACING_RATE socket option. From Eric Dumazet. 8) Make the initial autotuned send buffer sizing in TCP more closely reflect actual needs, from Eric Dumazet. 9) Currently early socket demux only happens for TCP sockets, but we can do it for connected UDP sockets too. Implementation from Shawn Bohrer. 10) Refactor inet socket demux with the goal of improving hash demux performance for listening sockets. With the main goals being able to use RCU lookups on even request sockets, and eliminating the listening lock contention. From Eric Dumazet. 11) The bonding layer has many demuxes in it's fast path, and an RCU conversion was started back in 3.11, several changes here extend the RCU usage to even more locations. From Ding Tianhong and Wang Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav Falico. 12) Allow stackability of segmentation offloads to, in particular, allow segmentation offloading over tunnels. From Eric Dumazet. 13) Significantly improve the handling of secret keys we input into the various hash functions in the inet hashtables, TCP fast open, as well as syncookies. From Hannes Frederic Sowa. The key fundamental operation is "net_get_random_once()" which uses static keys. Hannes even extended this to ipv4/ipv6 fragmentation handling and our generic flow dissector. 14) The generic driver layer takes care now to set the driver data to NULL on device removal, so it's no longer necessary for drivers to explicitly set it to NULL any more. Many drivers have been cleaned up in this way, from Jingoo Han. 15) Add a BPF based packet scheduler classifier, from Daniel Borkmann. 16) Improve CRC32 interfaces and generic SKB checksum iterators so that SCTP's checksumming can more cleanly be handled. Also from Daniel Borkmann. 17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces using the interface MTU value. This helps avoid PMTU attacks, particularly on DNS servers. From Hannes Frederic Sowa. 18) Use generic XPS for transmit queue steering rather than internal (re-)implementation in virtio-net. From Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) random32: add test cases for taus113 implementation random32: upgrade taus88 generator to taus113 from errata paper random32: move rnd_state to linux/random.h random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized random32: add periodic reseeding random32: fix off-by-one in seeding requirement PHY: Add RTL8201CP phy_driver to realtek xtsonic: add missing platform_set_drvdata() in xtsonic_probe() macmace: add missing platform_set_drvdata() in mace_probe() ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe() ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh vlan: Implement vlan_dev_get_egress_qos_mask as an inline. ixgbe: add warning when max_vfs is out of range. igb: Update link modes display in ethtool netfilter: push reasm skb through instead of original frag skbs ip6_output: fragment outgoing reassembled skb properly MAINTAINERS: mv643xx_eth: take over maintainership from Lennart net_sched: tbf: support of 64bit rates ixgbe: deleting dfwd stations out of order can cause null ptr deref ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS ...	2013-11-13 17:40:34 +09:00
Linus Torvalds	10d0c9705e	DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0 R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20 PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN 2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8= =GCbY -----END PGP SIGNATURE----- Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates" * tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits) powerpc: add missing explicit OF includes for ppc dt/irq: add empty of_irq_count for !OF_IRQ dt: disable self-tests for !OF_IRQ of: irq: Fix interrupt-map entry matching MIPS: Netlogic: replace early_init_devtree() call of: Add Panasonic Corporation vendor prefix of: Add Chunghwa Picture Tubes Ltd. vendor prefix of: Add AU Optronics Corporation vendor prefix of/irq: Fix potential buffer overflow of/irq: Fix bug in interrupt parsing refactor. of: set dma_mask to point to coherent_dma_mask of: add vendor prefix for PHYTEC Messtechnik GmbH DT: sort vendor-prefixes.txt of: Add vendor prefix for Cadence of: Add empty for_each_available_child_of_node() macro definition arm/versatile: Fix versatile irq specifications. of/irq: create interrupts-extended property microblaze/pci: Drop PowerPC-ism from irq parsing of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code. of/irq: Use irq_of_parse_and_map() ...	2013-11-12 16:52:17 +09:00
David S. Miller	c3fa32b976	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-23 16:49:34 -04:00
Claudiu Manoil	3ba405db1c	gianfar: Simplify MQ polling to avoid soft lockup Under certain low traffic conditions, the single core devices with multiple Rx/Tx queues (MQ mode) may reach soft lockup due to gfar_poll not returning in proper time. The following exception was obtained using iperf on a 100Mbit half-duplex link, for a p1010 single core device: BUG: soft lockup - CPU#0 stuck for 23s! [iperf:2847] Modules linked in: CPU: 0 PID: 2847 Comm: iperf Not tainted 3.12.0-rc3 #16 task: e8bf8000 ti: eeb16000 task.ti: ee646000 NIP: c0255b6c LR: c0367ae8 CTR: c0461c18 REGS: eeb17e70 TRAP: 0901 Not tainted (3.12.0-rc3) MSR: 00029000 <CE,EE,ME> CR: 44228428 XER: 20000000 GPR00: c0367ad4 eeb17f20 e8bf8000 ee01f4b4 00000008 ffffffff ffffffff 00000000 GPR08: 000000c0 00000008 000000ff ffffffc0 000193fe NIP [c0255b6c] find_next_bit+0xb8/0xc4 LR [c0367ae8] gfar_poll+0xc8/0x1d8 Call Trace: [eeb17f20] [c0367ad4] gfar_poll+0xb4/0x1d8 (unreliable) [eeb17f70] [c0422100] net_rx_action+0xa4/0x158 [eeb17fa0] [c003ec6c] __do_softirq+0xcc/0x17c [eeb17ff0] [c000c28c] call_do_softirq+0x24/0x3c [ee647cc0] [c0004660] do_softirq+0x6c/0x94 [ee647ce0] [c003eb9c] local_bh_enable+0x9c/0xa0 [ee647cf0] [c0454fe8] tcp_prequeue_process+0xa4/0xdc [ee647d10] [c0457e44] tcp_recvmsg+0x498/0x96c [ee647d80] [c047b630] inet_recvmsg+0x40/0x64 [ee647da0] [c040ca8c] sock_recvmsg+0x90/0xc0 [ee647e30] [c040edb8] SyS_recvfrom+0x98/0xfc To prevent this, the outer while() loop has been removed allowing gfar_poll() to return faster even if there's still budget left. Also, there's no need to recompute the budget per Rx queue anymore. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-18 15:54:43 -04:00
Rob Herring	5af5073004	drivers: clean-up prom.h implicit includes Powerpc is a mess of implicit includes by prom.h. Add the necessary explicit includes to drivers in preparation of prom.h cleanup. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org>	2013-10-09 20:04:04 -05:00
Claudiu Manoil	53fad77375	gianfar: Enable eTSEC-20 erratum w/a for P2020 Rev1 Enable workaround for P2020/P2010 erratum eTSEC 20, "Excess delays when transmitting TOE=1 large frames". The impact is that frames lager than 2500-bytes for which TOE (i.e. TCP/IP hw accelerations like Tx csum) is enabled may see excess delay before start of transmission. This erratum was fixed in Rev 2.0. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-09 14:02:02 -04:00
Claudiu Manoil	2969b1f725	gianfar: Use mpc85xx support for errata detection Use the macros and defines from mpc85xx.h to simplify and prevent errors in identifying a mpc85xx based SoC for errata detection. This should help enabling (and identifying) workarounds for various mpc85xx based chips and revisions. For instance, express MPC8548 Rev.2 as: (SVR_SOC_VER(svr) == SVR_8548) && (SVR_REV(svr) == 0x20) instead of: (pvr == 0x80210020 && mod == 0x8030 && rev == 0x0020) Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-09 14:02:02 -04:00
Claudiu Manoil	ad3660c242	gianfar: Enable eTSEC-A002 erratum w/a for all parts A002 is still in "no plans to fix" state, and applies to all the current P1/P2 parts as well, so it's resonable to enable its workaround by default, for all the soc's with etsec. The impact of not enabling this workaround for affected parts is that under certain conditons (runt frames or even frames with RX error detected at PHY level) during controller reset, the controller might fail to indicate Rx reset (GRS) completion. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-09 14:02:02 -04:00
Claudiu Manoil	50ad076ba4	gianfar: Fix reported number of sent bytes to BQL Fix the amount of sent bytes reported to BQL by reporting the number of bytes on wire in the xmit routine, and recording that value for each skb in order to be correctly confirmed on Tx confirmation cleanup. Reporting skb->len to BQL just before exiting xmit is not correct due to possible insertions of TOE block and alignment bytes in the skb->data, which are being stripped off by the controller before transmission on wire. This led to mismatch of (incorrectly) reported bytes to BQL b/w xmit and Tx confirmation, resulting in Tx timeout firing, for the h/w tx timestamping acceleration case. There's no easy way to obtain the number of bytes on wire in the Tx confirmation routine, so skb->cb is used to convey that information from xmit to Tx confirmation, for now (as proposed by Eric). Revived the currently unused GFAR_CB() construct for that purpose. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:14:28 -04:00
Claudiu Manoil	23402bddf9	gianfar: Add flow control support eTSEC has Rx and Tx flow control capabilities that may be enabled through MACCFG1[Rx_Flow, Tx_Flow] bits. These bits must not be set however when eTSEC is operated in Half-Duplex mode. Unfortunately, the driver currently sets these bits unconditionally. This patch adds the proper handling of the PAUSE frame capability register bits by implementing the ethtool -A interface. When pause autoneg is enabled, the controller uses the phy's capability to negotiate PAUSE frame settings with the link partner and reconfigures its Rx_Flow and Tx_Flow settings to match the capabilities of the link partner. If pause autoneg is off, the PAUSE frame generation may be forced manually (ethtool -A). Flow control is disabled by default now. This implementation is inspired by the tg3 driver. Signed-off-by: Lutz Jaenicke <ljaenicke@innominate.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-13 15:28:53 -07:00
Claudiu Manoil	0d0cffdcc6	gianfar: Cleanup TxFCB insertion on xmit Cleanup gfar_start_xmit()'s fast path by factoring out "redundant" FCB insertion code (repeated gfar_add_fcb() calls and related) and by reducing the number of if() clauses (i.e. if(fcb) checks). Improve maintainability (e.g. there's less code and easier to read) also by introducing do_csum and do_vlan to mark the other 2 Tx TOE functionalities, following the same model as do_tstamp. fcb_len may also be 0 now, to mark that Tx FCB insertion conditions (do_csum, do_vlan, do_tstamp) have not been met. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 12:29:15 -07:00
Claudiu Manoil	02d88fb4fb	gianfar: Fix Tx csum generation errata handling Both [eTSEC76] and [eTSEC12] errata relate to Tx checksum generation (for some MPC83xx and MCP8548 older revisions). They require the same workaround: manual checksum computation and insertion, and disabling the H/W Tx csum acceleration feature (per frame) through Tx FCB (Frame Control Block) csum offload settings. The workaround for [eTSEC76] needs to be fixed because it currently fails to disable H/W Tx csum insertion via FCB. This patch fixes it and provides a common workaround implementation for both Tx csum errata. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 12:29:15 -07:00
Claudiu Manoil	84915c6462	gianfar: Remove unused field grp_id from gfar_priv_grp grp->grp_id is obsolete. It has no use in the current driver. Remove it from gfar_priv_grp and put the 'rstat' member in its place, in the 2nd cache line, as rstat needs fast access. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 13:16:23 -07:00
Claudiu Manoil	5eaedf3131	gianfar: Add backwards compatible Single Queue mode polling Older Single Queue (SQ_SG_MODE) devices like TSEC (i.e. mpc83xx) don't feature the frame receive indication bits (RXF) in RSTAT. For these and for the rest of the SQ_SG_MODE devices, provide the appropiate polling routine that handles a single pair of Rx/Tx BD rings, removing the overhead incurred by the multiple queues/ multiple interrupt group devices (veTSEC/ eTSEC2.0 devices). So this is primarily a fix for the TSEC devices. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-12 03:16:20 -07:00
Jingoo Han	8513fbd880	net: ethernet: use platform_{get,set}_drvdata() Use the wrapper functions for getting and setting the driver data using platform_device instead of using dev_{get,set}_drvdata() with &pdev->dev, so we can directly pass a struct platform_device. Also, unnecessary dev_set_drvdata() is removed, because the driver core clears the driver data to NULL after device_release or on probe failure. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-25 21:27:58 -07:00
David S. Miller	e5905c8352	net: Fix some __vlan_hwaccel_put_tag() callers. Several call sites were missed when the protocol argument was added to __vlan_hwaccel_put_tag() in commit `86a9bad3ab` ("net: vlan: add protocol argument to packet tagging functions"). Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-22 19:24:19 -04:00
Patrick McHardy	f646968f8f	net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_* Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:45:26 -04:00
Claudiu Manoil	953d276847	gianfar: Remove superfluous kernel_dropped local counter The GRO_DROP return code is handled by the core network layer. The current kernel approach is to factorize this kind of statistics into the upper layers, instead of having all the drivers maintaining them. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 12:01:34 -04:00
Claudiu Manoil	c6e1160ed6	gianfar: Cleanup dead code and minor formatting Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 12:01:34 -04:00
Claudiu Manoil	39c0a0d5be	gianfar: Remove 'maybe-uninitialized' compile warning Warning message: warning: 'budget_per_q' may be used uninitialized in this function budget_per_q won't be used uninitialized since the only time it doesn't get initialized is when entering gfar_poll with num_act_queues == 0, meaning rstat_rxf == 0, in which case budget_per_q is not utilized (as it has no meaning). Inititalize budget_per_q to 0 though to suppress this compile warning. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 12:01:34 -04:00
Claudiu Manoil	800c644bcd	gianfar: Refactor config coalescing calls for all queues The only place where gfar_configure_coalescing is called with an actual bitmask (other than 0xff) is in gfar_poll (on the hot path). So make gfar_configure_coalescing() static for the buffer processing path, and export gfar_configure_coalescing_all() for the remaining cases that require to set coalescing for all the queues at once (on the slow path). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:21:53 -04:00
Claudiu Manoil	5d9657d83a	gianfar: Remove redundant programming of [rt]xic registers For Multi Q Multi Group (MQ_MG_MODE) mode, the Rx/Tx colescing registers [rt]xic are aliased with the [rt]xic0 registers (coalescing setting regs for Q0). This avoids programming twice in a row the coalescing registers for the Rx/Tx hw Q0. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:21:52 -04:00
Claudiu Manoil	6be5ed3fef	gianfar: Poll only active Rx queues Split the napi budget fairly among the active queues only, instead of dividing it by the total number of Rx queues assigned to the given interrupt group. Use the h/w indication field RXFi in rstat (receive status register) to identify the active rx queues from the current interrupt group (i.e. receive event occured on ring i, if ring i is part of the current interrupt group). This indication field in rstat, RXFi i=0..7, allows us to find out on which queues of the same interrupt group do we have incomming traffic once we entered the polling routine for the given interrupt group. After servicing the ring i, the corresponding bit RXFi will be written with 1 to clear the active queue indication for that ring. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:21:52 -04:00
Claudiu Manoil	c233cf4074	gianfar: Fix tx napi polling There are 2 issues with the current napi poll routine, with regards to tx ring cleanup: 1) for multi-queue devices (MQ_MG_MODE), should tx_bit_map != rx_bit_map, which is possible (and supported in h/w) if the DT property "fsl,tx-bit-map" holds a different value than rx_bit_map, the current polling routine will service the wrong Tx queues in this case (i.e. the interrupt group will receive interrupts from tx queues that it will not service) 2) Tx cleanup completion consumes napi budget, whereas the napi budget should be reserved for Rx work only. The patch fixes these issues and provides a clean napi polling routine. Napi poll completion is reached when all the Rx queues have been serviced and there is no Tx work to do. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:21:52 -04:00
Joe Perches	d0320f7500	drivers:net: Remove dma_alloc_coherent OOM messages I believe these error messages are already logged on allocation failure by warn_alloc_failed and so get a dump_stack on OOM. Remove the unnecessary additional error logging. Around these deletions: o Alignment neatening. o Remove unnecessary casts of dma_alloc_coherent. o Hoist assigns from ifs. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-15 08:56:58 -04:00

1 2

95 Commits