linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-30 07:26:46 +07:00

Author	SHA1	Message	Date
Yuchung Cheng	0f1c28ae74	tcp: usec resolution SYN/ACK RTT Currently SYN/ACK RTT is measured in jiffies. For LAN the SYN/ACK RTT is often measured as 0ms or sometimes 1ms, which would affect RTT estimation and min RTT samping used by some congestion control. This patch improves SYN/ACK RTT to be usec resolution if platform supports it. While the timestamping of SYN/ACK is done in request sock, the RTT measurement is carefully arranged to avoid storing another u64 timestamp in tcp_sock. For regular handshake w/o SYNACK retransmission, the RTT is sampled right after the child socket is created and right before the request sock is released (tcp_check_req() in tcp_minisocks.c) For Fast Open the child socket is already created when SYN/ACK was sent, the RTT is sampled in tcp_rcv_state_process() after processing the final ACK an right before the request socket is released. If the SYN/ACK was retransmistted or SYN-cookie was used, we rely on TCP timestamps to measure the RTT. The sample is taken at the same place in tcp_rcv_state_process() after the timestamp values are validated in tcp_validate_incoming(). Note that we do not store TS echo value in request_sock for SYN-cookies, because the value is already stored in tp->rx_opt used by tcp_ack_update_rtt(). One side benefit is that the RTT measurement now happens before initializing congestion control (of the passive side). Therefore the congestion control can use the SYN/ACK RTT. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:19:01 -07:00
David S. Miller	21fe8af400	Merge branch 's390-next' Ursula Braun says: ==================== s390: qeth and iucv patches here is version 2 of some s390 related qeth patches for net-next. The patch by Thomas Richter adds a new feature to the qeth layer2 code; the remaining patches are minor improvements. Version 2 of patch 4 uses the desired indentation in function declarations and definitions spanning multiple lines in almost all cases. Thomas run into a conflict with the maximum number of columns once. Thus you will still see one function definition using an earlier column before the opening paranthesis. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:03:05 -07:00
Ursula Braun	91e60eb60b	s390/iucv: do not use arrays as argument The iucv code uses arrays as arguments. Even though this does not really cause a problem, it could be misleading, since the compiler turns array arguments into just a pointer argument. To be more precise this patch changes the array arguments into pointers. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:03:04 -07:00
Thomas Richter	4d7def2a12	qeth: add layer 2 RX/TX checksum offloading Checksum offloading for send and receive is already supported for layer 3 (IP layer). This patch adds support for RX and TX hardware checksum offloading for layer 2 (MAC layer). The hardware calculates the checksum for IP UDP and TCP packets. This patch moves the hardware checksum offloading setup to the set of common functions in qeth_core_main.c. Layer 2 and layer 3 now simply call the same common functions. Also note that TX checksum offloading is always enabled. The device driver relies on the TCP/IP stack to make use of this feature. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Reviewed-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:03:04 -07:00
Ursula Braun	239ff408dd	qeth: move OSA portname into deprecated status An OSA-Express port name was required to identify a shared OSA port. All operating system instances that shared the port had to use the same port name. This requirement no longer applies. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:03:04 -07:00
Lakhvich Dmitriy	248046ba07	qeth: no write permission for readonly sysattr User is not allowed to write into bridge_state sysfs file. Fixed attribute not mislead the user Signed-off-by: Lakhvich Dmitriy <ldmitriy@ru.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Reported-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Reviewed-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Reviewed-by: Thomas Richter <tmricht@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:03:04 -07:00
Eugene Crosser	9846e70b9b	qeth: remove extraneous length from %pM format Length specifier in the %pM format is not supported (at least, not documented). Remove it, and also an extraneous '&' for the array. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:03:03 -07:00
David S. Miller	5dcd246107	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-09-18 Here's the first bluetooth-next pull request for the 4.4 kernel: - ieee802154 cleanups & fixes - debugfs support for the at86rf230 driver - Support for quirky (seemingly counterfeit) CSR Bluetooth controllers - Power management and device config improvements for Intel controllers - Fix for devices with incorrect advertising data length - Fix for closing HCI user channel socket Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:00:44 -07:00
David S. Miller	a1ef48e1e8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-09-17 This series contains updates to i40e and i40evf. Shannon provides updates to i40e and i40evf to resolve an issue with the nvmupdate utility. First renames a variable name to reduce confusion and to differentiate it from the actual user variable. Then added the ability to save the admin queue write back descriptor if a caller supplies a buffer for it to be saved into. Added a new GetStatus command so that the NVM update tool can query the current status instead of doing fake write requests to probe for readiness. Added wait states to the NVM update state machine to signify when waiting for an update operation to finish, whether we are in the middle of a set of write operations, or we are now idle but waiting. Then added a facility to run admin queue commands through the NVM update utility in order to allow the update tools to interact with the firmware and do special commands needed for updates and configuration changes. Also added a facility to recover the result of a previously run admin queue command. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 22:26:58 -07:00
David S. Miller	eaf9a992b7	linux-can-next-for-4.4-20150917 -----BEGIN PGP SIGNATURE----- iQEcBAABCgAGBQJV+ysPAAoJEP5prqPJtc/HP4kH/2+WqOYDXkObGQ/dsurcDu1W kRFNBZHM6m8egD1kqRs8+ipzjrkc7mCN3LUGKOMReXptUlBjS6UOno7ZRNS5dfkt vDgCaGwCQbGOXlDbPv6OujWRL6MLL4sl1CKhB653ubFJQ75TTw9kPlpL9LdQ0J9R gBgxT1+hIwbEZeJagByF7EPe8qiQ12f6IOiFXKtsPbLsXTSHxoh8HJxBT6BxPxvF W3xKVb1YzXEx9ZsmY2uNJCRmDt6PWovwMA34vVBk0mKTFG7VtKplO3o/XiFnKZT/ M0iQNJ0p2sWnchfiHtBMG4KWBsWmd/Bk1tW3fRBTIJz+bRIDYMxIVsT3Gad+fRg= =JEjX -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-4.4-20150917' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2015-09-17 this is a pull request of two patches for net-next/master. Gerhard Bertelsmann adds support for the CAN controller found on the Allwinner A10/A20 SoC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:58:23 -07:00
Ksenija Stanojevic	22a3f9a204	rxrpc: Replace get_seconds with ktime_get_seconds Replace time_t type and get_seconds function which are not y2038 safe on 32-bit systems. Function ktime_get_seconds use monotonic instead of real time and therefore will not cause overflow. Signed-off-by: Ksenija Stanojevic <ksenija.stanojevic@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:53:56 -07:00
David S. Miller	62e3b1d01c	Merge branch 'hsilicon-net-subsys' huangdaode says: ==================== net: Hisilicon Network Subsystem support This is V2 of Hisilicon Network Subsystem(HNS) patchesets taking care about LKML comments. Please find out the changes from the change logs. This patchset is rebased on mainline kernel Linux 4.3-rc1 branch. [PATCH v2 1/5] Device Tree Binding Documentation [PATCH v2 2/5] Merge MDIO Module [PATCH v2 3/5] Hisilicon Network Acceleration Engine Framework [PATCH v2 4/5] Distributed System Area Fabric Module [PATCH v2 5/5] Basic Ethernet Driver Module Changes from V1: 1. Remove "inline" in C file (according to LKML comment, same in below). 2. Fix a bug about class_find_device. 3. Change the DTS pattern on hnae, restruct it to compatible with Hi1610 soc. 4. Unified hip04_mdio and hip05_mdio into hns_mdio, which is more usaul for later SOCs. V1 Patches Reference: https://lkml.org/lkml/2015/8/14/165 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:42:58 -07:00
huangdaode	b5996f11ea	net: add Hisilicon Network Subsystem basic ethernet support This is to add basic ethernet support for HNS. It is one of the way to use the HNS acceleration engine. But most of the decoding/encoding capability of the AE cannot be used in this way. This submit contains the basic feature as a ethernet driver. More will be added later. Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: Kenneth Lee <liguozhu@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:42:58 -07:00
huangdaode	511e6bc071	net: add Hisilicon Network Subsystem DSAF support DSAF, namely Distributed System Area Fabric, is one of the HNS acceleration engine implementation. This patch add DSAF driver to the system. hns_ae_adapt: the adaptor for registering the driver to HNAE framework hns_dsaf_mac: MAC cover interface for GE and XGE hns_dsaf_gmac: GE (10/100/1000G Ethernet) MAC function hns_dsaf_xgmac: XGE (10000+G Ethernet) MAC function hns_dsaf_main: the platform device driver for the whole hardware hns_dsaf_misc: some misc helper function, such as LED support hns_dsaf_ppe: packet process engine function hns_dsaf_rcb: ring buffer function Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: Kenneth Lee <liguozhu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:42:58 -07:00
huangdaode	6fe6611ff2	net: add Hisilicon Network Subsystem hnae framework support HNAE (Hisilicon Network Acceleration Engine) is a framework to provide a unified ring buffer interface for Hisilicon Network Acceleration Engines. With the interface, upper layer can work as ethernet driver, ODP driver or other service driver on purpose. Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: Kenneth Lee <liguozhu@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:42:57 -07:00
huangdaode	5b904d3940	net: add Hisilicon Network Subsystem MDIO support The MDIO support for Hisilicon Network Subsystem. It is used in Hislicon hip04, hip05 and Hi1610 SoC to control the external PHY Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: Kenneth Lee <liguozhu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:42:57 -07:00
huangdaode	fc7e37c6b2	net: add Hisilicon Network Subsystem support (config and documents) The Hisilicon Network Subsystem is a long term evolution IP which is supposed to be used in Hisilicon ICT SoC. The IP, which is called hns for short, is a TCP/IP acceleration engine, which can directly decode TCP/IP stream and distribute them to different ring buffers. HNS can be configured to work on different mode for different scenario. This patch make use only some of the mode to make it as standard ethernet NIC. The other mode will be added soon. The whole function has 4 kernel sub-modules: hnae: the HNS acceleration engine framework. It provides a abstract interface between the engine and the upper layers which make use of the engine by ring buffer. hns_enet_drv: a standard ethernet driver that base on the ring buffer. hns_dsaf: one of the implementation of HNS acceleration engine, which is applied on Hililicon hip05, Hi1610 and other later-on SoCs hns_mdio: the mdio control to the PHY, used by acceleration engine This submit add basic config and documents Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: Kenneth Lee <liguozhu@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:42:57 -07:00
chas williams	812494d9a0	xen-netfront: always set num queues if possible If netfront connects with two (or more) queues and then reconnects with only one queue it fails to delete or rewrite the multi-queue-num-queues key and netback will try to use the wrong number of queues. Always write the num-queues field if the backend has multi-queue support. Signed-off-by: Chas Williams <3chas3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:39:21 -07:00
Szymon Janc	6818375e97	Bluetooth: Fix reporting incorrect EIR in device found mgmt event Some remote devices (ie Gigaset G-Tag) misbehave with ADV data length. This can lead to incorrect EIR format in device found event when ADV_DATA and SCAN_RSP are merged (terminator field before SCAN_RSP part). Fix this by inspecting ADV_DATA and correct its length if terminator is found. > HCI Event: LE Meta Event (0x3e) plen 42 [hci0] 32.172182 LE Advertising Report (0x02) Num reports: 1 Event type: Connectable undirected - ADV_IND (0x00) Address type: Public (0x00) Address: 7C:2F:80:94:97:5A (Gigaset Communications GmbH) Data length: 30 Flags: 0x06 LE General Discoverable Mode BR/EDR Not Supported Company: Gigaset Communications GmbH (384) Data: 021512348094975abbc5 16-bit Service UUIDs (partial): 1 entry Battery Service (0x180f) RSSI: -65 dBm (0xbf) > HCI Event: LE Meta Event (0x3e) plen 27 [hci0] 32.172191 LE Advertising Report (0x02) Num reports: 1 Event type: Scan response - SCAN_RSP (0x04) Address type: Public (0x00) Address: 7C:2F:80:94:97:5A (Gigaset Communications GmbH) Data length: 15 Name (complete): Gigaset G-tag RSSI: -59 dBm (0xc5) Note "Data length: 30" in ADV_DATA which results in 9 extra zero bytes after Battery Service UUID. Terminator field present in the middle of EIR in Device Found event resulted in userspace stop parsing EIR and skipping device name. @ Device Found: 7C:2F:80:94:97:5A (1) rssi -59 flags 0x0000 02 01 06 0d ff 80 01 02 15 12 34 80 94 97 5a bb ..........4...Z. c5 03 02 0f 18 00 00 00 00 00 00 00 00 00 0e 09 ................ 47 69 67 61 73 65 74 20 47 2d 74 61 67 Gigaset G-tag With this fix EIR with merged ADV_DATA and SCAN_RSP in device found event is properly formatted: @ Device Found: 7C:2F:80:94:97:5A (1) rssi -59 flags 0x0000 02 01 06 0d ff 80 01 02 15 12 34 80 94 97 5a bb ..........4...Z. c5 03 02 0f 18 0e 09 47 69 67 61 73 65 74 20 47 .......Gigaset G 2d 74 61 67 -tag Signed-off-by: Szymon Janc <ext.szymon.janc@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-18 09:53:20 +02:00
Szymon Janc	e781b7f7fc	Bluetooth: Add BT_ERR_RATELIMITED This patch adds ratelimited version of the BT_ERR macro. Signed-off-by: Szymon Janc <ext.szymon.janc@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-18 09:53:19 +02:00
Eric Dumazet	47bbbb30b4	sch_dsmark: improve memory locality Memory placement in sch_dsmark is silly : Better place mask/value in the same cache line. Also, we can embed small arrays in the first cache line and remove a potential cache miss. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:37:19 -07:00
David S. Miller	25354001d0	Merge branch 'bcmgenet-irq-coalesce' Florian Fainelli says: ==================== net: bcmgenet: Interrupt coalescing This patch series adds support for interrupt coalescing for GENET adapters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:17:14 -07:00
Florian Fainelli	4a29645bfe	net: bcmgenet: Implement RX coalescing control knobs Add support for the ethtool rx-frames coalescing parameter which allows defining the number of RX interrupts per frames received. The RDMA engine supports a configurable timeout with a resolution of approximately 8.192 us. We can no longer enable the BDONE/PDONE interrupts as those would fire for each packet/buffer received, which would defeat the MBDONE interrupt purpose. The MBDONE interrupt is guaranteed to correspond to a PDONE/BDONE interrupt when the threshold is set to 1. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:17:14 -07:00
Florian Fainelli	2f9130709d	net: bcmgenet: Implement TX coalescing control knobs Configuring the ethtool tx-frames property, which translates into N packets before a TX interrupt is the simplest configuration scheme because it requires no locking neither at the softare nor hardware level, and is completely indepedent from the link speed. Since ethtool does not allow per-tx queue coalescing parameters, we apply the same setting to any transmit queue. We can no longer enable the BDONE/PDONE interrupts as those would fire for each packet/buffer received, which would defeat the MBDONE interrupt purpose. The MBDONE interrupt is guaranteed to correspond to a PDONE/BDONE interrupt when the threshold is set to 1, but offers interrupt coalescing when the value is > 1. Since the HW is configured to generate an interrupt when the ring becomes emtpy, we have to deny any timeout/timer settings coming from user-space to indicate we can only generate an interrupt very <N> packets. While we are at it, fix the DMA_INTR_THRESHOLD_MASK value which was off by one bit (0xff vs. 0x1ff). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:17:14 -07:00
Woojung.Huh@microchip.com	9110fe4a17	lan78xx: Remove not defined MAC_CR_GMII_EN_ bit from MAC_CR. Remove not defined MAC_CR_GMII_EN_ bit from MAC_CR. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:15:37 -07:00
Woojung.Huh@microchip.com	758c5c1174	lan78xx: Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control. Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:15:37 -07:00
Woojung.Huh@microchip.com	bdfba55e0d	lan78xx: Remove phy defines in lan78xx.h and use defines in include/linux/microchipphy.h Remove phy defines in lan78xx.h and use defines in include/linux/microchipphy.h. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:15:37 -07:00
Woojung.Huh@microchip.com	ce85e13ad6	lan78xx: Update to use phylib instead of mii_if_info. Update to use phylib instead of mii_if_info. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:15:36 -07:00
Woojung.Huh@microchip.com	05fe68c008	lan78xx: Add PHYLIB and MICROCHIP_PHY as default config. Add PHYLIB and MICROCHIP_PHY as default configuration for lan78xx. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:15:36 -07:00
Woojung.Huh@microchip.com	6c595b03b1	lan78xx: Check device ready bit (PMT_CTL_READY_) after reset the PHY Check device ready bit (PMT_CTL_READY_) after reset the PHY. Device may not be ready even if PHY_RST_ is cleared depends on configuration. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:15:36 -07:00
David Ahern	bde6f9ded1	net: Initialize table in fib result Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard: [ 0.877040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056 [ 0.877597] IP: [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00 [ 0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0 [ 0.877597] Oops: 0000 [#1] SMP [ 0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio [ 0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1 [ 0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 0.877597] task: ffff88003fab0bc0 ti: ffff88003faa8000 task.ti: ffff88003faa8000 [ 0.877597] RIP: 0010:[<ffffffff8155b5e2>] [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00 [ 0.877597] RSP: 0018:ffff88003ed03ba0 EFLAGS: 00010202 [ 0.877597] RAX: 0000000000000046 RBX: 00000000ffffff8f RCX: 0000000000000020 [ 0.877597] RDX: ffff88003fab50b8 RSI: 0000000000000200 RDI: ffffffff8152b4b8 [ 0.877597] RBP: ffff88003ed03c50 R08: 0000000000000000 R09: 0000000000000000 [ 0.877597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fab6f00 [ 0.877597] R13: ffff88003fab5000 R14: 0000000000000000 R15: ffffffff81cb5600 [ 0.877597] FS: 00007f6de5751700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000 [ 0.877597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.877597] CR2: 0000000000000056 CR3: 000000003fa6d000 CR4: 00000000000006e0 [ 0.877597] Stack: [ 0.877597] 0000000000000000 0000000000000046 ffff88003fffa600 ffff88003ed03be0 [ 0.877597] ffff88003f9e2c00 697da8c0017da8c0 ffff880000000000 000000000007fd00 [ 0.877597] 0000000000000000 0000000000000046 0000000000000000 0000000400000000 [ 0.877597] Call Trace: [ 0.877597] <IRQ> [ 0.877597] [<ffffffff812bfa1f>] ? cpumask_next_and+0x2f/0x40 [ 0.877597] [<ffffffff8158e13c>] arp_process+0x39c/0x690 [ 0.877597] [<ffffffff8158e57e>] arp_rcv+0x13e/0x170 [ 0.877597] [<ffffffff8151feec>] __netif_receive_skb_core+0x60c/0xa00 [ 0.877597] [<ffffffff81515795>] ? __build_skb+0x25/0x100 [ 0.877597] [<ffffffff81515795>] ? __build_skb+0x25/0x100 [ 0.877597] [<ffffffff81521ff6>] __netif_receive_skb+0x16/0x70 [ 0.877597] [<ffffffff81522078>] netif_receive_skb_internal+0x28/0x90 [ 0.877597] [<ffffffff8152288f>] napi_gro_receive+0x7f/0xd0 [ 0.877597] [<ffffffffa0017906>] virtnet_receive+0x256/0x910 [virtio_net] [ 0.877597] [<ffffffffa0017fd8>] virtnet_poll+0x18/0x80 [virtio_net] [ 0.877597] [<ffffffff815234cd>] net_rx_action+0x1dd/0x2f0 [ 0.877597] [<ffffffff81053228>] __do_softirq+0x98/0x260 [ 0.877597] [<ffffffff8164969c>] do_softirq_own_stack+0x1c/0x30 The root cause is use of res.table uninitialized. Thanks to Nikolay for noticing the uninitialized use amongst the maze of gotos. As Nikolay pointed out the second initialization is not required to fix the oops, but rather to fix a related problem where a valid lookup should be invalidated before creating the rth entry. Fixes: `b7503e0cdb` ("net: Add FIB table id to rtable") Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Reported-by: Richard Alpe <richard.alpe@ericsson.com> Reported-by: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:34:08 -07:00
David S. Miller	41a9802fd8	Merge branch 'bpf_avoid_clone' Alexei Starovoitov says: ==================== bpf: performance improvements v1->v2: dropped redundant iff_up check in patch 2 At plumbers we discussed different options on how to get rid of skb_clone from bpf_clone_redirect(), the patch 2 implements the best option. Patch 1 adds 'integrated exts' to cls_bpf to improve performance by combining simple actions into bpf classifier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:09:07 -07:00
Alexei Starovoitov	27b29f6305	bpf: add bpf_redirect() helper Existing bpf_clone_redirect() helper clones skb before redirecting it to RX or TX of destination netdev. Introduce bpf_redirect() helper that does that without cloning. Benchmarked with two hosts using 10G ixgbe NICs. One host is doing line rate pktgen. Another host is configured as: $ tc qdisc add dev $dev ingress $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop so it receives the packet on $dev and immediately xmits it on $dev + 1 The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program that does bpf_clone_redirect() and performance is 2.0 Mpps $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section redirect_xmit drop which is using bpf_redirect() - 2.4 Mpps and using cls_bpf with integrated actions as: $ tc filter add dev $dev root pref 10 \ bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1 performance is 2.5 Mpps To summarize: u32+act_bpf using clone_redirect - 2.0 Mpps u32+act_bpf using redirect - 2.4 Mpps cls_bpf using redirect - 2.5 Mpps For comparison linux bridge in this setup is doing 2.1 Mpps and ixgbe rx + drop in ip_rcv - 7.8 Mpps Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:09:07 -07:00
Daniel Borkmann	045efa82ff	cls_bpf: introduce integrated actions Often cls_bpf classifier is used with single action drop attached. Optimize this use case and let cls_bpf return both classid and action. For backwards compatibility reasons enable this feature under TCA_BPF_FLAG_ACT_DIRECT flag. Then more interesting programs like the following are easier to write: int cls_bpf_prog(struct __sk_buff skb) { / classify arp, ip, ipv6 into different traffic classes * and drop all other packets */ switch (skb->protocol) { case htons(ETH_P_ARP): skb->tc_classid = 1; break; case htons(ETH_P_IP): skb->tc_classid = 2; break; case htons(ETH_P_IPV6): skb->tc_classid = 3; break; default: return TC_ACT_SHOT; } return TC_ACT_OK; } Joint work with Daniel Borkmann. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:09:06 -07:00
Junwei Zhang	f6c53334d6	net: only check perm protocol when register proto The permanent protocol nodes are at the head of the list, So only need check all these nodes. No matter the new node is permanent or not, insert the new node after the last permanent protocol node, If the new node conflicts with existing permanent node, return error. Signed-off-by: Martin Zhang <martinbj2008@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:02:59 -07:00
Eric Dumazet	4b1b865e4e	bonding: use l4 hash if available If skb carries a l4 hash, no need to perform a flow dissection. Performance is slightly better : lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.39012e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.39393e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.39988e+06 After patch : lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.43579e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.44304e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.44312e+06 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:01:05 -07:00
Eric Dumazet	58d607d3e5	tcp: provide skb->hash to synack packets In commit `b73c3d0e4f` ("net: Save TX flow hash in sock and set in skbuf on xmit"), Tom provided a l4 hash to most outgoing TCP packets. We'd like to provide one as well for SYNACK packets, so that all packets of a given flow share same txhash, to later enable bonding driver to also use skb->hash to perform slave selection. Note that a SYNACK retransmit shuffles the tx hash, as Tom did in commit `265f94ff54` ("net: Recompute sk_txhash on negative routing advice") for established sockets. This has nice effect making TCP flows resilient to some kind of black holes, even at connection establish phase. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Mahesh Bandewar <maheshb@google.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:01:04 -07:00
Catherine Sullivan	f91638af0e	i40e/i40evf: Bump i40e to 1.3.21 and i40evf to 1.3.13 Bump. Change-ID: If7ce84218361defa209142d1d8c6f69d48c2d7ad Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-17 17:54:32 -07:00
Shannon Nelson	b72dc7b193	i40e/i40evf: add get AQ result command to nvmupdate utility Add a facility to recover the result of a previously run AQ command. Change-ID: I21afec2c20c1a5e6ba60c7fbfcbedfff78c10e45 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-17 17:52:07 -07:00
Shannon Nelson	e4c83c20f8	i40e/i40evf: add exec_aq command to nvmupdate utility Add a facility to run AQ commands through the nvmupdate utility in order to allow the update tools to interact with the FW and do special commands needed for updates and configuration changes. Change-ID: I5c41523e4055b37f8e4ee479f7a0574368f4a588 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-17 17:49:42 -07:00
Shannon Nelson	2f1b5bc844	i40e/i40evf: add wait states to NVM state machine This adds wait states to the NVM update state machine to signify when waiting for an update operation to finish, whether we're in the middle of a set of Write operations, or we're now idle but waiting. Change-ID: Iabe91d6579ef6a2ea560647e374035656211ab43 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-17 17:47:16 -07:00
Shannon Nelson	0af8e9db2c	i40e/i40evf: add GetStatus command for nvmupdate This adds a new GetStatus command so that the NVM update tool can query the current status instead of doing fake write requests to probe for readiness. Change-ID: I671ec6ccd4dfc9dbac3a03b964589d693fda5cd8 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-17 17:44:50 -07:00
Shannon Nelson	6b5c1b89c3	i40e/i40evf: add handling of writeback descriptor If the writeback descriptor buffer was previously created, this gives it to the AQ command request to be used to save the results. Change-ID: I8c8a1af81e6ebed6d0a15ed31697fe1a6c4e3708 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-17 17:42:27 -07:00
Shannon Nelson	87db27a9e2	i40e/i40evf: save aq writeback for future inspection Add the ability to save the AdminQ write back descriptor if a caller supplies a buffer for it to be saved into. Change-ID: I3d1301d26360b39a2d66dc8569e851f54133a3af Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-17 17:40:00 -07:00
Shannon Nelson	79afe839ab	i40e: rename variable to prevent clash of understanding This code returns something that becomes the errno value from ethtool and passes around a pointer to an errno variable. This patch changes the name slightly to differentiate it from the actual user errno variable. Change-ID: Idaa37845c069e66f4cea072e90f471bb2142454d Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-17 17:36:45 -07:00
David S. Miller	bbe8373138	Merge branch 'nf_hook_netns' Eric W. Biederman says: ==================== Passing net through the netfilter hooks My primary goal with this patchset and it's follow ups is to cleanup the network routing paths so that we do not look at the output device to derive the network namespace. My plan is to pass the network namespace of the transmitting socket through the output path, to replace code that looks at the output network device today. Once that is done we can have routes with output devices outside of the current network namespace. Which should allow reception and transmission of packets in network namespaces to be as fast as normal packet reception and transmission with early demux disabled, because it will same code path. Once skb_dst(skb)->dev is a little better under control I think it will also be possible to use rcu to cleanup the ancient hack that sets dst->dev to loopback_dev when a network device is removed. The work to get there is a series of code cleanups. I am starting with passing net into the netfilter hooks and into the functions that are called after the netfilter hooks. This removes from netfilter the need to guess which network namespace it is working on. To get there I perform a series of minor prep patches so the big changes at the end are possible to audit without getting lost in the noise. In particular I have a lot of patches computing net into a local variable and then using it through out the function. So this patchset encompases removing dead code, sorting out the _sk functions that were added last time someone pushed a prototype change through the post netfilter functions. Cleaning up individual functions use of the network namespace. Passing net into the netfilter hooks. Passing net into the post netfilter functions. Using state->net in the netfilter code where it is available and trivially usable. Pablo, Dave I don't know whose tree this makes more sense to go through. I am assuming at least initially Pablos as netfilter is involved. From what I have seen there will be a lot of back and forth between the netfilter code paths and the routing code paths. The patches are also available (against 4.3-rc1) at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next.git master ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:38 -07:00
Eric W. Biederman	be10de0a32	netfilter: Add blank lines in callers of netfilter hooks In code review it was noticed that I had failed to add some blank lines in places where they are customarily used. Taking a second look at the code I have to agree blank lines would be nice so I have added them here. Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	0c4b51f005	netfilter: Pass net into okfn This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	9dff2c966a	netfilter: Use nf_hook_state.net Instead of saying "net = dev_net(state->in?state->in:state->out)" just say "state->net". As that information is now availabe, much less confusing and much less error prone. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	29a26a5680	netfilter: Pass struct net into the netfilter hooks Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc() sock_net(sk) ipv6/ip6_output.c:ip6_xmit() sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00

1 2 3 4 5 ...

546358 Commits