Commit Graph

546310 Commits

Author SHA1 Message Date
Johannes Berg
1939089300 iwlwifi: mvm: remove PHY RX from handlers
Treat PHY RX specially, since it's actually pretty frequent,
doesn't need all the notication etc. code, and will have a
different handler in future hardware.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 14:33:26 +03:00
Assaf Krauss
5ac15be8fa iwlwifi: mvm: Improve debugfs tof robustness
Return a proper error when wrong parameters are passed to debugfs
tof_range_request.

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 14:33:25 +03:00
Gregory Greenman
9ef2b8befb iwlwifi: mvm: ToF - fill bssid of responder configuration
The command needs to have the AP interfaces BSSID (which corresponds
to its address).

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 14:33:25 +03:00
Assaf Krauss
3e0fa50575 iwlwifi: mvm: Fix tof debugfs formats (dec vs. hex)
Make some input formats more natural, e.g. bandwidth and periods
are more natural in decimal than in hexadecimal.

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 14:33:24 +03:00
Nicolas Iooss
bb74360937 iwlwifi: mvm: fix tof.h header guard
Commit ce7929186a ("iwlwifi: mvm: add basic Time of Flight (802.11mc
FTM) support") created drivers/net/wireless/iwlwifi/mvm/tof.h with a
broken header guard:

    #ifndef __tof
    #define __tof_h__

    ...

    #endif /* __tof_h__ */

Use __tof_h__ in the first line.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 12:29:13 +03:00
Ilan Peer
f82c83397b iwlwifi: mvm: Correctly update MAC context on add/del station
Commit "iwlwifi: mvm: don't ask beacons when AP vif and no
assoc sta" directly called iwl_mvm_mac_ctxt_cmd_ap() to update the
MAC context when adding/removing a station. However, this ignores
the case that the vif is actually a P2P GO.

Fix this by calling iwl_mvm_mac_ctxt_changed() that handles P2P GO
case as well.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 12:29:09 +03:00
Liad Kaufman
69191afef3 iwlwifi: mvm: fix default disabled aggs in sta
For the ADD_STA command, when the flag for aggregation
disabling is set, there is a bitmap indicated what TIDs
are disabling aggregations and what aren't. Currently, by
default, all TIDs allow for aggregations since the value
we begin with is 0.

Change this default value to 0xffff so all TIDs don't
allow aggregations until explicitly turned on.

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 12:29:04 +03:00
Johannes Berg
acf9de3dfb iwlwifi: enable tracing by default
Tracing, if disabled at runtime, has very low overhead with
great returns on debugging. It therefore makes sense to have
it enabled by default (if the kernel enables EVENT_TRACING).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 12:29:00 +03:00
Johannes Berg
7b9aebf0e5 MAINTAINERS: iwlwifi: update contact email
The ilw@linux.intel.com address is being phased out, replace
it with the new address.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 12:28:56 +03:00
Johannes Berg
a644912651 iwlwifi: mvm: correct skip-over-DTIM implementation
The formula used in D0i3 should also be used in D3, instead of
the hardcoded value.

Additionally, the formula is actually wrong - if the calculation
yields 0 then 1 should be used instead of disabling entirely.
Also need to add 1 since the firmware needs 3 to skip 2, etc.

To make all this clearer, centralize the calculation into a
single function.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 12:28:50 +03:00
Moshe Harel
bb35dc1418 iwlwifi: nvm: force 1x1 antenna in Series 8000
This is a workaround to an OTP bug. In Series 8000 1x1, the OTP
0xA052 defines 2x2 antenna configuration. This workaround overrides
the decision based on HW id and MIMO disabled bit which is correct
in the OTP and set to disabled.

Signed-off-by: Moshe Harel <moshe.harel@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 12:28:44 +03:00
Aviya Erenfeld
09eef3307e iwlwifi: mvm: move DTS command and notification to new group
Move the DTS measurement command and notification from short
command header to the new PHY command group for firmware
supporting the extended command headers.

Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-10-05 12:28:39 +03:00
Kalle Valo
8f6c5b079c * some debugfs improvements;
* fix signedness in beacon statistics;
 * deinline some functions to reduce size when device tracing is enabled;
 * filter beacons out in AP mode when no stations are associated;
 * deprecate firmwares version -12;
 * fix a runtime PM vs. legacy suspend race;
 * one-liner fix for a ToF bug;
 * clean-ups in the rx code;
 * small debugging improvement;
 * fix WoWLAN with new firmware versions;
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWAGfdAAoJEKFHnKIaPMX6K+cP/1QSMKT7GULl6qZ6cNx3kb//
 exoxdXVp/10YTPvNWrvO1Pv73I75wpVrtn11yzizAel/6hARkeC/z5c+C57u+7Cl
 +tKtrGtrMd7GHIQU5/DZC0zSD70HYOdj5EfY58lNZvLItw4XaEUZCWBwTGkIUDpe
 bd+AcUN83m5PtP8cq73LbLUqnhMAQUy3nZMiqVYsxGQR6hwgDOiaHaaP2qmJAx/Q
 fASQe4xwOOyOTGn5r37AVLfk96URMMmKJBi4fgRCP2aw5+5MiI3x/lj6ctwr20+D
 2oGUyKAUrYtf/e8wvauuVxsC8Zp/n/pO4gaJOXlxrWH0TdZ+NNr3i1vvvaIA3p+6
 eh5KmC4htEDr7IR1CE7bzwj97HHSCLQFCGAQhtINIFiqmI3Ii1S0Gq51JhSpc+RR
 D+2bNuv/5Ca4hJgFBTTwqCnwaMx/K7pcNOEE4Z8M5sSLcNLquRu2OoWfqj5p5IfC
 rzmFNhLIxl4YzoyKzr2CeUqR1yEC9WmB9yitCEPJqxI0lm7UgYAiKwfxzO372B2a
 aPM3FSihlLJBICEUvNwFbImoQ1EDD2DYObtMw9ulHqw+h00kpkO3YuLnHXmDTePg
 WzT/yYfQZ+IJJwms+EAsmx2iOb33aeby1d+iNJ6M6Lnuz1xkxTT1yn2gq3/Z+Q0h
 i6IydQ1o+9mm1trfE0Uf
 =XkqB
 -----END PGP SIGNATURE-----

Merge tag 'iwlwifi-next-for-kalle-2015-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next

* some debugfs improvements;
* fix signedness in beacon statistics;
* deinline some functions to reduce size when device tracing is enabled;
* filter beacons out in AP mode when no stations are associated;
* deprecate firmwares version -12;
* fix a runtime PM vs. legacy suspend race;
* one-liner fix for a ToF bug;
* clean-ups in the rx code;
* small debugging improvement;
* fix WoWLAN with new firmware versions;
2015-09-26 21:25:18 +03:00
Eliad Peller
7c014e35a0 iwlwifi: mvm: add debug print for d0i3 exit indication
In order to verify d0i3 flow, add debug print to indicate
d0i3 exit was completed (right after tx was re-enabled),
along with the wakeup reasons.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:46 +03:00
Eliad Peller
183edd8484 iwlwifi: mvm: configure wowlan configuration only if connected
Recent fw version added assert to make sure wowlan configuration
is configured only when a station is connected.

Change the driver behavior to pass this configuration only
if we indeed have ap station id (i.e. connected).

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:45 +03:00
Johannes Berg
ee6dbb2937 iwlwifi: mvm: move RX API into its own file
The RX API is currently mixed up into the general fw-api.h
file, but we're going to need to extend it significantly in
the future, so move it to its own file.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:45 +03:00
Johannes Berg
2df5328e78 iwlwifi: mvm: remove some unused defines from RX API
Remove some unused values from the RX API; these were used
with older firmware API that didn't have the RX energy API,
support for which was removed a long time ago.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:45 +03:00
Johannes Berg
abfd794c59 iwlwifi: mvm: remove pointless cfg_phy_cnt length check
Since the driver can never configure the data here, this field
will always be reported as 0 by the firmware. Even if this was
not the case, however, it wouldn't matter since the extra data
would be added beyond the end of the phy_info structure we use
in the driver, so wouldn't harm anything in this code either.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:44 +03:00
Johannes Berg
7f89a58efc iwlwifi: mvm: remove useless debug message from RX
This message is useless - it's in the good case that always
happens so enabling it doesn't really help. Just remove it.
There are other ways to debug this (e.g. tracing) so there's
no need to add a message in the bad case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:44 +03:00
Johannes Berg
da583fdfec iwlwifi: mvm: make sure AP is operating for ToF
It's possible for an AP interface to be UP but not actually
operating (i.e. not beaconing etc.) - in this case it can't
actually do ToF, so check for it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:43 +03:00
Emmanuel Grumbach
38d5f66f06 iwlwifi: mvm: remove IWL_UCODE_TLV_API_STATS_V10 TLV flag
This flag is set in all supported firmwares.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:43 +03:00
Emmanuel Grumbach
9d43fa4b4a iwlwifi: mvm: remove IWL_UCODE_TLV_API_ASYNC_DTM TLV flag
This flag is set in all supported firmwares.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:42 +03:00
Emmanuel Grumbach
eb991c5ec1 iwlwifi: mvm: remove IWL_UCODE_TLV_API_SINGLE_SCAN_EBS TLV flag
All the supported firmwares have this flag set.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:42 +03:00
Emmanuel Grumbach
4d31eed13f iwlwifi: mvm: remove IWL_UCODE_TLV_API_TX_POWER_DEV TLV flag
All the supported firmwares use the new API.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:42 +03:00
Emmanuel Grumbach
89ced540eb iwlwifi: mvm: remove IWL_UCODE_TLV_API_HDC_PHASE_0 TLV flag
All the supported firwmares support the new API.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2015-09-21 18:08:34 +03:00
Oleksij Rempel
e904cf6fe2 ath9k_htc: introduce support for different fw versions
Current kernel support only one fw name with theoretically only one
fw version located in “firmware/htc_[9271|7010].fw”. Which is ok so far we
have only one fw version (1.3). After we realised new fw 1.4, we faced
compatibility problem which was decided to solve by firmware name and
location:
- new firmware is located now in
	firmware/ath9k_htc/htc_[9271|7010]-1.4.0.fw
- old version 1.3 should be on old place, so old kernel have no issues
	with it.
- new kernels including this patch should be able to try different
	supported (min..max) fw version.
- new kernel should be able to support old fw location too. At least for
	now.

At same time this patch will add new module option which should allow user
to play with development  fw version without replacing stable one. If user
will set “ath9k_htc use_dev_fw=1” module will try to find
firmware/ath9k_htc/htc_[9271|7010]-1.dev.0.fw first and if it fails, use
stable version: for example...1.4.0.fw.

Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-09-18 10:40:14 +03:00
Eric Dumazet
47bbbb30b4 sch_dsmark: improve memory locality
Memory placement in sch_dsmark is silly : Better place mask/value
in the same cache line.

Also, we can embed small arrays in the first cache line and
remove a potential cache miss.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:37:19 -07:00
David S. Miller
25354001d0 Merge branch 'bcmgenet-irq-coalesce'
Florian Fainelli says:

====================
net: bcmgenet: Interrupt coalescing

This patch series adds support for interrupt coalescing for GENET
adapters.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:17:14 -07:00
Florian Fainelli
4a29645bfe net: bcmgenet: Implement RX coalescing control knobs
Add support for the ethtool rx-frames coalescing parameter which allows
defining the number of RX interrupts per frames received. The RDMA
engine supports a configurable timeout with a resolution of
approximately 8.192 us.

We can no longer enable the BDONE/PDONE interrupts as those would
fire for each packet/buffer received, which would defeat the MBDONE
interrupt purpose. The MBDONE interrupt is guaranteed to correspond to a
PDONE/BDONE interrupt when the threshold is set to 1.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:17:14 -07:00
Florian Fainelli
2f9130709d net: bcmgenet: Implement TX coalescing control knobs
Configuring the ethtool tx-frames property, which translates into N
packets before a TX interrupt is the simplest configuration scheme
because it requires no locking neither at the softare nor hardware
level, and is completely indepedent from the link speed. Since ethtool
does not allow per-tx queue coalescing parameters, we apply the same
setting to any transmit queue.

We can no longer enable the BDONE/PDONE interrupts as those would fire
for each packet/buffer received, which would defeat the MBDONE interrupt
purpose. The MBDONE interrupt is guaranteed to correspond to a
PDONE/BDONE interrupt when the threshold is set to 1, but offers
interrupt coalescing when the value is > 1.

Since the HW is configured to generate an interrupt when the ring
becomes emtpy, we have to deny any timeout/timer settings coming from
user-space to indicate we can only generate an interrupt very <N>
packets.

While we are at it, fix the DMA_INTR_THRESHOLD_MASK value which was off
by one bit (0xff vs. 0x1ff).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:17:14 -07:00
Woojung.Huh@microchip.com
9110fe4a17 lan78xx: Remove not defined MAC_CR_GMII_EN_ bit from MAC_CR.
Remove not defined MAC_CR_GMII_EN_ bit from MAC_CR.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:15:37 -07:00
Woojung.Huh@microchip.com
758c5c1174 lan78xx: Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control.
Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:15:37 -07:00
Woojung.Huh@microchip.com
bdfba55e0d lan78xx: Remove phy defines in lan78xx.h and use defines in include/linux/microchipphy.h
Remove phy defines in lan78xx.h and use defines in include/linux/microchipphy.h.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:15:37 -07:00
Woojung.Huh@microchip.com
ce85e13ad6 lan78xx: Update to use phylib instead of mii_if_info.
Update to use phylib instead of mii_if_info.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:15:36 -07:00
Woojung.Huh@microchip.com
05fe68c008 lan78xx: Add PHYLIB and MICROCHIP_PHY as default config.
Add PHYLIB and MICROCHIP_PHY as default configuration for lan78xx.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:15:36 -07:00
Woojung.Huh@microchip.com
6c595b03b1 lan78xx: Check device ready bit (PMT_CTL_READY_) after reset the PHY
Check device ready bit (PMT_CTL_READY_) after reset the PHY.
Device may not be ready even if PHY_RST_ is cleared depends on configuration.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 22:15:36 -07:00
David Ahern
bde6f9ded1 net: Initialize table in fib result
Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard:

[    0.877040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056
[    0.877597] IP: [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
[    0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0
[    0.877597] Oops: 0000 [#1] SMP
[    0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio
[    0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1
[    0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.877597] task: ffff88003fab0bc0 ti: ffff88003faa8000 task.ti: ffff88003faa8000
[    0.877597] RIP: 0010:[<ffffffff8155b5e2>]  [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
[    0.877597] RSP: 0018:ffff88003ed03ba0  EFLAGS: 00010202
[    0.877597] RAX: 0000000000000046 RBX: 00000000ffffff8f RCX: 0000000000000020
[    0.877597] RDX: ffff88003fab50b8 RSI: 0000000000000200 RDI: ffffffff8152b4b8
[    0.877597] RBP: ffff88003ed03c50 R08: 0000000000000000 R09: 0000000000000000
[    0.877597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fab6f00
[    0.877597] R13: ffff88003fab5000 R14: 0000000000000000 R15: ffffffff81cb5600
[    0.877597] FS:  00007f6de5751700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
[    0.877597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.877597] CR2: 0000000000000056 CR3: 000000003fa6d000 CR4: 00000000000006e0
[    0.877597] Stack:
[    0.877597]  0000000000000000 0000000000000046 ffff88003fffa600 ffff88003ed03be0
[    0.877597]  ffff88003f9e2c00 697da8c0017da8c0 ffff880000000000 000000000007fd00
[    0.877597]  0000000000000000 0000000000000046 0000000000000000 0000000400000000
[    0.877597] Call Trace:
[    0.877597]  <IRQ>
[    0.877597]  [<ffffffff812bfa1f>] ? cpumask_next_and+0x2f/0x40
[    0.877597]  [<ffffffff8158e13c>] arp_process+0x39c/0x690
[    0.877597]  [<ffffffff8158e57e>] arp_rcv+0x13e/0x170
[    0.877597]  [<ffffffff8151feec>] __netif_receive_skb_core+0x60c/0xa00
[    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
[    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
[    0.877597]  [<ffffffff81521ff6>] __netif_receive_skb+0x16/0x70
[    0.877597]  [<ffffffff81522078>] netif_receive_skb_internal+0x28/0x90
[    0.877597]  [<ffffffff8152288f>] napi_gro_receive+0x7f/0xd0
[    0.877597]  [<ffffffffa0017906>] virtnet_receive+0x256/0x910 [virtio_net]
[    0.877597]  [<ffffffffa0017fd8>] virtnet_poll+0x18/0x80 [virtio_net]
[    0.877597]  [<ffffffff815234cd>] net_rx_action+0x1dd/0x2f0
[    0.877597]  [<ffffffff81053228>] __do_softirq+0x98/0x260
[    0.877597]  [<ffffffff8164969c>] do_softirq_own_stack+0x1c/0x30

The root cause is use of res.table uninitialized.

Thanks to Nikolay for noticing the uninitialized use amongst the maze of
gotos.

As Nikolay pointed out the second initialization is not required to fix
the oops, but rather to fix a related problem where a valid lookup should
be invalidated before creating the rth entry.

Fixes: b7503e0cdb ("net: Add FIB table id to rtable")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Reported-by: Richard Alpe <richard.alpe@ericsson.com>
Reported-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 21:34:08 -07:00
David S. Miller
41a9802fd8 Merge branch 'bpf_avoid_clone'
Alexei Starovoitov says:

====================
bpf: performance improvements

v1->v2: dropped redundant iff_up check in patch 2

At plumbers we discussed different options on how to get rid of skb_clone
from bpf_clone_redirect(), the patch 2 implements the best option.
Patch 1 adds 'integrated exts' to cls_bpf to improve performance by
combining simple actions into bpf classifier.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 21:09:07 -07:00
Alexei Starovoitov
27b29f6305 bpf: add bpf_redirect() helper
Existing bpf_clone_redirect() helper clones skb before redirecting
it to RX or TX of destination netdev.
Introduce bpf_redirect() helper that does that without cloning.

Benchmarked with two hosts using 10G ixgbe NICs.
One host is doing line rate pktgen.
Another host is configured as:
$ tc qdisc add dev $dev ingress
$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
   action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
so it receives the packet on $dev and immediately xmits it on $dev + 1
The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
that does bpf_clone_redirect() and performance is 2.0 Mpps

$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
   action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
which is using bpf_redirect() - 2.4 Mpps

and using cls_bpf with integrated actions as:
$ tc filter add dev $dev root pref 10 \
  bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
performance is 2.5 Mpps

To summarize:
u32+act_bpf using clone_redirect - 2.0 Mpps
u32+act_bpf using redirect - 2.4 Mpps
cls_bpf using redirect - 2.5 Mpps

For comparison linux bridge in this setup is doing 2.1 Mpps
and ixgbe rx + drop in ip_rcv - 7.8 Mpps

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 21:09:07 -07:00
Daniel Borkmann
045efa82ff cls_bpf: introduce integrated actions
Often cls_bpf classifier is used with single action drop attached.
Optimize this use case and let cls_bpf return both classid and action.
For backwards compatibility reasons enable this feature under
TCA_BPF_FLAG_ACT_DIRECT flag.

Then more interesting programs like the following are easier to write:
int cls_bpf_prog(struct __sk_buff *skb)
{
  /* classify arp, ip, ipv6 into different traffic classes
   * and drop all other packets
   */
  switch (skb->protocol) {
  case htons(ETH_P_ARP):
    skb->tc_classid = 1;
    break;
  case htons(ETH_P_IP):
    skb->tc_classid = 2;
    break;
  case htons(ETH_P_IPV6):
    skb->tc_classid = 3;
    break;
  default:
    return TC_ACT_SHOT;
  }

  return TC_ACT_OK;
}

Joint work with Daniel Borkmann.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 21:09:06 -07:00
Junwei Zhang
f6c53334d6 net: only check perm protocol when register proto
The permanent protocol nodes are at the head of the list,
So only need check all these nodes.

No matter the new node is permanent or not,
insert the new node after the last permanent protocol node,

If the new node conflicts with existing permanent node,
return error.

Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 21:02:59 -07:00
Eric Dumazet
4b1b865e4e bonding: use l4 hash if available
If skb carries a l4 hash, no need to perform a flow dissection.

Performance is slightly better :

lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39012e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39393e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39988e+06

After patch :

lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.43579e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.44304e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.44312e+06

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 21:01:05 -07:00
Eric Dumazet
58d607d3e5 tcp: provide skb->hash to synack packets
In commit b73c3d0e4f ("net: Save TX flow hash in sock and set in skbuf
on xmit"), Tom provided a l4 hash to most outgoing TCP packets.

We'd like to provide one as well for SYNACK packets, so that all packets
of a given flow share same txhash, to later enable bonding driver to
also use skb->hash to perform slave selection.

Note that a SYNACK retransmit shuffles the tx hash, as Tom did
in commit 265f94ff54 ("net: Recompute sk_txhash on negative routing
advice") for established sockets.

This has nice effect making TCP flows resilient to some kind of black
holes, even at connection establish phase.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 21:01:04 -07:00
David S. Miller
bbe8373138 Merge branch 'nf_hook_netns'
Eric W. Biederman says:

====================
Passing net through the netfilter hooks

My primary goal with this patchset and it's follow ups is to cleanup the
network routing paths so that we do not look at the output device to
derive the network namespace.  My plan is to pass the network namespace
of the transmitting socket through the output path, to replace code that
looks at the output network device today.  Once that is done we can have
routes with output devices outside of the current network namespace.
Which should allow reception and transmission of packets in network
namespaces to be as fast as normal packet reception and transmission
with early demux disabled, because it will same code path.

Once skb_dst(skb)->dev is a little better under control I think it will
also be possible to use rcu to cleanup the ancient hack that sets
dst->dev to loopback_dev when a network device is removed.

The work to get there is a series of code cleanups.  I am starting with
passing net into the netfilter hooks and into the functions that are
called after the netfilter hooks.  This removes from netfilter the
need to guess which network namespace it is working on.

To get there I perform a series of minor prep patches so the big changes
at the end are possible to audit without getting lost in the noise.  In
particular I have a lot of patches computing net into a local variable
and then using it through out the function.

So this patchset encompases removing dead code, sorting out the _sk
functions that were added last time someone pushed a prototype change
through the post netfilter functions.  Cleaning up individual functions
use of the network namespace.  Passing net into the netfilter hooks.
Passing net into the post netfilter functions.  Using state->net in
the netfilter code where it is available and trivially usable.

Pablo, Dave I don't know whose tree this makes more sense to go
through.  I am assuming at least initially Pablos as netfilter is
involved.  From what I have seen there will be a lot of back and forth
between the netfilter code paths and the routing code paths.

The patches are also available (against 4.3-rc1) at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next.git master
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:38 -07:00
Eric W. Biederman
be10de0a32 netfilter: Add blank lines in callers of netfilter hooks
In code review it was noticed that I had failed to add some blank lines
in places where they are customarily used.  Taking a second look at the
code I have to agree blank lines would be nice so I have added them
here.

Reported-by:  Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:37 -07:00
Eric W. Biederman
0c4b51f005 netfilter: Pass net into okfn
This is immediately motivated by the bridge code that chains functions that
call into netfilter.  Without passing net into the okfns the bridge code would
need to guess about the best expression for the network namespace to process
packets in.

As net is frequently one of the first things computed in continuation functions
after netfilter has done it's job passing in the desired network namespace is in
many cases a code simplification.

To support this change the function dst_output_okfn is introduced to
simplify passing dst_output as an okfn.  For the moment dst_output_okfn
just silently drops the struct net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:37 -07:00
Eric W. Biederman
9dff2c966a netfilter: Use nf_hook_state.net
Instead of saying "net = dev_net(state->in?state->in:state->out)"
just say "state->net".  As that information is now availabe,
much less confusing and much less error prone.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:37 -07:00
Eric W. Biederman
29a26a5680 netfilter: Pass struct net into the netfilter hooks
Pass a network namespace parameter into the netfilter hooks.  At the
call site of the netfilter hooks the path a packet is taking through
the network stack is well known which allows the network namespace to
be easily and reliabily.

This allows the replacement of magic code like
"dev_net(state->in?:state->out)" that appears at the start of most
netfilter hooks with "state->net".

In almost all cases the network namespace passed in is derived
from the first network device passed in, guaranteeing those
paths will not see any changes in practice.

The exceptions are:
xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

In all cases these exceptions seem to be a better expression for the
network namespace the packet is being processed in then the historic
"dev_net(in?in:out)".  I am documenting them in case something odd
pops up and someone starts trying to track down what happened.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:37 -07:00
Eric W. Biederman
04eb44890e bridge: Add br_netif_receive_skb remove netif_receive_skb_sk
netif_receive_skb_sk is only called once in the bridge code, replace
it with a bridge specific function that calls netif_receive_skb.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:37 -07:00
Eric W. Biederman
f2d74cf88c bridge: Cache net in br_nf_pre_routing_finish
This is prep work for passing net to the netfilter hooks.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 17:18:36 -07:00