Commit Graph

71 Commits

Author SHA1 Message Date
Scott Feldman
ac28393e85 rocker: mark STP update as 'no wait' processing
We can get STP updates from the bridge driver in atomic and non-atomic
contexts.  Since we can't test what context we're getting called in,
do the STP processing as 'no wait', which will cover all cases.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:06:49 -07:00
Scott Feldman
02a9fbfc87 rocker: mark neigh update event processing as 'no wait'
Neigh update event handler runs in a context where we can't sleep, so mark
processing in driver with ROCKER_OP_FLAG_NOWAIT.  NOWAIT will use
GFP_ATOMIC for allocations and will queue cmds to the device's cmd ring but
will not wait (sleep) for cmd response back from device.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:06:48 -07:00
Scott Feldman
179f9a2590 rocker: revert back to support for nowait processes
One of the items removed from the rocker driver in the Spring Cleanup patch
series was the ability to mark processing in the driver as "no wait" for
those contexts where we cannot sleep.  Turns out, we have "no wait"
contexts where we want to program the device.  So re-add the
ROCKER_OP_FLAG_NOWAIT flag to mark such processes, and propagate flags to
mem allocator and to the device cmd executor.  With NOWAIT, mem allocs are
GFP_ATOMIC and device cmds are queued to the device, but the driver will
not wait (sleep) for the response back from the device.

My bad for removing NOWAIT support in the first place; I thought we could
swing non-sleep contexts to process context using a work queue, for
example, but there is push-back to keep processing in original context.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:06:48 -07:00
Scott Feldman
4d81db4156 rocker: fix neigh tbl index increment race
rocker->neigh_tbl_next_index is used to generate unique indices for neigh
entries programmed into the device.  The way new indices were generated was
racy with the new prepare-commit transaction model.  A simple fix here
removes the race.  The race was with two processes getting the same index,
one process using prepare-commit, the other not:

Proc A					Proc B

PREPARE phase
get neigh_tbl_next_index

					NONE phase
					get neigh_tbl_next_index
					neigh_tbl_next_index++

COMMIT phase
neigh_tbl_next_index++

Both A and B got the same index.  The fix is to store and increment
neigh_tbl_next_index in the PREPARE (or NONE) phase and use value in COMMIT
phase:

Proc A					Proc B

PREPARE phase
get neigh_tbl_next_index
neigh_tbl_next_index++

					NONE phase
					get neigh_tbl_next_index
					neigh_tbl_next_index++

COMMIT phase
// use value stashed in PREPARE phase

Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:04:21 -07:00
Scott Feldman
a072031084 rocker: gaurd against NULL rocker_port when removing ports
The ports array is filled in as ports are probed, but if probing doesn't
finish, we need to stop only those ports that where probed successfully.
Check the ports array for NULL to skip un-probed ports when stopping.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:03:48 -07:00
Scott Feldman
2aa2ed0864 rocker: remove support for legacy VLAN ndo ops
Remove support for legacy ndo ops
.ndo_vlan_rx_add_vid/.ndo_vlan_rx_kill_vid.  Rocker will use
bridge_setlink/dellink exclusively for VLAN add/del operations.

The legacy ops are needed if using 8021q driver module to setup VLANs on
the port.  But an alternative exists in using bridge_setlink/delink to
setup VLANs, which doesn't depend on 8021q module.  So rocker will switch
to the newer setlink/dellink ops.  VLANs can added/delete from the port,
regardless if port is bridged or not, using the bridge commands:

	bridge vlan [add|del] vid VID dev DEV self

(Yes, I agree it's confusing to use the "bridge" command to set a VLAN on a
non-bridged port).

Using setlink/dellink over legacy ops let's us handle the stacked driver
case automatically.  It's built-in.  setlink also pass additional flags
(PVID, egress untagged) that aren't available with the legacy ops.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:09 -07:00
Scott Feldman
027e00dc0b rocker: install/remove router MAC for untagged VLAN when joining/leaving bridge
When the port joins a bridge, the port's internal VLAN ID needs to change
to the bridge's internal VLAN ID.  Likewise, when leaving the bridge, the
internal VLAN ID reverts back the port's original internal VLAN ID.  (The
internal VLAN ID is used by device to internally mark untagged pkts with
some VLAN, which will eventually be removed on egress...think PVID).  When
the internal VLAN ID changes, we need to update the VLAN table entries and
the router MAC entries for IP/IPv6 to reflect the new internal VLAN ID.

This patch makes use of the common rocker_port_vlan_add/del functions to
make sure the tables are updated for the current internal VLAN ID.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:09 -07:00
Scott Feldman
bcfd780144 rocker: install untagged VLAN (vid=0) support for each port
On port probe, install by default untagged VLAN support.  This is
equivalent to running the command:

	bridge vlan add vid 0 dev DEV self

A user could, if they wanted, manaully removing untagged support from the
port by running the command:

	bridge vlan del vid 0 dev DEV self

But installing it by default on port initialization gives the normal
expected behavior.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:09 -07:00
Scott Feldman
cec04a60bc rocker: cleanup vlan table on error adding vlan
Basic house keeping: If there is an error adding the router MAC for this
vlan, removing the just installed VLAN table entry to leave device in same
state as before failure.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:08 -07:00
Scott Feldman
27b808cbc2 rocker: zero allocate ports array
When allocating the array of rocker port pointers, zero the array values so
we can test for !NULL to see if port is allocated/registered.  We'll need
this later when installing untagged VLAN support for each port, during port
probe.  It's a long story, but to install a VLAN (vid=0 for untagged, in
this case) on a port, we'll need to scan other ports to see if the VLAN
group for that VLAN has been setup.  To scan the other ports, we need to
walk the port array.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:08 -07:00
Simon Horman
534ba6a87d rocker: remove rocker parameter from functions that have rocker_port parameter
The rocker (switch) of a rocker_port may be trivially obtained from
the latter it seems cleaner not to pass the former to a function when
the latter is being passed anyway.

rocker_port_rx_proc() is omitted from this change as it is a hot path case.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 16:04:52 -07:00
Simon Horman
e505464355 rocker: mark parameters and local variables as const
Mark parameters and local variables as const where possible.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 18:17:08 -04:00
Simon Horman
0985df7390 rocker: remove unused rocker_port parameter from rocker_port_kfree
Remove unused rocker_port parameter from rocker_port_kfree.
Also remove the rocker_port parameter from callers of rocker_port_kfree
where the parameter it is now unused.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 18:17:08 -04:00
Simon Horman
df6a206730 rocker: make rocker_port_internal_vlan_id_{get, put}() non-transactional
The motivation for this is that rocker_port_internal_vlan_id_{get,put} appear
to only partially implement the transaction model: memory allocation
and freeing is transactional, but hash and bitmap manipulation is not.

The latter could be fixed, however, as it is not currently exercised
due to trans always being SWITCHDEV_TRANS_NONE it seems cleaner
to make rocker_port_internal_vlan_id_get non-transactional.

This problem was introduced by c4f20321d9 ("rocker: support
prepare-commit transaction model").

Found by inspection.
I do not believe that this change should have any run-time effect.

Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:20:55 -04:00
Simon Horman
550ecc92fe rocker: do not make neighbour entry changes when preparing transactions
rocker_port_ipv4_nh() and in turn rocker_port_ipv4_neigh() may be
be called with trans == SWITCHDEV_TRANS_PREPARE and then
trans == SWITCHDEV_TRANS_COMMIT from switchdev_port_obj_set() via
fib_table_insert().

The first time that rocker_port_ipv4_nh() is called, with
trans == SWITCHDEV_TRANS_PREPARE, _rocker_neigh_add() adds a new entry to
the neigh table.

And the second time  rocker_port_ipv4_nh() is called, with
trans == SWITCHDEV_TRANS_COMMIT, that entry is found. This causes
rocker_port_ipv4_nh() to believe it is not adding an entry and thus it
frees "entry", which is still present in rocker driver's neigh table.

This problem does not appear to affect deletion as my analysis is that
deletion is always performed with trans == SWITCHDEV_TRANS_NONE.

For completeness _rocker_neigh_{add,del,prepare} are updated not to
manipulate fib table entries if trans == SWITCHDEV_TRANS_PREPARE.

Fixes: c4f20321d9 ("rocker: support prepare-commit transaction model")
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:20:55 -04:00
Simon Horman
42e9488971 rocker: do not modify fdb table in rocker_port_fdb() when preparing transactions
rocker_port_fdb_flush() may be called be called with
trans == SWITCHDEV_TRANS_PREPARE and then trans == SWITCHDEV_TRANS_COMMIT from
switchdev_port_attr_set() via switchdev_port_obj_add().

Adding the new entry to the FDB table when trans == SWITCHDEV_TRANS_PREPARE
may result in a memory leak because when trans == SWITCHDEV_TRANS_PREPARE
rocker_flow_tbl_bridge() will allocate memory when called via
rocker_port_fdb_learn(). However, when trans == SWITCHDEV_TRANS_COMMIT
the presence of the FDB entry in the FDB table causes
rocker_port_fdb() to set the ROCKER_OP_FLAG_REFRESH flag which results
in rocker_port_fdb_learn() skipping the call to rocker_flow_tbl_bridge()
which would free the memory allocated by it when
trans == SWITCHDEV_TRANS_PREPARE.

ip link add br0 type bridge
ip link set up dev eth0
ip link set dev eth0 master br0
bridge fdb add 52:54:00:12:35:08 dev eth0
bridge fdb add 52:54:00:12:35:09 dev eth0
[    2.600730] ------------[ cut here ]------------
[    2.601002] kernel BUG at drivers/net/ethernet/rocker/rocker.c:4369!
[    2.601373] invalid opcode: 0000 [#1] SMP
[    2.601963] Modules linked in:
[    2.602355] CPU: 0 PID: 64 Comm: bridge Not tainted 4.1.0-rc3-01048-g6d0f50c50211-dirty #1075
[    2.602721] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org 04/01/2014
[    2.602721] task: ffff880019facef0 ti: ffff88001f96c000 task.ti: ffff88001f96c000
[    2.602721] RIP: 0010:[<ffffffff811f1470>]  [<ffffffff811f1470>] rocker_port_obj_add+0x150/0x160
[    2.602721] RSP: 0018:ffff88001f96fa98  EFLAGS: 00000212
[    2.602721] RAX: ffff880019d4fa68 RBX: ffff88001f96fb18 RCX: 0000000000000000
[    2.602721] RDX: ffff880019d4f000 RSI: ffff88001f96fb18 RDI: ffff880019d4f000
[    2.602721] RBP: 0000000000000001 R08: 0000000000000000 R09: ffff88001f904620
[    2.602721] R10: ffff88001f96fb60 R11: ffff880019e9d100 R12: ffff88001f96fb18
[    2.602721] R13: ffff880019d4f680 R14: ffff88001f904610 R15: ffff8800198f7b80
[    2.602721] FS:  00007f3eee917700(0000) GS:ffff88001b000000(0000) knlGS:0000000000000000
[    2.602721] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.602721] CR2: 00007f3eee4a15cb CR3: 000000001f933000 CR4: 00000000000006b0
[    2.602721] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.602721] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[    2.602721] Stack:
[    2.602721]  0000000000000000 ffff88001f96fb18 ffff880019d4f000 ffff88001f96fb18
[    2.602721]  ffff880019d4f000 ffffffff81332105 ffff88001f96fb50 ffffffff814464c0
[    2.602721]  ffff88001f96fb18 ffff88001f904600 ffff880019d4f000 ffffffff813326e5
[    2.602721] Call Trace:
[    2.602721]  [<ffffffff81332105>] ? __switchdev_port_obj_add+0x25/0x90
[    2.602721]  [<ffffffff813326e5>] ? switchdev_port_obj_add+0x25/0xc0
[    2.602721]  [<ffffffff813327b1>] ? switchdev_port_fdb_add+0x31/0x40
[    2.602721]  [<ffffffff8123911f>] ? rtnl_fdb_add+0xff/0x1e0
[    2.602721]  [<ffffffff81237d8e>] ? rtnetlink_rcv_msg+0x7e/0x250
[    2.602721]  [<ffffffff8121d1ce>] ? __skb_recv_datagram+0xfe/0x4b0
[    2.602721]  [<ffffffff81237d10>] ? rtnetlink_rcv+0x30/0x30
[    2.602721]  [<ffffffff81247958>] ? netlink_rcv_skb+0xa8/0xd0
[    2.602721]  [<ffffffff81237cff>] ? rtnetlink_rcv+0x1f/0x30
[    2.602721]  [<ffffffff81247220>] ? netlink_unicast+0x150/0x200
[    2.602721]  [<ffffffff81247714>] ? netlink_sendmsg+0x374/0x3e0
[    2.602721]  [<ffffffff8120f8df>] ? sock_sendmsg+0xf/0x30
[    2.602721]  [<ffffffff8120ffd3>] ? ___sys_sendmsg+0x1f3/0x200
[    2.602721]  [<ffffffff812100e5>] ? ___sys_recvmsg+0x105/0x140
[    2.602721]  [<ffffffff810a36f0>] ? SyS_readahead+0x90/0x90
[    2.602721]  [<ffffffff81098dfd>] ? filemap_map_pages+0x1ed/0x210
[    2.602721]  [<ffffffff810b77fc>] ? handle_mm_fault+0x5fc/0xe50
[    2.602721]  [<ffffffff81210ef9>] ? __sys_sendmsg+0x39/0x70
[    2.602721]  [<ffffffff8133ce17>] ? system_call_fastpath+0x12/0x6a
[    2.602721] Code: b7 8f a0 06 00 00 48 83 bf 88 06 00 00 00 74 1d 48 83 c4 08 89 ee 4c 89 ef 5b 5d 41 5c 41 5d 0f b7 c9 45 31 c0 e9 51 db ff ff 90 <0f> 0b b8 ea ff ff ff e9 cf fe ff ff 0f 1f 40 00 41 57 41 56 b9
[    2.602721] RIP  [<ffffffff811f1470>] rocker_port_obj_add+0x150/0x160
[    2.602721]  RSP <ffff88001f96fa98>
[    2.615848] ---[ end trace 4f7b4f1c98077108 ]---

The above is resolved by not adding the new FDB entry to the FDB table
if trans == SWITCHDEV_TRANS_PREPARE.

For symmetry this patch also skips deleting FDB entries from the FDB
table trans == SWITCHDEV_TRANS_PREPARE. However, my analysis is that
this never occurs as trans is always SWITCHDEV_TRANS_NONE when removing
FDB entries.

Fixes: c4f20321d9 ("rocker: support prepare-commit transaction model")
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:20:54 -04:00
Simon Horman
3098ac3963 rocker: do not delete fdb entries in rocker_port_fdb_flush() when preparing transactions
rocker_port_fdb_flush() is called by rocker_port_stp_update() which in
turn may be called with trans == SWITCHDEV_TRANS_PREPARE and then
trans == SWITCHDEV_TRANS_COMMIT from switchdev_port_attr_set() via
br_set_state().

When rocker_port_fdb_flush() is called with trans == SWITCHDEV_TRANS_PREPARE
it calls rocker_port_fdb_learn() for each entry in the FDB table which in
turn calls rocker_flow_tbl_bridge() which will allocate memory using
rocker_port_kzalloc(). rocker_port_fdb_learn() will then remove the entry
from the FDB table.

Then when rocker_port_fdb_learn() is called with
trans == SWITCHDEV_TRANS_PREPARE no calls are made to rocker_port_fdb_learn()
because there are no longer any entries present in the FDB table. Thus the
memory previously allocated by rocker_port_fdb_learn() is leaked resulting
in the kernel BUG() below.

Furthermore, it looks like the driver ends up with an incorrect view of the
fdb table as the FDB entries are purged from the driver's table but not the
hardware's table.

ip link add br0 type bridge
ip link set up dev eth0
sleep 1
ip link set dev eth0 master br0
[    3.704360] ------------[ cut here ]------------
[    3.704611] kernel BUG at drivers/net/ethernet/rocker/rocker.c:4289!
[    3.704962] invalid opcode: 0000 [#1] SMP
[    3.705537] Modules linked in:
[    3.705919] CPU: 0 PID: 63 Comm: ip Not tainted 4.1.0-rc3-01046-gb9fbe709de4d #1044
[    3.706191] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org 04/01/2014
[    3.706820] task: ffff880019f70150 ti: ffff88001f92c000 task.ti: ffff88001f92c000
[    3.707138] RIP: 0010:[<ffffffff811f0080>]  [<ffffffff811f0080>] rocker_port_attr_set+0xe0/0xf0
[    3.707990] RSP: 0018:ffff88001f92f808  EFLAGS: 00000212
[    3.708200] RAX: ffff880019d4fa68 RBX: ffff880019d4f000 RCX: 0000000000000000
[    3.708471] RDX: 000000000000000c RSI: ffff88001f92f890 RDI: ffff880019d4f680
[    3.708740] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000004
[    3.708999] R10: ffff880000034024 R11: 0000000000000000 R12: ffff88001f92f890
[    3.709276] R13: ffff88001f8f1c00 R14: 000000000000000b R15: 0000000000000000
[    3.709303] FS:  00007f8ab66bd700(0000) GS:ffff88001b000000(0000) knlGS:0000000000000000
[    3.709303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.709303] CR2: 0000000000654988 CR3: 000000001f8f3000 CR4: 00000000000006b0
[    3.709303] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.709303] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[    3.709303] Stack:
[    3.709303]  ffff88001f8f1c00 000000000000000b ffff88001f92f890 ffff880019d4f000
[    3.709303]  ffff88001f92f890 ffffffff813332f5 ffff88001f92f880 0000000000000000
[    3.709303]  ffff88001f92f890 0000000000000001 ffff880019d4f000 ffffffff81333627
[    3.709303] Call Trace:
[    3.709303]  [<ffffffff813332f5>] ? __switchdev_port_attr_set+0x25/0x90
[    3.709303]  [<ffffffff81333627>] ? switchdev_port_attr_set+0x27/0x120
[    3.709303]  [<ffffffff81318e86>] ? br_set_state+0x36/0x50
[    3.709303]  [<ffffffff8131795c>] ? br_add_if+0x37c/0x400
[    3.709303]  [<ffffffff81238ce1>] ? do_setlink+0x7e1/0x800
[    3.709303]  [<ffffffff8111f980>] ? radix_tree_lookup_slot+0x10/0x30
[    3.709303]  [<ffffffff81136fba>] ? nla_parse+0xaa/0x110
[    3.709303]  [<ffffffff81239c98>] ? rtnl_newlink+0x548/0x870
[    3.709303]  [<ffffffff8111f900>] ? __radix_tree_lookup+0x40/0xb0
[    3.709303]  [<ffffffff81136f3e>] ? nla_parse+0x2e/0x110
[    3.709303]  [<ffffffff81237d7e>] ? rtnetlink_rcv_msg+0x7e/0x250
[    3.709303]  [<ffffffff8121d1be>] ? __skb_recv_datagram+0xfe/0x4b0
[    3.709303]  [<ffffffff81237d00>] ? rtnetlink_rcv+0x30/0x30
[    3.709303]  [<ffffffff81247948>] ? netlink_rcv_skb+0xa8/0xd0
[    3.709303]  [<ffffffff81237cef>] ? rtnetlink_rcv+0x1f/0x30
[    3.709303]  [<ffffffff81247210>] ? netlink_unicast+0x150/0x200
[    3.709303]  [<ffffffff81247704>] ? netlink_sendmsg+0x374/0x3e0
[    3.709303]  [<ffffffff8120f8cf>] ? sock_sendmsg+0xf/0x30
[    3.709303]  [<ffffffff8120ffc3>] ? ___sys_sendmsg+0x1f3/0x200
[    3.709303]  [<ffffffff812100d5>] ? ___sys_recvmsg+0x105/0x140
[    3.709303]  [<ffffffff812228d9>] ? dev_get_by_name_rcu+0x69/0x90
[    3.709303]  [<ffffffff812228d9>] ? dev_get_by_name_rcu+0x69/0x90
[    3.709303]  [<ffffffff81217b7d>] ? skb_dequeue+0x4d/0x60
[    3.709303]  [<ffffffff81217bb0>] ? skb_queue_purge+0x20/0x30
[    3.709303]  [<ffffffff810ebdcf>] ? __inode_wait_for_writeback+0x5f/0xb0
[    3.709303]  [<ffffffff810648b0>] ? autoremove_wake_function+0x30/0x30
[    3.709303]  [<ffffffff81210ee9>] ? __sys_sendmsg+0x39/0x70
[    3.709303]  [<ffffffff8133e097>] ? system_call_fastpath+0x12/0x6a
[    3.709303] Code: bb 90 06 00 00 48 c7 04 24 00 00 00 00 45 31 c9 45 31 c0 48 c7 c1 c0 b7 1e 81 89 ea e8 da da ff ff eb 95 0f 1f 84 00 00 00 00 00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 83 fe 15 75
[    3.709303] RIP  [<ffffffff811f0080>] rocker_port_attr_set+0xe0/0xf0
[    3.709303]  RSP <ffff88001f92f808>
[    3.721409] ---[ end trace b7481fcb7cb032aa ]---
Segmentation fault

Fixes: c4f20321d9 ("rocker: support prepare-commit transaction model")
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:20:54 -04:00
Samudrala, Sridhar
45d4122ca7 switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.
- introduce port fdb obj and generic switchdev_port_fdb_add/del/dump()
- use switchdev_port_fdb_add/del/dump in rocker/team/bonding ndo ops.
- add support for fdb obj in switchdev_port_obj_add/del/dump()
- switch rocker to implement fdb ops via switchdev_ops

v3: updated to sync with named union changes.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 22:49:09 -04:00
Ying Xue
4133fc0952 rocker: fix a neigh entry leak issue
Once we get a neighbour through looking up arp cache or creating a
new one in rocker_port_ipv4_resolve(), the neighbour's refcount is
already taken. But as we don't put the refcount again after it's
used, this makes the neighbour entry leaked.

Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-15 16:58:32 -04:00
Scott Feldman
42275bd8fc switchdev: don't use anonymous union on switchdev attr/obj structs
Older gcc versions (e.g.  gcc version 4.4.6) don't like anonymous unions
which was causing build issues on the newly added switchdev attr/obj
structs.  Fix this by using named union on structs.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 14:20:59 -04:00
Scott Feldman
7a7ee5312d switchdev: sparse warning: pass ipv4 fib dst as network-byte order
And let driver convert it to host-byte order as needed.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 12:26:27 -04:00
Scott Feldman
4725ceb9b7 rocker: make checkpatch -f clean
Well almost clean: ignore the CHECKs for space after cast operator and some
longer-than-80 char cases where for readability it's better to keep as-is.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:56 -04:00
Scott Feldman
7889cbee83 switchdev: remove NETIF_F_HW_SWITCH_OFFLOAD feature flag
Roopa said remove the feature flag for this series and she'll work on
bringing it back if needed at a later date.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman
58c2cb16b1 switchdev: convert fib_ipv4_add/del over to switchdev_port_obj_add/del
The IPv4 FIB ops convert nicely to the switchdev objs and we're left with
only four switchdev ops: port get/set and port add/del.  Other objs will
follow, such as FDB.  So go ahead and convert IPv4 FIB over to switchdev
obj for consistency, anticipating more objs to come.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman
85fdb95672 switchdev: cut over to new switchdev_port_bridge_getlink
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman
54ba5a0bbc switchdev: cut over to new switchdev_port_bridge_dellink
Rocker, bonding and team and switch over to the new
switchdev_port_bridge_dellink to avoid duplicating code in each driver.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman
fc8f40d864 switchdev: cut over to new switchdev_port_bridge_setlink
Rocker, bonding, and team can now use the switchdev bridge setlink to parse
raw netlink; no need to duplicate this code in each driver.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:54 -04:00
Scott Feldman
6004c86718 switchdev: add bridge port flags attr
rocker: use switchdev get/set attr for bridge port flags

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:54 -04:00
Scott Feldman
9228ad26ab rocker: use switchdev add/del obj for bridge port vlans
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:54 -04:00
Scott Feldman
3563606258 switchdev: convert STP update to switchdev attr set
STP update is just a settable port attribute, so convert
switchdev_port_stp_update to an attr set.

For DSA, the prepare phase is skipped and STP updates are only done in the
commit phase.  This is because currently the DSA drivers don't need to
allocate any memory for STP updates and the STP update will not fail to HW
(unless something horrible goes wrong on the MDIO bus, in which case the
prepare phase wouldn't have been able to predict anyway).

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Scott Feldman
c4f20321d9 rocker: support prepare-commit transaction model
For rocker, support prepare-commit transaction model for setting attributes
(and for adding objects).  This requires rocker to preallocate memory
needed for the commit up front in the prepare phase.  Since rtnl_lock is
held between prepare-commit, store the allocated memory on a queue hanging
off of the rocker_port.  Also, in prepare phase, do everything right up to
calling into HW.  The same code paths are tranversed in the driver for both
prepare and commit phases.  In some cases, any state modified in the
prepare phase must be reverted before returning so the commit phase makes
the same decisions.

As a consequence of holding rtnl_lock in process context for all attr sets
(and obj adds), all memory is GFP_KERNEL allocated and we don't need to
busy spin waiting for the device to complete the command.  So the bulk of
this patch is simplifying the memory allocations to only use GFP_KERNEL and
to remove the nowait flag and busy spin loop.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Scott Feldman
f8e20a9f87 switchdev: convert parent_id_get to switchdev attr get
Switch ID is just a gettable port attribute.  Convert switchdev op
switchdev_parent_id_get to a switchdev attr.

Note: for sysfs and netlink interfaces, SWITCHDEV_ATTR_PORT_PARENT_ID is
called with SWITCHDEV_F_NO_RECUSE to limit switch ID user-visiblity to only
port netdevs.  So when a port is stacked under bond/bridge, the user can
only query switch id via the switch ports, but not via the upper devices

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Jiri Pirko
9d47c0a2d9 switchdev: s/swdev_/switchdev_/
Turned out that "switchdev" sticks. So just unify all related terms to use
this prefix.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Jiri Pirko
ebb9a03a59 switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/
Turned out that "switchdev" sticks. So just unify all related terms to use
this prefix.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:52 -04:00
David S. Miller
3715544750 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge net into net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-02 22:05:58 -04:00
Simon Horman
629161f649 net: rocker: Use ether_addr_equal
A small cleanup to make use of the ether_addr_equal helper.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01 21:48:30 -04:00
Nicolas Dichtel
46c264daaa bridge/nl: remove wrong use of NLM_F_MULTI
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.

Libraries like libnl will wait forever for NLMSG_DONE.

Fixes: e5a55a8987 ("net: create generic bridge ops")
Fixes: 815cccbf10 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf")
CC: John Fastabend <john.r.fastabend@intel.com>
CC: Sathya Perla <sathya.perla@emulex.com>
CC: Subbu Seetharaman <subbu.seetharaman@emulex.com>
CC: Ajit Khaparde <ajit.khaparde@emulex.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: intel-wired-lan@lists.osuosl.org
CC: Jiri Pirko <jiri@resnulli.us>
CC: Scott Feldman <sfeldma@gmail.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: bridge@lists.linux-foundation.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 14:59:16 -04:00
Wei Yongjun
3122a92e80 rocker: fix error return code in rocker_probe()
Fix to return -EINVAL from the invalid PCI region size error
handling case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-16 12:13:39 -04:00
David S. Miller
9f0d34bc34 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	drivers/net/usb/sr9800.c
	drivers/net/usb/usbnet.c
	include/linux/usb/usbnet.h
	net/ipv4/tcp_ipv4.c
	net/ipv6/tcp_ipv6.c

The TCP conflicts were overlapping changes.  In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.

With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:16:53 -04:00
Simon Horman
a6e95cc718 rocker: handle non-bridge master change
Master change notifications may occur other than when joining or
leaving a bridge, for example when being added to or removed from
a bond or Open vSwitch.

Previously in those cases rocker_port_bridge_leave() was called
which results in a null-pointer dereference as rocker_port->bridge_dev
is NULL because there is no bridge device.

This patch makes provision for doing nothing in such cases.

Fixes: 6c70794500 ("rocker: implement L2 bridge offloading")
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 14:52:42 -04:00
David Ahern
db19170bc0 rocker: add support for phys_port_name
Implement the phys_port_name operation. Port names are pulled from the
rocker hardware model in qemu and default to the qemu name + port id.
e.g.,

sw1p1: flags=4098<BROADCAST,MULTICAST>  mtu 1500
        ether 52:54:00:12:35:01  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

where 'sw1' comes from the qemu command line -device rocker,name=sw1, and
'p1' is port 1.

Patch is adapted from Scott's phys_port_id patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 22:30:35 -04:00
Scott Feldman
04f49faf70 rocker: replace fixed stack allocation with dynamic allocation
In hast to fix some sparse warning, I hard-coded a fix-sized array on the stack
which is probably too big for kernel standards.  Fix this by converting array
to dynamic allocation.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:56:36 -04:00
Scott Feldman
98237d433b switchdev: use new swdev ops
Move swdev wrappers over to new swdev ops (from previous ndo ops).  No
functional changes to the implementation.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>

rocker: move to new swdev ops

Signed-off-by: Scott Feldman <sfeldma@gmail.com>

dsa: move to new swdev ops

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 00:14:43 -04:00
Scott Feldman
f8f2147150 switchdev: add netlink flags to IPv4 FIB add op
Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op
to allow driver to 1) optimize hardware updates, 2) handle ip route prepend
and append commands correctly.

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:56:52 -04:00
Scott Feldman
1b5ef07e3d rocker: sparse: fix dynamic allocation on stack warning
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 22:01:27 -05:00
Scott Feldman
0f43deba6f rocker: quiet sparce endianess warnings
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 22:01:27 -05:00
Scott Feldman
8ea696384a rocker: fix some sparse warnings
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 12:43:54 -05:00
Scott Feldman
c1beeef7a3 rocker: implement IPv4 fib offloading
The driver implements ndo_switch_fib_ipv4_add/del ops to add/del/mod IPv4
routes to/from switchdev device.  Once a route is added to the device, and the
route's nexthops are resolved to neighbor MAC address, the device will forward
matching pkts rather than the kernel.  This offloads the L3 forwarding path
from the kernel to the device.  Note that control and management planes are
still mananged by Linux; only the data plane is offloaded.  Standard routing
control protocols such as OSPF and BGP run on Linux and manage the kernel's FIB
via standard rtm netlink msgs...nothing changes here.

A new hash table is added to rocker to track neighbors.  The driver listens for
neighbor updates events using netevent notifier NETEVENT_NEIGH_UPDATE.  Any ARP
table updates for ports on this device are recorded in this table.  Routes
installed to the device with nexthops that reference neighbors in this table
are "qualified".  In the case of a route with nexthops not resolved in the
table, the kernel is asked to resolve the nexthop.

The driver uses fib_info->fib_priority for the priority field in rocker's
unicast routing table.

The device can only forward to pkts matching route dst to resolved nexthops.
Currently, the device only supports single-path routes (i.e. routes with one
nexthop).  Equal Cost Multipath (ECMP) route support will be added in followup
patches.

This patch is driver support for unicast IPv4 routing only.  Followup patches
will add driver and infrastructure for IPv6 routing and multicast routing.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:24:58 -05:00
David S. Miller
71a83a6db6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 21:16:48 -05:00
Dan Carpenter
5f2ebfbee6 rocker: silence shift wrapping warning
"val" is declared as a u64 so static checkers complain that this shift
can wrap.  I don't have the hardware but probably it's doesn't have over
31 ports.  Still we may as well silence the warning even if it's not a
real bug.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 15:53:44 -05:00