Both Eric and Paolo noticed the rcu_barrier() we use in
tcf_block_put_ext() could be a performance bottleneck when
we have a lot of tc classes.
Paolo provided the following to demonstrate the issue:
tc qdisc add dev lo root htb
for I in `seq 1 1000`; do
tc class add dev lo parent 1: classid 1:$I htb rate 100kbit
tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb
for J in `seq 1 10`; do
tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J
done
done
time tc qdisc del dev root
real 0m54.764s
user 0m0.023s
sys 0m0.000s
The rcu_barrier() there is to ensure we free the block after all chains
are gone, that is, to queue tcf_block_put_final() at the tail of workqueue.
We can achieve this ordering requirement by refcnt'ing tcf block instead,
that is, the tcf block is freed only when the last chain in this block is
gone. This also simplifies the code.
Paolo reported after this patch we get:
real 0m0.017s
user 0m0.000s
sys 0m0.017s
Tested-by: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Schmidt says:
====================
pull-request: ieee802154-next 2017-12-04
Some update from ieee802154 to *net-next*
Jian-Hong Pan updated our docs to match the APIs in code.
Michael Hennerichs enhanced the adf7242 driver to work with adf7241
devices and reworked the IRQ and packet handling in the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Functions nsim_bpf_create_prog and nsim_bpf_destroy_prog are local to the
source and do not need to be in global scope, so make them static.
Cleans up sparse warnings:
symbol 'nsim_bpf_create_prog' was not declared. Should it be static?
symbol 'nsim_bpf_destroy_prog' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven says:
====================
Teach phylib hard-resetting devices
This patch series adds optional PHY reset support to phylib.
The first two patches are destined for David's net-next tree. They add
core PHY reset code, and update a driver that currently uses its own
reset code.
The last two patches are destined for Simon's renesas tree. They add
properties to describe the EthernetAVB PHY reset topology to the common
Salvator-X/XS and ULCB DTS files, which solves two issues:
1. On Salvator-XS, the enable pin of the regulator providing PHY power
is connected to PRESETn, and PSCI powers down the SoC during system
suspend. Hence a PHY reset is needed to restore network
functionality after system resume.
2. Linux should not rely on the boot loader having reset the PHY, but
should reset the PHY during driver probe.
Changes compared to v3:
- Remove Florian's Acked-by,
- Add missing #include <linux/gpio/consumer.h>,
- Re-add the gpiod check, as the dummy gpiod_set_value() for !GPIOLIB
does not ignore NULL, and calls WARN_ON(1),
- Do not reassert the reset signal if {mdio,phy}_probe() or
phy_device_register() succeeded, as that may destroy initial setup,
- Do not deassert the reset signal in {mdio,phy}_remove(), as it
should already be deasserted,
- Bring the PHY back into reset state in phy_device_remove(),
- Move/consolidate GPIO descriptor acquiring code from
of_mdiobus_register_phy() and of_mdiobus_register_device() to
mdiobus_register_device().
Note that this changes behavior slightly, in that the reset signal
is now also asserted when called from of_mdiobus_register_device().
- Add Reviewed-by,
Changes compared to v2, as sent by Sergei Shtylyov:
- Fix fwnode_get_named_gpiod() call due to added parameters (which
allowed to eliminate the gpiod_direction_output() call),
- Rebased, refreshed, reworded,
- Take over from Sergei,
- Add Acked-by,
- Remove unneeded gpiod check, as gpiod_set_value() handles NULL fine,
- Handle fwnode_get_named_gpiod() errors correctly:
- -ENOENT is ignored (the GPIO is optional), and turned into NULL,
which allowed to remove all later !IS_ERR() checks,
- Other errors (incl. -EPROBE_DEFER) are propagated,
- Extract DTS patches from series "[PATCH 0/4] ravb: Add PHY reset
support" (https://www.spinics.net/lists/netdev/msg457308.html), and
incorporate in this series, after moving reset-gpios from the
ethernet to the ethernet-phy node.
Given (1) the new reset-gpios DT property in the PHY node follows
established practises, (2) the DT binding change in the first patch has
been acked by Rob, and (3) the DTS patch does not cause any regressions
if it is applied before the PHY driver patches, the DTS patches can be
applied independently.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With the phylib now being aware of the "reset-gpios" PHY node property,
there should be no need to frob the PHY reset in this driver anymore...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHY devices sometimes do have their reset signal (maybe even power
supply?) tied to some GPIO and sometimes it also does happen that a boot
loader does not leave it deasserted. So far this issue has been attacked
from (as I believe) a wrong angle: by teaching the MAC driver to manipulate
the GPIO in question; that solution, when applied to the device trees, led
to adding the PHY reset GPIO properties to the MAC device node, with one
exception: Cadence MACB driver which could handle the "reset-gpios" prop
in a PHY device subnode. I believe that the correct approach is to teach
the 'phylib' to get the MDIO device reset GPIO from the device tree node
corresponding to this device -- which this patch is doing...
Note that I had to modify the AT803x PHY driver as it would stop working
otherwise -- it made use of the reset GPIO for its own purposes...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Rob Herring <robh@kernel.org>
[geert: Propagate actual errors from fwnode_get_named_gpiod()]
[geert: Avoid destroying initial setup]
[geert: Consolidate GPIO descriptor acquiring code]
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move dissection of tunnel info to outside of the main flow dissection
function, __skb_flow_dissect(). The sole user of this feature, the flower
classifier, is updated to call tunnel info dissection directly, using
skb_flow_dissect_tunnel_info().
This results in a slightly less complex implementation of
__skb_flow_dissect(), in particular removing logic from that call path
which is not used by the majority of users. The expense of this is borne by
the flower classifier which now has to make an extra call for tunnel info
dissection.
This patch should not result in any behavioural change.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces an eBPF based queue selection method. With this,
the policy could be offloaded to userspace completely through a new
ioctl TUNSETSTEERINGEBPF.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta says:
====================
net: hns3: Refactors "reset" handling code in HCLGE layer of HNS3 driver
This patch refactors the code of the reset feature in HCLGE layer
of HNS3 PF driver. Prime motivation to do this change is:
1. To reduce the time for which common miscellaneous Vector 0
interrupt is disabled because of the reset. Simplification
of the common miscellaneous interrupt handler routine(for
Vector 0) used to handle reset and other sources of Vector
0 interrupt.
2. Separate the task for handling the reset
3. Simplification of reset request submission and pending reset
logic.
To achieve above below few things have been done:
1. Interrupt is disabled while common miscellaneous interrupt
handler is entered and re-enabled before it is exit. This
reduces the interrupt handling latency as compared to older
interrupt handling scheme where interrupt was being disabled
in interrupt handler context and re-enabled in task context
some time later. Made Miscellaneous interrupt handler more
generic to handle all sources including reset interrupt source.
2. New reset service task has been introduced to service the
reset handling.
3. Introduces new reset service task for honoring software reset
requests like from network stack related to timeout and serving
the pending reset request(to reset the driver and associated
clients).
Change Log:
Patch V2: Addressed comment by Andrew Lunn
Link: https://lkml.org/lkml/2017/12/1/366
Patch V1: Initial Submit
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In exisiting code, the way to detect if driver/client reset should
be executed or if hardware should be be soft resetted was overly
complex.
Existing code use to read the interrupt status register from task
context to figure out if the interrupt source event was reset and
then use clear the interrupt source for reset while waiting for the
hardware to finish the reset. This behaviour again was confusing
and overly complex in terms of the flow.
This patch simplifies the handling of the requested reset and the
pending reset(i.e. reset which have already been asserted by the
software and hardware has acknowledged back to driver that it is
processing the hardware reset through interrupt)
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Existing common service task was being used to service the reset
requests. This patch tries to make the handling of reset cleaner
by separating task to handle the reset requests. This might in
turn help in adapting similar handling approach for other
interrupt events like mailbox, sharing vector 0 interrupt.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reset interrupt event shares common miscellaneous interrupt
Vector 0. In the existing reset interrupt handling we disable
the Vector 0 interrupt in misc interrupt handler and re-enable
them later in context to common service task.
This also means other event sources like mailbox would also be
deferred or if the interrupt event was due to mailbox(which shall
be supported for VF soon), it could delay the reset handling.
This patch reorganizes the reset interrupt handling logic and
makes it more fair to other events.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function gem_add_flow_filter called on line 2958 inside lock on line 2949
but uses GFP_KERNEL
Generated by: scripts/coccinelle/locks/call_kern.cocci
Fixes: ae8223de3d ("net: macb: Added support for RX filtering")
CC: Rafal Ozieblo <rafalo@cadence.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King says:
====================
SFP/phylink updates
This series, which follows on from the fixes posted earlier, improves
the phylink/sfp support. Changes included here are:
- Merge 802.3z and SGMII modes into one "in-band" mode, using the
PHY_INTERFACE_MODE_xxx definition to determine which should be used.
This allows more flexibility as more interface modes become
available.
- Allow 2500base-X and 10GBASE-KR to be requested from SFP.
- Remove unused and unnecessary phylink_init_eee()
- Restart 802.3z autonegotiation when starting the network device to
ensure that the negotiated parameters are always correct. It has
been observed on mvneta that this is not always the case without
this change.
- Add kerneldoc documentation for phylink and sfp upstream facing APIs
and link it in to the networking documentation.
- Resolve a sparse warning in sfp-bus.c
- Convert phylink/sfp to use fwnode rather than DT so that other firmware
systems can take advantage of this - I have received a request for it
to be usable with ACPI. The exception to this is our interactions with
phylib, as phylib itself does not yet support fwnode.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert phylink to fwnode, switching phylink_create() from taking a
device_node to taking a fwnode_handle. This will allow other firmware
systems to take advantage of sfp/phylink support.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert sfp-bus to use fwnode rather than device_node internally, so
we can support more than just device tree firmware.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/phy/sfp-bus.c:298:13: warning: context imbalance in 'sfp_bus_release' - wrong count at exit
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add kernel-doc documentation for sfp kernel APIs, and link it into the
networking kapi documentation under "Network device support".
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add kernel-doc documentation for phylink kernel APIs, and link it into
the networking kapi documentation under "Network device support".
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Restart 802.3z negotiation when the net device is brought up to ensure
that the link partner has our current link modes.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
phylink_init_eee() serves no purpose, remove it.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for handling the faster 2.5G and 10G link modes when used
with SFP modules.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the handling of SGMII and 802.3z is now the same, combine the
MLO_AN_xxx constants into a single MLO_AN_INBAND, and use the PHY
interface mode to distinguish between Cisco SGMII and 802.3z.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code handling SGMII and 802.3z is essentially the same, except that
we assume 802.3z has no PHY. Re-organise the code such that these cases
are merged, and exclude 802.3z mode from having a PHY attached. This
results in the same link handling behaviour as before.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add and use phy_interface_mode_is_8023z() helper to identify the
interface modes that use 802.3z negotiation. Use it in phylink's
phylink_mac_an_restart().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Small overlapping change conflict ('net' changed a line,
'net-next' added a line right afterwards) in flexcan.c
Signed-off-by: David S. Miller <davem@davemloft.net>
A couple of bugfixes that just became ready.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJaIW15AAoJECgfDbjSjVRprxIH/RzUfSAfgaxhIVm2j3CRarGS
G0xJvKxuXpiDnEtfu5st9QRqNo8CMuPV02kIaOb0Aqq2GfdEb0sCFMTLbfnYuXul
uECu3uquXMUbuRXffQCHnDN4qqDWKRim8STIDPUlxjnBQqkos+JFn+MQ3vF2KEI0
SeA8u1YQs5Ji10ZPjZ7FEAirbxDFiLPb0Hb0tphBGSZPXBRV3qoA4qeKNOy45EkQ
F4J/1YxX/nWca0HYjBl2NceIqc4U/HcNZpner5YcJLad6SQSU9mSfEnJJIhOrRmu
/Ivn6Q1822/YVdePDcffLIrvFvM4HprT4BUYv0Cd/cTZZRFMCa16OXmDcPrWcLo=
=Nvzh
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"virtio and qemu bugfixes
A couple of bugfixes that just became ready"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_balloon: fix increment of vb->num_pfns in fill_balloon()
virtio: release virtio index when fail to device_register
fw_cfg: fix driver remove
Pull networking fixes from David Miller:
1) Various TCP control block fixes, including one that crashes with
SELinux, from David Ahern and Eric Dumazet.
2) Fix ACK generation in rxrpc, from David Howells.
3) ipvlan doesn't set the mark properly in the ipv4 route lookup key,
from Gao Feng.
4) SIT configuration doesn't take on the frag_off ipv4 field
configuration properly, fix from Hangbin Liu.
5) TSO can fail after device down/up on stmmac, fix from Lars Persson.
6) Various bpftool fixes (mostly in JSON handling) from Quentin Monnet.
7) Various SKB leak fixes in vhost/tun/tap (mostly observed as
performance problems). From Wei Xu.
8) mvpps's TX descriptors were not zero initialized, from Yan Markman.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
rxrpc: Fix the MAINTAINERS record
rxrpc: Use correct netns source in rxrpc_release_sock()
liquidio: fix incorrect indentation of assignment statement
stmmac: reset last TSO segment size after device open
ipvlan: Add the skb->mark as flow4's member to lookup route
s390/qeth: build max size GSO skbs on L2 devices
s390/qeth: fix GSO throughput regression
s390/qeth: fix thinko in IPv4 multicast address tracking
tap: free skb if flags error
tun: free skb in early errors
vhost: fix skb leak in handle_rx()
bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()
bnxt_en: fix dst/src fid for vxlan encap/decap actions
bnxt_en: wildcard smac while creating tunnel decap filter
bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown
phylink: ensure we take the link down when phylink_stop() is called
sfp: warn about modules requiring address change sequence
sfp: improve RX_LOS handling
...
The chip family of TILEPro and TILE-Gx was developed by Tilera, which
was eventually acquired by Mellanox. The tile architecture was added to
the kernel in 2010 and first appeared in 2.6.36.
Now at Mellanox we are developing new chips based on the ARM64
architecture; our last TILE-Gx chip (the Gx72) was released in 2013, and
our customers using tile architecture products are not, as far as we
know, looking to upgrade to newer kernel releases. In the absence of
someone in the community stepping up to take over maintainership, this
commit marks the architecture as orphaned.
Cc: Chris Metcalf <metcalf@alum.mit.edu>
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
convert remaining users of rtnl_register to rtnl_register_module
and un-export rtnl_register.
Requested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlohXEATHG1rbEBwZW5n
dXRyb25peC5kZQAKCRAe/sog7Dgc9gWMCACMHrISR36Ht6NGRIQC0asPY3Z3aSSL
W2D0z5QysnLfd4RPen0rwvkujezMIBbDjjcbo0hI+VNe1q1SNQSGJmgL4Mm+drLw
6uqfvv8KIrhH9AEcz9AP8nKWeDFgYrjjRoYMfr0uc0Zeq3boTRWaO6bGw17rMZ1X
WWaOYA2sFUsNaiZPkiNEADg1MQGH85kfCEaS2SnQLVQJNmm0pidsC3BbcepTQ5Ys
Wx3P1jkHwkol33lHsgumSzDYDT00kNQA038k9VAVKo5lx8WH3rUWg0lb133U6IY6
rN0jmTE67MAsKspcveoIRdxAo/PDOFbZBoiHJA03j3JkdA1NIpAgesi7
=2Ddp
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.16-20171201' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2017-12-01
this is a pull request of 10 patches for net-next/master.
The first two patches are by Arnd Bergmann, they convert the peak_usb
from using "struct timeval" to "ktime_t". The error handling in the
vxcan driver is clean up by Markus Elfring's patch. Bhumika Goyal
contributes a patch for the c_can_pci driver to make the pci data const.
The six patches by Pankaj Bansal for the flexcan driver add LS1021A
support by making the endianness of the driver configurable by the
device tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-03
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Addition of a software model for BPF offloads in order to ease
testing code changes in that area and make semantics more clear.
This is implemented in a new driver called netdevsim, which can
later also be extended for other offloads. SR-IOV support is added
as well to netdevsim. BPF kernel selftests for offloading are
added so we can track basic functionality as well as exercising
all corner cases around BPF offloading, from Jakub.
2) Today drivers have to drop the reference on BPF progs they hold
due to XDP on device teardown themselves. Change this in order
to make XDP handling inside the drivers less error prone, and
move disabling XDP to the core instead, also from Jakub.
3) Misc set of BPF verifier improvements and cleanups as preparatory
work for upcoming BPF-to-BPF calls. Among others, this set also
improves liveness marking such that pruning can be slightly more
effective. Register and stack liveness information is now included
in the verifier log as well, from Alexei.
4) nfp JIT improvements in order to identify load/store sequences in
the BPF prog e.g. coming from memcpy lowering and optimizing them
through the NPU's command push pull (CPP) instruction, from Jiong.
5) Cleanups to test_cgrp2_attach2.c BPF sample code in oder to remove
bpf_prog_attach() magic values and replacing them with actual proper
attach flag instead, from David.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal says:
====================
rtnetlink: rework handler (un)registering
Peter Zijlstra reported (referring to commit 019a316992,
"rtnetlink: add reference counting to prevent module unload while dump is in progress"):
1) it not in fact a refcount, so using refcount_t is silly
2) there is a distinct lack of memory barriers, so we can easily
observe the decrement while the msg_handler is still in progress.
3) waiting with a schedule()/yield() loop is complete crap and subject
life-locks, imagine doing that rtnl_unregister_all() from a RT task.
In ancient times rtnetlink exposed a statically-sized table with
preset doit/dumpit handlers to be called for a protocol/type pair.
Later the rtnl_register interface was added and the table was allocated
on demand. Eventually these were also used by modules.
Problem is that nothing prevents module unload while a netlink dump
is in progress. netlink dumps can be span multiple recv calls and
netlink core saves the to-be-repeated dumper address for later invocation.
To prevent rmmod the netlink core expects callers to pass in the owning
module so a reference can be taken.
So far rtnetlink wasn't doing this, add new interface to pass THIS_MODULE.
Moreover, when converting parts of the rtnetlink handling to rcu this code
gained way too many READ_ONCE spots, remove them and the extra refcounting.
Take a module reference when running dumpit and doit callbacks
and never alter content of rtnl_link structures after they have been
published via rcu_assign_pointer.
Based partially on earlier patch from Peter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes __rtnl_register and switches callers to either
rtnl_register or rtnl_register_module.
Also, rtnl_register() will now print an error if memory allocation
failed rather than panic the kernel.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
all of these can be compiled as a module, so use new
_module version to make sure module can no longer be removed
while callback/dump is in use.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add yet another rtnl_register function. It will be used by modules
that can be removed.
The passed module struct is used to prevent module unload while
a netlink dump is in progress or when a DOIT_UNLOCKED doit callback
is called.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnetlink is littered with READ_ONCE() because we can have read accesses
while another cpu can write to the structure we're reading by
(un)registering doit or dumpit handlers.
This patch changes this so that (un)registering cpu allocates a new
structure and then publishes it via rcu_assign_pointer, i.e. once
another cpu can see such pointer no modifications will occur anymore.
based on initial patch from Peter Zijlstra.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous patch mistakenly removed three chip-specific config settings.
Add them again.
Fixes: 80274abafc "net: phy: remove generic settings for callbacks config_aneg and read_status from drivers"
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu says:
====================
add ip6 gre and gretap collect_md mode
Similar to gre, vxlan, geneve, ipip tunnels, allow ip6gretap tunnels to
operate in collect metadata mode. The first patch adds the support to
ip6_gre.c. The second patch enables unsetting the csum for ipv6 tunnel,
when using bpf_skb_[gs]et_tunnel_key() helpers. Finally, the last patch
adds the ip6 gre and gretap tunnel test cases to BPF sample code.
The corresponding iproute2 patch:
https://marc.info/?l=linux-netdev&m=151216943128087&w=2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend existing tests for vxlan, gre, geneve, ipip, erspan,
to include ip6 gre and gretap tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before the patch, BPF_F_ZERO_CSUM_TX can be used only for ipv4 tunnel.
With introduction of ip6gretap collect_md mode, the flag should be also
supported for ipv6.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to gre, vxlan, geneve, ipip tunnels, allow ip6 gre and gretap
tunnels to operate in collect metadata mode. bpf_skb_[gs]et_tunnel_key()
helpers can make use of it right away. OVS can use it as well in the
future.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If state is not PHY_HALTED I see no need to temporarily disable
interrupts on the device. As long as the current interrupt isn't acked
on the device no new interrupt can happen anyway.
In addition remove a unneeded enabling of interrupts in the state
machine when handling state PHY_CHANGELINK.
Tested on a Odroid-C2 with RTL8211F phy in interrupt mode.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commits c974bdbc3e "net: phy: Use threaded IRQ, to allow IRQ from
sleeping devices" and 664fcf123a "net: phy: Threaded interrupts allow
some simplification" all relevant code pieces run in process context
anyway and I don't think we need the disabling of interrupts any longer.
Interestingly enough, latter commit already removed the comment
explaining why interrupts need to be temporarily disabled.
On my system phy interrupt mode works fine with this patch.
However I may miss something, especially in the context of shared phy
interrupts, therefore I'd appreciate if more people could test this.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-02
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a compilation warning in xdp redirect tracepoint due to
missing bpf.h include that pulls in struct bpf_map, from Xie.
2) Limit the maximum number of attachable BPF progs for a given
perf event as long as uabi is not frozen yet. The hard upper
limit is now 64 and therefore the same as with BPF multi-prog
for cgroups. Also add related error checking for the sample
BPF loader when enabling and attaching to the perf event, from
Yonghong.
3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
case, so that the test case can always pass and not fail in
some environments due to too low default limit, also from
Yonghong.
4) Fix up a missing license header comment for kernel/bpf/offload.c,
from Jakub.
5) Several fixes for bpftool, among others a crash on incorrect
arguments when json output is used, error message handling
fixes on unknown options and proper destruction of json writer
for some exit cases, all from Quentin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
James Morris reported kernel stack corruption bug that
we tracked back to commit 971f10eca1 ("tcp: better TCP_SKB_CB
layout to reduce cache line misses")
First patch needs to be backported to kernels >= 3.18,
while second patch needs to be backported to kernels >= 4.9, since
this was the time when inet_exact_dif_match appeared.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
socket lookups happen while skb->cb[] has not been mangled yet by TCP.
Fixes: a04a480d43 ("net: Require exact match for TCP socket lookups if dif is l3mdev")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>