Ido Schimmel says:
====================
mlxsw: Reduce dependency between bridge and router code
This patch set reduces the dependency between the bridge and the router
code in preparation for RTNL removal from the route insertion path in
mlxsw.
The motivation and solution are explained in detail in patch #3. The
main idea is that we need to stop special-casing the VXLAN devices with
regards to the reference counting of the FIDs. Otherwise, we can bump
into the situation described in patch #3, where the routing code calls
into the bridge code which calls back into the routing code. After
adding a mutex to protect router data structures to remove RTNL
dependency, this can result in an AA deadlock.
Patches #1 and #2 are preparations. They convert the FIDs to use
'refcount_t' for reference counting in order to catch over/under flows
and add extack to the bridge creation function.
Patches #3-#5 reduce the dependency between the bridge and the router
code. First, by having the VXLAN device take a reference on the FID in
patch #3 and then by removing unnecessary code following the change in
patch #3.
Patches #6-#10 adjust existing selftests and add new ones to exercise
the new code paths.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that when two VXLAN tunnels with conflicting configurations (i.e.,
different TTL) are enslaved to the same VLAN-aware bridge, then the
enslavement of a port to the bridge is denied.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After recent changes, the VXLAN tunnel will be offloaded regardless if
any local ports are member in the FID or not. Adjust the test to make
sure the tunnel is offloaded in this case.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver supports a single VLAN-aware bridge. Test that the
enslavement of a port to the second VLAN-aware bridge fails with an
extack.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that creation of a bridge (both VLAN-aware and VLAN-unaware) fails
with an extack when a VXLAN device with an unsupported configuration is
already enslaved to it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The addition of a VLAN on a bridge slave prompts the driver to have the
local port in question join the FID corresponding to this VLAN.
Before recent changes, the operation of joining the FID would also mean
that the driver would enable VXLAN tunneling if a VXLAN device was also
member in the VLAN. In case the configuration of the VXLAN tunnel was
not supported, an extack error would be returned.
Since the operation of joining the FID no longer means that VXLAN
tunneling is potentially enabled, the test is no longer relevant. Remove
it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f40be47a3e ("mlxsw: spectrum_router: Do not force specific
configuration order") added a call from the routing code to the bridge
code in order to handle the case where VNI should be set on a FID
following the joining of the router port to the FID.
This is no longer required, as previous patches made VXLAN devices
explicitly take a reference on the FID and set VNI on it.
Therefore, remove the unnecessary call and simply have the RIF take a
reference on the FID without checking if VNI should also be set on it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As explained in previous patch, VXLAN devices now take a reference on
the FID and not only local ports. Therefore, there is no need for local
ports to check if they need to set a VNI on the FID when they join the
FID.
Remove these unnecessary checks.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now only local ports and the router port (which is also a local
port) took a reference on the corresponding FID (Filtering Identifier)
when joining a bridge. For example:
192.0.2.1/24
br0
|
+------+------+
| |
swp1 vxlan0
In this case the reference count of the FID will be '2'. Since the VXLAN
device does not take a reference on the FID, whenever a local port joins
the bridge it needs to check if a VXLAN device is already enslaved. If
the VXLAN device should be mapped to the FID in question, then the VXLAN
device's VNI is set on the FID.
Beside the fact that this scheme special-cases the VXLAN device, it also
creates an unnecessary dependency between the routing and bridge code:
1. [R] IP address is added on 'br0', which prompts the creation of a RIF
and a backing FID
2. [B] VNI is enabled on backing FID
3. [R] Host route corresponding to VXLAN device's source address is
promoted to perform NVE decapsulation
[R] - Routing code
[B] - Bridge code
This back and forth dependency will become problematic when a lock is
added in the routing code instead of relying on RTNL, as it will result
in an AA deadlock.
Instead, have the VXLAN device take a reference on the FID just like all
the other netdev members of the bridge. In order to correctly handle the
case where VXLAN devices are already enslaved to the bridge when it is
offloaded, walk the bridge's slaves and replay the configuration.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propagate extack to bridge creation function so that error messages
could be passed to user space via netlink instead of printing them to
kernel log.
A subsequent patch will pass the new extack argument to more functions.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'refcount_t' is very useful for catching over/under flows. Convert the
FID (Filtering Identifier) objects to use it instead of 'unsigned int'
for their reference count.
A subsequent patch in the series will change the way VXLAN devices hold
/ release the FID reference, which is why the conversion is made now.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This enables ndo_dflt_bridge_getlink() to report a bridge port's
offload settings for multicast and broadcast flooding.
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree says:
====================
couple more ARFS tidy-ups
Tie up some loose ends from the recent ARFS work.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_filter_rfs_expire() is a work-function, so it being inline makes no
sense. It's only ever used in efx_channels.c, so move it there.
While we're at it, clean out some related unused cruft.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevent excessive CPU time spent running a workitem with nothing to do.
We avoid any races by keeping the same check in efx_filter_rfs_expire().
Suggested-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a real dev unregisters, vlan_device_event() also unregisters all
of its vlan interfaces. For each VID this ends up in __vlan_vid_del(),
which attempts to remove the VID from the real dev's VLAN filter.
But the unregistering real dev might no longer be able to issue the
required IOs, and return an error. Subsequently we raise a noisy warning
msg that is not appropriate for this situation: the real dev is being
torn down anyway, there shouldn't be any worry about cleanly releasing
all of its HW-internal resources.
So to avoid scaring innocent users, suppress this warning when the
failed deletion happens on an unregistering device.
While at it also convert the raw pr_warn() to a more fitting
netdev_warn().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since PCI core provides a generic PCI_DEVICE_DATA() macro,
replace STMMAC_DEVICE() with former one.
No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov says:
====================
Remove rtnl lock dependency from flow_action infra
Currently, TC flow_action infrastructure code obtain rtnl lock before
accessing action state in tc_setup_flow_action() function and releases
it afterwards. This behavior is not supposed to impact TC filter
insertion rate because filling flow_action representation is only a
small part of creating new filter and expensive operations (hardware
offload callbacks, classifiers, cls API code that creates chains and
classifiers instances) already support unlocked execution. However,
typical vswitch implementation might need to also dump TC filters
concurrently, for example to age out unused flows or update flow
counters. TC dump is fully serialized and holds rtnl lock during its
whole execution in kernel space. As such, it can significantly impact
concurrent tasks that try to intermittently obtain rtnl lock when
filling intermediate representation for new filter offload (performance
evaluation at the end of this mail).
Refactor flow_action cls API infrastructure and its dependencies to not
rely on rtnl lock for synchronization. Patch set overview:
- Refactor tc_setup_flow_action() to obtain action tcf_lock when
accessing action state. Fix its dependencies to not obtain tcf_lock
themselves and assume that caller already holds it (needs to be done
in same patch to prevent deadlock) and not to call sleeping functions
(needs to be done in same patch to prevent "sleeping while atomic"
dmesg warnings).
- Refactor action helper functions to require tcf_lock instead of rtnl.
Internally, all of the actions already use tcf_lock for
synchronization to accommodate unlocked classifier API, so this change
relies on already existing functionality.
- Remove rtnl lock and "rtnl_held" argument from tc_setup_flow_action()
function.
To test the change, multiple concurrent TC instances are invoked with
following command:
time ls add* | xargs -n 1 -P 100 sudo tc -b
Ten batch files with following typical rules (100k each) are used:
filter add dev ens1f0_0 protocol ip ingress prio 1 handle 1 flower
src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0 src_ip 192.168.111.1
dst_ip 192.168.111.2 ip_proto udp dst_port 1 src_port 1 action
tunnel_key set id 1 src_ip 2.2.2.2 dst_ip 2.2.2.3 dst_port 4789
no_percpu action mirred egress redirect dev vxlan1 no_percpu
TC dump of same device is called in infinite loop from five concurrent
instances:
while true do tc -s filter show dev $NIC ingress >/dev/null done
Results obtained on current net-next commit 9f68e3655a ("Merge tag
'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm"):
| net-next | this change
---------------+----------+-------------
TC add | 6.3s | 6.3s
TC add + dump | 29.3s | 6.8s
Test results confirm significant impact of concurrent TC dump. The
impact is almost fully mitigated by proposed change (differences can be
attributed to contention for chain and tp locks between add and dump TC
instances).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor tc_setup_flow_action() function not to use rtnl lock and remove
'rtnl_held' argument that is no longer needed.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to remove rtnl lock dependency from flow_action representation
translator, change rtnl_dereference() to rcu_dereference_protected() in ct
action helpers that provide external access to zone and action values. This
is safe to do because the functions are not called from anywhere else
outside flow_action infrastructure which was modified to obtain tcf_lock
when accessing action data in one of previous patches in the series.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to remove rtnl lock dependency from flow_action representation
translator, change rcu_dereference_bh_rtnl() to rcu_dereference_protected()
in police action helpers that provide external access to rate and burst
values. This is safe to do because the functions are not called from
anywhere else outside flow_action infrastructure which was modified to
obtain tcf_lock when accessing action data in one of previous patches in
the series.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to remove dependency on rtnl lock, take action's tcfa_lock when
constructing its representation as flow_action_entry structure.
Refactor tcf_sample_get_group() to assume that caller holds tcf_lock and
don't take it manually. This callback is only called from flow_action infra
representation translator which now calls it with tcf_lock held, so this
refactoring is necessary to prevent deadlock.
Allocate memory with GFP_ATOMIC flag for ip_tunnel_info copy because
tcf_tunnel_info_copy() is only called from flow_action representation infra
code with tcf_lock spinlock taken.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each extracted frame on Ocelot has an IFH. The frame and IFH are extracted
by reading chuncks of 4 bytes from a register.
In case the IFH and frames were read corretly it would try to read the next
frame. In case there are no more frames in the queue, it checks if there
were any previous errors and in that case clear the queue. But this check
will always succeed also when there are no errors. Because when extracting
the IFH the error is checked against 4(number of bytes read) and then the
error is set only if the extraction of the frame failed. So in a happy case
where there are no errors the err variable is still 4. So it could be
a case where after the check that there are no more frames in the queue, a
frame will arrive in the queue but because the error is not reseted, it
would try to flush the queue. So the frame will be lost.
The fix consist in resetting the error after reading the IFH.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the correct mask for this two-bit field. This fixes setting the DAI
data format to RIGHT_J or DSP_A.
Fixes: 36c684936f ("ASoC: Add sun8i digital audio codec")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20200217064250.15516-7-samuel@sholland.org
Signed-off-by: Mark Brown <broonie@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl5K0xIACgkQxWXV+ddt
WDvihhAAjTdEr8Gbt4Beu+F9I2j+1+x/qdVADsrJErBJ5s1J0prXI27Vx5auHMMD
qJViTGTTjwlIo4jFI+AtlrzcE07eIMN1zzjVb2o8efrpkFDTfgT+dAkSVR8zAX20
nycsQ93QebCrlRGpPuXsniczy+QTRXWBY+vPXe1UbfGrzb3q+1pBJgfdq/7pkt/V
0RgB78JDVWUhw+YBjLuV0Y7tjmWlANGvofT34A34CR23i/JoIjVx7GfeQyILBmHE
iuCwzMgAdIKlRpjG5WJeK60bW0Dkjmbr3HDyJexiXSa4u2UwgUlT8UsITbVCr48t
OrfYxb7aWmwEqa5fbZzIiyTCWDKSxiQ5X0XW8e6ehdkh+s9DWmNp48xt9OogpYcv
8qni+vMgjrOLZwj//DgHpr2CYrISxJ+vhCvaOLMK++8TMIH/EQZEyaYkKn5Osg/8
iSwI9b2IF8LmnnvWhkbs1p4Jn2Ca4fMhlN2D5xflwSK/lbhgJk5E6P+4h6pQ9MvG
U5bWR2mqW1xT+lCO7AD+MBq+VbF/BZik0TjuNssoYJJtx/VbTbpydrHW65SvZvQK
QT7MTHekkbtIR+SGYl66n1Sc4RG7znWNwjfoyVEbSblLIczfzDDsl1O/9ldqbia4
BoPWghEO8Wy543UTH9DyX+oMzB2Pwb/TMa4eQyNhbQMqcwXHv5Q=
=nlEW
-----END PGP SIGNATURE-----
Merge tag 'for-5.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"This is the fix for sleeping in a locked section bug reported by Dave
Jones, caused by a patch dependence in development and pulled
branches.
I picked the existing patch over the fixup that Filipe sent, as it's a
bit more generic fix. I've verified it with a specific test case, some
rsync stress and one round of fstests"
* tag 'for-5.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't set path->leave_spinning for truncate
Virtual engines are fleeting. They carry a reference count and may be freed
when their last request is retired. This makes them unsuitable for the
task of housing engine->retire.work so assert that it is not used.
Tvrtko tracked down an instance where we did indeed violate this rule.
In virtual_submit_request, we flush a completed request directly with
__i915_request_submit and this causes us to queue that request on the
veng's breadcrumb list and signal it. Leading us down a path where we
should not attach the retire.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: dc93c9b693 ("drm/i915/gt: Schedule request retirement when signaler idles")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-1-chris@chris-wilson.co.uk
(cherry picked from commit f91d8156ab)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We lack full state readout of DSC config, which may lead to DSC enable
using a config that's all zeros, failing spectacularly. Force full
modeset and thus compute config at probe to get a sane state, until we
implement DSC state readout. Any fastset that did appear to work with
DSC at probe, worked by coincidence. [1] is an example of a change that
triggered the issue on TGL DSI DSC.
[1] http://patchwork.freedesktop.org/patch/msgid/20200212150102.7600-1-ville.syrjala@linux.intel.com
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Vandita Kulkarni <vandita.kulkarni@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: fbacb15ea8 ("drm/i915/dsc: add basic hardware state readout support")
Acked-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200213140412.32697-3-stanislav.lisovskiy@intel.com
(cherry picked from commit a4277aa398)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Voltage level depends not only on the cdclk, but also on the DDI clock.
Last time the bspec voltage level table for EHL was updated, we only
updated the cdclk requirements, but forgot to account for the new port
clock criteria.
Bspec: 21809
Fixes: d147483884 ("drm/i915/ehl: Update voltage level checks")
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200207001417.1229251-1-matthew.d.roper@intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 9d5fd37ed7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
chip->allocated_banks, an array of tpm_bank_info structures, contains the
list of TPM algorithm IDs of allocated PCR banks. It also contains the
corresponding ID of the crypto subsystem, so that users of the TPM driver
can calculate a digest for a PCR extend operation.
However, if there is no mapping between TPM algorithm ID and crypto ID, the
crypto_id field of tpm_bank_info remains set to zero (the array is
allocated and initialized with kcalloc() in tpm2_get_pcr_allocation()).
Zero should not be used as value for unknown mappings, as it is a valid
crypto ID (HASH_ALGO_MD4).
Thus, initialize crypto_id to HASH_ALGO__LAST.
Cc: stable@vger.kernel.org # 5.1.x
Fixes: 879b589210 ("tpm: retrieve digest size of unknown algorithms with PCR read")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Revert tpm_tis_spi_mod.ko back to tpm_tis_spi.ko as the rename could
break user space scripts. This can be achieved by renaming tpm_tis_spi.c
as tpm_tis_spi_main.c. Then tpm_tis_spi-y can be used inside the
makefile.
Cc: Andrey Pronin <apronin@chromium.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: stable@vger.kernel.org # 5.5.x
Fixes: 797c0113c9 ("tpm: tpm_tis_spi: Support cr50 devices")
Reported-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Tested-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Inside the intel_timeline_get_seqno(), we currently track the retirement
of the old cachelines by listening to the new request. This requires
that the new request is ready to be used and so requires a minimum bit
of initialisation prior to getting the new seqno.
Fixes: b1e3177bd1 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-2-chris@chris-wilson.co.uk
(cherry picked from commit 855e39e65c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
To enable non-persistent contexts, we require a means of cancelling any
inflight work from that context. This is first done "gracefully" by
using preemption to kick the active context off the engine, and then
forcefully by resetting the engine if it is active. If we are unable to
reset the engine to remove hostile userspace, we should not allow
userspace to opt into using non-persistent contexts.
If the per-engine reset fails, we still do a full GPU reset, but that is
rare and usually indicative of much deeper issues. The damage is already
done. However, the goal of the interface to allow long running compute
jobs without causing collateral damage elsewhere, and if we are unable
to support that we should make that known by not providing the
interface (and falsely pretending we can).
Fixes: a0e047156c ("drm/i915/gem: Make context persistence optional")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130164553.1937718-1-chris@chris-wilson.co.uk
(cherry picked from commit d1b9b5f127)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
This if guards whether user-space wants a copy of the offload-jited
bytecode and whether this bytecode exists. By erroneously doing a bitwise
AND instead of a logical AND on user- and kernel-space buffer-size can lead
to no data being copied to user-space especially when user-space size is a
power of two and bigger then the kernel-space buffer.
Fixes: fcfb126def ("bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info")
Signed-off-by: Johannes Krude <johannes@krude.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20200212193227.GA3769@phlox.h.transitiv.net
The only time we actually leave the path spinning is if we're truncating
a small amount and don't actually free an extent, which is not a common
occurrence. We have to set the path blocking in order to add the
delayed ref anyway, so the first extent we find we set the path to
blocking and stay blocking for the duration of the operation. With the
upcoming file extent map stuff there will be another case that we have
to have the path blocking, so just swap to blocking always.
Note: this patch also fixes a warning after 28553fa992 ("Btrfs: fix
race between shrinking truncate and fiemap") got merged that inserts
extent locks around truncation so the path must not leave spinning locks
after btrfs_search_slot.
[70.794783] BUG: sleeping function called from invalid context at mm/slab.h:565
[70.794834] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1141, name: rsync
[70.794863] 5 locks held by rsync/1141:
[70.794876] #0: ffff888417b9c408 (sb_writers#17){.+.+}, at: mnt_want_write+0x20/0x50
[70.795030] #1: ffff888428de28e8 (&type->i_mutex_dir_key#13/1){+.+.}, at: lock_rename+0xf1/0x100
[70.795051] #2: ffff888417b9c608 (sb_internal#2){.+.+}, at: start_transaction+0x394/0x560
[70.795124] #3: ffff888403081768 (btrfs-fs-01){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795203] #4: ffff888403086568 (btrfs-fs-00){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795222] CPU: 5 PID: 1141 Comm: rsync Not tainted 5.6.0-rc2-backup+ #2
[70.795362] Call Trace:
[70.795374] dump_stack+0x71/0xa0
[70.795445] ___might_sleep.part.96.cold.106+0xa6/0xb6
[70.795459] kmem_cache_alloc+0x1d3/0x290
[70.795471] alloc_extent_state+0x22/0x1c0
[70.795544] __clear_extent_bit+0x3ba/0x580
[70.795557] ? _raw_spin_unlock_irq+0x24/0x30
[70.795569] btrfs_truncate_inode_items+0x339/0xe50
[70.795647] btrfs_evict_inode+0x269/0x540
[70.795659] ? dput.part.38+0x29/0x460
[70.795671] evict+0xcd/0x190
[70.795682] __dentry_kill+0xd6/0x180
[70.795754] dput.part.38+0x2ad/0x460
[70.795765] do_renameat2+0x3cb/0x540
[70.795777] __x64_sys_rename+0x1c/0x20
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: 28553fa992 ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
Removed this logic because It is BIOS which needs to
power off the ACP power domian through ACP_PGFSM_CTRL
register when you De-initialize ACP Engine.
Signed-off-by: Ravulapati Vishnu vardhan rao <Vishnuvardhanrao.Ravulapati@amd.com>
Link: https://lore.kernel.org/r/1581935964-15059-1-git-send-email-Vishnuvardhanrao.Ravulapati@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Since commit 057b52b4b3 ("watchdog: da9062: make restart handler atomic
safe"), the driver calls i2c functions directly. It now therefore depends
on I2C. This is a hard dependency which overrides COMPILE_TEST.
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 057b52b4b3 ("watchdog: da9062: make restart handler atomic safe")
Cc: Marco Felsch <m.felsch@pengutronix.de>
Cc: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Cc: Stefan Lengfeld <contact@stefanchrist.eu>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
This fixes commit f6c98b0838 ("watchdog: da9062: add power management
ops"). During discussion [1] we agreed that this should be configurable
because it is a device quirk if we can't use the hw watchdog auto
suspend function.
[1] https://lore.kernel.org/linux-watchdog/20191128171931.22563-1-m.felsch@pengutronix.de/
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Fixes: f6c98b0838 ("watchdog: da9062: add power management ops")
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Link: https://lore.kernel.org/r/20200207071518.5559-1-m.felsch@pengutronix.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
The da9062 hw has a minimum ping cool down phase of at least 200ms. The
driver takes that into account by setting the min_hw_heartbeat_ms to
300ms and the core guarantees that the hw limit is observed for the
ping() calls. But the core can't guarantee the required minimum ping
cool down phase if a stop() command is send immediately after the ping()
command. So it is not allowed to ping the watchdog within the stop()
command as the driver does. Remove the ping can be done without doubts
because the watchdog gets disabled anyway and a (re)start resets the
watchdog counter too.
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20200120091729.16256-1-m.felsch@pengutronix.de
[groeck: Updated description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
- Two fixes to the reset process of the ASIC. Without these fixes, the
reset process might take a long time and produce a kernel panic.
Alternatively, the ASIC could get stuck.
- Fix to reference counting of a command buffer object. It was kref_put
one more time than it should have been.
-----BEGIN PGP SIGNATURE-----
iQFKBAABCgA0FiEE7TEboABC71LctBLFZR1NuKta54AFAl5CcQoWHG9kZWQuZ2Fi
YmF5QGdtYWlsLmNvbQAKCRBlHU24q1rngERpB/4ssOGhoKI43eb5ARKX4NPkBiUv
GiWFYyasmN6+kWbC8gKNsWyEw1c88TYvjkgWllMURJujWrxA1bbjwt9jpaXO0nVI
gXgqDaR1rASoQRqK0LYjxc1bdZ+MtWIWBPAie3NkJaggkvXKE/URQ4NiayaTqN5j
H0kib1h0yC9mvAuDObcVyzWAAIP/dZB2MYlGhwD+IKvMQd/Jzj7zH8QCn0I7V1rR
hF/9ds4jqDrQT7sKnzVzYvLvD4mGOunvbrOTrFre3LSYD9HZf+b29R1q7fKuZA5f
TjElN9bOkHlKJ1PLiZv5QacAJos1neOibPOav42hOpiArXdY44sE50DObO8d
=B93X
-----END PGP SIGNATURE-----
Merge tag 'misc-habanalabs-fixes-2020-02-11' of git://people.freedesktop.org/~gabbayo/linux into char-misc-linus
Oded writes:
This tag contains the following fixes:
- Two fixes to the reset process of the ASIC. Without these fixes, the
reset process might take a long time and produce a kernel panic.
Alternatively, the ASIC could get stuck.
- Fix to reference counting of a command buffer object. It was kref_put
one more time than it should have been.
* tag 'misc-habanalabs-fixes-2020-02-11' of git://people.freedesktop.org/~gabbayo/linux:
habanalabs: patched cb equals user cb in device memset
habanalabs: do not halt CoreSight during hard reset
habanalabs: halt the engines before hard-reset
This patch further relaxes the need to drop an skb due to a clash with
an existing conntrack entry.
Current clash resolution handles the case where the clash occurs between
two identical entries (distinct nf_conn objects with same tuples), i.e.:
Original Reply
existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353
clashing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353
... existing handling will discard the unconfirmed clashing entry and
makes skb->_nfct point to the existing one. The skb can then be
processed normally just as if the clash would not have existed in the
first place.
For other clashes, the skb needs to be dropped.
This frequently happens with DNS resolvers that send A and AAAA queries
back-to-back when NAT rules are present that cause packets to get
different DNAT transformations applied, for example:
-m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353
-m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353
In this case the A or AAAA query is dropped which incurs a costly
delay during name resolution.
This patch also allows this collision type:
Original Reply
existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353
clashing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.7:5353
In this case, clash is in original direction -- the reply direction
is still unique.
The change makes it so that when the 2nd colliding packet is received,
the clashing conntrack is tagged with new IPS_NAT_CLASH_BIT, gets a fixed
1 second timeout and is inserted in the reply direction only.
The entry is hidden from 'conntrack -L', it will time out quickly
and it can be early dropped because it will never progress to the
ASSURED state.
To avoid special-casing the delete code path to special case
the ORIGINAL hlist_nulls node, a new helper, "hlist_nulls_add_fake", is
added so hlist_nulls_del() will work.
Example:
CPU A: CPU B:
1. 10.2.3.4:42 -> 10.8.8.8:53 (A)
2. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
3. Apply DNAT, reply changed to 10.0.0.6
4. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
5. Apply DNAT, reply changed to 10.0.0.7
6. confirm/commit to conntrack table, no collisions
7. commit clashing entry
Reply comes in:
10.2.3.4:42 <- 10.0.0.6:5353 (A)
-> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
10.2.3.4:42 <- 10.0.0.7:5353 (AAAA)
-> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
The conntrack entry is deleted from table, as it has the NAT_CLASH
bit set.
In case of a retransmit from ORIGINAL dir, all further packets will get
the DNAT transformation to 10.0.0.6.
I tried to come up with other solutions but they all have worse
problems.
Alternatives considered were:
1. Confirm ct entries at allocation time, not in postrouting.
a. will cause uneccesarry work when the skb that creates the
conntrack is dropped by ruleset.
b. in case nat is applied, ct entry would need to be moved in
the table, which requires another spinlock pair to be taken.
c. breaks the 'unconfirmed entry is private to cpu' assumption:
we would need to guard all nfct->ext allocation requests with
ct->lock spinlock.
2. Make the unconfirmed list a hash table instead of a pcpu list.
Shares drawback c) of the first alternative.
3. Document this is expected and force users to rearrange their
ruleset (e.g. by using "-m cluster" instead of "-m statistics").
nft has the 'jhash' expression which can be used instead of 'numgen'.
Major drawback: doesn't fix what I consider a bug, not very realistic
and I believe its reasonable to have the existing rulesets to 'just
work'.
4. Document this is expected and force users to steer problematic
packets to the same CPU -- this would serialize the "allocate new
conntrack entry/nat table evaluation/perform nat/confirm entry", so
no race can occur. Similar drawback to 3.
Another advantage of this patch compared to 1) and 2) is that there are
no changes to the hot path; things are handled in the udp tracker and
the clash resolution path.
Cc: rcu@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new device id for the 100 devie. It has 4 interfaces like the 28
and 28L devices but a larger endpoint so more I/O pins.
Cc: Christoph Jung <jung@codemercs.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200214161148.GA3963518@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While certain modeset operations on gv100+ need us to temporarily
disable the LUT, we make the mistake of sometimes neglecting to
reprogram the LUT after such modesets. In particular, moving a head from
one encoder to another seems to trigger this quite often. GV100+ is very
picky about having a LUT in most scenarios, so this causes the display
engine to hang with the following error code:
disp: chid 1 stat 00005080 reason 5 [INVALID_STATE] mthd 0200 data
00000001 code 0000002d)
So, fix this by always re-programming the LUT if we're clearing it in a
state where the wndw is still visible, and has a XLUT handle programmed.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: facaed62b4 ("drm/nouveau/kms/gv100: initial support")
Cc: <stable@vger.kernel.org> # v4.18+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>