Since de77ecd4ef ("bonding: improve link-status update in
mii-monitoring"), the bonding driver has utilized two separate variables
to indicate the next link state a particular slave should transition to.
Each is used to communicate to a different portion of the link state
change commit logic; one to the bond_miimon_commit function itself, and
another to the state transition logic.
Unfortunately, the two variables can become unsynchronized,
resulting in incorrect link state transitions within bonding. This can
cause slaves to become stuck in an incorrect link state until a
subsequent carrier state transition.
The issue occurs when a special case in bond_slave_netdev_event
sets slave->link directly to BOND_LINK_FAIL. On the next pass through
bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
case will set the proposed next state (link_new_state) to BOND_LINK_UP,
but the new_link to BOND_LINK_DOWN. The setting of the final link state
from new_link comes after that from link_new_state, and so the slave
will end up incorrectly in _DOWN state.
Resolve this by combining the two variables into one.
Reported-by: Aleksei Zakharov <zakharov.a.g@yandex.ru>
Reported-by: Sha Zhang <zhangsha.zhang@huawei.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Fixes: de77ecd4ef ("bonding: improve link-status update in mii-monitoring")
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-11-02
The following pull-request contains BPF updates for your *net* tree.
We've added 6 non-merge commits during the last 6 day(s) which contain
a total of 8 files changed, 35 insertions(+), 9 deletions(-).
The main changes are:
1) Fix ppc BPF JIT's tail call implementation by performing a second pass
to gather a stable JIT context before opcode emission, from Eric Dumazet.
2) Fix build of BPF samples sys_perf_event_open() usage to compiled out
unavailable test_attr__{enabled,open} checks. Also fix potential overflows
in bpf_map_{area_alloc,charge_init} on 32 bit archs, from Björn Töpel.
3) Fix narrow loads of bpf_sysctl context fields with offset > 0 on big endian
archs like s390x and also improve the test coverage, from Ilya Leoshkevich.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull NVMe fixes from Keith:
"We have a few late nvme fixes for a couple device removal kernel
crashes, and a compat fix for a new ioctl introduced during this merge
window."
* 'nvme-5.4-rc7' of git://git.infradead.org/nvme:
nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
nvme-rdma: fix a segmentation fault during module unload
Don't swap oper and admin schedules too early, it's not correct and
causes crash.
Steps to reproduce:
1)
tc qdisc replace dev eth0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 1@2 \
base-time $SOME_BASE_TIME \
sched-entry S 01 80000 \
sched-entry S 02 15000 \
sched-entry S 04 40000 \
flags 2
2)
tc qdisc replace dev eth0 parent root handle 100 taprio \
base-time $SOME_BASE_TIME \
sched-entry S 01 90000 \
sched-entry S 02 20000 \
sched-entry S 04 40000 \
flags 2
3)
tc qdisc replace dev eth0 parent root handle 100 taprio \
base-time $SOME_BASE_TIME \
sched-entry S 01 150000 \
sched-entry S 02 200000 \
sched-entry S 04 40000 \
flags 2
Do 2 3 2 .. steps more times if not happens and observe:
[ 305.832319] Unable to handle kernel write to read-only memory at
virtual address ffff0000087ce7f0
[ 305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
[ 305.919306] Hardware name: Texas Instruments AM654 Base Board (DT)
[...]
[ 306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80
[ 306.022422] Call trace:
[ 306.024866] taprio_free_sched_cb+0x4c/0x98
[ 306.029040] rcu_process_callbacks+0x25c/0x410
[ 306.033476] __do_softirq+0x10c/0x208
[ 306.037132] irq_exit+0xb8/0xc8
[ 306.040267] __handle_domain_irq+0x64/0xb8
[ 306.044352] gic_handle_irq+0x7c/0x178
[ 306.048092] el1_irq+0xb0/0x128
[ 306.051227] arch_cpu_idle+0x10/0x18
[ 306.054795] do_idle+0x120/0x138
[ 306.058015] cpu_startup_entry+0x20/0x28
[ 306.061931] rest_init+0xcc/0xd8
[ 306.065154] start_kernel+0x3bc/0x3e4
[ 306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662)
[ 306.074900] ---[ end trace 96c8e2284a9d9d6e ]---
[ 306.079507] Kernel panic - not syncing: Fatal exception in interrupt
[ 306.085847] SMP: stopping secondary CPUs
[ 306.089765] Kernel Offset: disabled
Try to explain one of the possible crash cases:
The "real" admin list is assigned when admin_sched is set to
new_admin, it happens after "swap", that assigns to oper_sched NULL.
Thus if call qdisc show it can crash.
Farther, next second time, when sched list is updated, the admin_sched
is not NULL and becomes the oper_sched, previous oper_sched was NULL so
just skipped. But then admin_sched is assigned new_admin, but schedules
to free previous assigned admin_sched (that already became oper_sched).
Farther, next third time, when sched list is updated,
while one more swap, oper_sched is not null, but it was happy to be
freed already (while prev. admin update), so while try to free
oper_sched the kernel panic happens at taprio_free_sched_cb().
So, move the "swap emulation" where it should be according to function
comment from code.
Fixes: 9c66d15646 ("taprio: Add support for hardware offloading")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl3BnFkTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRBaxiGjkeSdIHOLB/9uT8j0PWXrT9rZQip8SH0H8Jx37O2B
BcBIHrLXFFSGmSrmEGs+1hn1CAiUS1ihez5sndeKYx79bJJNWKfBHTKJaIMq1D5W
UPfUnZ/ovbqCdbQST1su4P7KlDEtAfrtO+ELoMiXt/21jQHFqVBnI17RR42vFxG3
irR6sIMgaXwmvq+AfwTLjCtSYYkR3s/f0cSvH/TUweLGwF/4bfYYepU9MpLc6HSo
Qfb6Bxy+XZZGT0/E8hfFnon/aCp0OYnSvlGW8Xfdoiz6CUxSm8PA1N4cvNw0VyIX
2majH5XxDo/BXi16+g3OX3el5mFSdL+Q4rykoTJTjcCKd5DU+3Aut7zS
=OkhG
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-5.4-20191105' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2019-11-05
this is a pull request of 33 patches for net/master.
In the first patch Wen Yang's patch adds a missing of_node_put() to CAN device
infrastructure.
Navid Emamdoost's patch for the gs_usb driver fixes a memory leak in the
gs_can_open() error path.
Johan Hovold provides two patches, one for the mcba_usb, the other for the
usb_8dev driver. Both fix a use-after-free after USB-disconnect.
Joakim Zhang's patch improves the flexcan driver, the ECC mechanism is now
completely disabled instead of masking the interrupts.
The next three patches all target the peak_usb driver. Stephane Grosjean's
patch fixes a potential out-of-sync while decoding packets, Johan Hovold's
patch fixes a slab info leak, Jeroen Hofstee's patch adds missing reporting of
bus off recovery events.
Followed by three patches for the c_can driver. Kurt Van Dijck's patch fixes
detection of potential missing status IRQs, Jeroen Hofstee's patches add a chip
reset on open and add missing reporting of bus off recovery events.
Appana Durga Kedareswara rao's patch for the xilinx driver fixes the flags
field initialization for axi CAN.
The next seven patches target the rx-offload helper, they are by me and Jeroen
Hofstee. The error handling in case of a queue overflow is fixed removing a
memory leak. Further the error handling in case of queue overflow and skb OOM
is cleaned up.
The next two patches are by me and target the flexcan and ti_hecc driver. In
case of a error during can_rx_offload_queue_sorted() the error counters in the
drivers are incremented.
Jeroen Hofstee provides 6 patches for the ti_hecc driver, which properly stop
the device in ifdown, improve the rx-offload support (which hit mainline in
v5.4-rc1), and add missing FIFO overflow and state change reporting.
The following four patches target the j1939 protocol. Colin Ian King's patch
fixes a memory leak in the j1939_sk_errqueue() handling. Three patches by
Oleksij Rempel fix a memory leak on socket release and fix the EOMA packet in
the transport protocol.
Timo Schlüßler's patch fixes a potential race condition in the mcp251x driver
on after suspend.
The last patch is by Yegor Yefremov and updates the SPDX-License-Identifier to
v3.0.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing nvme_passthru_cmd64 to add a field: rsvd2. This field is an explicit
marker for the padding space added on certain platforms as a result of the
enlargement of the result field from 32 bit to 64 bits in size, and
fixes differences in struct size when using compat ioctl for 32-bit
binaries on 64-bit architecture.
Fixes: 65e68edce0 ("nvme: allow 64-bit results in passthru commands")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Charles Machalow <csm10495@gmail.com>
[changelog]
Signed-off-by: Keith Busch <kbusch@kernel.org>
drivers/gpu/drm/drm_dp_mst_topology.c: In function __topology_ref_save:
drivers/gpu/drm/drm_dp_mst_topology.c:1424:6: error: implicit declaration of function stack_trace_save; did you mean stack_depot_save? [-Werror=implicit-function-declaration]
n = stack_trace_save(stack_entries, ARRAY_SIZE(stack_entries), 1);
^~~~~~~~~~~~~~~~
stack_depot_save
drivers/gpu/drm/drm_dp_mst_topology.c: In function __dump_topology_ref_history:
drivers/gpu/drm/drm_dp_mst_topology.c:1513:3: error: implicit declaration of function stack_trace_snprint; did you mean acpi_trace_point? [-Werror=implicit-function-declaration]
stack_trace_snprint(buf, PAGE_SIZE, entries, nr_entries, 4);
^~~~~~~~~~~~~~~~~~~
acpi_trace_point
stack_trace_save and stack_trace_snprint are declared in <linux/stacktrace.h>,
so there is need to include it, and <linux/stackdepot.h> is already included
by practices, so just replace <linux/stackdepot.h> by <linux/stacktrace.h>.
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1572515029-42087-1-git-send-email-chenwandun@huawei.com
In some circumstances the RC6 context can get corrupted. We can detect
this and take the required action, that is disable RC6 and runtime PM.
The HW recovers from the corrupted state after a system suspend/resume
cycle, so detect the recovery and re-enable RC6 and runtime PM.
v2: rebase (Mika)
v3:
- Move intel_suspend_gt_powersave() to the end of the GEM suspend
sequence.
- Add commit message.
v4:
- Rebased on intel_uncore_forcewake_put(i915->uncore, ...) API
change.
v5: rebased on gem/gt split (Mika)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
In BXT/APL, device 2 MMIO reads from MIPI controller requires its PLL
to be turned ON. When MIPI PLL is turned off (MIPI Display is not
active or connected), and someone (host or GT engine) tries to read
MIPI registers, it causes hard hang. This is a hardware restriction
or limitation.
Driver by itself doesn't read MIPI registers when MIPI display is off.
But any userspace application can submit unprivileged batch buffer for
execution. In that batch buffer there can be mmio reads. And these
reads are allowed even for unprivileged applications. If these
register reads are for MIPI DSI controller and MIPI display is not
active during that time, then the MMIO read operation causes system
hard hang and only way to recover is hard reboot. A genuine
process/application won't submit batch buffer like this and doesn't
cause any issue. But on a compromised system, a malign userspace
process/app can generate such batch buffer and can trigger system
hard hang (denial of service attack).
The fix is to lower the internal MMIO timeout value to an optimum
value of 950us as recommended by hardware team. If the timeout is
beyond 1ms (which will hit for any value we choose if MMIO READ on a
DSI specific register is performed without PLL ON), it causes the
system hang. But if the timeout value is lower than it will be below
the threshold (even if timeout happens) and system will not get into
a hung state. This will avoid a system hang without losing any
programming or GT interrupts, taking the worst case of lowest CDCLK
frequency and early DC5 abort into account.
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Some of the gen instruction macros (e.g. MI_DISPLAY_FLIP) have the
length directly encoded in them. Since these are used directly in
the tables, the Length becomes part of the comparison used for
matching during parsing. Thus, if the cmd being parsed has a
different length to that in the table, it is not matched and the
cmd is accepted via the default variable length path.
Fix by masking out everything except the Opcode in the cmd tables
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
To keep things manageable, the pre-gen9 cmdparser does not
attempt to track any form of nested BB_START's. This did not
prevent usermode from using nested starts, or even chained
batches because the cmdparser is not strictly enforced pre gen9.
Instead, the existence of a nested BB_START would cause the batch
to be emitted in insecure mode, and any privileged capabilities
would not be available.
For Gen9, the cmdparser becomes mandatory (for BCS at least), and
so not providing any form of nested BB_START support becomes
overly restrictive. Any such batch will simply not run.
We make heavy use of backward jumps in igt, and it is much easier
to add support for this restricted subset of nested jumps, than to
rewrite the whole of our test suite to avoid them.
Add the required logic to support limited backward jumps, to
instructions that have already been validated by the parser.
Note that it's not sufficient to simply approve any BB_START
that jumps backwards in the buffer because this would allow an
attacker to embed a rogue instruction sequence within the
operand words of a harmless instruction (say LRI) and jump to
that.
We introduce a bit array to track every instr offset successfully
validated, and test the target of BB_START against this. If the
target offset hits, it is re-written to the same offset in the
shadow buffer and the BB_START cmd is allowed.
Note: This patch deliberately ignores checkpatch issues in the
cmdtables, in order to match the style of the surrounding code.
We'll correct the entire file in one go in a later patch.
v2: set dispatch secure late (Mika)
v3: rebase (Mika)
v4: Clear whitelist on each parse
Minor review updates (Chris)
v5: Correct backward jump batching
v6: fix compilation error due to struct eb shuffle (Mika)
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
In the next patch we will be adding a second valid
termination condition which will require a small
amount of refactoring to share logic with the BB_END
case.
Refactor all error conditions to jump to a dedicated
exit path, with 'break' reserved only for a successful
parse.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
For gen9 we enable cmdparsing on the BCS ring, specifically
to catch inadvertent accesses to sensitive registers
Unlike gen7/hsw, we use the parser only to block certain
registers. We can rely on h/w to block restricted commands,
so the command tables only provide enough info to allow the
parser to delineate each command, and identify commands that
access registers.
Note: This patch deliberately ignores checkpatch issues in
favour of matching the style of the surrounding code. We'll
correct the entire file in one go in a later patch.
v3: rebase (Mika)
v4: Add RING_TIMESTAMP registers to whitelist (Jon)
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
In "drm/i915: Add support for mandatory cmdparsing" we introduced the
concept of mandatory parsing. This allows the cmdparser to be invoked
even when user passes batch_len=0 to the execbuf ioctl's.
However, the cmdparser needs to know the extents of the buffer being
scanned. Refactor the code to ensure the cmdparser uses the actual
object size, instead of the incoming length, if user passes 0.
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
For Gen7, the original cmdparser motive was to permit limited
use of register read/write instructions in unprivileged BB's.
This worked by copying the user supplied bb to a kmd owned
bb, and running it in secure mode, from the ggtt, only if
the scanner finds no unsafe commands or registers.
For Gen8+ we can't use this same technique because running bb's
from the ggtt also disables access to ppgtt space. But we also
do not actually require 'secure' execution since we are only
trying to reduce the available command/register set. Instead we
will copy the user buffer to a kmd owned read-only bb in ppgtt,
and run in the usual non-secure mode.
Note that ro pages are only supported by ppgtt (not ggtt), but
luckily that's exactly what we need.
Add the required paths to map the shadow buffer to ppgtt ro for Gen8+
v2: IS_GEN7/IS_GEN (Mika)
v3: rebase
v4: rebase
v5: rebase
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
The existing cmdparser for gen7 can be bypassed by specifying
batch_len=0 in the execbuf call. This is safe because bypassing
simply reduces the cmd-set available.
In a later patch we will introduce cmdparsing for gen9, as a
security measure, which must be strictly enforced since without
it we are vulnerable to DoS attacks.
Introduce the concept of 'required' cmd parsing that cannot be
bypassed by submitting zero-length bb's.
v2: rebase (Mika)
v2: rebase (Mika)
v3: fix conflict on engine flags (Mika)
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
The previous patch has killed support for secure batches
on gen6+, and hence the cmdparsers master tables are
now dead code. Remove them.
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Retroactively stop reporting support for secure batches
through the api for gen6+ so that older binaries trigger
the fallback path instead.
Older binaries use secure batches pre gen6 to access resources
that are not available to normal usermode processes. However,
all known userspace explicitly checks for HAS_SECURE_BATCHES
before relying on the secure batch feature.
Since there are no known binaries relying on this for newer gens
we can kill secure batches from gen6, via I915_PARAM_HAS_SECURE_BATCHES.
v2: rebase (Mika)
v3: rebase (Mika)
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
We're about to introduce some new tables for later gens, and the
current naming for the gen7 tables will no longer make sense.
v2: rebase
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Do not support mmap in S/PDIF mode. In S/PDIF mode
the buffer has to be copied, to allow the channel status
bits insertion.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Link: https://lore.kernel.org/r/20191104133654.28750-1-olivier.moysan@st.com
Signed-off-by: Mark Brown <broonie@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXcGMxwAKCRCRxhvAZXjc
otEzAP9lHvP97TtRG9gP6dj6YovZ5Djdo3IscmkTqy5Nt8sVNQD+NWg1LZnSMFdJ
ExETgRVlsjF8q2sblswtn/8Ab53O6AM=
=RVGl
-----END PGP SIGNATURE-----
Merge tag 'for-linus-2019-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull clone3 stack argument update from Christian Brauner:
"This changes clone3() to do basic stack validation and to set up the
stack depending on whether or not it is growing up or down.
With clone3() the expectation is now very simply that the .stack
argument points to the lowest address of the stack and that
.stack_size specifies the initial stack size. This is diferent from
legacy clone() where the "stack" argument had to point to the lowest
or highest address of the stack depending on the architecture.
clone3() was released with 5.3. Currently, it is not documented and
very unclear to userspace how the stack and stack_size argument have
to be passed. After talking to glibc folks we concluded that changing
clone3() to determine stack direction and doing basic validation is
the right course of action.
Note, this is a potentially user visible change. In the very unlikely
case, that it breaks someone's use-case we will revert. (And then e.g.
place the new behavior under an appropriate flag.)
Note that passing an empty stack will continue working just as before.
Breaking someone's use-case is very unlikely. Neither glibc nor musl
currently expose a wrapper for clone3(). There is currently also no
real motivation for anyone to use clone3() directly. First, because
using clone{3}() with stacks requires some assembly (see glibc and
musl). Second, because it does not provide features that legacy
clone() doesn't. New features for clone3() will first happen in v5.5
which is why v5.4 is still a good time to try and make that change now
and backport it to v5.3.
I did a codesearch on https://codesearch.debian.net, github, and
gitlab and could not find any software currently relying directly on
clone3(). I expect this to change once we land CLONE_CLEAR_SIGHAND
which was a request coming from glibc at which point they'll likely
start using it"
* tag 'for-linus-2019-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
clone3: validate stack arguments
- Fix a build error in the tools used for kselftest.
- A series of reverts to bring the Intel Merrifield back to
working. We will likely unrevert the reverts for v5.5
but we can't have v5.4 broken.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl3BjZ8ACgkQQRCzN7AZ
XXPbTRAAxCBCegC7ZISN413NqmzAc8S9e2pHik/0+UinIiq8Kha1zHEmiWKa/GFv
NSulFytfwYwi3UVgAMds2lFMpbq6atG60D/FgGhRDeG56RHOs29sYDXOIrWvrad6
49B2U//2FQPcfMpH5PFT0O+B90cxZRYW/MhljnAuBRPUEF6F9m/BJqHlQYD6eY6v
Xn005z1N/sQbs/1EBXryNlvfHOtepKXTNC0d+HQnRRTEZsAhid8sc+iTVEvLUZh8
IttKs2I7rB26swx/HwWyiuWxE3hf5lrlwf3xxMQn2cWyyp08vJNRyiEXsEErHQf6
9eqt1j4iXARf1fh4KoLCgC6zQgqsLG0BZ9Bf28gVXIRXJZDYV1FOpBfc8YwTk29T
VhNrafUN0o+72qieXqLLKpMT4gjMnqzBhI06NRXEqA2AwIqQJQd4DHdLNHmiaICJ
cyfzI4TBTjc3qh2av+D43bESh2UwXYErImvaIT+TqFv1P2T7x2UCFL6WGuQZ/qaB
z7Gg0vh1gc+yrQnVxOh3vuUpD0x2FXo52k74fSYy55aMQlrj77y3w27hzSZ9Prgs
zIZtVZbTXXQ+SxtuujzdzZKDJ9gJ0+FWyVCe7MxbgP6vBi+7omZ6TPLn0iAS9GQG
aqLVGVgaIS0kOVvQVR6aAEonbsZZILns8ib4bpPlGIMNgQTCN2A=
=ml6d
-----END PGP SIGNATURE-----
Merge tag 'gpio-v5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"More GPIO fixes! We found a late regression in the Intel Merrifield
driver. Oh well. We fixed it up.
- Fix a build error in the tools used for kselftest
- A series of reverts to bring the Intel Merrifield back to working.
We will likely unrevert the reverts for v5.5 but we can't have v5.4
broken"
* tag 'gpio-v5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
Revert "gpio: merrifield: Pass irqchip when adding gpiochip"
Revert "gpio: merrifield: Restore use of irq_base"
Revert "gpio: merrifield: Move hardware initialization to callback"
tools: gpio: Use !building_out_of_srctree to determine srctree
The bd70528 watchdog driver is probed by MFD driver. Add MODULE_ALIAS
in order to allow udev to load the module when MFD sub-device cell for
watchdog is added.
Fixes: bbc88a0ec9 ("watchdog: bd70528: Initial support for ROHM BD70528 watchdog block")
Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
SCU firmware calculates pretimeout based on current time stamp
instead of watchdog timeout stamp, need to convert the pretimeout
to SCU firmware's timeout value.
Fixes: 15f7d7fc55 ("watchdog: imx_sc: Add pretimeout support")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
The left time value is wrong when we get it by sysfs. The left time value
should be equal to preset timeout value minus elapsed time value. According
to the Meson-GXB/GXL datasheets which can be found at [0], the timeout value
is saved to BIT[0-15] of the WATCHDOG_TCNT, and elapsed time value is saved
to BIT[16-31] of the WATCHDOG_TCNT.
[0]: http://linux-meson.com
Fixes: 683fa50f0e ("watchdog: Add Meson GXBB Watchdog Driver")
Signed-off-by: Xingyu Chen <xingyu.chen@amlogic.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
When an IRQ is present in the dts, the probe function shall fail if
the interrupt can not be registered.
The probe function shall also be retried if getting the irq is being
deferred.
Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
The compat_ptr_ioctl() infrastructure did not make it into
linux-5.4, so cpwd now fails to build.
Fix it by using an open-coded version.
Fixes: 68f28b01fb ("watchdog: cpwd: use generic compat_ptr_ioctl")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
nvme_mpath_clear_ctrl_paths() iterates through
the ctrl->namespaces list while holding ctrl->scan_lock.
This does not seem to be the correct way of protecting
from concurrent list modification.
Specifically, nvme_scan_work() sorts ctrl->namespaces
AFTER unlocking scan_lock.
This may result in the following (rare) crash in ctrl disconnect
during scan_work:
BUG: kernel NULL pointer dereference, address: 0000000000000050
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic
RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core]
...
Call Trace:
nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core]
nvme_remove_namespaces+0x35/0xe0 [nvme_core]
nvme_do_delete_ctrl+0x47/0x90 [nvme_core]
nvme_sysfs_delete+0x49/0x60 [nvme_core]
dev_attr_store+0x17/0x30
sysfs_kf_write+0x3e/0x50
kernfs_fop_write+0x11e/0x1a0
__vfs_write+0x1b/0x40
vfs_write+0xb9/0x1a0
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x5a/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f8d02bfb154
Fix:
After taking scan_lock in nvme_mpath_clear_ctrl_paths()
down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe.
This will not cause deadlocks because taking scan_lock never happens
while holding the namespaces_rwsem.
Moreover, scan work downs namespaces_rwsem in the same order.
Alternative: sort ctrl->namespaces in nvme_scan_work()
while still holding the scan_lock.
This would leave nvme_mpath_clear_ctrl_paths() without correct protection
against ctrl->namespaces modification by anyone other than scan_work.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
In case there are controllers that are not associated with any RDMA
device (e.g. during unsuccessful reconnection) and the user will unload
the module, these controllers will not be freed and will access already
freed memory. The same logic appears in other fabric drivers as well.
Fixes: 87fd125344 ("nvme-rdma: remove redundant reference between ib_device and tagset")
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Validate the stack arguments and setup the stack depening on whether or not
it is growing down or up.
Legacy clone() required userspace to know in which direction the stack is
growing and pass down the stack pointer appropriately. To make things more
confusing microblaze uses a variant of the clone() syscall selected by
CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument.
IA64 has a separate clone2() syscall which also takes an additional
stack_size argument. Finally, parisc has a stack that is growing upwards.
Userspace therefore has a lot nasty code like the following:
#define __STACK_SIZE (8 * 1024 * 1024)
pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd)
{
pid_t ret;
void *stack;
stack = malloc(__STACK_SIZE);
if (!stack)
return -ENOMEM;
#ifdef __ia64__
ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#elif defined(__parisc__) /* stack grows up */
ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd);
#else
ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#endif
return ret;
}
or even crazier variants such as [3].
With clone3() we have the ability to validate the stack. We can check that
when stack_size is passed, the stack pointer is valid and the other way
around. We can also check that the memory area userspace gave us is fine to
use via access_ok(). Furthermore, we probably should not require
userspace to know in which direction the stack is growing. It is easy
for us to do this in the kernel and I couldn't find the original
reasoning behind exposing this detail to userspace.
/* Intentional user visible API change */
clone3() was released with 5.3. Currently, it is not documented and very
unclear to userspace how the stack and stack_size argument have to be
passed. After talking to glibc folks we concluded that trying to change
clone3() to setup the stack instead of requiring userspace to do this is
the right course of action.
Note, that this is an explicit change in user visible behavior we introduce
with this patch. If it breaks someone's use-case we will revert! (And then
e.g. place the new behavior under an appropriate flag.)
Breaking someone's use-case is very unlikely though. First, neither glibc
nor musl currently expose a wrapper for clone3(). Second, there is no real
motivation for anyone to use clone3() directly since it does not provide
features that legacy clone doesn't. New features for clone3() will first
happen in v5.5 which is why v5.4 is still a good time to try and make that
change now and backport it to v5.3. Searches on [4] did not reveal any
packages calling clone3().
[1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17TA@mail.gmail.com
[2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein
[3]: 5238e95759/src/basic/raw-clone.h (L31)
[4]: https://codesearch.debian.net
Fixes: 7f192e3cd3 ("fork: add clone3")
Cc: Kees Cook <keescook@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-api@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 5.3
Cc: GNU C Library <libc-alpha@sourceware.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/r/20191031113608.20713-1-christian.brauner@ubuntu.com
copy_file_range tries to use the OSD 'copy-from' operation, which simply
performs a full object copy. Unfortunately, the implementation of this
system call assumes that stripe_count is always set to 1 and doesn't take
into account that the data may be striped across an object set. If the
file layout has stripe_count different from 1, then the destination file
data will be corrupted.
For example:
Consider a 8 MiB file with 4 MiB object size, stripe_count of 2 and
stripe_size of 2 MiB; the first half of the file will be filled with 'A's
and the second half will be filled with 'B's:
0 4M 8M Obj1 Obj2
+------+------+ +----+ +----+
file: | AAAA | BBBB | | AA | | AA |
+------+------+ |----| |----|
| BB | | BB |
+----+ +----+
If we copy_file_range this file into a new file (which needs to have the
same file layout!), then it will start by copying the object starting at
file offset 0 (Obj1). And then it will copy the object starting at file
offset 4M -- which is Obj1 again.
Unfortunately, the solution for this is to not allow remote object copies
to be performed when the file layout stripe_count is not 1 and simply
fallback to the default (VFS) copy_file_range implementation.
Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If ceph_atomic_open is handed a !d_in_lookup dentry, then that means
that it already passed d_revalidate so we *know* that it's negative (or
at least was very recently). Just return -ENOENT in that case.
This also addresses a subtle bug in dentry handling. Non-O_CREAT opens
call atomic_open with the parent's i_rwsem shared, but calling
d_splice_alias on a hashed dentry requires the exclusive lock.
If ceph_atomic_open receives a hashed, negative dentry on a non-O_CREAT
open, and another client were to race in and create the file before we
issue our OPEN, ceph_fill_trace could end up calling d_splice_alias on
the dentry with the new inode with insufficient locks.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The counter is removed from the pm wakeref count, but it remains intact
so that we can restore it upon resume. Ergo inside suspend, it may have
a value.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191104090158.2959-1-chris@chris-wilson.co.uk
(cherry picked from commit 83c55ee82f)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently we shutdown rc6 during i915_gem_resume() but this is called
during the preparation phase (i915_drm_prepare) for all suspend paths,
but we only want to shutdown rc6 for S3+. Move the actual shutdown to
i915_gem_suspend_late().
We then need to differentiate between suspend targets, to distinguish S0
(s2idle) where the device is kept awake but needs to be in a low power
mode (the same as runtime suspend) from the device suspend levels where
we lose control of HW and so must disable any HW access to dangling
memory.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111909
Fixes: c113236718 ("drm/i915: Extract GT render sleep (rc6) management")
Testcase: igt/gem_exec_suspend/power-S0
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101141009.15581-4-chris@chris-wilson.co.uk
(cherry picked from commit c601cb2135)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We already track the debugfs user_forcewake on the GT, so it is natural
to pull the suspend/resume handling under gt/ as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101141009.15581-3-chris@chris-wilson.co.uk
(cherry picked from commit 9ab3fe2d7d)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
As we already do reload the kernel context in intel_gt_resume, repeating
that action inside i915_gem_resume() as well is redundant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101141009.15581-2-chris@chris-wilson.co.uk
(cherry picked from commit c8f6cfc56f)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Assume all responsibility for operating on the HW to sanitize the GT
state upon load/resume in intel_gt_sanitize() itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101141009.15581-1-chris@chris-wilson.co.uk
(cherry picked from commit 797a615357)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The unsolicited event handler for the headphone jack on CA0132 codec
driver tries to reschedule the another delayed work with
cancel_delayed_work_sync(). It's no good idea, unfortunately,
especially after we changed the work queue to the standard global
one; this may lead to a stall because both works are using the same
global queue.
Fix it by dropping the _sync but does call cancel_delayed_work()
instead.
Fixes: 993884f6a2 ("ALSA: hda/ca0132 - Delay HP amp turnon.")
BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1155836
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191105134316.19294-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The nsdeps script passes a list of the module source files to
generate_deps_for_ns() as a space delimited string named $mod_source_files,
which then passes it to spatch. But since $mod_source_files is not encased
in quotes, each source file in that string is treated as a separate shell
function argument (as $2, $3, $4, etc.). However, the spatch invocation
only refers to $2, so only the first file out of $mod_source_files is
processed by spatch.
This causes problems (namely, the MODULE_IMPORT_NS() statement doesn't
get inserted) when a module is composed of many source files and the
"main" module file containing the MODULE_LICENSE() statement is not the
first file listed in $mod_source_files. Fix this by encasing
$mod_source_files in quotes so that the entirety of the string is
treated as a single argument and can be referred to as $2.
In addition, put quotes in the variable assignment of mod_source_files
to prevent any shell interpretation and field splitting.
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
The final sort might get confused when the comparison is done over
bigger numbers than int like for -s time.
Check the following report for longer workloads:
$ perf report -s time -F time,overhead --stdio
Fix hist_entry__sort() to properly return int64_t and not possible cut
int.
Fixes: 043ca389a3 ("perf tools: Use hpp formats to sort final output")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org # v3.16+
Link: http://lore.kernel.org/lkml/20191104232711.16055-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "GPL-2.0" license identifier changed to "GPL-2.0-only" in SPDX v3.0.
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In mcp251x_restart_work_handler() the variable to stop the interrupt
handler (priv->force_quit) is reset after the chip is restarted and thus
a interrupt might occur.
This patch fixes the potential race condition by resetting force_quit
before enabling interrupts.
Signed-off-by: Timo Schlüßler <schluessler@krause.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
trace_find_next_event() was buggy and pretty much a useless helper. As
there are no more users, just remove it.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lore.kernel.org/lkml/20191017210636.224045576@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of calling a useless (and broken) helper function to get the
next event of a tep event array, just get the array directly and iterate
over it.
Note, the broken part was from trace_find_next_event() which after this
will no longer be used, and can be removed.
Committer notes:
This fixes a segfault when generating python scripts from perf.data
files with multiple tracepoint events, i.e. the following use case is
fixed by this patch:
# perf record -e sched:* sleep 1
[ perf record: Woken up 31 times to write data ]
[ perf record: Captured and wrote 0.031 MB perf.data (9 samples) ]
# perf script -g python
Segmentation fault (core dumped)
#
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lkml.kernel.org/r/20191017153733.630cd5eb@gandalf.local.home
Link: http://lore.kernel.org/lkml/20191017210636.061448713@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since the execlists_active() is no longer protected by the
engine->active.lock, we need to protect the request pointer with RCU to
prevent it being freed as we evaluate whether or not we need to preempt.
Fixes: df40306902 ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Fixes: 13ed13a4dc ("drm/i915: Don't set queue_priority_hint if we don't kick the submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191104090158.2959-2-chris@chris-wilson.co.uk
(cherry picked from commit 7d14863525)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The introduction of clocksource_tsc_early broke the functionality of
"tsc=reliable" and "tsc=nowatchdog" command line parameters, since
clocksource_tsc_early is unconditionally registered with
CLOCK_SOURCE_MUST_VERIFY and thus put on the watchdog list.
This can cause the TSC to be declared unstable during boot:
clocksource: timekeeping watchdog on CPU0: Marking clocksource
'tsc-early' as unstable because the skew is too large:
clocksource: 'refined-jiffies' wd_now: fffb7018 wd_last: fffb6e9d
mask: ffffffff
clocksource: 'tsc-early' cs_now: 68a6a7070f6a0 cs_last: 68a69ab6f74d6
mask: ffffffffffffffff
tsc: Marking TSC unstable due to clocksource watchdog
The corresponding elapsed times are cs_nsec=1224152026 and wd_nsec=378942392, so
the watchdog differs from TSC by 0.84 seconds.
This happens when HPET is not available and jiffies are used as the TSC
watchdog instead and the jiffies update is not happening due to lost timer
interrupts in periodic mode, which can happen e.g. with expensive debug
mechanisms enabled or under massive overload conditions in virtualized
environments.
Before the introduction of the early TSC clocksource the command line
parameters "tsc=reliable" and "tsc=nowatchdog" could be used to work around
this issue.
Restore the behaviour by disabling the watchdog if requested on the kernel
command line.
[ tglx: Clarify changelog ]
Fixes: aa83c45762 ("x86/tsc: Introduce early tsc clocksource")
Signed-off-by: Michael Zhivich <mzhivich@akamai.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191024175945.14338-1-mzhivich@akamai.com