Commit Graph

917285 Commits

Author SHA1 Message Date
Peter Zijlstra
4b44a21dd6 irq_work, smp: Allow irq_work on call_single_queue
Currently irq_work_queue_on() will issue an unconditional
arch_send_call_function_single_ipi() and has the handler do
irq_work_run().

This is unfortunate in that it makes the IPI handler look at a second
cacheline and it misses the opportunity to avoid the IPI. Instead note
that struct irq_work and struct __call_single_data are very similar in
layout, so use a few bits in the flags word to encode a type and stick
the irq_work on the call_single_queue list.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200526161908.011635912@infradead.org
2020-05-28 10:54:15 +02:00
Peter Zijlstra
b2a02fc43a smp: Optimize send_call_function_single_ipi()
Just like the ttwu_queue_remote() IPI, make use of _TIF_POLLING_NRFLAG
to avoid sending IPIs to idle CPUs.

[ mingo: Fix UP build bug. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200526161907.953304789@infradead.org
2020-05-28 10:54:15 +02:00
Peter Zijlstra
afaa653c56 smp: Move irq_work_run() out of flush_smp_call_function_queue()
This ensures flush_smp_call_function_queue() is strictly about
call_single_queue.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200526161907.895109676@infradead.org
2020-05-28 10:54:15 +02:00
Peter Zijlstra
52103be07d smp: Optimize flush_smp_call_function_queue()
The call_single_queue can contain (two) different callbacks,
synchronous and asynchronous. The current interrupt handler runs them
in-order, which means that remote CPUs that are waiting for their
synchronous call can be delayed by running asynchronous callbacks.

Rework the interrupt handler to first run the synchonous callbacks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200526161907.836818381@infradead.org
2020-05-28 10:54:15 +02:00
Peter Zijlstra
19a1f5ec69 sched: Fix smp_call_function_single_async() usage for ILB
The recent commit: 90b5363acd ("sched: Clean up scheduler_ipi()")
got smp_call_function_single_async() subtly wrong. Even though it will
return -EBUSY when trying to re-use a csd, that condition is not
atomic and still requires external serialization.

The change in kick_ilb() got this wrong.

While on first reading kick_ilb() has an atomic test-and-set that
appears to serialize the use, the matching 'release' is not in the
right place to actually guarantee this serialization.

Rework the nohz_idle_balance() trigger so that the release is in the
IPI callback and thus guarantees the required serialization for the
CSD.

Fixes: 90b5363acd ("sched: Clean up scheduler_ipi()")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: mgorman@techsingularity.net
Link: https://lore.kernel.org/r/20200526161907.778543557@infradead.org
2020-05-28 10:54:15 +02:00
Ingo Molnar
58ef57b16d Merge branch 'core/rcu' into sched/core, to pick up dependency
We are going to rely on the loosening of RCU callback semantics,
introduced by this commit:

  806f04e9fd: ("rcu: Allow for smp_call_function() running callbacks from idle")

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28 10:52:53 +02:00
Ingo Molnar
498bdcdb94 Merge branch 'sched/urgent' into sched/core, to pick up fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28 10:52:37 +02:00
Peter Zijlstra
806f04e9fd rcu: Allow for smp_call_function() running callbacks from idle
Current RCU hard relies on smp_call_function() callbacks running from
interrupt context. A pending optimization is going to break that, it
will allow idle CPUs to run the callbacks from the idle loop. This
avoids raising the IPI on the requesting CPU and avoids handling an
exception on the receiving CPU.

Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
provided it is the idle task.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20200527171236.GC706495@hirez.programming.kicks-ass.net
2020-05-28 10:50:12 +02:00
Jens Axboe
18f855e574 sched/fair: Don't NUMA balance for kthreads
Stefano reported a crash with using SQPOLL with io_uring:

  BUG: kernel NULL pointer dereference, address: 00000000000003b0
  CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 #11
  RIP: 0010:task_numa_work+0x4f/0x2c0
  Call Trace:
   task_work_run+0x68/0xa0
   io_sq_thread+0x252/0x3d0
   kthread+0xf9/0x130
   ret_from_fork+0x35/0x40

which is task_numa_work() oopsing on current->mm being NULL.

The task work is queued by task_tick_numa(), which checks if current->mm is
NULL at the time of the call. But this state isn't necessarily persistent,
if the kthread is using use_mm() to temporarily adopt the mm of a task.

Change the task_tick_numa() check to exclude kernel threads in general,
as it doesn't make sense to attempt ot balance for kthreads anyway.

Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/865de121-8190-5d30-ece5-3b097dc74431@kernel.dk
2020-05-26 18:34:58 +02:00
Mel Gorman
2ebb177175 sched/core: Offload wakee task activation if it the wakee is descheduling
The previous commit:

  c6e7bd7afa: ("sched/core: Optimize ttwu() spinning on p->on_cpu")

avoids spinning on p->on_rq when the task is descheduling, but only if the
wakee is on a CPU that does not share cache with the waker.

This patch offloads the activation of the wakee to the CPU that is about to
go idle if the task is the only one on the runqueue. This potentially allows
the waker task to continue making progress when the wakeup is not strictly
synchronous.

This is very obvious with netperf UDP_STREAM running on localhost. The
waker is sending packets as quickly as possible without waiting for any
reply. It frequently wakes the server for the processing of packets and
when netserver is using local memory, it quickly completes the processing
and goes back to idle. The waker often observes that netserver is on_rq
and spins excessively leading to a drop in throughput.

This is a comparison of 5.7-rc6 against "sched: Optimize ttwu() spinning
on p->on_cpu" and against this patch labeled vanilla, optttwu-v1r1 and
localwakelist-v1r2 respectively.

                                  5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                    vanilla           optttwu-v1r1     localwakelist-v1r2
Hmean     send-64         251.49 (   0.00%)      258.05 *   2.61%*      305.59 *  21.51%*
Hmean     send-128        497.86 (   0.00%)      519.89 *   4.43%*      600.25 *  20.57%*
Hmean     send-256        944.90 (   0.00%)      997.45 *   5.56%*     1140.19 *  20.67%*
Hmean     send-1024      3779.03 (   0.00%)     3859.18 *   2.12%*     4518.19 *  19.56%*
Hmean     send-2048      7030.81 (   0.00%)     7315.99 *   4.06%*     8683.01 *  23.50%*
Hmean     send-3312     10847.44 (   0.00%)    11149.43 *   2.78%*    12896.71 *  18.89%*
Hmean     send-4096     13436.19 (   0.00%)    13614.09 (   1.32%)    15041.09 *  11.94%*
Hmean     send-8192     22624.49 (   0.00%)    23265.32 *   2.83%*    24534.96 *   8.44%*
Hmean     send-16384    34441.87 (   0.00%)    36457.15 *   5.85%*    35986.21 *   4.48%*

Note that this benefit is not universal to all wakeups, it only applies
to the case where the waker often spins on p->on_rq.

The impact can be seen from a "perf sched latency" report generated from
a single iteration of one packet size:

   -----------------------------------------------------------------------------------------------------------------
    Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
   -----------------------------------------------------------------------------------------------------------------

  vanilla
    netperf:4337          |  21709.193 ms |     2932 | avg:    0.002 ms | max:    0.041 ms | max at:    112.154512 s
    netserver:4338        |  14629.459 ms |  5146990 | avg:    0.001 ms | max: 1615.864 ms | max at:    140.134496 s

  localwakelist-v1r2
    netperf:4339          |  29789.717 ms |     2460 | avg:    0.002 ms | max:    0.059 ms | max at:    138.205389 s
    netserver:4340        |  18858.767 ms |  7279005 | avg:    0.001 ms | max:    0.362 ms | max at:    135.709683 s
   -----------------------------------------------------------------------------------------------------------------

Note that the average wakeup delay is quite small on both the vanilla
kernel and with the two patches applied. However, there are significant
outliers with the vanilla kernel with the maximum one measured as 1615
milliseconds with a vanilla kernel but never worse than 0.362 ms with
both patches applied and a much higher rate of context switching.

Similarly a separate profile of cycles showed that 2.83% of all cycles
were spent in try_to_wake_up() with almost half of the cycles spent
on spinning on p->on_rq. With the two patches, the percentage of cycles
spent in try_to_wake_up() drops to 1.13%

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: valentin.schneider@arm.com
Cc: Hillf Danton <hdanton@sina.com>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/r/20200524202956.27665-3-mgorman@techsingularity.net
2020-05-25 07:04:10 +02:00
Peter Zijlstra
c6e7bd7afa sched/core: Optimize ttwu() spinning on p->on_cpu
Both Rik and Mel reported seeing ttwu() spend significant time on:

  smp_cond_load_acquire(&p->on_cpu, !VAL);

Attempt to avoid this by queueing the wakeup on the CPU that owns the
p->on_cpu value. This will then allow the ttwu() to complete without
further waiting.

Since we run schedule() with interrupts disabled, the IPI is
guaranteed to happen after p->on_cpu is cleared, this is what makes it
safe to queue early.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: valentin.schneider@arm.com
Cc: Hillf Danton <hdanton@sina.com>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/r/20200524202956.27665-2-mgorman@techsingularity.net
2020-05-25 07:01:44 +02:00
Huaixin Chang
d505b8af58 sched: Defend cfs and rt bandwidth quota against overflow
When users write some huge number into cpu.cfs_quota_us or
cpu.rt_runtime_us, overflow might happen during to_ratio() shifts of
schedulable checks.

to_ratio() could be altered to avoid unnecessary internal overflow, but
min_cfs_quota_period is less than 1 << BW_SHIFT, so a cutoff would still
be needed. Set a cap MAX_BW for cfs_quota_us and rt_runtime_us to
prevent overflow.

Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Link: https://lkml.kernel.org/r/20200425105248.60093-1-changhuaixin@linux.alibaba.com
2020-05-19 20:34:14 +02:00
Muchun Song
dbe9337109 sched/cpuacct: Fix charge cpuacct.usage_sys
The user_mode(task_pt_regs(tsk)) always return true for
user thread, and false for kernel thread. So it means that
the cpuacct.usage_sys is the time that kernel thread uses
not the time that thread uses in the kernel mode. We can
try get_irq_regs() first, if it is NULL, then we can fall
back to task_pt_regs().

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200420070453.76815-1-songmuchun@bytedance.com
2020-05-19 20:34:14 +02:00
Gustavo A. R. Silva
04f5c362ec sched/fair: Replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200507192141.GA16183@embeddedor
2020-05-19 20:34:14 +02:00
Vincent Guittot
95d685935a sched/pelt: Sync util/runnable_sum with PELT window when propagating
update_tg_cfs_*() propagate the impact of the attach/detach of an entity
down into the cfs_rq hierarchy and must keep the sync with the current pelt
window.

Even if we can't sync child cfs_rq and its group se, we can sync the group
se and its parent cfs_rq with current position in the PELT window. In fact,
we must keep them sync in order to stay also synced with others entities
and group entities that are already attached to the cfs_rq.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200506155301.14288-1-vincent.guittot@linaro.org
2020-05-19 20:34:14 +02:00
Muchun Song
12aa258738 sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr()
The cpuacct_charge() and cpuacct_account_field() are called with
rq->lock held, and this means preemption(and IRQs) are indeed
disabled, so it is safe to use __this_cpu_*() to allow for better
code-generation.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200507031039.32615-1-songmuchun@bytedance.com
2020-05-19 20:34:13 +02:00
Vincent Guittot
7d148be69e sched/fair: Optimize enqueue_task_fair()
enqueue_task_fair jumps to enqueue_throttle label when cfs_rq_of(se) is
throttled which means that se can't be NULL in such case and we can move
the label after the if (!se) statement. Futhermore, the latter can be
removed because se is always NULL when reaching this point.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200513135502.4672-1-vincent.guittot@linaro.org
2020-05-19 20:34:13 +02:00
Peter Zijlstra
9013196a46 Merge branch 'sched/urgent' 2020-05-19 20:34:12 +02:00
Vincent Guittot
39f23ce07b sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list
Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
are quite close and follow the same sequence for enqueuing an entity in the
cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
enqueue_task_fair(). This fixes a problem already faced with the latter and
add an optimization in the last for_each_sched_entity loop.

Fixes: fe61468b2c (sched/fair: Fix enqueue_task_fair warning)
Reported-by Tao Zhou <zohooouoto@zoho.com.cn>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Link: https://lkml.kernel.org/r/20200513135528.4742-1-vincent.guittot@linaro.org
2020-05-19 20:34:10 +02:00
Pavankumar Kondeti
ad32bb41fc sched/debug: Fix requested task uclamp values shown in procfs
The intention of commit 96e74ebf8d ("sched/debug: Add task uclamp
values to SCHED_DEBUG procfs") was to print requested and effective
task uclamp values. The requested values printed are read from p->uclamp,
which holds the last effective values. Fix this by printing the values
from p->uclamp_req.

Fixes: 96e74ebf8d ("sched/debug: Add task uclamp values to SCHED_DEBUG procfs")
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/1589115401-26391-1-git-send-email-pkondeti@codeaurora.org
2020-05-19 20:34:10 +02:00
Phil Auld
b34cb07dde sched/fair: Fix enqueue_task_fair() warning some more
sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2c (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list.  In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop. The list is not fixed up because the throttled parent was
already added back to the list by a task enqueue in a parallel child hierarchy.

Address this by calling list_add_leaf_cfs_rq if there are throttled parents
while doing the second for_each_sched_entity loop.

Fixes: fe61468b2c ("sched/fair: Fix enqueue_task_fair warning")
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20200512135222.GC2201@lorien.usersys.redhat.com
2020-05-19 20:34:10 +02:00
Thomas Gleixner
b1fcf9b83c rcu: Provide __rcu_is_watching()
Same as rcu_is_watching() but without the preempt_disable/enable() pair
inside the function. It is merked noinstr so it ends up in the
non-instrumentable text section.

This is useful for non-preemptible code especially in the low level entry
section. Using rcu_is_watching() there results in a call to the
preempt_schedule_notrace() thunk which triggers noinstr section warnings in
objtool.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512213810.518709291@linutronix.de
2020-05-19 15:51:21 +02:00
Thomas Gleixner
8ae0ae6737 rcu: Provide rcu_irq_exit_preempt()
Interrupts and exceptions invoke rcu_irq_enter() on entry and need to
invoke rcu_irq_exit() before they either return to the interrupted code or
invoke the scheduler due to preemption.

The general assumption is that RCU idle code has to have preemption
disabled so that a return from interrupt cannot schedule. So the return
from interrupt code invokes rcu_irq_exit() and preempt_schedule_irq().

If there is any imbalance in the rcu_irq/nmi* invocations or RCU idle code
had preemption enabled then this goes unnoticed until the CPU goes idle or
some other RCU check is executed.

Provide rcu_irq_exit_preempt() which can be invoked from the
interrupt/exception return code in case that preemption is enabled. It
invokes rcu_irq_exit() and contains a few sanity checks in case that
CONFIG_PROVE_RCU is enabled to catch such issues directly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.364456424@linutronix.de
2020-05-19 15:51:21 +02:00
Paul E. McKenney
9ea366f669 rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
The rcu_nmi_enter_common() and rcu_nmi_exit_common() functions take an
"irq" parameter that indicates whether these functions have been invoked from
an irq handler (irq==true) or an NMI handler (irq==false).

However, recent changes have applied notrace to a few critical functions
such that rcu_nmi_enter_common() and rcu_nmi_exit_common() many now rely on
in_nmi().  Note that in_nmi() works no differently than before, but rather
that tracing is now prohibited in code regions where in_nmi() would
incorrectly report NMI state.

Therefore remove the "irq" parameter and inline rcu_nmi_enter_common() and
rcu_nmi_exit_common() into rcu_nmi_enter() and rcu_nmi_exit(),
respectively.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134101.617130349@linutronix.de
2020-05-19 15:51:21 +02:00
Thomas Gleixner
ff5c4f5cad rcu/tree: Mark the idle relevant functions noinstr
These functions are invoked from context tracking and other places in the
low level entry code. Move them into the .noinstr.text section to exclude
them from instrumentation.

Mark the places which are safe to invoke traceable functions with
instrumentation_begin/end() so objtool won't complain.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/20200505134100.575356107@linutronix.de
2020-05-19 15:51:20 +02:00
Peter Zijlstra
0d00449c7a x86: Replace ist_enter() with nmi_enter()
A few exceptions (like #DB and #BP) can happen at any location in the code,
this then means that tracers should treat events from these exceptions as
NMI-like. The interrupted context could be holding locks with interrupts
disabled for instance.

Similarly, #MC is an actual NMI-like exception.

All of them use ist_enter() which only concerns itself with RCU, but does
not do any of the other setup that NMIs need. This means things like:

	printk()
	  raw_spin_lock_irq(&logbuf_lock);
	  <#DB/#BP/#MC>
	     printk()
	       raw_spin_lock_irq(&logbuf_lock);

are entirely possible (well, not really since printk tries hard to
play nice, but the concept stands).

So replace ist_enter() with nmi_enter(). Also observe that any nmi_enter()
caller must be both notrace and NOKPROBE, or in the noinstr text section.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134101.525508608@linutronix.de
2020-05-19 15:51:20 +02:00
Peter Zijlstra
5567d11c21 x86/mce: Send #MC singal from task work
Convert #MC over to using task_work_add(); it will run the same code
slightly later, on the return to user path of the same exception.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134100.957390899@linutronix.de
2020-05-19 15:51:19 +02:00
Thomas Gleixner
b052df3da8 x86/entry: Get rid of ist_begin/end_non_atomic()
This is completely overengineered and definitely not an interface which
should be made available to anything else than this particular MCE case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134059.462640294@linutronix.de
2020-05-19 15:51:19 +02:00
Peter Zijlstra
f93524eb9c sched,rcu,tracing: Avoid tracing before in_nmi() is correct
If a tracer is invoked before in_nmi() becomes true, the tracer can no
longer detect it is called from NMI context and behave correctly.

Therefore change nmi_{enter,exit}() to use __preempt_count_{add,sub}()
as the normal preempt_count_{add,sub}() have a (desired) function
trace entry.

This fixes a potential issue with the current code; when the function-tracer
has stack-tracing enabled __trace_stack() will malfunction when it hits the
preempt_count_add() function entry from NMI context.

Suggested-by: Steven Rostedt (VMware) <rosted@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134101.434193525@linutronix.de
2020-05-19 15:51:18 +02:00
Peter Zijlstra
178ba00c35 sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception
SuperH is the last remaining user of arch_ftrace_nmi_{enter,exit}(),
remove it from the generic code and into the SuperH code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200505134101.248881738@linutronix.de
2020-05-19 15:51:18 +02:00
Peter Zijlstra
e616cb8daa lockdep: Always inline lockdep_{off,on}()
These functions are called {early,late} in nmi_{enter,exit} and should
not be traced or probed. They are also puny, so 'inline' them.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134101.048523500@linutronix.de
2020-05-19 15:51:18 +02:00
Peter Zijlstra
69ea03b56e hardirq/nmi: Allow nested nmi_enter()
Since there are already a number of sites (ARM64, PowerPC) that effectively
nest nmi_enter(), make the primitive support this before adding even more.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lkml.kernel.org/r/20200505134100.864179229@linutronix.de
2020-05-19 15:51:17 +02:00
Frederic Weisbecker
28f6bf9e24 arm64: Prepare arch_nmi_enter() for recursion
When using nmi_enter() recursively, arch_nmi_enter() must also be recursion
safe. In particular, it must be ensured that HCR_TGE is always set while in
NMI context when in HYP mode, and be restored to it's former state when
done.

The current code fails this when interleaved wrong. Notably it overwrites
the original hcr state on nesting.

Introduce a nesting counter to make sure to store the original value.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lkml.kernel.org/r/20200505134100.771491291@linutronix.de
2020-05-19 15:51:17 +02:00
Peter Zijlstra
b0f51883f5 printk: Disallow instrumenting print_nmi_enter()
It happens early in nmi_enter(), no tracing, probing or other funnies
allowed. Specifically as nmi_enter() will be used in do_debug(), which
would cause recursive exceptions when kprobed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134101.139720912@linutronix.de
2020-05-19 15:51:16 +02:00
Petr Mladek
8c4e93c362 printk: Prepare for nested printk_nmi_enter()
There is plenty of space in the printk_context variable. Reserve one byte
there for the NMI context to be on the safe side.

It should never overflow. The BUG_ON(in_nmi() == NMI_MASK) in nmi_enter()
will trigger much earlier.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134100.681374113@linutronix.de
2020-05-19 15:51:16 +02:00
Thomas Gleixner
1ed0948eea Merge tag 'noinstr-lds-2020-05-19' into core/rcu
Get the noinstr section and annotation markers to base the RCU parts on.
2020-05-19 15:50:34 +02:00
Thomas Gleixner
6553896666 vmlinux.lds.h: Create section for protection against instrumentation
Some code pathes, especially the low level entry code, must be protected
against instrumentation for various reasons:

 - Low level entry code can be a fragile beast, especially on x86.

 - With NO_HZ_FULL RCU state needs to be established before using it.

Having a dedicated section for such code allows to validate with tooling
that no unsafe functions are invoked.

Add the .noinstr.text section and the noinstr attribute to mark
functions. noinstr implies notrace. Kprobes will gain a section check
later.

Provide also a set of markers: instrumentation_begin()/end()

These are used to mark code inside a noinstr function which calls
into regular instrumentable text section as safe.

The instrumentation markers are only active when CONFIG_DEBUG_ENTRY is
enabled as the end marker emits a NOP to prevent the compiler from merging
the annotation points. This means the objtool verification requires a
kernel compiled with this option.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134100.075416272@linutronix.de
2020-05-19 15:47:20 +02:00
Linus Torvalds
b9bbe6ed63 Linux 5.7-rc6 2020-05-17 16:48:37 -07:00
Linus Torvalds
8feea6233d Convert i2c_new_device() to i2c_new_client_device()
Wolfram Sang has asked to have this included in 5.7 so the deprecated
 API can be removed next release.  There should be no functional
 difference.
 
 I think that entire this section of code can be removed; it is leftover
 from other things that have since changed, but this is the safer thing
 to do for now.  The full removal can happen next release.
 
 Thanks,
 
 -corey
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE/Q1c5nzg9ZpmiCaGYfOMkJGb/4EFAl7Bk3oACgkQYfOMkJGb
 /4FIYw//X+LJwWSyK/4CAJC/9ZkjuFTtSqTKLPTGokZOt559Ug/ukkvp66X5cnpm
 vp+VEHnR4fxDRFg/CuQHSqlUJIEiMbc92sdNRqeO6unDsgUceab7Zt8xVQ/5mL+N
 D1GWDglw6UcGxjH0W3vFh67GP+tRix3WOE+GvTIzq0dXs9mALkl76EL7w4kDuacB
 21rnetsMBBw1Yx/495ZOqnWWhzCyVTDnQZ4k38KXGTZtUvZcENBZngakEnnbzzMa
 Uqef6nAfrH8aRi3fF14GfDII5oGCBltkmbHccQdFR+q4s2ShmCi+woxw1DSVWQ9V
 kABFGn69fVnQ+e/cd8aaIoTLqcQ2WSh5m1x2w9h1+g195g+s5a+H4ryQWr1MnwRO
 RKPH/egt/y5d9cd63EwtkD25RZAJDxf6ZFXNbrEDLRRepI3cOQz9aI2MXw3h1JJY
 jlUcFJ8XGCy+rrf6U6laJwled6WGppEsaLH6sBOx2uWsaSJNuqfbFllDiTj7V4qG
 EGhGOsRbcrrZLQWbGOGgapzRnwTj9XvAER3poOXWiHjR+Krzo1y9w7N1aRyk10W/
 cXG/lg3pAUsYt32tT00m/e/LCN5opuGJ8fgitCb1vpg1xn0B1Xmbgt2KUXi9BAvg
 u9PFe6X8jJ5LacYBLwkjMPNXMhgcUXkahGCbehv9bd8H875Gyts=
 =JH+B
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.7-2' of git://github.com/cminyard/linux-ipmi

Pull IPMI update from Corey Minyard:
 "Convert i2c_new_device() to i2c_new_client_device()

  Wolfram Sang has asked to have this included in 5.7 so the deprecated
  API can be removed next release. There should be no functional
  difference.

  I think that entire this section of code can be removed; it is
  leftover from other things that have since changed, but this is the
  safer thing to do for now. The full removal can happen next release"

* tag 'for-linus-5.7-2' of git://github.com/cminyard/linux-ipmi:
  char: ipmi: convert to use i2c_new_client_device()
2020-05-17 16:07:30 -07:00
Linus Torvalds
9b1f2cbdb6 Some more clk driver fixes for the merge window and one core framework
fix:
 
  - A handful of TI driver fixes for bad of_node_put() and incorrect
    parent names
  - Rockchip rk3228 aclk_gpu* creation was interfering with lima GPU work
    so we use a composite clk now
  - Resuming from suspend on Tegra Jetson TK1 was broken because an audio
    PLL calculated an incorrect rate
  - A fix for devicetree probing on IM-PD1 by actually specifying a clk
    name which is required to pass clk registration
  - Avoid list corruption if registration fails for a critical clk
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAl7BctURHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSWzxA//Tc9OcFfC9QSJNxrdl38EB8NB5Q+9d1r3
 7taGgXEG1FwpzNOs5ojFPSK8kJbbMINm9jKnbEYlwu739ibA/3y/Q0ifnoZPM28s
 fDIWdzrH7heTr0+R9iqNeQFP0lD2lFRg7bye1ofjKTGaKWZuY4gFPmCk+w4xcfHH
 zveqpiiIr4B67VUvsOGtmBxmAW2Y6BT6XgXIZLfGkNQQit4MKVbJISPDTcenTaMu
 H1UPHoFSM7UMfUutGHE6jkAY6yVv/4wdGYeIckkhXK7pkF/3Ro5eaRf63qyBcA92
 gHSO4UZB90nP0+O6gdwyI3y7wPUjVdIEgP1M83+uMSdgbpbhMa/cEc0JKY7Ib80T
 L5RXerOE/eOvRmUThl8XOoITbkz5SfpHd0rfnHBU5E9bJk53o51y0Y1Aww6rcVQY
 ebUcGEnxfYOSxclKelq0FGV/XyGDxsJTInWGyaoAr1LglzhD44OQaLuaMKiIgMgV
 FMYAQJk0PljwLVYVg9Xsr79wvJPswi0kw/OOMrJ2b6C7yK2JQ92LiWFvLCH8c1dA
 tCiGQJg5I9heBF6DgIGFbjrSzSbI511GBAbf9lXz80aKt6SchxM5mCc8TyEStNO2
 1REUHPq3FSA8cvnPEnxG3U0IVMHcm+nAig7DpWIsh2cIm8CfKBpvyb0FY/OK/b64
 3TkWVm+VANM=
 =Sq8s
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "Some more clk driver fixes and one core framework fix:

   - A handful of TI driver fixes for bad of_node_put() and incorrect
     parent names

   - Rockchip rk3228 aclk_gpu* creation was interfering with lima GPU
     work so we use a composite clk now

   - Resuming from suspend on Tegra Jetson TK1 was broken because an
     audio PLL calculated an incorrect rate

   - A fix for devicetree probing on IM-PD1 by actually specifying a clk
     name which is required to pass clk registration

   - Avoid list corruption if registration fails for a critical clk"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: ti: clkctrl: convert subclocks to use proper names also
  clk: ti: am33xx: fix RTC clock parent
  clk: ti: clkctrl: Fix Bad of_node_put within clkctrl_get_name
  clk: tegra: Fix initial rate for pll_a on Tegra124
  clk: impd1: Look up clock-output-names
  clk: Unlink clock if failed to prepare or enable
  clk: rockchip: fix incorrect configuration of rk3228 aclk_gpu* clocks
2020-05-17 12:33:00 -07:00
Linus Torvalds
fb27bc034d USB fixes for 5.7-rc6
Here are a number of USB fixes for 5.7-rc6
 
 The "largest" in here is a bunch of raw-gadget fixes and api changes as
 the driver just showed up in -rc1 and work has been done to fix up some
 uapi issues found with the original submission, before it shows up in a
 -final release.
 
 Other than that, a bunch of other small USB gadget fixes, xhci fixes,
 some quirks, andother tiny fixes for reported issues.
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXsEF2A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynrawCggmWnCKh2vFXUwIkyfDtS2HKm6q0AoMmBH76F
 isVpqHKAVOQ+LCDNhV6U
 =WzGX
 -----END PGP SIGNATURE-----

Merge tag 'usb-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a number of USB fixes for 5.7-rc6

  The "largest" in here is a bunch of raw-gadget fixes and api changes
  as the driver just showed up in -rc1 and work has been done to fix up
  some uapi issues found with the original submission, before it shows
  up in a -final release.

  Other than that, a bunch of other small USB gadget fixes, xhci fixes,
  some quirks, andother tiny fixes for reported issues.

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
  USB: gadget: fix illegal array access in binding with UDC
  usb: core: hub: limit HUB_QUIRK_DISABLE_AUTOSUSPEND to USB5534B
  USB: usbfs: fix mmap dma mismatch
  usb: host: xhci-plat: keep runtime active when removing host
  usb: xhci: Fix NULL pointer dereference when enqueuing trbs from urb sg list
  usb: cdns3: gadget: make a bunch of functions static
  usb: mtu3: constify struct debugfs_reg32
  usb: gadget: udc: atmel: Make some symbols static
  usb: raw-gadget: fix null-ptr-deref when reenabling endpoints
  usb: raw-gadget: documentation updates
  usb: raw-gadget: support stalling/halting/wedging endpoints
  usb: raw-gadget: fix gadget endpoint selection
  usb: raw-gadget: improve uapi headers comments
  usb: typec: mux: intel: Fix DP_HPD_LVL bit field
  usb: raw-gadget: fix return value of ep read ioctls
  usb: dwc3: select USB_ROLE_SWITCH
  usb: gadget: legacy: fix error return code in gncm_bind()
  usb: gadget: legacy: fix error return code in cdc_bind()
  usb: gadget: legacy: fix redundant initialization warnings
  usb: gadget: tegra-xudc: Fix idle suspend/resume
  ...
2020-05-17 12:31:22 -07:00
Linus Torvalds
b48397cb75 Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull execve fix from Eric Biederman:
 "While working on my exec cleanups I found a bug in exec that I
  introduced by accident a couple of years ago. I apparently missed the
  fact that bprm->file can change.

  Now I have a very personal motive to clean up exec and make it more
  approachable.

  The change is just moving woud_dump to where it acts on the final
  bprm->file not the initial bprm->file. I have been careful and tested
  and verify this fix works"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  exec: Move would_dump into flush_old_exec
2020-05-17 12:23:37 -07:00
Linus Torvalds
ef0d5b9102 A single bugfix for the ORC unwinder to ensure that the error flag which
tells the unwinding code whether a stack trace can be trusted or not is
 always set correctly. This was messed up by a couple of changes in the
 recent past.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7BC+gTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWFBEACR8MiO0VM2XXNsejd7rttgs/eoC4/M
 IKM5K1hq4eRCTodwVnkWwLk6p0asAMKhzpWQ3MS5RJBNAYxLbbxnsYSGtd8zIsdV
 wk6jbNYeT2MUZq2tYkjn3b9B6+91FFMZq6q+KDOfNPqcKZyP4n5o5QSewznBvQwt
 dHvjGgegJDjrrtuhLSQKG/uvSSi2hN9S5ibSMCa004GnH6P+uk/eICpvUXwNCyjV
 ygogYTmQQqAEqnlqVNdQxo+DFYbaxKCw12VSoBeOsEySljPdc136hP/j7Tzbf2em
 rkqtyXwng1+yG0vozMCAkyP5l3uA+HUculQLdmO8/55eia5Dl/zgsp3SvW7/2ONS
 0DRfGo0ghoZgId1oDu6DGPsX80wKKskerJpTN/tHWTXQWeUXCNXrX//lhrFiwd7P
 mHiyuk+INw3LQBkTlf7XhAf28w/9/+gCm3prEGnUCmLaJOeZ8HtL0mwDzudgc9Ca
 NW/b3tdt4JU3oXKyyqywr4XAYfxlfmyf3DrBMnuHdTgccaB9PAAzugjmDnFJOuzk
 jQw/Qfd6w7ZgVcVoaNQjjeogMTryGthCOPe9DzPUgkr+jCDsMwXopCvxbhbWI9e5
 L1/U5ilka/VC2ZP7qZUvwsltCgp6RamhDb3yLZbn/2PKf0sFKVoI/j/g1qMnLNZt
 TBNjzYuWAC8Hlw==
 =4kDr
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 stack unwinding fix from Thomas Gleixner:
 "A single bugfix for the ORC unwinder to ensure that the error flag
  which tells the unwinding code whether a stack trace can be trusted or
  not is always set correctly.

  This was messed up by a couple of changes in the recent past"

* tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Fix error handling in __unwind_start()
2020-05-17 12:20:14 -07:00
Linus Torvalds
43567139f5 A single fix for early boot crashes of kernels built with gcc10 and
stack protector enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl7A+q4ACgkQEsHwGGHe
 VUpvtA/+NNPKVGSKZPdDlUm64JEPy7XrbzFJ+zigWGQjUPtZsDkAT4U33eQIvV5f
 ea7vB2u+e7iRZBExgTI1JfyjTenGpBffhubR/ueawtxeTgvZSopFajHQir/VGPlJ
 KQdtqe2wZek3Wux8BsKl8vcbqhgNH/LKgQzoG2y5P1LuA77MpFkMVkAoxKqbTDbt
 Nx7j147ffZBJHfmUHz2/nWD9r0Exu+abeSPJeO4T52ImhVkr+Pd1nFS8S+mRCHMj
 uJjxL/nB/sZmDDX+EX/zA7Du3ibaVa2po9cuhMTwNIPZIpak8Yyopl64fVm/N7jH
 w0DIc1CgEaA1IkG7lwyKSgB/T6Fsg4SQp8gM4V3BkcTgVDuhTH0J/kGrOk2+YFSc
 akk3420XBS4Q54BQ547woOImabxgQXDBvqBq+DhJFwP1qSllUXbZX7rlwZ3VQ160
 sfmItVM0c4J9bgaXqZuwqHxJdgakaIECkXWZwpksQAzVxaOKpZo7drLq6SDhX9HH
 BZdm/5AhIJ5rIGaiMXsZj5cC+H341N5TlaXA+I2b0r/vVOLtbe3it1rbSsvMoZJQ
 7WOesyqFSjSObDUpXZ0riLl1X+rdrCAfzHsm5IMwLAoxmv80973johZKNZIgqIoh
 CbPdyvaJoNK8FK6gT7bw3HNJ1ILGqk53jpWH1Gr1MlfzSzErOdQ=
 =5Xi5
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:
 "A single fix for early boot crashes of kernels built with gcc10 and
  stack protector enabled"

* tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix early boot crash on gcc-10, third try
2020-05-17 11:08:29 -07:00
Eric W. Biederman
f87d1c9559 exec: Move would_dump into flush_old_exec
I goofed when I added mm->user_ns support to would_dump.  I missed the
fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and
binfmt_script bprm->file is reassigned.  Which made the move of
would_dump from setup_new_exec to __do_execve_file before exec_binprm
incorrect as it can result in would_dump running on the script instead
of the interpreter of the script.

The net result is that the code stopped making unreadable interpreters
undumpable.  Which allows them to be ptraced and written to disk
without special permissions.  Oops.

The move was necessary because the call in set_new_exec was after
bprm->mm was no longer valid.

To correct this mistake move the misplaced would_dump from
__do_execve_file into flos_old_exec, before exec_mmap is called.

I tested and confirmed that without this fix I can attach with gdb to
a script with an unreadable interpreter, and with this fix I can not.

Cc: stable@vger.kernel.org
Fixes: f84df2a6f2 ("exec: Ensure mm->user_ns contains the execed files")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-17 10:48:24 -05:00
Linus Torvalds
5a9ffb954a three small cifs/smb3 fixes, one for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl7Ao9QACgkQiiy9cAdy
 T1GjGgv+L2zqdaHOFaFWFsQejY5DjQ7U7EjwMvCLBoM1RgTIPosQCdwo8EqNkPm/
 fHtHVyG7I2vHjv9zmcxPPphasHOl/WwDZf8VP9u+cRCH+/2NRTZziCqW1kFpi4ET
 q88K5DWD6FMZVZZxP+mlJKLws3Za+I0wujx3VylbRfX20mniFLNFGQyNA3TCCw8k
 gEiv4TUE9dhzX+PULLIL3/63ZIYay3IfwN3GTuLIdOMlINGj1DxfrXX1VTVeiNKb
 uuii5Sb5XbHt0+ZylU787Sbvr7t61GZXDjBwKV12o/P2kcRc2BklekkDbs20FZJZ
 g5rkmcUYY0atDEak6MYr931QE8LgotsQL9aH9n5Mb7Hra1t9/lfjoDov+KoA83S8
 7dnzBaDzrlsbij15DADYrg0ygJe/zcUHq88ETc8UH4dQUrnhJQSn5Tf+xASP8H0N
 SJgpGxIQs30rMUKstFz1/xD1yJ30M55kxhU5Lchre5WuGorGR57c7Kv/j2O2leR8
 iiddqtPE
 =aPOp
 -----END PGP SIGNATURE-----

Merge tag '5.7-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Three small cifs/smb3 fixes, one for stable"

* tag '5.7-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix leaked reference on requeued write
  cifs: Fix null pointer check in cifs_read
  CIFS: Spelling s/EACCESS/EACCES/
2020-05-16 21:43:11 -07:00
Linus Torvalds
5d438e071f A new testcase for guest debugging (gdbstub) that exposed a bunch of
bugs, mostly for AMD processors.  And a few other x86 fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6/0xcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOZuwf/bQZw/SP9awLjOOVsRaSWUmwRGD4q
 6KVq9+JYsPU4CyJ7P+vdsFF39a0ixoAnKWqRe/vsXdXZrdYCDUuQxh+7X+lmjKAb
 dCQBnoqxI0w3yuxrm9Kn6Xs1AGIWibaRlZnXUKbuyn4ecFrh08OfYKGkYsEovhxK
 G4ftY4/xyM7Qvm0fq7ZmzxPrkzd74HDZBvB83R6uiyPiX3w4O9qumqkUogcVXIJX
 l3mnvSPClDDX4FOr8uhnU93varuR7Bek4Fh+Abj4uNks/F3z9ooJO9Hy9E+V5fhY
 g6Oj2IrxDwJ2G6hqyucr1kujukJC1bX2nMZ1O4gNayXsxZEU/JtI0Y26SA==
 =EzBt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "A new testcase for guest debugging (gdbstub) that exposed a bunch of
  bugs, mostly for AMD processors. And a few other x86 fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
  KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c
  KVM: SVM: Disable AVIC before setting V_IRQ
  KVM: Introduce kvm_make_all_cpus_request_except()
  KVM: VMX: pass correct DR6 for GD userspace exit
  KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6
  KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6
  KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on
  KVM: selftests: Add KVM_SET_GUEST_DEBUG test
  KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG
  KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUG
  KVM: x86: fix DR6 delivery for various cases of #DB injection
  KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
2020-05-16 13:39:22 -07:00
Linus Torvalds
befc42e5dd powerpc fixes for 5.7 #4
A fix for unrecoverable SLB faults in the interrupt exit path, introduced by the
 recent rewrite of interrupt exit in C.
 
 Four fixes for our KUAP (Kernel Userspace Access Prevention) support on 64-bit.
 These are all fairly minor with the exception of the change to evaluate the
 get/put_user() arguments before we enable user access, which reduces the amount
 of code we run with user access enabled.
 
 A fix for our secure boot IMA rules, if enforcement of module signatures is
 enabled at runtime rather than build time.
 
 A fix to our 32-bit VDSO clock_getres() which wasn't falling back to the syscall
 for unknown clocks.
 
 A build fix for CONFIG_PPC_KUAP_DEBUG on 32-bit BookS, and another for 40x.
 
 Thanks to:
   Christophe Leroy, Hugh Dickins, Nicholas Piggin, Aurelien Jarno, Mimi Zohar,
   Nayna Jain.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl6/z6ATHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgKM8D/9ibff532jpCewfwNNXztn3SxRpboQJ
 MgSpnqBlhjoezu84oPwGTPSBYiV/m4hVs5+vUVELKN1cvSmGFkRMoxI80MD3sPEj
 DWoyKmRpG2Uz6tSbU1H/1fA2MEHmG+saYE1g8k0S2m03mRluD4hcSW7jgHiTbFvc
 dC2i/fpf9SloHjEAu/3Wq/j3/6PI8pEoOvZozeLdusipLQgH7zODcW7noV1vF/pK
 BG4me8EVd7QImRRsJrPSlq/292wlSG6G+AwIlEYQbEv4rMqVNzc1WMPlBndHC27B
 ROT+84Ol1eX+jmbq+NTGpZFoWoozmZfTyEhVIQy8IMEiJjhsrh8nBLPP/KiDXjIA
 yujFgP4xfeLbYU+jydBrxun+Kd/DQxbnAIgJ3rJZdbGG2anqt9oWK+88+Y7rnnAY
 wjong1gQVPGr+y0Hj/6ahnXEs2En7W6dLw/EOHDRzazusFRj50a4JGAB1y2WAKHd
 yfa/6KWTUOvMvkkldPiIV1Uz+LFiUUGcHcnx67Fi19lj15W1JzTiHXqs15Ka0n8E
 X2n247d5rD55cS7owqfwBtso5zMlFITuPvr0lMv6hTP0bs0OuTiWCh+od3UeGKyy
 Rxa7piq3/8yB16cqEgbnQmcP0rkUD0WNGQamc2/02buD+GLV6DbgSSSDixUKyVJi
 0/vi7nqaKVbIsg==
 =XkA+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - A fix for unrecoverable SLB faults in the interrupt exit path,
   introduced by the recent rewrite of interrupt exit in C.

 - Four fixes for our KUAP (Kernel Userspace Access Prevention) support
   on 64-bit. These are all fairly minor with the exception of the
   change to evaluate the get/put_user() arguments before we enable user
   access, which reduces the amount of code we run with user access
   enabled.

 - A fix for our secure boot IMA rules, if enforcement of module
   signatures is enabled at runtime rather than build time.

 - A fix to our 32-bit VDSO clock_getres() which wasn't falling back to
   the syscall for unknown clocks.

 - A build fix for CONFIG_PPC_KUAP_DEBUG on 32-bit BookS, and another
   for 40x.

Thanks to: Christophe Leroy, Hugh Dickins, Nicholas Piggin, Aurelien
Jarno, Mimi Zohar, Nayna Jain.

* tag 'powerpc-5.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/40x: Make more space for system call exception
  powerpc/vdso32: Fallback on getres syscall when clock is unknown
  powerpc/32s: Fix build failure with CONFIG_PPC_KUAP_DEBUG
  powerpc/ima: Fix secure boot rules in ima arch policy
  powerpc/64s/kuap: Restore AMR in fast_interrupt_return
  powerpc/64s/kuap: Restore AMR in system reset exception
  powerpc/64/kuap: Move kuap checks out of MSR[RI]=0 regions of exit code
  powerpc/64s: Fix unrecoverable SLB crashes due to preemption check
  powerpc/uaccess: Evaluate macro arguments once, before user access is allowed
2020-05-16 13:34:45 -07:00
Linus Torvalds
26b089a7fc csky updates for 5.7-rc6
10 Fixups for this updates :P
 
  - 1 fixup for copy_from/to_user (a hard-to-find bug, thx Viro)
  - 1 fixup for calltrace panic without FRAME_POINT
  - 2 fixups for perf
  - 2 fixups for compile error
  - 4 fixups for non-fatal bugs:
    (msa, rm dis_irq, cleanup psr, gdbmacros.txt)
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEE2KAv+isbWR/viAKHAXH1GYaIxXsFAl6/sPwSHGd1b3JlbkBr
 ZXJuZWwub3JnAAoJEAFx9RmGiMV74vcP/1OJG6Kji2mzxet/ciLk90ntuXfij9sY
 qrEq2d+uikGqRn+T9+ebi+TMSd8DTi4SdGxiiH3o35+mF9N2cDQegjRbk5/P4h4e
 3G8+C5HuG654qjMqUO6FPoTZrEWHKN8qe2fCQGfpFvZb54C6M1CcIEjfE6KFcxZp
 2SDtGBk2Sx/F/oLNGSXXvm83WZY+vRZQUpKT9dJZzTd+nQaeE31YBpXeGPHnw7EP
 brqa3kb7tPYLpPLhb046bjWW0CqVbACqIRdDW0rRko5gPF1RiXzUdvaAvzfzifor
 E3FLoCfP7LnoK5mDVDyLwjkmRPM3RgYpCBy4wUTKVJF9rRd3Hvytd5OswiS+gxds
 jyrr+ctII4/ENTsvxbDEoFCDox07sIwOlMkPQ+AY+gDjI2pBBxvONE+TQH1nmTkg
 pzutY/e6C3MB/dAvwF9ICATuJb/Ho8Xj2xgkzKVg1JCp5XB1Apv+Ll4+kbDvrBB5
 WVwpIUI+uTdxbwN6nbFL+chpyvSKN8TbmG8h1OThFfLgYOqc2ZGBFaBtjX8D+yk0
 8l+arVCKa1Zx8XJiC1mP9JbL14Ck9HvU4qnOwWOFaYxWSP2I6W/QYGejE6pVpEOH
 hqYSkl2k0ZzaEQ29ZzM8pf9b6EPlg1t20PAg+pnXyavnLurfRCDH8wJVkGVWA+mP
 8ZL2MemlYNXu
 =9/IQ
 -----END PGP SIGNATURE-----

Merge tag 'csky-for-linus-5.7-rc6' of git://github.com/c-sky/csky-linux

Pull csky updates from Guo Ren:

 - fix for copy_from/to_user (a hard-to-find bug, thx Viro)

 - fix for calltrace panic without FRAME_POINT

 - two fixes for perf

 - two build fixes

 - four fixes for non-fatal bugs (msa, rm dis_irq, cleanup psr,
   gdbmacros.txt)

* tag 'csky-for-linus-5.7-rc6' of git://github.com/c-sky/csky-linux:
  csky: Fixup raw_copy_from_user()
  csky: Fixup gdbmacros.txt with name sp in thread_struct
  csky: Fixup remove unnecessary save/restore PSR code
  csky: Fixup remove duplicate irq_disable
  csky: Fixup calltrace panic
  csky: Fixup perf callchain unwind
  csky: Fixup msa highest 3 bits mask
  csky: Fixup perf probe -x hungup
  csky: Fixup compile error for abiv1 entry.S
  csky/ftrace: Fixup error when disable CONFIG_DYNAMIC_FTRACE
2020-05-16 13:31:40 -07:00
Linus Torvalds
5c33696f2b ARM: SoC/dt fixes for v5.7
This round of fixes is almost exclusively device tree changes,
 with trivial defconfig fixes and one compiler warning fix
 added in.
 
 A number of patches are to fix dtc warnings, in particular on
 Amlogic, i.MX and Rockchips.
 
 Other notable changes include:
 
 Renesas:
  - Fix a wrong clock configuration on R-Mobile A1,
  - Fix IOMMU support on R-Car V3H
 
 Allwinner
  - Multiple audio fixes
 
 Qualcomm
  - Use a safe CPU voltage on MSM8996
  - Fixes to match a late audio driver change
 
 Rockchip:
  - Some fixes for the newly added Pinebook Pro
 
 NXP i.MX:
  - Fix I2C1 pinctrl configuration for i.MX27 phytec-phycard board.
  - Fix imx6dl-yapp4-ursa board Ethernet connection.
 
 OMAP:
  - A regression fix for non-existing can device on am534x-idk
  - Fix flakey wlan on droid4 where some devices would not connect
    at all because of internal pull being used with an external pull
  - Fix occasional missed wake-up events on droid4 modem uart
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl6/Dx8ACgkQmmx57+YA
 GNmRJw/+MjrVUuD+KHjkfOX68zJ4puLluj1PGn//t0DnQPkrmPnDMP0gs/62cMNc
 4U3SIlYCivSOiz6N6gyvEU+j/sMqzCn7J3jBsUqCEvcOBm0X3b1T6OAwK8kp1TVq
 zAbEs/YaHBYKPIUOrstZwLkTCDLUXQlqPs7aEubyZ+awqa+EW9joGDyrus53jCIP
 pbaLV+52TGpOENfKRv05k6eJrtLfNM6Yt5qoCPRE4DYvbSumwXarPT/WOC1V5C/9
 KUwh3bVi6bRUEuhIHomnKqLK+GA8DMVw+HU9vNHDHexbdSQGsIMuFrCyWktwEizq
 7FTAKpEiuFJkOD6eyNCe3x5f2W5isDRvV8ehULaCRa8CbiBJUGY9wOdllMAYLuna
 haFGjJ8aKIPwgmoxjFL534hXprAMAGAAm8ZUcULIS+X8/8R2c+fzIadJ6ZS+VmNh
 2Nq4qNAPh8pTJJXzM9oNtI7HQLwOkAp49/r5FOrSReUFQVt84ofHy/QxAUdDMTkq
 7250L8G+SrpDQiNxck2pOYIKNqQcRUeJU60fTlkQlljBIjKOTn9zUrEhYLpujD3g
 vKLBmpek67fHfXB/sBpFOALCYw7prRacIyZrDHUKgEhP85WnkXXoZ+NQL5edgZGC
 qZ+h5xeRc6eBFYSP30HYQYL4fhTYokvWSdu5AbWBdJlAoAHk1mQ=
 =fXzR
 -----END PGP SIGNATURE-----

Merge tag 'arm-soc-fixes-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC/dt fixes from Arnd Bergmann:
 "This round of fixes is almost exclusively device tree changes, with
  trivial defconfig fixes and one compiler warning fix added in.

  A number of patches are to fix dtc warnings, in particular on Amlogic,
  i.MX and Rockchips.

  Other notable changes include:

  Renesas:
   - Fix a wrong clock configuration on R-Mobile A1
   - Fix IOMMU support on R-Car V3H

  Allwinner
   - Multiple audio fixes

  Qualcomm
   - Use a safe CPU voltage on MSM8996
   - Fixes to match a late audio driver change

  Rockchip:
   - Some fixes for the newly added Pinebook Pro

  NXP i.MX:
   - Fix I2C1 pinctrl configuration for i.MX27 phytec-phycard board
   - Fix imx6dl-yapp4-ursa board Ethernet connection

  OMAP:
   - A regression fix for non-existing can device on am534x-idk
   - Fix flakey wlan on droid4 where some devices would not connect at
     all because of internal pull being used with an external pull
   - Fix occasional missed wake-up events on droid4 modem uart"

* tag 'arm-soc-fixes-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (51 commits)
  ARM: dts: iwg20d-q7-dbcm-ca: Remove unneeded properties in hdmi@39
  ARM: dts: renesas: Make hdmi encoder nodes compliant with DT bindings
  arm64: dts: renesas: Make hdmi encoder nodes compliant with DT bindings
  arm64: defconfig: add MEDIA_PLATFORM_SUPPORT
  arm64: defconfig: ARCH_R8A7795: follow changed config symbol name
  arm64: defconfig: add DRM_DISPLAY_CONNECTOR
  arm64: defconfig: DRM_DUMB_VGA_DAC: follow changed config symbol name
  ARM: oxnas: make ox820_boot_secondary static
  ARM: dts: r8a7740: Add missing extal2 to CPG node
  ARM: dts: omap4-droid4: Fix occasional lost wakeirq for uart1
  ARM: dts: omap4-droid4: Fix flakey wlan by disabling internal pull for gpio
  arm64: dts: allwinner: a64: Remove unused SPDIF sound card
  arm64: dts: allwinner: a64: pinetab: Fix cpvdd supply name
  arm64: dts: meson-g12: remove spurious blank line
  arm64: dts: meson-g12b-khadas-vim3: add missing frddr_a status property
  arm64: dts: meson-g12-common: fix dwc2 clock names
  arm64: dts: meson-g12b-ugoos-am6: fix usb vbus-supply
  arm64: dts: freescale: imx8mp: update input_val for AUDIOMIX_BIT_STREAM
  ARM: dts: r7s9210: Remove bogus clock-names from OSTM nodes
  ARM: dts: rockchip: fix pinctrl sub nodename for spi in rk322x.dtsi
  ...
2020-05-16 13:27:58 -07:00