With:
commit 2d4056fafa ("sched/numa: Remove numa_has_capacity()")
the local variables 'smt', 'cpus' and 'capacity' and their results are not used
anymore in numa_has_capacity()
Remove this unused code.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1535548752-4434-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
LLVM has a warning that tags expressions like:
if (foo && non-bool-const)
This pattern triggers for CONFIG_SCHED_DEBUG=n where sched_feat() ends
up being whatever bit we select. Avoid the warning with an explicit
cast to bool.
Reported-by: Philipp Klocke <philipp97kl@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'prefer sibling' sched_domain flag is intended to encourage
spreading tasks to sibling sched_domain to take advantage of more caches
and core for SMT systems. It has recently been changed to be on all
non-NUMA topology level. However, spreading across domains with CPU
capacity asymmetry isn't desirable, e.g. spreading from high capacity to
low capacity CPUs even if high capacity CPUs aren't overutilized might
give access to more cache but the CPU will be slower and possibly lead
to worse overall throughput.
To prevent this, we need to remove SD_PREFER_SIBLING on the sched_domain
level immediately below SD_ASYM_CPUCAPACITY.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-13-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When lower capacity CPUs are load balancing and considering to pull
something from a higher capacity group, we should not pull tasks from a
CPU with only one task running as this is guaranteed to impede progress
for that task. If there is more than one task running, load balance in
the higher capacity group would have already made any possible moves to
resolve imbalance and we should make better use of system compute
capacity by moving a task if we still have more than one running.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-11-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Idle balance is a great opportunity to pull a misfit task. However,
there are scenarios where misfit tasks are present but idle balance is
prevented by the overload flag.
A good example of this is a workload of n identical tasks. Let's suppose
we have a 2+2 Arm big.LITTLE system. We then spawn 4 fairly
CPU-intensive tasks - for the sake of simplicity let's say they are just
CPU hogs, even when running on big CPUs.
They are identical tasks, so on an SMP system they should all end at
(roughly) the same time. However, in our case the LITTLE CPUs are less
performing than the big CPUs, so tasks running on the LITTLEs will have
a longer completion time.
This means that the big CPUs will complete their work earlier, at which
point they should pull the tasks from the LITTLEs. What we want to
happen is summarized as follows:
a,b,c,d are our CPU-hogging tasks _ signifies idling
LITTLE_0 | a a a a _ _
LITTLE_1 | b b b b _ _
---------|-------------
big_0 | c c c c a a
big_1 | d d d d b b
^
^
Tasks end on the big CPUs, idle balance happens
and the misfit tasks are pulled straight away
This however won't happen, because currently the overload flag is only
set when there is any CPU that has more than one runnable task - which
may very well not be the case here if our CPU-hogging workload is all
there is to run.
As such, this commit sets the overload flag in update_sg_lb_stats when
a group is flagged as having a misfit task.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-10-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This variable can be read and set locklessly within update_sd_lb_stats().
As such, READ/WRITE_ONCE() are added to make sure nothing terribly wrong
can happen because of the compiler.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-9-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
sizeof(_Bool) is implementation defined, so let's just go with 'int' as
is done for other structures e.g. sched_domain_shared->has_idle_cores.
The local 'overload' variable used in update_sd_lb_stats can remain
bool, as it won't impact any struct layout and can be assigned to the
root_domain field.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-8-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This variable is entirely local to update_sd_lb_stats, so we can
safely change its type and slightly clean up its initialisation.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-7-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There already are a few conditions in nohz_kick_needed() to ensure
a nohz kick is triggered, but they are not enough for some misfit
task scenarios. Excluding asym packing, those are:
- rq->nr_running >=2: Not relevant here because we are running a
misfit task, it needs to be migrated regardless and potentially through
active balance.
- sds->nr_busy_cpus > 1: If there is only the misfit task being run
on a group of low capacity CPUs, this will be evaluated to False.
- rq->cfs.h_nr_running >=1 && check_cpu_capacity(): Not relevant here,
misfit task needs to be migrated regardless of rt/IRQ pressure
As such, this commit adds an rq->misfit_task_load condition to trigger a
nohz kick.
The idea to kick a nohz balance for misfit tasks originally came from
Leo Yan <leo.yan@linaro.org>, and a similar patch was submitted for
the Android Common Kernel - see:
https://lists.linaro.org/pipermail/eas-dev/2016-September/000551.html
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-6-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On asymmetric CPU capacity systems load intensive tasks can end up on
CPUs that don't suit their compute demand. In this scenarios 'misfit'
tasks should be migrated to CPUs with higher compute capacity to ensure
better throughput. group_misfit_task indicates this scenario, but tweaks
to the load-balance code are needed to make the migrations happen.
Misfit balancing only makes sense between a source group of lower
per-CPU capacity and destination group of higher compute capacity.
Otherwise, misfit balancing is ignored. group_misfit_task has lowest
priority so any imbalance due to overload is dealt with first.
The modifications are:
1. Only pick a group containing misfit tasks as the busiest group if the
destination group has higher capacity and has spare capacity.
2. When the busiest group is a 'misfit' group, skip the usual average
load and group capacity checks.
3. Set the imbalance for 'misfit' balancing sufficiently high for a task
to be pulled ignoring average load.
4. Pick the CPU with the highest misfit load as the source CPU.
5. If the misfit task is alone on the source CPU, go for active
balancing.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-5-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current sg->min_capacity tracks the lowest per-CPU compute capacity
available in the sched_group when rt/irq pressure is taken into account.
Minimum capacity isn't the ideal metric for tracking if a sched_group
needs offloading to another sched_group for some scenarios, e.g. a
sched_group with multiple CPUs if only one is under heavy pressure.
Tracking maximum capacity isn't perfect either but a better choice for
some situations as it indicates that the sched_group definitely compute
capacity constrained either due to rt/irq pressure on all CPUs or
asymmetric CPU capacities (e.g. big.LITTLE).
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-4-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To maximize throughput in systems with asymmetric CPU capacities (e.g.
ARM big.LITTLE) load-balancing has to consider task and CPU utilization
as well as per-CPU compute capacity when load-balancing in addition to
the current average load based load-balancing policy. Tasks with high
utilization that are scheduled on a lower capacity CPU need to be
identified and migrated to a higher capacity CPU if possible to maximize
throughput.
To implement this additional policy an additional group_type
(load-balance scenario) is added: 'group_misfit_task'. This represents
scenarios where a sched_group has one or more tasks that are not
suitable for its per-CPU capacity. 'group_misfit_task' is only considered
if the system is not overloaded or imbalanced ('group_imbalanced' or
'group_overloaded').
Identifying misfit tasks requires the rq lock to be held. To avoid
taking remote rq locks to examine source sched_groups for misfit tasks,
each CPU is responsible for tracking misfit tasks themselves and update
the rq->misfit_task flag. This means checking task utilization when
tasks are scheduled and on sched_tick.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-3-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The existing asymmetric CPU capacity code should cause minimal overhead
for others. Putting it behind a static_key, it has been done for SMT
optimizations, would make it easier to extend and improve without
causing harm to others moving forward.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: gaku.inami.xh@renesas.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1530699470-29808-2-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Asymmetric CPU capacity can not necessarily be determined accurately at
the time the initial sched_domain hierarchy is built during boot. It is
therefore necessary to be able to force a full rebuild of the hierarchy
later triggered by the arch_topology driver. A full rebuild requires the
arch-code to implement arch_update_cpu_topology() which isn't yet
implemented for arm. This patch points the arm implementation to
arch_topology driver to ensure that full hierarchy rebuild happens when
needed.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1532093554-30504-5-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Asymmetric CPU capacity can not necessarily be determined accurately at
the time the initial sched_domain hierarchy is built during boot. It is
therefore necessary to be able to force a full rebuild of the hierarchy
later triggered by the arch_topology driver. A full rebuild requires the
arch-code to implement arch_update_cpu_topology() which isn't yet
implemented for arm64. This patch points the arm64 implementation to
arch_topology driver to ensure that full hierarchy rebuild happens when
needed.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: dietmar.eggemann@arm.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1532093554-30504-4-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The setting of SD_ASYM_CPUCAPACITY depends on the per-CPU capacities.
These might not have their final values when the hierarchy is initially
built as the values depend on cpufreq to be initialized or the values
being set through sysfs. To ensure that the flags are set correctly we
need to rebuild the sched_domain hierarchy whenever the reported per-CPU
capacity (arch_scale_cpu_capacity()) changes.
This patch ensure that a full sched_domain rebuild happens when CPU
capacity changes occur.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1532093554-30504-3-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
sched_domain in the hierarchy where all CPU capacities are visible for
any CPU's point of view on asymmetric CPU capacity systems. The
scheduler can then take to take capacity asymmetry into account when
balancing at this level. It also serves as an indicator for how wide
task placement heuristics have to search to consider all available CPU
capacities as asymmetric systems might often appear symmetric at
smallest level(s) of the sched_domain hierarchy.
The flag has been around for while but so far only been set by
out-of-tree code in Android kernels. One solution is to let each
architecture provide the flag through a custom sched_domain topology
array and associated mask and flag functions. However,
SD_ASYM_CPUCAPACITY is special in the sense that it depends on the
capacity and presence of all CPUs in the system, i.e. when hotplugging
all CPUs out except those with one particular CPU capacity the flag
should disappear even if the sched_domains don't collapse. Similarly,
the flag is affected by cpusets where load-balancing is turned off.
Detecting when the flags should be set therefore depends not only on
topology information but also the cpuset configuration and hotplug
state. The arch code doesn't have easy access to the cpuset
configuration.
Instead, this patch implements the flag detection in generic code where
cpusets and hotplug state is already taken care of. All the arch is
responsible for is to implement arch_scale_cpu_capacity() and force a
full rebuild of the sched_domain hierarchy if capacities are updated,
e.g. later in the boot process when cpufreq has initialized.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1532093554-30504-2-git-send-email-morten.rasmussen@arm.com
[ Fixed 'CPU' capitalization. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix kernel-doc warning for missing 'flags' parameter description:
../kernel/sched/fair.c:3371: warning: Function parameter or member 'flags' not described in 'attach_entity_load_avg'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ea14b57e8a ("sched/cpufreq: Provide migration hint")
Link: http://lkml.kernel.org/r/cdda0d42-880d-4229-a9f7-5899c977a063@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It can happen that load_balance() finds a busiest group and then a
busiest rq but the calculated imbalance is in fact 0.
In such situation, detach_tasks() returns immediately and lets the
flag LBF_ALL_PINNED set. The busiest CPU is then wrongly assumed to
have pinned tasks and removed from the load balance mask. then, we
redo a load balance without the busiest CPU. This creates wrong load
balance situation and generates wrong task migration.
If the calculated imbalance is 0, it's useless to try to find a
busiest rq as no task will be migrated and we can return immediately.
This situation can happen with heterogeneous system or smp system when
RT tasks are decreasing the capacity of some CPUs.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: jhugo@codeaurora.org
Link: http://lkml.kernel.org/r/1536306664-29827-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since commit:
523e979d31 ("sched/core: Use PELT for scale_rt_capacity()")
scale_rt_capacity() returns the remaining capacity and not a scale factor
to apply on cpu_capacity_orig. arch_scale_cpu() is directly called by
scale_rt_capacity() so we must take the sched_domain argument.
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 523e979d31 ("sched/core: Use PELT for scale_rt_capacity()")
Link: http://lkml.kernel.org/r/20180904093626.GA23936@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When a task which previously ran on a given CPU is remotely queued to
wake up on that same CPU, there is a period where the task's state is
TASK_WAKING and its vruntime is not normalized. This is not accounted
for in vruntime_normalized() which will cause an error in the task's
vruntime if it is switched from the fair class during this time.
For example if it is boosted to RT priority via rt_mutex_setprio(),
rq->min_vruntime will not be subtracted from the task's vruntime but
it will be added again when the task returns to the fair class. The
task's vruntime will have been erroneously doubled and the effective
priority of the task will be reduced.
Note this will also lead to inflation of all vruntimes since the doubled
vruntime value will become the rq's min_vruntime when other tasks leave
the rq. This leads to repeated doubling of the vruntime and priority
penalty.
Fix this by recognizing a WAKING task's vruntime as normalized only if
sched_remote_wakeup is true. This indicates a migration, in which case
the vruntime would have been normalized in migrate_task_rq_fair().
Based on a similar patch from John Dias <joaodias@google.com>.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Steve Muckle <smuckle@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Redpath <Chris.Redpath@arm.com>
Cc: John Dias <joaodias@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel de Dios <migueldedios@google.com>
Cc: Morten Rasmussen <Morten.Rasmussen@arm.com>
Cc: Patrick Bellasi <Patrick.Bellasi@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: kernel-team@android.com
Fixes: b5179ac70d ("sched/fair: Prepare to fix fairness problems on migration")
Link: http://lkml.kernel.org/r/20180831224217.169476-1-smuckle@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
update_blocked_averages() is called to periodiccally decay the stalled load
of idle CPUs and to sync all loads before running load balance.
When cfs rq is idle, it trigs a load balance during pick_next_task_fair()
in order to potentially pull tasks and to use this newly idle CPU. This
load balance happens whereas prev task from another class has not been put
and its utilization updated yet. This may lead to wrongly account running
time as idle time for RT or DL classes.
Test that no RT or DL task is running when updating their utilization in
update_blocked_averages().
We still update RT and DL utilization instead of simply skipping them to
make sure that all metrics are synced when used during load balance.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 371bf42732 ("sched/rt: Add rt_rq utilization tracking")
Fixes: 3727e0e163 ("sched/dl: Add dl_rq utilization tracking")
Link: http://lkml.kernel.org/r/1535728975-22799-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the following commit:
051f3ca02e ("sched/topology: Introduce NUMA identity node sched domain")
the scheduler introduced a new NUMA level. However this leads to the NUMA topology
on 2 node systems to not be marked as NUMA_DIRECT anymore.
After this commit, it gets reported as NUMA_BACKPLANE, because
sched_domains_numa_level is now 2 on 2 node systems.
Fix this by allowing setting systems that have up to 2 NUMA levels as
NUMA_DIRECT.
While here remove code that assumes that level can be 0.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andre Wild <wild@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Fixes: 051f3ca02e "Introduce NUMA identity node sched domain"
Link: http://lkml.kernel.org/r/1533920419-17410-1-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Prevent multiplication result truncation on 32bit. Introduced with
the early timestamp reworrk.
- Ensure microcode revision storage to be consistent under all
circumstances
- Prevent write tearing of PTEs
- Prevent confusion of user and kernel reegisters when dumping fatal
signals verbosely
- Make an error return value in a failure path of the vector
allocation negative. Returning EINVAL might the caller assume
success and causes further wreckage.
- A trivial kernel doc warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Use WRITE_ONCE() when setting PTEs
x86/apic/vector: Make error return value negative
x86/process: Don't mix user/kernel regs in 64bit __show_regs()
x86/tsc: Prevent result truncation on 32bit
x86: Fix kernel-doc atomic.h warnings
x86/microcode: Update the new microcode revision unconditionally
x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
Pull timekeeping fixes from Thomas Gleixner:
"Two fixes for timekeeping:
- Revert to the previous kthread based update, which is unfortunately
required due to lock ordering issues. The removal caused boot
failures on old Core2 machines. Add a proper comment why the thread
needs to stay to prevent accidental removal in the future.
- Fix a silly typo in a function declaration"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Revert "Remove kthread"
timekeeping: Fix declaration of read_persistent_wall_and_boot_offset()
Pull irqchip fix from Thomas Gleixner:
"A single fix to prevent allocating excessive memory in the GIC/ITS
driver.
While the subject of the patch might suggest otherwise this is a real
fix as some SoCs exceed the memory allocation limits and fail to boot"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Cap lpi_id_bits to reduce memory footprint
Pull cpu hotplug fixes from Thomas Gleixner:
"Two fixes for the hotplug state machine code:
- Move the misplaces smb() in the hotplug thread function to the
proper place, otherwise a half update control struct could be
observed
- Prevent state corruption on error rollback, which causes the state
to advance by one and as a consequence skip it in the bringup
sequence"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Prevent state corruption on error rollback
cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()
initialize the CRNG is configurable via the boot option
random.trust_cpu={on,off}
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAluVEQAACgkQ8vlZVpUN
gaN4vAgAqQQHYBTlHSYTyh9eEyOOo6gSTnu9mgk6iwejUceoPDcwYiFptZvdpQxj
moNTz31hy2tFHqt8aiNA2CgSMLI6cilLhz9AzeA6UuQe/EGhZeQHtnvKNIct8Zbg
97+b2WipCgspO0hzm8NLCjcvSgu892fBLc1TVl8Z+GxLhTCTAgkrMqLpo2iSR/Xe
+wv2NhT5gAnXFUuHzayiG/wCwSpWNt1cc1DJHVLMFv2yznHL/nagUywO4IeYqaJk
ZeXie9GsMZDsqFMOjCPS98U3/7c6y2FoYtm/O4NRUpQh9T8QP4NPylP3NDlhIxss
ZTu6x9xXKnLBfhHu5qk6LuYMJNW/lQ==
=XP8t
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random driver fix from Ted Ts'o:
"Fix things so the choice of whether or not to trust RDRAND to
initialize the CRNG is configurable via the boot option
random.trust_cpu={on,off}"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: make CPU trust a boot parameter
- make setlocalversion more robust about -dirty check
- loosen the pkg-config requirement for Kconfig
- change missing depmod to a warning from an error
- warn modules_install when System.map is missing
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJblMNtAAoJED2LAQed4NsG618P/jJCMiSSqsuf9lLIGM+M+9kv
ALiUPgx0pSx60PREu/oxMNLhmdCxGhYmn7PIDupWt7Wj0/Qq7IJsCe/c91UcDG+m
ZTumArWstGXD+Cwfe7nOVIuV2V8/ntdBUNKky2zH4WstQ+BH/kjl4tV1f0NxR1WZ
7vVGSpMjoOiVuhjloa02OFmpv/0KdTn+ChGV7R8nc2AqgTUY7s0X3cY3NLScsAxr
OpI+4zmgi/PWBtfhA2VPWZWshKzmFlK4UZ5ZrRqChUFaYDTGoN7Lncmz4njI7sxm
N9QrWNdkFhtj7rA+7ZKhYE1AeqbU9+K3XKw538fbG2hha/KfP1xWJ+m0hD4KrW7S
dqYmTs+ntdF/f7c1A/ZAbQEo574o4TcTKQ2utJ5QfpbNTqVoVywvXuevI6mGLfDS
DLRLfXBnP9THbEQNHD0HL0f9zLpTK0uVn6yT6gS2LmgEfXl5f3STFIytUQpxRi7A
ujjaT9wEJIP41yICQa/bs7GS6DfIr0Ax+Pf7vr7mpo2Yv6FwRQ6XYBYZrAmjxSPQ
Jk9h1nsrqLgUQs4OVikDDRfwy5Lz//+VwuKH54dQqMqd7Z2v6G0nIlJsNZT+azEV
DZTE74MWhLvyZRGrKqy5fWR/+YVTh6wD4vAPBhtyy6sxlxvmSEwpAmE4Md8WG71R
Fh6+u2dpY1SnLxHQ4R1J
=BQVE
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- make setlocalversion more robust about -dirty check
- loosen the pkg-config requirement for Kconfig
- change missing depmod to a warning from an error
- warn modules_install when System.map is missing
* tag 'kbuild-fixes-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: modules_install: warn when missing System.map file
kbuild: make missing $DEPMOD a Warning instead of an Error
kconfig: do not require pkg-config on make {menu,n}config
kconfig: remove a spurious self-assignment
scripts/setlocalversion: git: Make -dirty check more robust
If there is no System.map file for "make modules_install",
scripts/depmod.sh will silently exit with success, having done
nothing. Since this is an unexpected situation, change it to
report a Warning for the missing file. The behavior is not
changed except for the Warning message.
The (previous) silent success and new Warning can be reproduced
by:
$ make mrproper; make defconfig
$ make modules; make modules_install
and since System.map is produced by "make vmlinux", the steps
above omit producing the System.map file.
Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
ARM:
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups
s390:
- Fallout from the hugetlbfs support: pfmf interpretion and locking
- VSIE: fix keywrapping for nested guests
PPC:
- Fix a bug where pages might not get marked dirty, causing
guest memory corruption on migration,
- Fix a bug causing reads from guest memory to use the wrong guest
real address for very large HPT guests (>256G of memory), leading to
failures in instruction emulation.
x86:
- Fix out of bound access from malicious pv ipi hypercalls (introduced
in rc1)
- Fix delivery of pending interrupts when entering a nested guest,
preventing arbitrarily late injection
- Sanitize kvm_stat output after destroying a guest
- Fix infinite loop when emulating a nested guest page fault
and improve the surrounding emulation code
- Two minor cleanups
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJbk5gAAAoJEED/6hsPKofoS0UH/1clCzg/8x3jhpDcKKp6tDm7
9XHOOQ6XmydT0HXYJNqZepGNqU99ip+2u4x8E9LCT5MTvTMZ1BcNM6PmenjJVULY
GMJtwZhjqoklrOcNkXGqIye4Ec+I0pBuMmt0AN0N85CcHO8VUBpMzsdxgJLuxcRm
UT6OZnCLyJsock6BqkZmqVsJj/gemFnI9MpudnrU8cCFk60roXmQWJ66fMIFfKjt
q0R61t8nmbapQKE8pjqBNgbCsuotVOtU1zgMkeM5LkaYEfc65ZPdgt3sdpyG8Guq
WA7Vt6HEvmNrcQxHFX5P0GxTVM9lOVCUx1bKXE4+57CMZOYl/8hDaTudlcacutg=
=FyuN
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups
s390:
- Fallout from the hugetlbfs support: pfmf interpretion and locking
- VSIE: fix keywrapping for nested guests
PPC:
- Fix a bug where pages might not get marked dirty, causing guest
memory corruption on migration
- Fix a bug causing reads from guest memory to use the wrong guest
real address for very large HPT guests (>256G of memory), leading
to failures in instruction emulation.
x86:
- Fix out of bound access from malicious pv ipi hypercalls
(introduced in rc1)
- Fix delivery of pending interrupts when entering a nested guest,
preventing arbitrarily late injection
- Sanitize kvm_stat output after destroying a guest
- Fix infinite loop when emulating a nested guest page fault and
improve the surrounding emulation code
- Two minor cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
KVM: LAPIC: Fix pv ipis out-of-bounds access
KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2
arm64: KVM: Remove pgd_lock
KVM: Remove obsolete kvm_unmap_hva notifier backend
arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW
KVM: s390: Properly lock mm context allow_gmap_hpage_1m setting
KVM: s390: vsie: copy wrapping keys to right place
KVM: s390: Fix pfmf and conditional skey emulation
tools/kvm_stat: re-animate display of dead guests
tools/kvm_stat: indicate dead guests as such
tools/kvm_stat: handle guest removals more gracefully
tools/kvm_stat: don't reset stats when setting PID filter for debugfs
tools/kvm_stat: fix updates for dead guests
tools/kvm_stat: fix handling of invalid paths in debugfs provider
tools/kvm_stat: fix python3 issues
KVM: x86: Unexport x86_emulate_instruction()
KVM: x86: Rename emulate_instruction() to kvm_emulate_instruction()
KVM: x86: Do not re-{try,execute} after failed emulation in L2
KVM: x86: Default to not allowing emulation retry in kvm_mmu_page_fault
...
A few more fixes who have trickled in:
- MMC bus width fixup for some Allwinner platforms
- Fix for NULL deref in ti-aemif when no platform data is passed in
- Fix div by 0 in SCMI code
- Add a missing module alias in a new RPi driver
-----BEGIN PGP SIGNATURE-----
iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAluUAp0PHG9sb2ZAbGl4
b20ubmV0AAoJEIwa5zzehBx3+6YP/2T9NuOUTjssbVBho92lF9dV58Y5xOgDv9wX
mFT7gePXovTPQrgrpDi4RWrv0wAkjMa3grJfL2RGZXSZtsgkyHstb3mXf1O6sbnF
Ry1yc4ByJ0+JKJRq2tBxhQmLpBVFNXiav4vhIdPNZRdtZid7WzZaqF0JrCj6iyNf
CDhiGFRAZC9NcaCdOvI0aHFVC47Cp/Uacbh3PzZmdRWJJ2rCGO9X4vwQoMai/1cq
vVuiOBOs2ArXQQvvDoVixb3sCcdblCsDoS57lArJ5jKrHFm8iu6Z2+6UGhi2QEhc
9PKp5tySctWVqitOn0Ueixq+nKCXF3/dVAqjMVViSfC7G0Pt2XIAeqZU+2Ou3Zkj
nFcHqTZAXfSs6I1hnXqJYQ9Me3JzwQ+pRFJY8/+tbq2eGv7eZzUuzUppr13eF62s
NeBzJiGiI7ab9sGJknhmoXVDyuB7ctuZXA8JgO/kZvL8dfuWcF3GNocs2p9916JD
uWGwnfXiTLMhbxKkYrjaOClaVyx2bf996M3Z4NqxBQ9XGNXyh+V/6bzUh9DGPSL0
+9W7YcRFT08v4I1Zh7/P5zXVAOyqj3awWeD6gpg7PAsmKPdN/f17EEqk6KH7rOVZ
Vvw3/w+Ef9u4onGpbpE/IyCco75vXrv1GtkHMX7VlMjLe0eAv5Cpw7UwLDO2tVnu
pEJFkk45
=oZbn
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A few more fixes who have trickled in:
- MMC bus width fixup for some Allwinner platforms
- Fix for NULL deref in ti-aemif when no platform data is passed in
- Fix div by 0 in SCMI code
- Add a missing module alias in a new RPi driver"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
memory: ti-aemif: fix a potential NULL-pointer dereference
firmware: arm_scmi: fix divide by zero when sustained_perf_level is zero
hwmon: rpi: add module alias to raspberrypi-hwmon
arm64: allwinner: dts: h6: fix Pine H64 MMC bus width
Just one fix for H6 mmc on the Pine H64: the mmc bus width was missing
from the device tree. This was added in 4.19-rc1.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCgAsFiEE2nN1m/hhnkhOWjtHOJpUIZwPJDAFAluQmz0OHHdlbnNAY3Np
ZS5vcmcACgkQOJpUIZwPJDD01RAApxHO0v7/7y9w/8pGGpjpTYpliF9lndaQYD3o
+Xc/Y7bcy2+Iy4Lz0TbkOObjIVUoLsQcpGKvttHa/gIsjbgd9xBpxd5X2PVBRmWx
/ERA5HdMG4RvznLD3P7X0JOAL/3w1ad/4DarOHOibqYk3KqX+iG6kphIRx326INt
SSqPZNNub/LXmHSUnyprQ+ccfKs87uiy9dT1LrSTxXGjh9tdXXmkGmDCOSX+oCKm
EXeFIK1uTmyGyE8OXa2NbCktwNylw6c4XwcaWLIPQeJTEW6oVh95IkewBphi+nFw
rU82W2aqCGqP2EYHJwzD7zx53V7cGAJVkb/u3ENXSXgE/kyTdmoFukxWRb7upfEb
9bjgQUMQ+6RG1f5lDYIHSVNXdk81AshMc1Y7qKG5EoCfJUIcG0gyyQYpO+lKji7V
nvTeiA0882a/PMYYkGU7vWGD7oIuPHEWEmnSZDWUNsqcKXaX5b3km/BsoLfTii9a
45MDQ9Wo2B26PL6zflN78BrDfuX+UgmX1bbxY0b+rOal4CKuz+VqwEnQIumu1SYE
9GaMHFKGMh2JCQ/U8o4AGdomEUjX79dgZbwz7W4KBnaS7K4iKrQfxcKLFXcXLtI9
EaA4nNsHeIe6ByE5z4FNVUPHEcLkfqlpqdFBRdd/xt+MfDYQaorh73NQfGvN4s0x
3pGu1fI=
=WbHO
-----END PGP SIGNATURE-----
Merge tag 'sunxi-fixes-for-4.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes
Allwinner fixes for 4.19
Just one fix for H6 mmc on the Pine H64: the mmc bus width was missing
from the device tree. This was added in 4.19-rc1.
* tag 'sunxi-fixes-for-4.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: allwinner: dts: h6: fix Pine H64 MMC bus width
Signed-off-by: Olof Johansson <olof@lixom.net>
When page-table entries are set, the compiler might optimize their
assignment by using multiple instructions to set the PTE. This might
turn into a security hazard if the user somehow manages to use the
interim PTE. L1TF does not make our lives easier, making even an interim
non-present PTE a security hazard.
Using WRITE_ONCE() to set PTEs and friends should prevent this potential
security hazard.
I skimmed the differences in the binary with and without this patch. The
differences are (obviously) greater when CONFIG_PARAVIRT=n as more
code optimizations are possible. For better and worse, the impact on the
binary with this patch is pretty small. Skimming the code did not cause
anything to jump out as a security hazard, but it seems that at least
move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
activate_managed() returns EINVAL instead of -EINVAL in case of
error. While this is unlikely to happen, the positive return value would
cause further malfunction at the call site.
Fixes: 2db1f959d9 ("x86/vector: Handle managed interrupts proper")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Pull i2c fixes from Wolfram Sang:
- bugfixes for uniphier, i801, and xiic drivers
- ID removal (never produced) for imx
- one MAINTAINER addition
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: xiic: Record xilinx i2c with Zynq fragment
i2c: xiic: Make the start and the byte count write atomic
i2c: i801: fix DNV's SMBCTRL register offset
i2c: imx-lpi2c: Remove mx8dv compatible entry
dt-bindings: imx-lpi2c: Remove mx8dv compatible entry
i2c: uniphier-f: issue STOP only for last message or I2C_M_STOP
i2c: uniphier: issue STOP only for last message or I2C_M_STOP
Fix the cell specification mechanism to allow cells to be pre-created
without having to specify at least one address (the addresses will be
upcalled for).
This allows the cell information preload service to avoid the need to issue
loads of DNS lookups during boot to get the addresses for each cell (500+
lookups for the 'standard' cell list[*]). The lookups can be done later as
each cell is accessed through the filesystem.
Also remove the print statement that prints a line every time a new cell is
added.
[*] There are 144 cells in the list. Each cell is first looked up for an
SRV record, and if that fails, for an AFSDB record. These get a list
of server names, each of which then has to be looked up to get the
addresses for that server. E.g.:
dig srv _afs3-vlserver._udp.grand.central.org
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull MD fixes from Shaohua Li:
- Fix a locking issue for md-cluster (Guoqing)
- Fix a sync crash for raid10 (Ni)
- Fix a reshape bug with raid5 cache enabled (me)
* tag 'md/4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md-cluster: release RESYNC lock after the last resync message
RAID10 BUG_ON in raise_barrier when force is true and conf->barrier is 0
md/raid5-cache: disable reshape completely
went into -rc1 and a use-after-free fix.
The rbd changes have been sitting in a branch for quite a while but
couldn't be included into the -rc1 pull request because of a pending
wire protocol backwards compatibility fixup that only got committed
early this week.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAluSrJYTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi/N8B/4sZzRCJMCejvU/yRq91NlaPDrxbVHh
nfICZ/8Fsy/fmvK8NWNyHcCIWx+nWrbCvCJMj0fxWMhk/1t75yC+TdyCJnyuhsQU
V/CPTs9BTdwrSUiTB83/n/ukGL6mpESk0CQ1er/l1EO6FnNOXvgzHDnCqUQZLdzU
1aRcx5JQWWo/QlCmzt2KWENhfQRMvLAtf04F5cUuR+JTrMjwWia6MAuRGuOhVQkW
XIlFNakBKab89Vod1pmA7BrG/+sHXCpVGX6sjAp9vQUWO3WWKBRnNtVwo9dPSHah
hBR8IzOkihw7HfTlINWVpiR69nTfM80PQHXJkFSp36E6Sfq8EShRpFIZ
=pga5
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.19-rc3' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two rbd patches to complete support for images within namespaces that
went into -rc1 and a use-after-free fix.
The rbd changes have been sitting in a branch for quite a while but
couldn't be included into the -rc1 pull request because of a pending
wire protocol backwards compatibility fixup that only got committed
early this week"
* tag 'ceph-for-4.19-rc3' of https://github.com/ceph/ceph-client:
rbd: support cloning across namespaces
rbd: factor out get_parent_info()
ceph: avoid a use-after-free in ceph_destroy_options()
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAluSNfQACgkQnJ2qBz9k
QNloSAf/RpsqUnmQvJKK7hQUVNMCQP/Kf3KND5iN5RfMbhU9r7tzERkNvqhdA6QZ
uoPi8dEecI+ihY5F8ddyw1Chaou4MToWKdNz4ojwJXVrN6bb+pq+xj0hTvT5FjFh
iM1JXHtSEk6W+CnXPE5CycrZppIHxJfJxeaWg7av5Zyc4nkTesxtG8PycMBxROW8
detUcJt15VGBswi19udztf7XY/lwDwUQ9LwC0W5B+o8pKIwuN3ENMVVOeAriAyoy
hXTpPA8twBhM7i8D/1eppDCkYLTr08bquNsDpn8kUEf2RxcxiFJuDLOeXiH3sQRq
BZmf/QIIRA8R+SPeFiuxY/795FDC6Q==
=CWu1
-----END PGP SIGNATURE-----
Merge tag 'for_v4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fix from Jan Kara:
"A small fsnotify fix from Amir"
* tag 'for_v4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix ignore mask logic in fsnotify()
- Remove accidental VM_WARN_ON
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJbkorgAAoJELescNyEwWM0my8IAKsVsc5heKBeL/0Ep5gfXJLS
H3kjkToFKfOeVADLfZXfTkPzlx9f1NrEP4+b/hQYgqGqXQcvCIwEXzpTMFg4pT4/
ERhYtq9qYBNQmg4AZnTHl2cKSRFt+s7knTZMoTEwNk1NxdBQAtbIZa9HB9Ly2mSn
xK6UP7zsZvRcY02BlyDQ0A/QBjzQAi3I83FRLizxjPYaSUhF0QqhrzTr0ANoKEjv
DnX04nJEMYqLEjSKWTn3rzot2PgLVDcMEjXKwMB3XB6LML3KLRUsvnTpxED5c+dW
tv+wzKKdaFeHWmfFxUgYZXSd4igh0IKf3OZDohRKz+lNOhKrTYUE35dtFyyw04I=
=xYN8
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"Just one small fix here, preventing a VM_WARN_ON when a !present
PMD/PUD is "freed" as part of a huge ioremap() operation.
The correct behaviour is to skip the free silently in this case, which
is a little weird (the function is a bit of a misnomer), but it
follows the x86 implementation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fix erroneous warnings in page freeing functions
- Fix a power management regression in the ACPI driver for Intel
SoCs (LPSS) introduced by a system-wide suspend/resume fix during
the 4.18 cycle (Zhang Rui).
- Prevent dmi_check_system() from being called on non-x86 systems in
the ACPI core (Jean Delvare).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbkjqmAAoJEILEb/54YlRxUn0QAKNCccsWcBnli0OkFKQNvgHN
cSq8sBLi6PoBULbkOKXgUPxjOajoeJ4I4YyZUIKD8sj4MJyLjGFQY+eDGHqjIetO
8MmAtSPlmu9xwO3T9OAhnRFI/zMYXHoT1PP7fGfQr8Ok0CpPlfIVZBGzy/9G8PeR
wgEYSIEcHpD6KphOG0FcUsA2aJdQf9pFe+OPBnNJpj8s7d6nsNhcYqoxQS4R7auz
2jJkh7khSwuY9EK8+mTLxnx9KUK3E39sTpgFbaixuujUEMnH5hVe1WpH38SC1ufZ
tGmXA19wZ9xzH02TA5tymx4dViDWMAQBNjoQ7uyqqLWM20oELE0MLbORyIqrUORP
wf6A+45xNxoVfQieqr8JJMjj0uPeCjKD1kLfPw6xerAI1+xXU7RRGcnaca1UoTVa
BmKrmReuC+7vpPJ6kZgR0m+LsaIyDF+b4/glgSJH28KoZwv6ZePxvd1tNZsdgYon
P5lBwLAMXvXvIrEkyx7MAz211EXM0AfJ4Vlt+How9vJt0g6y7cOETsP8hcguYTSb
KNUiCTL2/+lGMnPYPsWpRFRYYtWMI2gFs97/HO+Ay2MdH3NjQS0CkZ9sNphlixMV
5dhVidswOwP1lQomps0PZkE2ZW38WedHXuGtQSl3imYhJbWLkoo8PINBt2vwh6cm
YP+/eFIk4D/hvPFESjvA
=R0It
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a regression from the 4.18 cycle in the ACPI driver for
Intel SoCs (LPSS) and prevent dmi_check_system() from being called on
non-x86 systems in the ACPI core.
Specifics:
- Fix a power management regression in the ACPI driver for Intel SoCs
(LPSS) introduced by a system-wide suspend/resume fix during the
4.18 cycle (Zhang Rui).
- Prevent dmi_check_system() from being called on non-x86 systems in
the ACPI core (Jean Delvare)"
* tag 'acpi-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / LPSS: Force LPSS quirks on boot
ACPI / bus: Only call dmi_check_system() on X86
Just a few small fixes:
- a fix for the recursive work cancellation in a specific HD-audio
operation mode
- a fix for potentially uninitialized memory access via rawmidi
- the register bit access fixes for ASoC HD-audio
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAluRUPIOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE+Ocg/+IyPtFzekuBO5zqp9PtDAz0stpvYiUerRYzUL
PUZzKq4RoIpcZb5wk5wNATAlVGXhQEDzPdjYHQCW639Osj9s6429EufoKXqL9U29
UrKDMJBCmha3CAe6GJsBTNC/Lt8Zp60nV9HYfT9EU0LPRhQ9kw+gX2I7HyFxE5aw
7Ik5jOpiC+cR7gSuROL0ZzPQ4mG3YWSe6OWXYpoefuuDRQ7NpG4GIn/KQlJMoUlx
41GABCvLu2PIApczXzgZUGhTq4C+/0xZuYF4qf4iM5bDFhPxOC1u2eQftiXL5jGI
+ZWnCjw7005FmH3Yvwrc3S1vpdI26Kzd7Zg5F4xb5vSOybRLmajbnGq1jzqm7bDh
TcyET4VnX2gkL3DYq5fgwI5gqJPqkd8K8NC1L+YtqzAMb1GN2qDj/0FebGnY1Jpn
Zufu25bnX+3JKAYB37c/TSzdkKFch6w0Z2TtsmlYjpmTz0m1AyiwdyQ2HcTr+KBx
a1b1YU5+pvgRFpmgQUpGgXgyoG6+51np4oYf7Memc3gR1Fq8vht18hStgCP0qpuP
b4uvgOtSKMxJVDxflAbOHE+nhX1O9RciXR7Mep4haDtOTm8bW931L+GlkH1mRRYg
e4PAo4TmfzcQCyRATWMaA9i4l2hwRyhpndjAP81lyCVba3WBZa7xnLPaDLFFmSuT
b0h9hi4=
=nuWw
-----END PGP SIGNATURE-----
Merge tag 'sound-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few small fixes:
- a fix for the recursive work cancellation in a specific HD-audio
operation mode
- a fix for potentially uninitialized memory access via rawmidi
- the register bit access fixes for ASoC HD-audio"
* tag 'sound-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: Fix several mismatch for register mask and value
ALSA: rawmidi: Initialize allocated buffers
ALSA: hda - Fix cancel_work_sync() stall from jackpoll work
Dan Carpenter reported that the untrusted data returns from kvm_register_read()
results in the following static checker warning:
arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi()
error: buffer underflow 'map->phys_map' 's32min-s32max'
KVM guest can easily trigger this by executing the following assembly sequence
in Ring0:
mov $10, %rax
mov $0xFFFFFFFF, %rbx
mov $0xFFFFFFFF, %rdx
mov $0, %rsi
vmcall
As this will cause KVM to execute the following code-path:
vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi()
which will reach out-of-bounds access.
This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id,
ignoring destinations that are not present and delivering the rest. We also check
whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the
max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm
unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Add second "if (min > map->max_apic_id)" to complete the fix. -Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Consider the case L1 had a IRQ/NMI event until it executed
VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
(e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
L0 needs to evaluate if this pending event should cause an exit from
L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
EXTERNAL_INTERRUPT).
Usually this would be handled by L0 requesting a IRQ/NMI window
by setting VMCS accordingly. However, this setting was done on
VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
requesting a KVM_REQ_EVENT.
Note that above scenario exists when L1 KVM is about to enter L2 but
requests an "immediate-exit". As in this case, L1 will
disable-interrupts and then send a self-IPI before entering L2.
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJbkngmAAoJEEtpOizt6ddyeaoH/15bbGHlwWf23tGjSoDzhyD4
zAXfy+SJdm4cR8K7jEkVrNffkEMAby7Zl28hTHKB9jsY1K8DD+EuCE3Nd4kkVAsc
iHJwV4aiHil/zC5SyE0MqMzELeS8UhsxESYebG6yNF0ElQDQ0SG+QAFr47/OBN9S
u4I7x0rhyJP6Kg8z9U4KtEX0hM6C7VVunGWu44/xZSAecTaMuJnItCIM4UMdEkSs
xpAoI59lwM6BWrXLvEunekAkxEXoR7AVpQER2PDINoLK2I0i0oavhPim9Xdt2ZXs
rqQqfmwmPOVvYbexDp97JtfWo3/psGLqvgoK1tq9bzF3u6Y3ylnUK5IspyVYwuQ=
=TK8A
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-fixes-for-v4.19-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
Fixes for KVM/ARM for Linux v4.19 v2:
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups