Commit Graph

590068 Commits

Author SHA1 Message Date
Howard Cochran
74d3694433 writeback: Fix performance regression in wb_over_bg_thresh()
Commit 947e9762a8 ("writeback: update wb_over_bg_thresh() to use
wb_domain aware operations") unintentionally changed this function's
meaning from "are there more dirty pages than the background writeback
threshold" to "are there more dirty pages than the writeback threshold".
The background writeback threshold is typically half of the writeback
threshold, so this had the effect of raising the number of dirty pages
required to cause a writeback worker to perform background writeout.

This can cause a very severe performance regression when a BDI uses
BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker
can now disagree on whether writeback should be initiated.

For example, in a system having 1GB of RAM, a single spinning disk, and a
"pass-through" FUSE filesystem mounted over the disk, application code
mmapped a 128MB file on the disk and was randomly dirtying pages in that
mapping.

Because FUSE uses strictlimit and has a default max_ratio of only 1%, in
balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the
dirty_freerun_ceiling is the average of those, ~150. So, it pauses the
dirtying processes when we have 151 dirty pages and wakes up a background
writeback worker. But the worker tests the wrong threshold (200 instead of
100), so it does not initiate writeback and just returns.

Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
worker who will do nothing. It remains stuck in this state until the few
dirty pages that we have finally expire and we write them back for that
reason. Then the whole process repeats, resulting in near-zero throughput
through the FUSE BDI.

The fix is to call the parameterized variant of wb_calc_thresh, so that the
worker will do writeback if the bg_thresh is exceeded which was the
behavior before the referenced commit.

Fixes: 947e9762a8 ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations")
Signed-off-by: Howard Cochran <hcochran@kernelspring.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
Tested-by Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-05 15:44:55 -06:00
Vladimir Murzin
ec953b70f3 ARM: 8573/1: domain: move {set,get}_domain under config guard
Recursive undefined instrcution falut is seen with R-class taking an
exception. The reson for that is __show_regs() tries to get domain
information, but domains is not available on !MMU cores, like R/M
class.

Fix it by puting {set,get}_domain functions under CONFIG_CPU_CP15_MMU
guard and providing stubs for the case where domains is not supported.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-05-05 19:03:02 +01:00
Jean-Philippe Brucker
5b526bd925 ARM: 8572/1: nommu: change memory reserve for the vectors
Commit 19accfd3 (ARM: move vector stubs) moved the vector stubs in an
additional page above the base vector one. This change wasn't taken into
account by the nommu memreserve.
This patch ensures that the kernel won't overwrite any vector stub on
nommu.

[changed the MPU side too]

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-05-05 19:03:02 +01:00
Jean-Philippe Brucker
695665b0c5 ARM: 8571/1: nommu: fix PMSAv7 setup
Commit 1c2f87c (ARM: 8025/1: Get rid of meminfo) broke the support for
MPU on ARMv7-R. This patch adapts the code inside CONFIG_ARM_MPU to use
memblocks appropriately.

MPU initialisation only uses the first memory region, and removes all
subsequent ones. Because looping over all regions that need removal is
inefficient, and memblock_remove already handles memory ranges, we can
flatten the 'for_each_memblock' part.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-05-05 19:03:01 +01:00
Christoph Hellwig
9c674815d3 IB/iser: Fix max_sectors calculation
iSER currently has a couple places that set max_sectors in either the host
template or SCSI host, and all of them get it wrong.

This patch instead uses a single assignment that (hopefully) gets it right:
the max_sectors value must be derived from the number of segments in the
FR or FMR structure, but actually be one lower than the page size multiplied
by the number of sectors, as it has to handle the case of non-aligned I/O.

Without this I get trivial to reproduce hangs when running xfstests
(on XFS) over iSER to Linux targets.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 12:41:24 -04:00
Linus Torvalds
c5e0666c5a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fix from Eric Biederman:
 "This contains just a single fix for a nasty oops"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  propogate_mnt: Handle the first propogated copy being a slave
2016-05-05 08:41:57 -07:00
Linus Torvalds
3cedbec301 virtio/qemu: fixes for 4.6
A couple of fixes for virtio and for the new QEMU fw cfg driver.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXJfuWAAoJECgfDbjSjVRpRdwH/RQjAQnqTuLAI/mn/kbinK24
 EoDubbWKq+KVDDwoc3tkWRNBoyXDUIYLHrrJNhLtrXNULTGtIejqJdBnD1THuiZ2
 GGWzDiB86qtU1uXSEaXSBFpLi0YYP26bcQ8hjnDhSUWqESE2AC63NO0E0plyu2NI
 oPIiWLN3gJovr12vVURbYcrXxZ1zykd/8E80jj/Gus81XrbKXsn8Wmq1cSrtD/DC
 S07pWbF43YRl3pWyezQW8D4RowF1f9x8CSzrcvqt3qw5d0+jqtcKz0iR0vsi3wIy
 poAadEn+91Xi2kVZUm9bCB30+7A+iqGnFIG7x29xrdZl+2rbvquwRiNlvX4RM4g=
 =GoGi
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/qemu fixes from Michael Tsirkin:
 "A couple of fixes for virtio and for the new QEMU fw cfg driver"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio: Silence uninitialized variable warning
  firmware: qemu_fw_cfg.c: potential unintialized variable
2016-05-05 08:26:54 -07:00
Eric W. Biederman
5ec0811d30 propogate_mnt: Handle the first propogated copy being a slave
When the first propgated copy was a slave the following oops would result:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> PGD bacd4067 PUD bac66067 PMD 0
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000
> RIP: 0010:[<ffffffff811fba4e>]  [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> RSP: 0018:ffff8800bac3fd38  EFLAGS: 00010283
> RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010
> RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480
> RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000
> R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00
> FS:  00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0
> Stack:
>  ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85
>  ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40
>  0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0
> Call Trace:
>  [<ffffffff811fbf85>] propagate_mnt+0x105/0x140
>  [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0
>  [<ffffffff811f1ec3>] graft_tree+0x63/0x70
>  [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100
>  [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0
>  [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70
>  [<ffffffff811f3a45>] SyS_mount+0x75/0xc0
>  [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0
>  [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25
> Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30
> RIP  [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
>  RSP <ffff8800bac3fd38>
> CR2: 0000000000000010
> ---[ end trace 2725ecd95164f217 ]---

This oops happens with the namespace_sem held and can be triggered by
non-root users.  An all around not pleasant experience.

To avoid this scenario when finding the appropriate source mount to
copy stop the walk up the mnt_master chain when the first source mount
is encountered.

Further rewrite the walk up the last_source mnt_master chain so that
it is clear what is going on.

The reason why the first source mount is special is that it it's
mnt_parent is not a mount in the dest_mnt propagation tree, and as
such termination conditions based up on the dest_mnt mount propgation
tree do not make sense.

To avoid other kinds of confusion last_dest is not changed when
computing last_source.  last_dest is only used once in propagate_one
and that is above the point of the code being modified, so changing
the global variable is meaningless and confusing.

Cc: stable@vger.kernel.org
fixes: f2ebb3a921 ("smarter propagate_mnt()")
Reported-by: Tycho Andersen <tycho.andersen@canonical.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-05-05 09:54:45 -05:00
Wang YanQing
c10fcb14c7 x86/sysfb_efi: Fix valid BAR address range check
The code for checking whether a BAR address range is valid will break
out of the loop when a start address of 0x0 is encountered.

This behaviour is wrong since by breaking out of the loop we may miss
the BAR that describes the EFI frame buffer in a later iteration.

Because of this bug I can't use video=efifb: boot parameter to get
efifb on my new ThinkPad E550 for my old linux system hard disk with
3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not
supporting the GPU.

This patch also add a trivial optimization to break out after we find
the frame buffer address range without testing later BARs.

Signed-off-by: Wang YanQing <udknight@gmail.com>
[ Rewrote changelog. ]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Peter Jones <pjones@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 16:01:00 +02:00
Vineet Gupta
26f9d5fd82 ARC: support HIGHMEM even without PAE40
Initial HIGHMEM support on ARC was introduced for PAE40 where the low
memory (0x8000_0000 based) and high memory (0x1_0000_0000) were
physically contiguous. So CONFIG_FLATMEM sufficed (despite a peipheral
hole in the middle, which wasted a bit of struct page memory, but things
worked).

However w/o PAE, highmem was not possible and we could only reach
~1.75GB of DDR. Now there is a use case to access ~4GB of DDR w/o PAE40
The idea is to have low memory at canonical 0x8000_0000 and highmem
at 0 so enire 4GB address space is available for physical addressing
This needs additional platform/interconnect mapping to convert
the non contiguous physical addresses into linear bus adresses.

From Linux point of view, non contiguous divide means FLATMEM no
longer works and DISCONTIGMEM is needed to track the pfns in the 2
regions.

This scheme would also work for PAE40, only better in that we don't
waste struct page memory for the peripheral hole.

The DT description will be something like

    memory {
        ...
        reg = <0x80000000 0x200000000   /* 512MB: lowmem */
               0x00000000 0x10000000>;  /* 256MB: highmem */
   }

Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-05-05 16:35:46 +05:30
Vineet Gupta
2519d75367 ARC: Fix PAE40 boot failures due to PTE truncation
So a benign looking cleanup which macro'ized PAGE_SHIFT shifts turned
out to be bad (since it was done non-sensically across the board).

It caused boot failures with PAE40 as forced cast to (unsigned long)
from newly introduced virt_to_pfn() was causing truncatiion of the
(long long) pte/paddr values.

It is OK to use this in accessors dealing with kernel virtual address,
pointers etc, but not for PTE values themelves.

Fixes: cJ2ff5cf2735c ("ARC: mm: Use virt_to_pfn() for addr >> PAGE_SHIFT pattern)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-05-05 16:35:45 +05:30
Vineet Gupta
e5bc0478ab ARC: Add missing io barriers to io{read,write}{16,32}be()
While reviewing a different change to asm-generic/io.h Arnd spotted that
ARC ioread32 and ioread32be both of which come from asm-generic versions
are not symmetrical in terms of calling the io barriers.

generic ioread32   -> ARC readl()                  [ has barriers]
generic ioread32be -> __be32_to_cpu(__raw_readl()) [ lacks barriers]

While generic ioread32be is being remediated to call readl(), that involves
a swab32(), causing double swaps on ioread32be() on Big Endian systems.

So provide our versions of big endian IO accessors to ensure io barrier
calls while also keeping them optimal

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org  [4.2+]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-05-05 16:35:28 +05:30
Mauro Carvalho Chehab
b34ecd5aa3 [media] media-device: fix builds when USB or PCI is compiled as module
Just checking ifdef CONFIG_USB is not enough, if the USB is compiled
as module. The same applies to PCI.

Tested with the following .config alternatives:

CONFIG_USB=m
CONFIG_MEDIA_CONTROLLER=y
CONFIG_MEDIA_SUPPORT=m
CONFIG_VIDEO_AU0828=m

CONFIG_USB=m
CONFIG_MEDIA_CONTROLLER=y
CONFIG_MEDIA_SUPPORT=y
CONFIG_VIDEO_AU0828=m

CONFIG_USB=y
CONFIG_MEDIA_CONTROLLER=y
CONFIG_MEDIA_SUPPORT=y
CONFIG_VIDEO_AU0828=m

CONFIG_USB=y
CONFIG_MEDIA_CONTROLLER=y
CONFIG_MEDIA_SUPPORT=y
CONFIG_VIDEO_AU0828=y

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-05-05 08:01:34 -03:00
Peter Zijlstra
8482716b9d perf/x86/amd/iommu: Do not register a task ctx for uncore like PMUs
The new sanity check introduced by:

  2665784850 ("perf/core: Verify we have a single perf_hw_context PMU")

... triggered on the AMD IOMMU driver.

IOMMUs are not per logical CPU, they cannot have per-task counters. Fix it.

Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jroedel@suse.de
Cc: suravee.suthikulpanit@amd.com
Link: http://lkml.kernel.org/r/20160423224255.GB3430@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 10:11:28 +02:00
Alex Thorlton
08914f436b x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init
A while back the following commit:

  d394f2d9d8 ("x86/platform/UV: Remove EFI memmap quirk for UV2+")

changed uv_system_init() to only call map_low_mmrs() on older UV1 hardware,
which requires EFI_OLD_MEMMAP to be set in order to boot.

The recent changes to the EFI memory mapping code in:

  d2f7cbe7b2 ("x86/efi: Runtime services virtual mapping")

exposed some issues with the fact that we were relying on the EFI memory
mapping mechanisms to map in our MMRs for us, after commit d394f2d9d8.

Rather than revert the entire commit and go back to forcing
EFI_OLD_MEMMAP on all UVs, we're going to add the call to map_low_mmrs()
back into uv_system_init(), and then fix up our EFI runtime calls to use
the appropriate page table.

For now, UV2+ will still need efi=old_map to boot, but there will be
other changes soon that should eliminate the need for this.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Russ Anderson <rja@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1462401592-120735-1-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:55:02 +02:00
Dietmar Eggemann
885e542ce8 sched/fair: Fix comment in calculate_imbalance()
The comment in calculate_imbalance() was introduced in commit:

 2dd73a4f09 ("[PATCH] sched: implement smpnice")

which described the logic as it was then, but a later commit:

  b18855500f ("sched/balancing: Fix 'local->avg_load > sds->avg_load' case in calculate_imbalance()")

.. complicated this logic some more so that the comment does not match anymore.

Update the comment to match the code.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461958364-675-3-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:41:10 +02:00
Dietmar Eggemann
0a9b23ce46 sched/fair: Remove stale power aware scheduling comments
Commit 8e7fbcbc22 ("sched: Remove stale power aware scheduling remnants
and dysfunctional knobs") deleted the power aware scheduling support.

This patch gets rid of the remaining power aware scheduling related
comments in the code as well.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461958364-675-2-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:41:09 +02:00
Matt Fleming
b52fad2db5 sched/fair: Update rq clock before updating nohz CPU load
If we're accessing rq_clock() (e.g. in sched_avg_update()) we should
update the rq clock before calling cpu_load_update(), otherwise any
time calculations will be stale.

All other paths currently call update_rq_clock().

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1462304814-11715-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:41:09 +02:00
Wanpeng Li
db6ea2fb09 sched/debug: Print out idle balance values even on !CONFIG_SCHEDSTATS kernels
The max_idle_balance_cost and avg_idle values which are tracked and ar used to
capture short idle incidents, are not associated with schedstats, however the
information of these two values isn't printed out on !CONFIG_SCHEDSTATS kernels.

Fix this by moving the value printout out of the CONFIG_SCHEDSTATS section.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1462250305-4523-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:41:09 +02:00
Yuyang Du
7b20b916e9 sched/fair: Optimize sum computation with a lookup table
__compute_runnable_contrib() uses a loop to compute sum, whereas a
table lookup can do it faster in a constant amount of time.

The program to generate the constants is located at:

  Documentation/scheduler/sched-avg.txt

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Morten Rasmussen <morten.rasmussen@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@arm.com
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/1462226078-31904-2-git-send-email-yuyang.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:41:08 +02:00
Yuyang Du
7b5953345e sched/fair: Add detailed description to the sched load avg metrics
These sched metrics have become complex enough, so describe them
in detail at their definition.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Fixed the text to improve its spelling and typography. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: umgwanakikbuti@gmail.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1459829551-21625-4-git-send-email-yuyang.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:41:08 +02:00
Yuyang Du
172895e6b5 sched/fair: Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT and remove SCHED_LOAD_SCALE
After cleaning up the sched metrics, there are two definitions that are
ambiguous and confusing: SCHED_LOAD_SHIFT and SCHED_LOAD_SHIFT.

Resolve this:

 - Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT, which better reflects what
   it is.

 - Replace SCHED_LOAD_SCALE use with SCHED_CAPACITY_SCALE and remove SCHED_LOAD_SCALE.

Suggested-by: Ben Segall <bsegall@google.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: umgwanakikbuti@gmail.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1459829551-21625-3-git-send-email-yuyang.du@intel.com
[ Rewrote the changelog and fixed the build on 32-bit kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:35:21 +02:00
Andi Kleen
cba1b3798e perf/x86: Add model numbers for Kabylake CPUs
Everything the same as Skylake, just new model numbers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461977748-17616-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:29:00 +02:00
Yuyang Du
6ecdd74962 sched/fair: Generalize the load/util averages resolution definition
Integer metric needs fixed point arithmetic. In sched/fair, a few
metrics, e.g., weight, load, load_avg, util_avg, freq, and capacity,
may have different fixed point ranges, which makes their update and
usage error-prone.

In order to avoid the errors relating to the fixed point range, we
definie a basic fixed point range, and then formalize all metrics to
base on the basic range.

The basic range is 1024 or (1 << 10). Further, one can recursively
apply the basic range to have larger range.

Pointed out by Ben Segall, weight (visible to user, e.g., NICE-0 has
1024) and load (e.g., NICE_0_LOAD) have independent ranges, but they
must be well calibrated.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: umgwanakikbuti@gmail.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1459829551-21625-2-git-send-email-yuyang.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:24:00 +02:00
Peter Zijlstra
2159197d66 sched/core: Enable increased load resolution on 64-bit kernels
Mike ran into the low load resolution limitation on his big machine.

So reenable these bits; nobody could ever reproduce/analyze the
reported power usage claim and Google has been running with this for
years as well.

Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:24:00 +02:00
Peter Zijlstra
e7904a28f5 locking/lockdep, sched/core: Implement a better lock pinning scheme
The problem with the existing lock pinning is that each pin is of
value 1; this mean you can simply unpin if you know its pinned,
without having any extra information.

This scheme generates a random (16 bit) cookie for each pin and
requires this same cookie to unpin. This means you have to keep the
cookie in context.

No objsize difference for !LOCKDEP kernels.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:23:59 +02:00
Peter Zijlstra
eb58075149 sched/core: Introduce 'struct rq_flags'
In order to be able to pass around more than just the IRQ flags in the
future, add a rq_flags structure.

No difference in code generation for the x86_64-defconfig build I
tested.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:23:59 +02:00
Peter Zijlstra
3e71a462dd sched/core: Move task_rq_lock() out of line
Its a rather large function, inline doesn't seems to make much sense:

 $ size defconfig-build/kernel/sched/core.o{.orig,}
    text    data     bss     dec     hex filename
   56533   21037    2320   79890   13812 defconfig-build/kernel/sched/core.o.orig
   55733   21037    2320   79090   134f2 defconfig-build/kernel/sched/core.o

The 'perf bench sched messaging' micro-benchmark shows a visible improvement
of 4-5%:

  $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done
  $ perf stat --null --repeat 25 -- perf bench sched messaging -g 40 -l 5000

  pre:
       4.582798193 seconds time elapsed          ( +-  1.41% )
       4.733374877 seconds time elapsed          ( +-  2.10% )
       4.560955136 seconds time elapsed          ( +-  1.43% )
       4.631062303 seconds time elapsed          ( +-  1.40% )

  post:
       4.364765213 seconds time elapsed          ( +-  0.91% )
       4.454442734 seconds time elapsed          ( +-  1.18% )
       4.448893817 seconds time elapsed          ( +-  1.41% )
       4.424346872 seconds time elapsed          ( +-  0.97% )

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:23:58 +02:00
Ingo Molnar
64b7aad579 Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 09:01:49 +02:00
Tadeusz Struk
58446fef57 crypto: rsa - select crypto mgr dependency
The pkcs1pad template needs CRYPTO_MANAGER so it needs
to be explicitly selected by CRYPTO_RSA.

Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-05 14:27:05 +08:00
Herbert Xu
13f4bb78cf crypto: hash - Fix page length clamping in hash walk
The crypto hash walk code is broken when supplied with an offset
greater than or equal to PAGE_SIZE.  This patch fixes it by adjusting
walk->pg and walk->offset when this happens.

Cc: <stable@vger.kernel.org>
Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-05 14:27:04 +08:00
Dave Airlie
fca097169f Merge tag 'drm-intel-fixes-2016-05-02' of git://anongit.freedesktop.org/drm-intel into drm-fixes
i915 fixes for 4.6. A bit more than I'd like at this stage, but
OTOH they're all stable material.

* tag 'drm-intel-fixes-2016-05-02' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
  drm/i915: Fake HDMI live status
  drm/i915: Fix eDP low vswing for Broadwell
  drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
  drm/i915: Fix system resume if PCI device remained enabled
  drm/i915: Avoid stalling on pending flips for legacy cursor updates
2016-05-05 12:12:09 +10:00
Dave Airlie
80623de03b Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
two fixes for hw lockups and one for a double free

* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: make sure vertical front porch is at least 1
  drm/radeon: make sure vertical front porch is at least 1
  drm/amdgpu: set metadata pointer to NULL after freeing.
2016-05-05 10:37:25 +10:00
Philipp Zabel
503fe87bd0 gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
If of_node is set before calling platform_device_add, the driver core
will try to use of: modalias matching, which fails because the device
tree nodes don't have a compatible property set. This patch fixes
imx-ipuv3-crtc module autoloading by setting the of_node property only
after the platform modalias is set.

Fixes: 304e6be652 ("gpu: ipu-v3: Assign of_node of child platform devices to corresponding ports")
Reported-by: Dennis Gilmore <dennis@ausil.us>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-By: Dennis Gilmore <dennis@ausil.us>
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-05 10:34:52 +10:00
Viresh Kumar
21f8a99ce6 PM / OPP: Remove useless check
Regulators are optional for devices using OPPs and the OPP core
shouldn't be printing any errors for such missing regulators.

It was fine before the commit 0c717d0f9c, but that failed to update
this part of the code to remove an 'always true' check and an extra
unwanted print message.

Fix that now.

Fixes: 0c717d0f9c (PM / OPP: Initialize regulator pointer to an error value)
Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05 01:42:19 +02:00
Linus Torvalds
21a9703de3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: atmel_mxt_ts - use mxt_acquire_irq in mxt_soft_reset
  Input: zforce_ts - fix dual touch recognition
  Input: twl6040-vibra - fix atomic schedule panic
2016-05-04 16:07:50 -07:00
Yury Norov
1f93e9f231 asm-generic: use compat version for preadv2 and pwritev2
Compat architectures that does not use generic unistd (mips, s390),
declare compat version in their syscall tables for preadv2 and
pwritev2. Generic unistd syscall table should do it as well.

[arnd: this initially slipped through the review and an
 incorrect patch got merged. arch/tile/ is the only architecture
 that could be affected for their 32-bit compat mode, every
 other architecture we support today is fine.]

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2016-05-05 00:42:20 +02:00
David S. Miller
c2cf530d42 Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: 2 bug fixes.

Fix crash on ppc64 due to missing memory barrier and restore multicast
after reset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 17:11:39 -04:00
Michael Chan
7d2837dd7a bnxt_en: Setup multicast properly after resetting device.
The multicast/all-multicast internal flags are not properly restored
after device reset.  This could lead to unreliable multicast operations
after an ethtool configuration change for example.

Call bnxt_mc_list_updated() and setup the vnic->mask in bnxt_init_chip()
to fix the issue.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 17:11:38 -04:00
Michael Chan
67a95e2022 bnxt_en: Need memory barrier when processing the completion ring.
The code determines if the next ring entry is valid before proceeding
further to read the rest of the entry.  The CPU can re-order and read
the rest of the entry first, possibly reading a stale entry, if DMA
of a new entry happens right after reading it.  This issue can be
readily seen on a ppc64 system, causing it to crash.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 17:11:37 -04:00
Prarit Bhargava
93d68841a2 ACPICA: Dispatcher: Update thread ID for recursive method calls
ACPICA commit 7a3bd2d962f221809f25ddb826c9e551b916eb25

Set the mutex owner thread ID.
Original patch from: Prarit Bhargava <prarit@redhat.com>

Link: https://bugzilla.kernel.org/show_bug.cgi?id=115121
Link: https://github.com/acpica/acpica/commit/7a3bd2d9
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Tested-by: Andy Lutomirski <luto@kernel.org> # On a Dell XPS 13 9350
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-04 22:41:43 +02:00
David S. Miller
32b583a0cb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2016-05-04

1) The flowcache can hit an OOM condition if too
   many entries are in the gc_list. Fix this by
   counting the entries in the gc_list and refuse
   new allocations if the value is too high.

2) The inner headers are invalid after a xfrm transformation,
   so reset the skb encapsulation field to ensure nobody tries
   access the inner headers. Otherwise tunnel devices stacked
   on top of xfrm may build the outer headers based on wrong
   informations.

3) Add pmtu handling to vti, we need it to report
   pmtu informations for local generated packets.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 16:35:31 -04:00
Kangjie Lu
5f8e44741f net: fix infoleak in rtnetlink
The stack object “map” has a total size of 32 bytes. Its last 4
bytes are padding generated by compiler. These padding bytes are
not initialized and sent out via “nla_put”.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 16:19:42 -04:00
Kangjie Lu
b8670c09f3 net: fix infoleak in llc
The stack object “info” has a total size of 12 bytes. Its last byte
is padding which is not initialized and leaked via “put_cmsg”.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 16:18:48 -04:00
Linus Torvalds
4810d96829 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull IMA fix from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  ima: fix the string representation of the LSM/IMA hook enumeration ordering
2016-05-04 11:14:00 -07:00
Uwe Kleine-König
1c021bb717 net: fec: only clear a queue's work bit if the queue was emptied
In the receive path a queue's work bit was cleared unconditionally even
if fec_enet_rx_queue only read out a part of the available packets from
the hardware. This resulted in not reading any packets in the next napi
turn and so packets were delayed or lost.

The obvious fix is to only clear a queue's bit when the queue was
emptied.

Fixes: 4d494cdc92 ("net: fec: change data structure to support multiqueue")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Fugang Duan <fugang.duan@nxp.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 14:08:38 -04:00
Matthias Brugger
20decb7e48 drivers: net: xgene: Fix error handling
When probe bails out with an error, we try to unregister the
netdev before we have even registered it. Fix the goto statements
for that.

Signed-off-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 14:00:44 -04:00
Linus Torvalds
41143b774a xen: regression fixes for 4.6-rc6
- Fix two regressions causing crashes in 32-bit PV guests.
 - Fix a regression in the evtchn driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXKhf2AAoJEFxbo/MsZsTRYeEH/jkKi9CsCpcHBdJudqJ/RCmr
 ZLWu9JT5+iYUOD2o+NVumiMbNE7Ary/5rDVYzskzgtD2OL1MQJWQia3FC9Mhvj7y
 6lFEl5m6SPNMi1VOW+BPU8k6tduSgnPISYyJbsQ5/YoAiHNX+ieaWX5UhFFlfDkj
 8kpjVNclc3efgh6RqV1GzrmqhkYwVFATwG3SQFujzGSIC7KZNJHy5RQEH3UBvTU2
 ymcL+eO35VSZrH9BXVvOberI3ME3UOkFxFIlAVqlBEgO8MoOlV0tFs/ciqjsHZvH
 ieKAb5jxLlHTwIu6ZxL2vCMp/ili9jeDNsiADkk9utJUUuI5WlHDjrLbqnUf2aY=
 =ckNq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen regression fixes from David Vrabel:

 - Fix two regressions causing crashes in 32-bit PV guests

 - Fix a regression in the evtchn driver

* tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/evtchn: fix ring resize when binding new events
  xen/balloon: Fix crash when ballooning on x86 32 bit PAE
  xen: Fix page <-> pfn conversion on 32 bit systems
2016-05-04 11:00:05 -07:00
Jan Beulich
27e0e63853 xen/evtchn: fix ring resize when binding new events
The copying of ring data was wrong for two cases: For a full ring
nothing got copied at all (as in that case the canonicalized producer
and consumer indexes are identical). And in case one or both of the
canonicalized (after the resize) indexes would point into the second
half of the buffer, the copied data ended up in the wrong (free) part
of the new buffer. In both cases uninitialized data would get passed
back to the caller.

Fix this by simply copying the old ring contents twice: Once to the
low half of the new buffer, and a second time to the high half.

This addresses the inability to boot a HVM guest with 64 or more
vCPUs.  This regression was caused by 8620015499 (xen/evtchn:
dynamically grow pending event channel ring).

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: <stable@vger.kernel.org> # 4.4+
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-05-04 16:37:01 +01:00
Rafael J. Wysocki
6d45b719cb intel_pstate: Fix intel_pstate_get()
After commit 8fa520af50 "intel_pstate: Remove freq calculation from
intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
to compute the average frequency, which is problematic for two reasons.

First, intel_pstate_get() may be invoked before the driver reads the
CPU feedback registers for the first time and if that happens,
get_avg_frequency() will attempt to divide by zero.

Second, the get_avg_frequency() call in intel_pstate_get() is racy
with respect to intel_pstate_sample() and it may end up returning
completely meaningless values for this reason.

Moreover, after commit 7349ec0470 "intel_pstate: Move
intel_pstate_calc_busy() into get_target_pstate_use_performance()"
sample.core_pct_busy is never computed on Atom, but it is used in
intel_pstate_adjust_busy_pstate() in that case too.

To address those problems notice that if sample.core_pct_busy
was used in the average frequency computation carried out by
get_avg_frequency(), both the divide by zero problem and the
race with respect to intel_pstate_sample() would be avoided.

Accordingly, move the invocation of intel_pstate_calc_busy() from
get_target_pstate_use_performance() to intel_pstate_update_util(),
which also will take care of the uninitialized sample.core_pct_busy
on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
as per the above.

Reported-by: kernel test robot <ying.huang@linux.intel.com>
Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4
Fixes: 8fa520af50 "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()"
Fixes: 7349ec0470 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()"
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-04 14:09:16 +02:00