linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-17 06:08:52 +07:00

Author	SHA1	Message	Date
Howard Cochran	74d3694433	writeback: Fix performance regression in wb_over_bg_thresh() Commit `947e9762a8` ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") unintentionally changed this function's meaning from "are there more dirty pages than the background writeback threshold" to "are there more dirty pages than the writeback threshold". The background writeback threshold is typically half of the writeback threshold, so this had the effect of raising the number of dirty pages required to cause a writeback worker to perform background writeout. This can cause a very severe performance regression when a BDI uses BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker can now disagree on whether writeback should be initiated. For example, in a system having 1GB of RAM, a single spinning disk, and a "pass-through" FUSE filesystem mounted over the disk, application code mmapped a 128MB file on the disk and was randomly dirtying pages in that mapping. Because FUSE uses strictlimit and has a default max_ratio of only 1%, in balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the dirty_freerun_ceiling is the average of those, ~150. So, it pauses the dirtying processes when we have 151 dirty pages and wakes up a background writeback worker. But the worker tests the wrong threshold (200 instead of 100), so it does not initiate writeback and just returns. Thus, balance_dirty_pages keeps looping, sleeping and then waking up the worker who will do nothing. It remains stuck in this state until the few dirty pages that we have finally expire and we write them back for that reason. Then the whole process repeats, resulting in near-zero throughput through the FUSE BDI. The fix is to call the parameterized variant of wb_calc_thresh, so that the worker will do writeback if the bg_thresh is exceeded which was the behavior before the referenced commit. Fixes: `947e9762a8` ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") Signed-off-by: Howard Cochran <hcochran@kernelspring.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.2+ Tested-by Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-05 15:44:55 -06:00
Vladimir Murzin	ec953b70f3	ARM: 8573/1: domain: move {set,get}_domain under config guard Recursive undefined instrcution falut is seen with R-class taking an exception. The reson for that is __show_regs() tries to get domain information, but domains is not available on !MMU cores, like R/M class. Fix it by puting {set,get}_domain functions under CONFIG_CPU_CP15_MMU guard and providing stubs for the case where domains is not supported. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2016-05-05 19:03:02 +01:00
Jean-Philippe Brucker	5b526bd925	ARM: 8572/1: nommu: change memory reserve for the vectors Commit `19accfd3` (ARM: move vector stubs) moved the vector stubs in an additional page above the base vector one. This change wasn't taken into account by the nommu memreserve. This patch ensures that the kernel won't overwrite any vector stub on nommu. [changed the MPU side too] Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2016-05-05 19:03:02 +01:00
Jean-Philippe Brucker	695665b0c5	ARM: 8571/1: nommu: fix PMSAv7 setup Commit `1c2f87c` (ARM: 8025/1: Get rid of meminfo) broke the support for MPU on ARMv7-R. This patch adapts the code inside CONFIG_ARM_MPU to use memblocks appropriately. MPU initialisation only uses the first memory region, and removes all subsequent ones. Because looping over all regions that need removal is inefficient, and memblock_remove already handles memory ranges, we can flatten the 'for_each_memblock' part. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2016-05-05 19:03:01 +01:00
Christoph Hellwig	9c674815d3	IB/iser: Fix max_sectors calculation iSER currently has a couple places that set max_sectors in either the host template or SCSI host, and all of them get it wrong. This patch instead uses a single assignment that (hopefully) gets it right: the max_sectors value must be derived from the number of segments in the FR or FMR structure, but actually be one lower than the page size multiplied by the number of sectors, as it has to handle the case of non-aligned I/O. Without this I get trivial to reproduce hangs when running xfstests (on XFS) over iSER to Linux targets. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 12:41:24 -04:00
Linus Torvalds	c5e0666c5a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns fix from Eric Biederman: "This contains just a single fix for a nasty oops" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: propogate_mnt: Handle the first propogated copy being a slave	2016-05-05 08:41:57 -07:00
Linus Torvalds	3cedbec301	virtio/qemu: fixes for 4.6 A couple of fixes for virtio and for the new QEMU fw cfg driver. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXJfuWAAoJECgfDbjSjVRpRdwH/RQjAQnqTuLAI/mn/kbinK24 EoDubbWKq+KVDDwoc3tkWRNBoyXDUIYLHrrJNhLtrXNULTGtIejqJdBnD1THuiZ2 GGWzDiB86qtU1uXSEaXSBFpLi0YYP26bcQ8hjnDhSUWqESE2AC63NO0E0plyu2NI oPIiWLN3gJovr12vVURbYcrXxZ1zykd/8E80jj/Gus81XrbKXsn8Wmq1cSrtD/DC S07pWbF43YRl3pWyezQW8D4RowF1f9x8CSzrcvqt3qw5d0+jqtcKz0iR0vsi3wIy poAadEn+91Xi2kVZUm9bCB30+7A+iqGnFIG7x29xrdZl+2rbvquwRiNlvX4RM4g= =GoGi -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio/qemu fixes from Michael Tsirkin: "A couple of fixes for virtio and for the new QEMU fw cfg driver" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio: Silence uninitialized variable warning firmware: qemu_fw_cfg.c: potential unintialized variable	2016-05-05 08:26:54 -07:00
Eric W. Biederman	5ec0811d30	propogate_mnt: Handle the first propogated copy being a slave When the first propgated copy was a slave the following oops would result: > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 > IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > PGD bacd4067 PUD bac66067 PMD 0 > Oops: 0000 [#1] SMP > Modules linked in: > CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000 > RIP: 0010:[<ffffffff811fba4e>] [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP: 0018:ffff8800bac3fd38 EFLAGS: 00010283 > RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010 > RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480 > RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000 > R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00 > FS: 00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0 > Stack: > ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85 > ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40 > 0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0 > Call Trace: > [<ffffffff811fbf85>] propagate_mnt+0x105/0x140 > [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0 > [<ffffffff811f1ec3>] graft_tree+0x63/0x70 > [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100 > [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0 > [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70 > [<ffffffff811f3a45>] SyS_mount+0x75/0xc0 > [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0 > [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25 > Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30 > RIP [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP <ffff8800bac3fd38> > CR2: 0000000000000010 > ---[ end trace 2725ecd95164f217 ]--- This oops happens with the namespace_sem held and can be triggered by non-root users. An all around not pleasant experience. To avoid this scenario when finding the appropriate source mount to copy stop the walk up the mnt_master chain when the first source mount is encountered. Further rewrite the walk up the last_source mnt_master chain so that it is clear what is going on. The reason why the first source mount is special is that it it's mnt_parent is not a mount in the dest_mnt propagation tree, and as such termination conditions based up on the dest_mnt mount propgation tree do not make sense. To avoid other kinds of confusion last_dest is not changed when computing last_source. last_dest is only used once in propagate_one and that is above the point of the code being modified, so changing the global variable is meaningless and confusing. Cc: stable@vger.kernel.org fixes: `f2ebb3a921` ("smarter propagate_mnt()") Reported-by: Tycho Andersen <tycho.andersen@canonical.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2016-05-05 09:54:45 -05:00
Wang YanQing	c10fcb14c7	x86/sysfb_efi: Fix valid BAR address range check The code for checking whether a BAR address range is valid will break out of the loop when a start address of 0x0 is encountered. This behaviour is wrong since by breaking out of the loop we may miss the BAR that describes the EFI frame buffer in a later iteration. Because of this bug I can't use video=efifb: boot parameter to get efifb on my new ThinkPad E550 for my old linux system hard disk with 3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not supporting the GPU. This patch also add a trivial optimization to break out after we find the frame buffer address range without testing later BARs. Signed-off-by: Wang YanQing <udknight@gmail.com> [ Rewrote changelog. ] Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Peter Jones <pjones@redhat.com> Cc: <stable@vger.kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 16:01:00 +02:00
Vineet Gupta	26f9d5fd82	ARC: support HIGHMEM even without PAE40 Initial HIGHMEM support on ARC was introduced for PAE40 where the low memory (0x8000_0000 based) and high memory (0x1_0000_0000) were physically contiguous. So CONFIG_FLATMEM sufficed (despite a peipheral hole in the middle, which wasted a bit of struct page memory, but things worked). However w/o PAE, highmem was not possible and we could only reach ~1.75GB of DDR. Now there is a use case to access ~4GB of DDR w/o PAE40 The idea is to have low memory at canonical 0x8000_0000 and highmem at 0 so enire 4GB address space is available for physical addressing This needs additional platform/interconnect mapping to convert the non contiguous physical addresses into linear bus adresses. From Linux point of view, non contiguous divide means FLATMEM no longer works and DISCONTIGMEM is needed to track the pfns in the 2 regions. This scheme would also work for PAE40, only better in that we don't waste struct page memory for the peripheral hole. The DT description will be something like memory { ... reg = <0x80000000 0x200000000 /* 512MB: lowmem / 0x00000000 0x10000000>; / 256MB: highmem */ } Signed-off-by: Noam Camus <noamc@ezchip.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-05-05 16:35:46 +05:30
Vineet Gupta	2519d75367	ARC: Fix PAE40 boot failures due to PTE truncation So a benign looking cleanup which macro'ized PAGE_SHIFT shifts turned out to be bad (since it was done non-sensically across the board). It caused boot failures with PAE40 as forced cast to (unsigned long) from newly introduced virt_to_pfn() was causing truncatiion of the (long long) pte/paddr values. It is OK to use this in accessors dealing with kernel virtual address, pointers etc, but not for PTE values themelves. Fixes: cJ2ff5cf2735c ("ARC: mm: Use virt_to_pfn() for addr >> PAGE_SHIFT pattern) Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-05-05 16:35:45 +05:30
Vineet Gupta	e5bc0478ab	ARC: Add missing io barriers to io{read,write}{16,32}be() While reviewing a different change to asm-generic/io.h Arnd spotted that ARC ioread32 and ioread32be both of which come from asm-generic versions are not symmetrical in terms of calling the io barriers. generic ioread32 -> ARC readl() [ has barriers] generic ioread32be -> __be32_to_cpu(__raw_readl()) [ lacks barriers] While generic ioread32be is being remediated to call readl(), that involves a swab32(), causing double swaps on ioread32be() on Big Endian systems. So provide our versions of big endian IO accessors to ensure io barrier calls while also keeping them optimal Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org [4.2+] Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-05-05 16:35:28 +05:30
Mauro Carvalho Chehab	b34ecd5aa3	[media] media-device: fix builds when USB or PCI is compiled as module Just checking ifdef CONFIG_USB is not enough, if the USB is compiled as module. The same applies to PCI. Tested with the following .config alternatives: CONFIG_USB=m CONFIG_MEDIA_CONTROLLER=y CONFIG_MEDIA_SUPPORT=m CONFIG_VIDEO_AU0828=m CONFIG_USB=m CONFIG_MEDIA_CONTROLLER=y CONFIG_MEDIA_SUPPORT=y CONFIG_VIDEO_AU0828=m CONFIG_USB=y CONFIG_MEDIA_CONTROLLER=y CONFIG_MEDIA_SUPPORT=y CONFIG_VIDEO_AU0828=m CONFIG_USB=y CONFIG_MEDIA_CONTROLLER=y CONFIG_MEDIA_SUPPORT=y CONFIG_VIDEO_AU0828=y Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2016-05-05 08:01:34 -03:00
Peter Zijlstra	8482716b9d	perf/x86/amd/iommu: Do not register a task ctx for uncore like PMUs The new sanity check introduced by: `2665784850` ("perf/core: Verify we have a single perf_hw_context PMU") ... triggered on the AMD IOMMU driver. IOMMUs are not per logical CPU, they cannot have per-task counters. Fix it. Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jroedel@suse.de Cc: suravee.suthikulpanit@amd.com Link: http://lkml.kernel.org/r/20160423224255.GB3430@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 10:11:28 +02:00
Alex Thorlton	08914f436b	x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init A while back the following commit: `d394f2d9d8` ("x86/platform/UV: Remove EFI memmap quirk for UV2+") changed uv_system_init() to only call map_low_mmrs() on older UV1 hardware, which requires EFI_OLD_MEMMAP to be set in order to boot. The recent changes to the EFI memory mapping code in: `d2f7cbe7b2` ("x86/efi: Runtime services virtual mapping") exposed some issues with the fact that we were relying on the EFI memory mapping mechanisms to map in our MMRs for us, after commit `d394f2d9d8`. Rather than revert the entire commit and go back to forcing EFI_OLD_MEMMAP on all UVs, we're going to add the call to map_low_mmrs() back into uv_system_init(), and then fix up our EFI runtime calls to use the appropriate page table. For now, UV2+ will still need efi=old_map to boot, but there will be other changes soon that should eliminate the need for this. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Russ Anderson <rja@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1462401592-120735-1-git-send-email-athorlton@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:55:02 +02:00
Dietmar Eggemann	885e542ce8	sched/fair: Fix comment in calculate_imbalance() The comment in calculate_imbalance() was introduced in commit: `2dd73a4f09` ("[PATCH] sched: implement smpnice") which described the logic as it was then, but a later commit: `b18855500f` ("sched/balancing: Fix 'local->avg_load > sds->avg_load' case in calculate_imbalance()") .. complicated this logic some more so that the comment does not match anymore. Update the comment to match the code. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1461958364-675-3-git-send-email-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:41:10 +02:00
Dietmar Eggemann	0a9b23ce46	sched/fair: Remove stale power aware scheduling comments Commit `8e7fbcbc22` ("sched: Remove stale power aware scheduling remnants and dysfunctional knobs") deleted the power aware scheduling support. This patch gets rid of the remaining power aware scheduling related comments in the code as well. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1461958364-675-2-git-send-email-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:41:09 +02:00
Matt Fleming	b52fad2db5	sched/fair: Update rq clock before updating nohz CPU load If we're accessing rq_clock() (e.g. in sched_avg_update()) we should update the rq clock before calling cpu_load_update(), otherwise any time calculations will be stale. All other paths currently call update_rq_clock(). Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1462304814-11715-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:41:09 +02:00
Wanpeng Li	db6ea2fb09	sched/debug: Print out idle balance values even on !CONFIG_SCHEDSTATS kernels The max_idle_balance_cost and avg_idle values which are tracked and ar used to capture short idle incidents, are not associated with schedstats, however the information of these two values isn't printed out on !CONFIG_SCHEDSTATS kernels. Fix this by moving the value printout out of the CONFIG_SCHEDSTATS section. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1462250305-4523-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:41:09 +02:00
Yuyang Du	7b20b916e9	sched/fair: Optimize sum computation with a lookup table __compute_runnable_contrib() uses a loop to compute sum, whereas a table lookup can do it faster in a constant amount of time. The program to generate the constants is located at: Documentation/scheduler/sched-avg.txt Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Morten Rasmussen <morten.rasmussen@arm.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@arm.com Cc: pjt@google.com Link: http://lkml.kernel.org/r/1462226078-31904-2-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:41:08 +02:00
Yuyang Du	7b5953345e	sched/fair: Add detailed description to the sched load avg metrics These sched metrics have become complex enough, so describe them in detail at their definition. Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Fixed the text to improve its spelling and typography. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-4-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:41:08 +02:00
Yuyang Du	172895e6b5	sched/fair: Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT and remove SCHED_LOAD_SCALE After cleaning up the sched metrics, there are two definitions that are ambiguous and confusing: SCHED_LOAD_SHIFT and SCHED_LOAD_SHIFT. Resolve this: - Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT, which better reflects what it is. - Replace SCHED_LOAD_SCALE use with SCHED_CAPACITY_SCALE and remove SCHED_LOAD_SCALE. Suggested-by: Ben Segall <bsegall@google.com> Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-3-git-send-email-yuyang.du@intel.com [ Rewrote the changelog and fixed the build on 32-bit kernels. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:35:21 +02:00
Andi Kleen	cba1b3798e	perf/x86: Add model numbers for Kabylake CPUs Everything the same as Skylake, just new model numbers. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461977748-17616-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:29:00 +02:00
Yuyang Du	6ecdd74962	sched/fair: Generalize the load/util averages resolution definition Integer metric needs fixed point arithmetic. In sched/fair, a few metrics, e.g., weight, load, load_avg, util_avg, freq, and capacity, may have different fixed point ranges, which makes their update and usage error-prone. In order to avoid the errors relating to the fixed point range, we definie a basic fixed point range, and then formalize all metrics to base on the basic range. The basic range is 1024 or (1 << 10). Further, one can recursively apply the basic range to have larger range. Pointed out by Ben Segall, weight (visible to user, e.g., NICE-0 has 1024) and load (e.g., NICE_0_LOAD) have independent ranges, but they must be well calibrated. Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-2-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:24:00 +02:00
Peter Zijlstra	2159197d66	sched/core: Enable increased load resolution on 64-bit kernels Mike ran into the low load resolution limitation on his big machine. So reenable these bits; nobody could ever reproduce/analyze the reported power usage claim and Google has been running with this for years as well. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:24:00 +02:00
Peter Zijlstra	e7904a28f5	locking/lockdep, sched/core: Implement a better lock pinning scheme The problem with the existing lock pinning is that each pin is of value 1; this mean you can simply unpin if you know its pinned, without having any extra information. This scheme generates a random (16 bit) cookie for each pin and requires this same cookie to unpin. This means you have to keep the cookie in context. No objsize difference for !LOCKDEP kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:23:59 +02:00
Peter Zijlstra	eb58075149	sched/core: Introduce 'struct rq_flags' In order to be able to pass around more than just the IRQ flags in the future, add a rq_flags structure. No difference in code generation for the x86_64-defconfig build I tested. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:23:59 +02:00
Peter Zijlstra	3e71a462dd	sched/core: Move task_rq_lock() out of line Its a rather large function, inline doesn't seems to make much sense: $ size defconfig-build/kernel/sched/core.o{.orig,} text data bss dec hex filename 56533 21037 2320 79890 13812 defconfig-build/kernel/sched/core.o.orig 55733 21037 2320 79090 134f2 defconfig-build/kernel/sched/core.o The 'perf bench sched messaging' micro-benchmark shows a visible improvement of 4-5%: $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done $ perf stat --null --repeat 25 -- perf bench sched messaging -g 40 -l 5000 pre: 4.582798193 seconds time elapsed ( +- 1.41% ) 4.733374877 seconds time elapsed ( +- 2.10% ) 4.560955136 seconds time elapsed ( +- 1.43% ) 4.631062303 seconds time elapsed ( +- 1.40% ) post: 4.364765213 seconds time elapsed ( +- 0.91% ) 4.454442734 seconds time elapsed ( +- 1.18% ) 4.448893817 seconds time elapsed ( +- 1.41% ) 4.424346872 seconds time elapsed ( +- 0.97% ) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:23:58 +02:00
Ingo Molnar	64b7aad579	Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 09:01:49 +02:00
Tadeusz Struk	58446fef57	crypto: rsa - select crypto mgr dependency The pkcs1pad template needs CRYPTO_MANAGER so it needs to be explicitly selected by CRYPTO_RSA. Reported-by: Jamie Heilman <jamie@audible.transient.net> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-05-05 14:27:05 +08:00
Herbert Xu	13f4bb78cf	crypto: hash - Fix page length clamping in hash walk The crypto hash walk code is broken when supplied with an offset greater than or equal to PAGE_SIZE. This patch fixes it by adjusting walk->pg and walk->offset when this happens. Cc: <stable@vger.kernel.org> Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-05-05 14:27:04 +08:00
Dave Airlie	fca097169f	Merge tag 'drm-intel-fixes-2016-05-02' of git://anongit.freedesktop.org/drm-intel into drm-fixes i915 fixes for 4.6. A bit more than I'd like at this stage, but OTOH they're all stable material. * tag 'drm-intel-fixes-2016-05-02' of git://anongit.freedesktop.org/drm-intel: drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW drm/i915: Fake HDMI live status drm/i915: Fix eDP low vswing for Broadwell drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume drm/i915: Fix system resume if PCI device remained enabled drm/i915: Avoid stalling on pending flips for legacy cursor updates	2016-05-05 12:12:09 +10:00
Dave Airlie	80623de03b	Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes two fixes for hw lockups and one for a double free * 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: make sure vertical front porch is at least 1 drm/radeon: make sure vertical front porch is at least 1 drm/amdgpu: set metadata pointer to NULL after freeing.	2016-05-05 10:37:25 +10:00
Philipp Zabel	503fe87bd0	gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading If of_node is set before calling platform_device_add, the driver core will try to use of: modalias matching, which fails because the device tree nodes don't have a compatible property set. This patch fixes imx-ipuv3-crtc module autoloading by setting the of_node property only after the platform modalias is set. Fixes: `304e6be652` ("gpu: ipu-v3: Assign of_node of child platform devices to corresponding ports") Reported-by: Dennis Gilmore <dennis@ausil.us> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-By: Dennis Gilmore <dennis@ausil.us> Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-05-05 10:34:52 +10:00
Viresh Kumar	21f8a99ce6	PM / OPP: Remove useless check Regulators are optional for devices using OPPs and the OPP core shouldn't be printing any errors for such missing regulators. It was fine before the commit `0c717d0f9c`, but that failed to update this part of the code to remove an 'always true' check and an extra unwanted print message. Fix that now. Fixes: `0c717d0f9c` (PM / OPP: Initialize regulator pointer to an error value) Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:42:19 +02:00
Linus Torvalds	21a9703de3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: atmel_mxt_ts - use mxt_acquire_irq in mxt_soft_reset Input: zforce_ts - fix dual touch recognition Input: twl6040-vibra - fix atomic schedule panic	2016-05-04 16:07:50 -07:00
Yury Norov	1f93e9f231	asm-generic: use compat version for preadv2 and pwritev2 Compat architectures that does not use generic unistd (mips, s390), declare compat version in their syscall tables for preadv2 and pwritev2. Generic unistd syscall table should do it as well. [arnd: this initially slipped through the review and an incorrect patch got merged. arch/tile/ is the only architecture that could be affected for their 32-bit compat mode, every other architecture we support today is fine.] Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2016-05-05 00:42:20 +02:00
David S. Miller	c2cf530d42	Merge branch 'bnxt_en-fixes' Michael Chan says: ==================== bnxt_en: 2 bug fixes. Fix crash on ppc64 due to missing memory barrier and restore multicast after reset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 17:11:39 -04:00
Michael Chan	7d2837dd7a	bnxt_en: Setup multicast properly after resetting device. The multicast/all-multicast internal flags are not properly restored after device reset. This could lead to unreliable multicast operations after an ethtool configuration change for example. Call bnxt_mc_list_updated() and setup the vnic->mask in bnxt_init_chip() to fix the issue. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 17:11:38 -04:00
Michael Chan	67a95e2022	bnxt_en: Need memory barrier when processing the completion ring. The code determines if the next ring entry is valid before proceeding further to read the rest of the entry. The CPU can re-order and read the rest of the entry first, possibly reading a stale entry, if DMA of a new entry happens right after reading it. This issue can be readily seen on a ppc64 system, causing it to crash. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 17:11:37 -04:00
Prarit Bhargava	93d68841a2	ACPICA: Dispatcher: Update thread ID for recursive method calls ACPICA commit 7a3bd2d962f221809f25ddb826c9e551b916eb25 Set the mutex owner thread ID. Original patch from: Prarit Bhargava <prarit@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=115121 Link: https://github.com/acpica/acpica/commit/7a3bd2d9 Signed-off-by: Prarit Bhargava <prarit@redhat.com> Tested-by: Andy Lutomirski <luto@kernel.org> # On a Dell XPS 13 9350 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-04 22:41:43 +02:00
David S. Miller	32b583a0cb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2016-05-04 1) The flowcache can hit an OOM condition if too many entries are in the gc_list. Fix this by counting the entries in the gc_list and refuse new allocations if the value is too high. 2) The inner headers are invalid after a xfrm transformation, so reset the skb encapsulation field to ensure nobody tries access the inner headers. Otherwise tunnel devices stacked on top of xfrm may build the outer headers based on wrong informations. 3) Add pmtu handling to vti, we need it to report pmtu informations for local generated packets. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:35:31 -04:00
Kangjie Lu	5f8e44741f	net: fix infoleak in rtnetlink The stack object “map” has a total size of 32 bytes. Its last 4 bytes are padding generated by compiler. These padding bytes are not initialized and sent out via “nla_put”. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:19:42 -04:00
Kangjie Lu	b8670c09f3	net: fix infoleak in llc The stack object “info” has a total size of 12 bytes. Its last byte is padding which is not initialized and leaked via “put_cmsg”. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:18:48 -04:00
Linus Torvalds	4810d96829	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull IMA fix from James Morris. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ima: fix the string representation of the LSM/IMA hook enumeration ordering	2016-05-04 11:14:00 -07:00
Uwe Kleine-König	1c021bb717	net: fec: only clear a queue's work bit if the queue was emptied In the receive path a queue's work bit was cleared unconditionally even if fec_enet_rx_queue only read out a part of the available packets from the hardware. This resulted in not reading any packets in the next napi turn and so packets were delayed or lost. The obvious fix is to only clear a queue's bit when the queue was emptied. Fixes: `4d494cdc92` ("net: fec: change data structure to support multiqueue") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: Fugang Duan <fugang.duan@nxp.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:08:38 -04:00
Matthias Brugger	20decb7e48	drivers: net: xgene: Fix error handling When probe bails out with an error, we try to unregister the netdev before we have even registered it. Fix the goto statements for that. Signed-off-by: Matthias Brugger <mbrugger@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:00:44 -04:00
Linus Torvalds	41143b774a	xen: regression fixes for 4.6-rc6 - Fix two regressions causing crashes in 32-bit PV guests. - Fix a regression in the evtchn driver. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXKhf2AAoJEFxbo/MsZsTRYeEH/jkKi9CsCpcHBdJudqJ/RCmr ZLWu9JT5+iYUOD2o+NVumiMbNE7Ary/5rDVYzskzgtD2OL1MQJWQia3FC9Mhvj7y 6lFEl5m6SPNMi1VOW+BPU8k6tduSgnPISYyJbsQ5/YoAiHNX+ieaWX5UhFFlfDkj 8kpjVNclc3efgh6RqV1GzrmqhkYwVFATwG3SQFujzGSIC7KZNJHy5RQEH3UBvTU2 ymcL+eO35VSZrH9BXVvOberI3ME3UOkFxFIlAVqlBEgO8MoOlV0tFs/ciqjsHZvH ieKAb5jxLlHTwIu6ZxL2vCMp/ili9jeDNsiADkk9utJUUuI5WlHDjrLbqnUf2aY= =ckNq -----END PGP SIGNATURE----- Merge tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen regression fixes from David Vrabel: - Fix two regressions causing crashes in 32-bit PV guests - Fix a regression in the evtchn driver * tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/evtchn: fix ring resize when binding new events xen/balloon: Fix crash when ballooning on x86 32 bit PAE xen: Fix page <-> pfn conversion on 32 bit systems	2016-05-04 11:00:05 -07:00
Jan Beulich	27e0e63853	xen/evtchn: fix ring resize when binding new events The copying of ring data was wrong for two cases: For a full ring nothing got copied at all (as in that case the canonicalized producer and consumer indexes are identical). And in case one or both of the canonicalized (after the resize) indexes would point into the second half of the buffer, the copied data ended up in the wrong (free) part of the new buffer. In both cases uninitialized data would get passed back to the caller. Fix this by simply copying the old ring contents twice: Once to the low half of the new buffer, and a second time to the high half. This addresses the inability to boot a HVM guest with 64 or more vCPUs. This regression was caused by `8620015499` (xen/evtchn: dynamically grow pending event channel ring). Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> # 4.4+ Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2016-05-04 16:37:01 +01:00
Rafael J. Wysocki	6d45b719cb	intel_pstate: Fix intel_pstate_get() After commit `8fa520af50` "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency() to compute the average frequency, which is problematic for two reasons. First, intel_pstate_get() may be invoked before the driver reads the CPU feedback registers for the first time and if that happens, get_avg_frequency() will attempt to divide by zero. Second, the get_avg_frequency() call in intel_pstate_get() is racy with respect to intel_pstate_sample() and it may end up returning completely meaningless values for this reason. Moreover, after commit `7349ec0470` "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()" sample.core_pct_busy is never computed on Atom, but it is used in intel_pstate_adjust_busy_pstate() in that case too. To address those problems notice that if sample.core_pct_busy was used in the average frequency computation carried out by get_avg_frequency(), both the divide by zero problem and the race with respect to intel_pstate_sample() would be avoided. Accordingly, move the invocation of intel_pstate_calc_busy() from get_target_pstate_use_performance() to intel_pstate_update_util(), which also will take care of the uninitialized sample.core_pct_busy on Atom, and modify get_avg_frequency() to use sample.core_pct_busy as per the above. Reported-by: kernel test robot <ying.huang@linux.intel.com> Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4 Fixes: `8fa520af50` "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" Fixes: `7349ec0470` "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()" Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-04 14:09:16 +02:00

1 2 3 4 5 ...

590068 Commits