Commit Graph

602833 Commits

Author SHA1 Message Date
Kirill A. Shutemov
c17b1f4259 hugetlb: fix nr_pmds accounting with shared page tables
We account HugeTLB's shared page table to all processes who share it.
The accounting happens during huge_pmd_share().

If somebody populates pud entry under us, we should decrease pagetable's
refcount and decrease nr_pmds of the process.

By mistake, I increase nr_pmds again in this case.  :-/ It will lead to
"BUG: non-zero nr_pmds on freeing mm: 2" on process' exit.

Let's fix this by increasing nr_pmds only when we're sure that the page
table will be used.

Link: http://lkml.kernel.org/r/20160617122506.GC6534@node.shutemov.name
Fixes: dc6c9a35b6 ("mm: account pmd page tables to the process")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: zhongjiang <zhongjiang@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Kirill A. Shutemov
06d8fbc7cf Revert "mm: disable fault around on emulated access bit architecture"
This reverts commit d0834a6c2c.

After revert of 5c0a85fad9 ("mm: make faultaround produce old ptes")
faultaround doesn't have dependencies on hardware accessed bit, so let's
revert this one too.

Link: http://lkml.kernel.org/r/1465893750-44080-3-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Kirill A. Shutemov
315d09bf30 Revert "mm: make faultaround produce old ptes"
This reverts commit 5c0a85fad9.

The commit causes ~6% regression in unixbench.

Let's revert it for now and consider other solution for reclaim problem
later.

Link: http://lkml.kernel.org/r/1465893750-44080-2-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Antoine Tenart
1f08fe2665 mailmap: add Boris Brezillon's email
There are different versions of Boris' name and email in the log, and
one typo.  Add his emails in mailmap to have all of his contributions
under the same name/email tuple.

Link: http://lkml.kernel.org/r/20160609130323.27706-2-antoine.tenart@free-electrons.com
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Antoine Tenart
a8a47ff534 mailmap: add Antoine Tenart's email
I used "Antoine Ténart" at first but then moved to a name without accent
as this cause some issues from time to time...  Add my email in the
mailmap file to have a consistent shortlog output.

Link: http://lkml.kernel.org/r/20160609130323.27706-1-antoine.tenart@free-electrons.com
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Cc: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Mel Gorman
e838a45f93 mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
Commit d0164adc89 ("mm, page_alloc: distinguish between being unable
to sleep, unwilling to sleep and avoiding waking kswapd") modified
__GFP_WAIT to explicitly identify the difference between atomic callers
and those that were unwilling to sleep.  Later the definition was
removed entirely.

The GFP_RECLAIM_MASK is the set of flags that affect watermark checking
and reclaim behaviour but __GFP_ATOMIC was never added.  Without it,
atomic users of the slab allocator strip the __GFP_ATOMIC flag and
cannot access the page allocator atomic reserves.  This patch addresses
the problem.

The user-visible impact depends on the workload but potentially atomic
allocations unnecessarily fail without this path.

Link: http://lkml.kernel.org/r/20160610093832.GK2527@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>	[4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Andrey Ryabinin
9b75a867cc mm: mempool: kasan: don't poot mempool objects in quarantine
Currently we may put reserved by mempool elements into quarantine via
kasan_kfree().  This is totally wrong since quarantine may really free
these objects.  So when mempool will try to use such element,
use-after-free will happen.  Or mempool may decide that it no longer
need that element and double-free it.

So don't put object into quarantine in kasan_kfree(), just poison it.
Rename kasan_kfree() to kasan_poison_kfree() to respect that.

Also, we shouldn't use kasan_slab_alloc()/kasan_krealloc() in
kasan_unpoison_element() because those functions may update allocation
stacktrace.  This would be wrong for the most of the remove_element call
sites.

(The only call site where we may want to update alloc stacktrace is
 in mempool_alloc(). Kmemleak solves this by calling
 kmemleak_update_trace(), so we could make something like that too.
 But this is out of scope of this patch).

Fixes: 55834c5909 ("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/575977C3.1010905@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Jon Mason
a6921c2974 MAINTAINERS: update Calgary IOMMU
Update the contact info for Muli, clean-up my name, and update the
mailing list to the IOMMU mailing list.

Link: http://lkml.kernel.org/r/1465493059-11840-2-git-send-email-jdmason@kudzu.us
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
f2db19719a jbd2: get rid of superfluous __GFP_REPEAT
jbd2_alloc is explicit about its allocation preferences wrt.  the
allocation size.  Sub page allocations go to the slab allocator and
larger are using either the page allocator or vmalloc.  This is all good
but the logic is unnecessarily complex.

1) as per Ted, the vmalloc fallback is a left-over:

 : jbd2_alloc is only passed in the bh->b_size, which can't be PAGE_SIZE, so
 : the code path that calls vmalloc() should never get called.  When we
 : conveted jbd2_alloc() to suppor sub-page size allocations in commit
 : d2eecb0393, there was an assumption that it could be called with a size
 : greater than PAGE_SIZE, but that's certaily not true today.

Moreover vmalloc allocation might even lead to a deadlock because the
callers expect GFP_NOFS context while vmalloc is GFP_KERNEL.

2) __GFP_REPEAT for requests <= PAGE_ALLOC_COSTLY_ORDER is ignored
   since the flag was introduced.

Let's simplify the code flow and use the slab allocator for sub-page
requests and the page allocator for others.  Even though order > 0 is
not currently used as per above leave that option open.

Link: http://lkml.kernel.org/r/1464599699-30131-18-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
a830627b01 unicore32: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

PGALLOC_GFP uses __GFP_REPEAT but it is only used in pte_alloc_one,
pte_alloc_one_kernel which does order-0 request.  This means that this
flag has never been actually useful here because it has always been used
only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-17-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
f45eebc25e tile: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pgtable_alloc_one uses __GFP_REPEAT flag for L2_USER_PGTABLE_ORDER but
the order is either 0 or 3 if L2_KERNEL_PGTABLE_SHIFT for HPAGE_SHIFT.
This means that this flag has never been actually useful here because it
has always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-16-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
884ed4cb8a sh: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

PGALLOC_GFP uses __GFP_REPEAT but {pgd,pmd}_alloc allocate from
{pgd,pmd}_cache but both caches are allocating up to PAGE_SIZE objects.
This means that this flag has never been actually useful here because it
has always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-15-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
10d58bf297 s390: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

page_table_alloc then uses the flag for a single page allocation.  This
means that this flag has never been actually useful here because it has
always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-14-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
45eeff260d sparc: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

{pud,pmd}_alloc_one is using __GFP_REPEAT but it always allocates from
pgtable_cache which is initialzed to PAGE_SIZE objects.  This means that
this flag has never been actually useful here because it has always been
used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-13-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
2379a23e34 powerpc: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

{pud,pmd}_alloc_one are allocating from {PGT,PUD}_CACHE initialized in
pgtable_cache_init which doesn't have larger than sizeof(void *) << 12
size and that fits into !costly allocation request size.

PGALLOC_GFP is used only in radix__pgd_alloc which uses either order-0
or order-4 requests.  The first one doesn't need the flag while the
second does.  Drop __GFP_REPEAT from PGALLOC_GFP and add it for the
order-4 one.

This means that this flag has never been actually useful here because it
has always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-12-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
a4135b9389 score: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pte_alloc_one{_kernel} allocate PTE_ORDER which is 0.  This means that
this flag has never been actually useful here because it has always been
used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-11-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
aade311a50 parisc: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pmd_alloc_one allocate PMD_ORDER which is 1.  This means that this flag
has never been actually useful here because it has always been used only
for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-10-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
565299d033 nios2: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pte_alloc_one{_kernel} allocate PTE_ORDER which is 0.  This means that
this flag has never been actually useful here because it has always been
used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-9-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Ley Foon Tan <lftan@altera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
65f84656ff mips: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pte_alloc_one{_kernel}, pmd_alloc_one allocate PTE_ORDER resp.
PMD_ORDER but both are not larger than 1.  This means that this flag has
never been actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-8-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: John Crispin <blogic@openwrt.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
54d87d600a arc: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pte_alloc_one_kernel uses __get_order_pte but this is obviously always
zero because BITS_FOR_PTE is not larger than 9 yet the page size is
always larger than 4K.  This means that this flag has never been
actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-7-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
f3610a6aff arm64: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

{pte,pmd,pud}_alloc_one{_kernel}, late_pgtable_alloc use PGALLOC_GFP for
__get_free_page (aka order-0).

pgd_alloc is slightly more complex because it allocates from pgd_cache
if PGD_SIZE != PAGE_SIZE and PGD_SIZE depends on the configuration
(CONFIG_ARM64_VA_BITS, PAGE_SHIFT and CONFIG_PGTABLE_LEVELS).

As per
config PGTABLE_LEVELS
	int
	default 2 if ARM64_16K_PAGES && ARM64_VA_BITS_36
	default 2 if ARM64_64K_PAGES && ARM64_VA_BITS_42
	default 3 if ARM64_64K_PAGES && ARM64_VA_BITS_48
	default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39
	default 3 if ARM64_16K_PAGES && ARM64_VA_BITS_47
	default 4 if !ARM64_64K_PAGES && ARM64_VA_BITS_48

we should have the following options

  CONFIG_ARM64_VA_BITS:48 CONFIG_PGTABLE_LEVELS:4 PAGE_SIZE:4k size:4096 pages:1
  CONFIG_ARM64_VA_BITS:48 CONFIG_PGTABLE_LEVELS:4 PAGE_SIZE:16k size:16 pages:1
  CONFIG_ARM64_VA_BITS:48 CONFIG_PGTABLE_LEVELS:3 PAGE_SIZE:64k size:512 pages:1
  CONFIG_ARM64_VA_BITS:47 CONFIG_PGTABLE_LEVELS:3 PAGE_SIZE:16k size:16384 pages:1
  CONFIG_ARM64_VA_BITS:42 CONFIG_PGTABLE_LEVELS:2 PAGE_SIZE:64k size:65536 pages:1
  CONFIG_ARM64_VA_BITS:39 CONFIG_PGTABLE_LEVELS:3 PAGE_SIZE:4k size:4096 pages:1
  CONFIG_ARM64_VA_BITS:36 CONFIG_PGTABLE_LEVELS:2 PAGE_SIZE:16k size:16384 pages:1

All of them fit into a single page (aka order-0).  This means that this
flag has never been actually useful here because it has always been used
only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-6-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
f58f230a83 x86/efi: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

efi_alloc_page_tables uses __GFP_REPEAT but it allocates an order-0
page.  This means that this flag has never been actually useful here
because it has always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-4-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
a3a9a59d20 x86: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this
flag is for more than order-0.  This means that this flag has never been
actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-3-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
32d6bd9059 tree wide: get rid of __GFP_REPEAT for order-0 allocations part I
This is the third version of the patchset previously sent [1].  I have
basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
rid of superfluous gfp flags" which went through dm tree.  I am sending
it now because it is tree wide and chances for conflicts are reduced
considerably when we want to target rc2.  I plan to send the next step
and rename the flag and move to a better semantic later during this
release cycle so we will have a new semantic ready for 4.8 merge window
hopefully.

Motivation:

While working on something unrelated I've checked the current usage of
__GFP_REPEAT in the tree.  It seems that a majority of the usage is and
always has been bogus because __GFP_REPEAT has always been about costly
high order allocations while we are using it for order-0 or very small
orders very often.  It seems that a big pile of them is just a
copy&paste when a code has been adopted from one arch to another.

I think it makes some sense to get rid of them because they are just
making the semantic more unclear.  Please note that GFP_REPEAT is
documented as

* __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt

* _might_ fail.  This depends upon the particular VM implementation.
  while !costly requests have basically nofail semantic.  So one could
  reasonably expect that order-0 request with __GFP_REPEAT will not loop
  for ever.  This is not implemented right now though.

I would like to move on with __GFP_REPEAT and define a better semantic
for it.

  $ git grep __GFP_REPEAT origin/master | wc -l
  111
  $ git grep __GFP_REPEAT | wc -l
  36

So we are down to the third after this patch series.  The remaining
places really seem to be relying on __GFP_REPEAT due to large allocation
requests.  This still needs some double checking which I will do later
after all the simple ones are sorted out.

I am touching a lot of arch specific code here and I hope I got it right
but as a matter of fact I even didn't compile test for some archs as I
do not have cross compiler for them.  Patches should be quite trivial to
review for stupid compile mistakes though.  The tricky parts are usually
hidden by macro definitions and thats where I would appreciate help from
arch maintainers.

[1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org

This patch (of 19):

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.  Yet we
have the full kernel tree with its usage for apparently order-0
allocations.  This is really confusing because __GFP_REPEAT is
explicitly documented to allow allocation failures which is a weaker
semantic than the current order-0 has (basically nofail).

Let's simply drop __GFP_REPEAT from those places.  This would allow to
identify place which really need allocator to retry harder and formulate
a more specific semantic for what the flag is supposed to do actually.

Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Crispin <blogic@openwrt.org>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Anthony Romano
b9b4bb26af tmpfs: don't undo fallocate past its last page
When fallocate is interrupted it will undo a range that extends one byte
past its range of allocated pages.  This can corrupt an in-use page by
zeroing out its first byte.  Instead, undo using the inclusive byte
range.

Fixes: 1635f6a741 ("tmpfs: undo fallocation on failure")
Link: http://lkml.kernel.org/r/1462713387-16724-1-git-send-email-anthony.romano@coreos.com
Signed-off-by: Anthony Romano <anthony.romano@coreos.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Brandon Philips <brandon@ifup.co>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Mike Kravetz
a7b50abc90 selftests/vm/compaction_test: fix write to restore nr_hugepages
The write at the end of the test to restore nr_hugepages to its previous
value is failing.  This is because it is trying to write the number of
bytes in the char array as opposed to the number of bytes in the string.

Link: http://lkml.kernel.org/r/1465331205-3284-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Cc: Eric B Munson <emunson@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Tetsuo Handa
9df10fb7b8 oom_reaper: avoid pointless atomic_inc_not_zero usage.
Since commit 36324a990c ("oom: clear TIF_MEMDIE after oom_reaper
managed to unmap the address space") changed to use find_lock_task_mm()
for finding a mm_struct to reap, it is guaranteed that mm->mm_users > 0
because find_lock_task_mm() returns a task_struct with ->mm != NULL.
Therefore, we can safely use atomic_inc().

Link: http://lkml.kernel.org/r/1465024759-8074-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Tetsuo Handa
491a1c65ae mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero()
Commit e2fe14564d ("oom_reaper: close race with exiting task") reduced
frequency of needlessly selecting next OOM victim, but was calling
mmput_async() when atomic_inc_not_zero() failed.

Link: http://lkml.kernel.org/r/1464423365-5555-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Linus Torvalds
9c46a6df3b Fix missing server-side permission checks on setting NFS ACLs.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXba7TAAoJECebzXlCjuG+j+gP/18y6ot02Y5R2pI/O8nqoY3I
 WeBNOo1yD77wQ1SopiIbPL/ChxOh/OVlUzo9ikNtwm5l6Op8mLMxPYaDjaIpA5Nt
 FC/pAHibdTJA4ZjzenRhnEEFYbOQh0GssF/qMG30ySGPhx0eoonXi5/qYvjFyTBF
 BuDrpC4YHSNvqCZ/r0aD2bw79Skw8cBPdj+SUfK2r37WyuQ4Kade9NCmDYwSNxSx
 6cru5ztRQSE8Ni0le3U2wTlYhq8xrpP0bRdIzc/9EipdKVdsvfukonjnT+dwtDks
 72fwDoALAZq0iiIur7LKaUjkaZcKzHwe6LVsZEoiJ5aeI2a2FodLwoyXl4SntAR7
 027YEqe7Pc+KHGUYACVuNuCcJkEK5B3zRBBSNoskhkPaK/lJ7BMSXNNhIt248YE3
 HAl1vuf4PakCgh7qIsiUHB1EVs6FCcG8aKH1TmumvPD2udwabiYcKqd8soNu5ZWu
 ALi1vtD/8B1LEI8TacP5NIt8Pdr1AQ0kVDFWlZSiK3oE11DrHLiUgfvl2y7cokMa
 xzcNnoyEppaWNFJzYzQes8XO7Ti/DLJoCB8JnxMaWT1BfVhpEAs1LNl4AIHij5fO
 /PKNs4OusntvOmEvgKtxZpvqXaElgvXz7LMgzM2bmMGMVY+mq0+lpDbzAK91ijk0
 di8+ivIMayA60P5xV4dJ
 =TZ/R
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
 "Fix missing server-side permission checks on setting NFS ACLs"

* tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linux:
  nfsd: check permissions when setting ACLs
  posix_acl: Add set_posix_acl
2016-06-24 17:22:27 -07:00
Linus Torvalds
7f1a00b6fc fix up initial thread stack pointer vs thread_info confusion
The INIT_TASK() initializer was similarly confused about the stack vs
thread_info allocation that the allocators had, and that were fixed in
commit b235beea9e ("Clarify naming of thread info/stack allocators").

The task ->stack pointer only incidentally ends up having the same value
as the thread_info, and in fact that will change.

So fix the initial task struct initializer to point to 'init_stack'
instead of 'init_thread_info', and make sure the ia64 definition for
that exists.

This actually makes the ia64 tsk->stack pointer be sensible for the
initial task, but not for any other task.  As mentioned in commit
b235beea9e, that whole pointer isn't actually used on ia64, since
task_stack_page() there just points to the (single) allocation.

All the other architectures seem to have copied the 'init_stack'
definition, even if it tended to be generally unusued.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:07:33 -07:00
Linus Torvalds
aca9c293d0 x86: fix up a few misc stack pointer vs thread_info confusions
As the actual pointer value is the same for the thread stack allocation
and the thread_info, code that confused the two worked fine, but will
break when the thread info is moved away from the stack allocation.  It
also looks very confusing.

For example, the kprobe code wanted to know the current top of stack.
To do that, it used this:

	(unsigned long)current_thread_info() + THREAD_SIZE

which did indeed give the correct value.  But it's not only a fairly
nonsensical expression, it's also rather complex, especially since we
actually have this:

	static inline unsigned long current_top_of_stack(void)

which not only gives us the value we are interested in, but happens to
be how "current_thread_info()" is currently defined as:

	(struct thread_info *)(current_top_of_stack() - THREAD_SIZE);

so using current_thread_info() to figure out the top of the stack really
is a very round-about thing to do.

The other cases are just simpler confusion about task_thread_info() vs
task_stack_page(), which currently return the same pointer - but if you
want the stack page, you really should be using the latter one.

And there was one entirely unused assignment of the current stack to a
thread_info pointer.

All cleaned up to make more sense today, and make it easier to move the
thread_info away from the stack in the future.

No semantic changes.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 16:55:53 -07:00
Linus Torvalds
b235beea9e Clarify naming of thread info/stack allocators
We've had the thread info allocated together with the thread stack for
most architectures for a long time (since the thread_info was split off
from the task struct), but that is about to change.

But the patches that move the thread info to be off-stack (and a part of
the task struct instead) made it clear how confused the allocator and
freeing functions are.

Because the common case was that we share an allocation with the thread
stack and the thread_info, the two pointers were identical.  That
identity then meant that we would have things like

	ti = alloc_thread_info_node(tsk, node);
	...
	tsk->stack = ti;

which certainly _worked_ (since stack and thread_info have the same
value), but is rather confusing: why are we assigning a thread_info to
the stack? And if we move the thread_info away, the "confusing" code
just gets to be entirely bogus.

So remove all this confusion, and make it clear that we are doing the
stack allocation by renaming and clarifying the function names to be
about the stack.  The fact that the thread_info then shares the
allocation is an implementation detail, and not really about the
allocation itself.

This is a pure renaming and type fix: we pass in the same pointer, it's
just that we clarify what the pointer means.

The ia64 code that actually only has one single allocation (for all of
task_struct, thread_info and kernel thread stack) now looks a bit odd,
but since "tsk->stack" is actually not even used there, that oddity
doesn't matter.  It would be a separate thing to clean that up, I
intentionally left the ia64 changes as a pure brute-force renaming and
type change.

Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 15:09:37 -07:00
Rafael J. Wysocki
e753f30509 Merge branches 'pm-devfreq-fixes' and 'pm-cpufreq-fixes'
* pm-devfreq-fixes:
  PM / devfreq: Send the DEVFREQ_POSTCHANGE notification when target() is failed
  PM / devfreq: fix initialization of current frequency in last status
  PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() check
  PM / devfreq: remove double put_device
  PM / devfreq: fix double call put_device
  PM / devfreq: fix duplicated kfree on devfreq pointer
  PM / devfreq: devm_kzalloc to have dev pointer more precisely

* pm-cpufreq-fixes:
  cpufreq: pcc-cpufreq: Fix doorbell.access_width
2016-06-24 23:37:23 +02:00
Rafael J. Wysocki
2605b98109 Merge branch 'acpica-fixes'
* acpica-fixes:
  ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading
2016-06-24 23:36:20 +02:00
Ben Hutchings
999653786d nfsd: check permissions when setting ACLs
Use set_posix_acl, which includes proper permission checks, instead of
calling ->set_acl directly.  Without this anyone may be able to grant
themselves permissions to a file by setting the ACL.

Lock the inode to make the new checks atomic with respect to set_acl.
(Also, nfsd was the only caller of set_acl not locking the inode, so I
suspect this may fix other races.)

This also simplifies the code, and ensures our ACLs are checked by
posix_acl_valid.

The permission checks and the inode locking were lost with commit
4ac7249e, which changed nfsd to use the set_acl inode operation directly
instead of going through xattr handlers.

Reported-by: David Sinquin <david@sinquin.eu>
[agreunba@redhat.com: use set_posix_acl]
Fixes: 4ac7249e
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-24 12:11:52 -04:00
Andreas Gruenbacher
485e71e8fb posix_acl: Add set_posix_acl
Factor out part of posix_acl_xattr_set into a common function that takes
a posix_acl, which nfsd can also call.

The prototype already exists in include/linux/posix_acl.h.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-24 12:11:34 -04:00
Takashi Iwai
d5dbbe6569 ALSA: dummy: Fix a use-after-free at closing
syzkaller fuzzer spotted a potential use-after-free case in snd-dummy
driver when hrtimer is used as backend:
> ==================================================================
> BUG: KASAN: use-after-free in rb_erase+0x1b17/0x2010 at addr ffff88005e5b6f68
>  Read of size 8 by task syz-executor/8984
> =============================================================================
> BUG kmalloc-192 (Not tainted): kasan: bad access detected
> -----------------------------------------------------------------------------
>
> Disabling lock debugging due to kernel taint
> INFO: Allocated in 0xbbbbbbbbbbbbbbbb age=18446705582212484632
> ....
> [<      none      >] dummy_hrtimer_create+0x49/0x1a0 sound/drivers/dummy.c:464
> ....
> INFO: Freed in 0xfffd8e09 age=18446705496313138713 cpu=2164287125 pid=-1
> [<      none      >] dummy_hrtimer_free+0x68/0x80 sound/drivers/dummy.c:481
> ....
> Call Trace:
>  [<ffffffff8179e59e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:333
>  [<     inline     >] rb_set_parent include/linux/rbtree_augmented.h:111
>  [<     inline     >] __rb_erase_augmented include/linux/rbtree_augmented.h:218
>  [<ffffffff82ca5787>] rb_erase+0x1b17/0x2010 lib/rbtree.c:427
>  [<ffffffff82cb02e8>] timerqueue_del+0x78/0x170 lib/timerqueue.c:86
>  [<ffffffff814d0c80>] __remove_hrtimer+0x90/0x220 kernel/time/hrtimer.c:903
>  [<     inline     >] remove_hrtimer kernel/time/hrtimer.c:945
>  [<ffffffff814d23da>] hrtimer_try_to_cancel+0x22a/0x570 kernel/time/hrtimer.c:1046
>  [<ffffffff814d2742>] hrtimer_cancel+0x22/0x40 kernel/time/hrtimer.c:1066
>  [<ffffffff85420531>] dummy_hrtimer_stop+0x91/0xb0 sound/drivers/dummy.c:417
>  [<ffffffff854228bf>] dummy_pcm_trigger+0x17f/0x1e0 sound/drivers/dummy.c:507
>  [<ffffffff85392170>] snd_pcm_do_stop+0x160/0x1b0 sound/core/pcm_native.c:1106
>  [<ffffffff85391b26>] snd_pcm_action_single+0x76/0x120 sound/core/pcm_native.c:956
>  [<ffffffff85391e01>] snd_pcm_action+0x231/0x290 sound/core/pcm_native.c:974
>  [<     inline     >] snd_pcm_stop sound/core/pcm_native.c:1139
>  [<ffffffff8539754d>] snd_pcm_drop+0x12d/0x1d0 sound/core/pcm_native.c:1784
>  [<ffffffff8539d3be>] snd_pcm_common_ioctl1+0xfae/0x2150 sound/core/pcm_native.c:2805
>  [<ffffffff8539ee91>] snd_pcm_capture_ioctl1+0x2a1/0x5e0 sound/core/pcm_native.c:2976
>  [<ffffffff8539f2ec>] snd_pcm_kernel_ioctl+0x11c/0x160 sound/core/pcm_native.c:3020
>  [<ffffffff853d9a44>] snd_pcm_oss_sync+0x3a4/0xa30 sound/core/oss/pcm_oss.c:1693
>  [<ffffffff853da27d>] snd_pcm_oss_release+0x1ad/0x280 sound/core/oss/pcm_oss.c:2483
>  .....

A workaround is to call hrtimer_cancel() in dummy_hrtimer_sync() which
is called certainly before other blocking ops.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-06-24 15:18:32 +02:00
Jaroslav Kysela
0f087ee3f3 ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
See: https://bugzilla.redhat.com/show_bug.cgi?id=1349539
  See: https://bugzilla.kernel.org/show_bug.cgi?id=120961

Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-06-24 15:16:50 +02:00
Jan Beulich
d2bd05d88d xen-pciback: return proper values during BAR sizing
Reads following writes with all address bits set to 1 should return all
changeable address bits as one, not the BAR size (nor, as was the case
for the upper half of 64-bit BARs, the high half of the region's end
address). Presumably this didn't cause any problems so far because
consumers use the value to calculate the size (usually via val & -val),
and do nothing else with it.

But also consider the exception here: Unimplemented BARs should always
return all zeroes.

And finally, the check for whether to return the sizing address on read
for the ROM BAR should ignore all non-address bits, not just the ROM
Enable one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-06-24 10:53:03 +01:00
Woodrow Shen
f83c32925d ALSA: hda - Fix the headset mic jack detection on Dell machine
The new Dell laptop with codec 3246 can't detect headset mic when
headset was inserted on the machine. So adding pin configurations
into quirk table makes headset mic work correctly.

Codec: Realtek ALC3246
Vendor Id: 0x10ec0256
Subsystem Id: 0x10280781

Signed-off-by: Woodrow Shen <woodrow.shen@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-06-24 10:29:55 +02:00
Scott Bauer
93a2001bdf HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands
This patch validates the num_values parameter from userland during the
HIDIOCGUSAGES and HIDIOCSUSAGES commands. Previously, if the report id was set
to HID_REPORT_ID_UNKNOWN, we would fail to validate the num_values parameter
leading to a heap overflow.

Cc: stable@vger.kernel.org
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-06-24 10:21:39 +02:00
Tejun Heo
feb245e304 sched/core: Allow kthreads to fall back to online && !active cpus
During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
online but not active.  A CPU_ONLINE callback may create or bind a
kthread so that its cpus_allowed mask only allows the CPU which is
being brought online.  The kthread may start executing before the CPU
is made active and can end up in select_fallback_rq().

In such cases, the expected behavior is selecting the CPU which is
coming online; however, because select_fallback_rq() only chooses from
active CPUs, it determines that the task doesn't have any viable CPU
in its allowed mask and ends up overriding it to cpu_possible_mask.

CPU_ONLINE callbacks should be able to put kthreads on the CPU which
is coming online.  Update select_fallback_rq() so that it follows
cpu_online() rather than cpu_active() for kthreads.

Reported-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Tested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160616193504.GB3262@mtj.duckdns.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-24 08:26:53 +02:00
Konstantin Khlebnikov
754bd598be sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
Hierarchy could be already throttled at this point. Throttled next
buddy could trigger a NULL pointer dereference in pick_next_task_fair().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-24 08:26:45 +02:00
Konstantin Khlebnikov
094f469172 sched/fair: Initialize throttle_count for new task-groups lazily
Cgroup created inside throttled group must inherit current throttle_count.
Broken throttle_count allows to nominate throttled entries as a next buddy,
later this leads to null pointer dereference in pick_next_task_fair().

This patch initialize cfs_rq->throttle_count at first enqueue: laziness
allows to skip locking all rq at group creation. Lazy approach also allows
to skip full sub-tree scan at throttling hierarchy (not in this patch).

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Link: http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-24 08:26:44 +02:00
Paolo Bonzini
4c5ea0a9cd locking/static_key: Fix concurrent static_key_slow_inc()
The following scenario is possible:

    CPU 1                                   CPU 2
    static_key_slow_inc()
     atomic_inc_not_zero()
      -> key.enabled == 0, no increment
     jump_label_lock()
     atomic_inc_return()
      -> key.enabled == 1 now
                                            static_key_slow_inc()
                                             atomic_inc_not_zero()
                                              -> key.enabled == 1, inc to 2
                                             return
                                            ** static key is wrong!
     jump_label_update()
     jump_label_unlock()

Testing the static key at the point marked by (**) will follow the
wrong path for jumps that have not been patched yet.  This can
actually happen when creating many KVM virtual machines with userspace
LAPIC emulation; just run several copies of the following program:

    #include <fcntl.h>
    #include <unistd.h>
    #include <sys/ioctl.h>
    #include <linux/kvm.h>

    int main(void)
    {
        for (;;) {
            int kvmfd = open("/dev/kvm", O_RDONLY);
            int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
            close(ioctl(vmfd, KVM_CREATE_VCPU, 1));
            close(vmfd);
            close(kvmfd);
        }
        return 0;
    }

Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call.
The static key's purpose is to skip NULL pointer checks and indeed one
of the processes eventually dereferences NULL.

As explained in the commit that introduced the bug:

  706249c222 ("locking/static_keys: Rework update logic")

jump_label_update() needs key.enabled to be true.  The solution adopted
here is to temporarily make key.enabled == -1, and use go down the
slow path when key.enabled <= 0.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v4.3+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 706249c222 ("locking/static_keys: Rework update logic")
Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com
[ Small stylistic edits to the changelog and the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-24 08:23:16 +02:00
Linus Torvalds
63c04ee7d3 This pull requests contains fixes for two critical bugs in UBI and UBIFS:
1. Fixes the possibility of losing data upon a power cut when UBI tries
    to recover from a write error.
 2. Fixes page migration on UBIFS. It turned out that the default page
    migration function is not suitable for UBIFS.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXa5tEAAoJEEtJtSqsAOnW090P/RcQjIfVf2g3r8VRp38OQPbb
 MTd4sD/rnyt5Eq0QYUPWG5xcYK2BWI1PwpdB81JvW5hxnXPgG8DpVIxjzt/7Xgnp
 QheYe9tMfgYjDntz1rzGVa/uHSAldP9V4czgczrBW/0lwnRsZ6mLY1RA9Oz0hRdG
 cp53I8CSD0DPyqU0XkgzLkzVUstmySwQ5i46C0kQEnlRcytReOLgcjSrXXn+/Zih
 yZxhtDQSCKmQAfVmERggPXVHo8jFtVfej52ja7RFcMA2uXvXqljOBNCyLUYPdYka
 XdQEKsXRLl69ktFUXwZwPAYAW23I8+PMpsoljHDVc0hF25p8omp3D+7HE18SsMSv
 6RNnUwz+PDbiFApyoTu0SBgHN/OO9o6rjNNoRIInoKpk0NvWmrMQOo6BIFsX4yq1
 0dOVJiKXVoFuo75Yw9mOKdrV/Z5P1TvgdTBj6g03aUM9vcX1Gz6+1xKkvcXGgh02
 8qFDZdZ5L87TlpMkvtWO87Ir0ssrfjxpvxR8pPsxxqvxbfUuVmss4ILuh9AVSVk+
 d1zrz30+JZzTbIrky/7R31i6Bx2+reYdTKiPIkST9sF5WblUPSeyUoKq1OlNRYxj
 n+0Q8S5Tm/6AHXUOQFxurbXU+D7G7TaL/CsBeepvV/AqJb07+vBxUuGFH1rDbmLB
 r5dTfOXn3iNEmmNyrhgN
 =EDeX
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.7-rc5' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS fixes from Richard Weinberger:
 "This contains fixes for two critical bugs in UBI and UBIFS:

   - fix the possibility of losing data upon a power cut when UBI tries
     to recover from a write error

   - fix page migration on UBIFS.  It turned out that the default page
     migration function is not suitable for UBIFS"

* tag 'upstream-4.7-rc5' of git://git.infradead.org/linux-ubifs:
  UBIFS: Implement ->migratepage()
  mm: Export migrate_page_move_mapping and migrate_page_copy
  ubi: Make recover_peb power cut aware
  gpio: make library immune to error pointers
  gpio: make sure gpiod_to_irq() returns negative on NULL desc
  gpio: 104-idi-48: Fix missing spin_lock_init for ack_lock
2016-06-23 22:48:48 -07:00
Linus Torvalds
0bf0ea431f Merge tag 'drm-fixes-for-v4.7-rc5' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "This is the drm fixes tree for 4.7-rc5.

  It's a bit larger than normal, due to fixes for production AMD Polaris
  GPUs.  We only merged support for these in 4.7-rc1 so it would be good
  if we got all the fixes into final.  The changes don't hit any other
  hardware.

  Other than the amdgpu Polaris changes:

   - A single fix for atomic modesetting WARN
   - Nouveau fix for when fbdev is disabled
   - i915 fixes for FBC on Haswell and displayport regression
   - Exynos fix for a display panel regression and some other minor changes
   - Atmel fixes for scaling and OF graph interaction
   - Allwiinner build, warning and probing fixes
   - AMD GPU non-polaris fix for num_rbs and some minor fixes

  Also I've just moved house, and my new place is Internet challenged
  due to incompetent incumbent ISPs, hopefully sorted out in a couple of
  weeks, so I might not be too responsive over the next while.  It also
  helps Daniel is on holidays for those couple of weeks as well"

* tag 'drm-fixes-for-v4.7-rc5' of git://people.freedesktop.org/~airlied/linux: (38 commits)
  drm/atomic: Make drm_atomic_legacy_backoff reset crtc->acquire_ctx
  drm/nouveau: fix for disabled fbdev emulation
  drm/i915/fbc: Disable on HSW by default for now
  drm/i915: Revert DisplayPort fast link training feature
  drm/amd/powerplay: enable clock stretch feature for polaris
  drm/amdgpu/gfx8: update golden setting for polaris10
  drm/amd/powerplay: enable avfs feature for polaris
  drm/amdgpu/atombios: add avfs struct for Polaris10/11
  drm/amd/powerplay: add avfs related define for polaris
  drm/amd/powrplay: enable stutter_mode for polaris.
  drm/amd/powerplay: disable UVD SMU handshake for MCLK.
  drm/amd/powerplay: initialize variables which were missed.
  drm/amd/powerplay: enable PowerContainment feature for polaris10/11.
  drm/amd/powerplay: need to notify system bios pcie device ready
  drm/amd/powerplay: fix bug that function parameter was incorect.
  drm/amd/powerplay: fix logic error.
  drm: atmel-hlcdc: Fix OF graph parsing
  drm: atmel-hlcdc: actually disable scaling when no scaling is required
  drm/amdgpu: initialize amdgpu_cgs_acpi_eval_object result value
  drm/amdgpu: precedence bug in amdgpu_device_init()
  ...
2016-06-23 21:35:12 -07:00
Linus Torvalds
75befb31ec PCI updates for v4.7:
Miscellaneous
     Fix unaligned accesses in VC code (David Miller)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXbJqPAAoJEFmIoMA60/r88SwQALfU1xaaVmPsxHBIwmSFoQ/U
 lEYhk9OU17re949Y1XWQI8jfqv7YMQZd3XLZI4IkQE79s6zXwmpz7uti87kADvHC
 hB1J1BXiWtmLfLBT/8xhmPMqLS2gB6kdALU/kz/wBKEUN+9/hiJSm5nTZUKKkj6X
 bKVY4+DVRdWUyNs+aCF+Fp+ri1ZIcjkFz7+EIh89mYqAztlfFIsY5PD6imV/8kO+
 x365GS+lJPCiIjke1Fe7Vf3DbX6ZomlGzE+GyKRwWnK+tRFp8vcgtMPiOzPbX26D
 bGFrTUsdS6PIuq2x3l4UntuK7vVREr+jd9F1ZtQwEehuPn8BbATRNMBV4+YVj2SO
 NT7+UMwg/Mlz2ncV2sUCCqIkFMDqOueKJ94+1WNaYdI/5jW6Bl8Y8a30sKYyfrWS
 yXH8+RJK+QtRJgfGL4N1TxDLQuWqbbk2j8KstUlOap78QlmBJQOnzlSuJzUPxAo+
 +CMnHmD0wsVP7dJlLrcvHiE8UJY7kQdtSS1b2VymA0eFXZqcGjL4/83BVI9KiDpu
 ZJitrP88/DhGHpmI1KO6LjV6C/jUzarg93+DlP08JXyqLpTDHUNVI131j9US9Zzp
 9ba7jei+/ZcwcDOC8PmYwV9ZinC01L+Hzq4McJiD8KVoQW5CiMbcEwaRXdqrrEOd
 +pbY/VCXmqokaNw1vsB6
 =Ymn1
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "Here's a small fix for v4.7.  This problem was actually introduced in
  v4.6 when we unified Kconfig, making PCIe support available everywhere
  including sparc, where config reads into unaligned buffers cause
  warnings.  This fix is from Dave Miller.

  As a reminder, any future PCI fixes for v4.7 will probably come from
  Alex Williamson, since I'll be on vacation for most of the rest of
  this cycle.  I should be back about the time the merge window opens"

* tag 'pci-v4.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Fix unaligned accesses in VC code
2016-06-23 20:59:14 -07:00
Maarten Lankhorst
81e257e964 drm/atomic: Make drm_atomic_legacy_backoff reset crtc->acquire_ctx
Atomic updates may acquire more state than initially locked through
drm_modeset_lock_crtc, running with heavy stress can cause a
WARN_ON(crtc->acquire_ctx) in drm_modeset_lock_crtc:

[  601.491296] ------------[ cut here ]------------
[  601.491366] WARNING: CPU: 0 PID: 2411 at
drivers/gpu/drm/drm_modeset_lock.c:191 drm_modeset_lock_crtc+0xeb/0xf0 [drm]
[  601.491369] Modules linked in: drm i915 drm_kms_helper
[  601.491414] CPU: 0 PID: 2411 Comm: kms_cursor_lega Tainted: G     U 4.7.0-rc4-patser+ #4798
[  601.491417] Hardware name: Intel Corporation Skylake Client
[  601.491420]  0000000000000000 ffff88044d153c98 ffffffff812ead28 0000000000000000
[  601.491425]  0000000000000000 ffff88044d153cd8 ffffffff810868e6 000000bf58058030
[  601.491431]  ffff880088b415e8 ffff880458058030 ffff88008a271548 ffff88008a271568
[  601.491436] Call Trace:
[  601.491443]  [<ffffffff812ead28>] dump_stack+0x4d/0x65
[  601.491447]  [<ffffffff810868e6>] __warn+0xc6/0xe0
[  601.491452]  [<ffffffff81086968>] warn_slowpath_null+0x18/0x20
[  601.491472]  [<ffffffffc00d4ffb>] drm_modeset_lock_crtc+0xeb/0xf0 [drm]
[  601.491491]  [<ffffffffc00c5526>] drm_mode_cursor_common+0x66/0x180 [drm]
[  601.491509]  [<ffffffffc00c91cc>] drm_mode_cursor_ioctl+0x3c/0x40 [drm]
[  601.491524]  [<ffffffffc00bc94d>] drm_ioctl+0x14d/0x530 [drm]
[  601.491540]  [<ffffffffc00c9190>] ? drm_mode_setcrtc+0x520/0x520 [drm]
[  601.491545]  [<ffffffff81176aeb>] ? handle_mm_fault+0x106b/0x1430
[  601.491550]  [<ffffffff81108441>] ? stop_one_cpu+0x61/0x70
[  601.491556]  [<ffffffff811bb71d>] do_vfs_ioctl+0x8d/0x570
[  601.491560]  [<ffffffff81290d7e>] ? security_file_ioctl+0x3e/0x60
[  601.491565]  [<ffffffff811bbc74>] SyS_ioctl+0x74/0x80
[  601.491571]  [<ffffffff810e321c>] ? posix_get_monotonic_raw+0xc/0x10
[  601.491576]  [<ffffffff8175b11b>] entry_SYSCALL_64_fastpath+0x13/0x8f
[  601.491581] ---[ end trace 56f3d3d85f000d00 ]---

For good measure, test mode_config.acquire_ctx too, although this should
never happen.

Testcase: kms_cursor_legacy
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-24 11:10:36 +10:00
Dave Airlie
f939a5f432 Merge branch 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A bit bigger than I would normally like, but most of the large changes are
for polaris support and since polaris went upstream in 4.7, I'd like
to get the fixes in so it's in good shape when the hw becomes available.
The major changes only touch the polaris code so there is little chance
for regressions on other asics.  The rest are just the usual collection
of bug fixes.

* 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/powerplay: enable clock stretch feature for polaris
  drm/amdgpu/gfx8: update golden setting for polaris10
  drm/amd/powerplay: enable avfs feature for polaris
  drm/amdgpu/atombios: add avfs struct for Polaris10/11
  drm/amd/powerplay: add avfs related define for polaris
  drm/amd/powrplay: enable stutter_mode for polaris.
  drm/amd/powerplay: disable UVD SMU handshake for MCLK.
  drm/amd/powerplay: initialize variables which were missed.
  drm/amd/powerplay: enable PowerContainment feature for polaris10/11.
  drm/amd/powerplay: need to notify system bios pcie device ready
  drm/amd/powerplay: fix bug that function parameter was incorect.
  drm/amd/powerplay: fix logic error.
  drm/amdgpu: initialize amdgpu_cgs_acpi_eval_object result value
  drm/amdgpu: precedence bug in amdgpu_device_init()
  drm/amdgpu: fix num_rbs exposed to userspace (v2)
  drm/amdgpu: missing bounds check in amdgpu_set_pp_force_state()
2016-06-24 10:51:12 +10:00