The syscall-level code is passed a protection key and need to
return an appropriate error code if the protection key is bogus.
We will be using this in subsequent patches.
Note that this also begins a series of arch-specific calls that
we need to expose in otherwise arch-independent code. We create
a linux/pkeys.h header where we will put *all* the stubs for
these functions.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210232.774EEAAB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This plumbs a protection key through calc_vm_flag_bits(). We
could have done this in calc_vm_prot_bits(), but I did not feel
super strongly which way to go. It was pretty arbitrary which
one to use.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Leon Romanovsky <leon@leon.nu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Riley Andrews <riandrews@android.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: devel@driverdev.osuosl.org
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160212210231.E6F1F0D6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This sets the bit in 'cr4' to actually enable the protection
keys feature. We also include a boot-time disable for the
feature "nopku".
Seting X86_CR4_PKE will cause the X86_FEATURE_OSPKE cpuid
bit to appear set. At this point in boot, identify_cpu()
has already run the actual CPUID instructions and populated
the "cpu features" structures. We need to go back and
re-run identify_cpu() to make sure it gets updated values.
We *could* simply re-populate the 11th word of the cpuid
data, but this is probably quick enough.
Also note that with the cpu_has() check and X86_FEATURE_PKU
present in disabled-features.h, we do not need an #ifdef
for setup_pku().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210229.6708027C@viggo.jf.intel.com
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I don't have a strong opinion on whether we need this or not.
Protection Keys has relatively little code associated with it,
and it is not a heavyweight feature to keep enabled. However,
I can imagine that folks would still appreciate being able to
disable it.
Here's the option if folks want it.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210228.7E79386C@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The protection key can now be just as important as read/write
permissions on a VMA. We need some debug mechanism to help
figure out if it is in play. smaps seems like a logical
place to expose it.
arch/x86/kernel/setup.c is a bit of a weirdo place to put
this code, but it already had seq_file.h and there was not
a much better existing place to put it.
We also use no #ifdef. If protection keys is .config'd out we
will effectively get the same function as if we used the weak
generic function.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mark Williamson <mwilliamson@undo-software.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210227.4F8EB3F8@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Protection Keys never affect kernel mappings. But, they can
affect whether the kernel will fault when it touches a user
mapping. The kernel doesn't touch user mappings without some
careful choreography and these accesses don't generally result in
oopses. But, if one does, we definitely want to have PKRU
available so we can figure out if protection keys played a role.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210225.BF0D4482@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As discussed earlier, we attempt to enforce protection keys in
software.
However, the code checks all faults to ensure that they are not
violating protection key permissions. It was assumed that all
faults are either write faults where we check PKRU[key].WD (write
disable) or read faults where we check the AD (access disable)
bit.
But, there is a third category of faults for protection keys:
instruction faults. Instruction faults never run afoul of
protection keys because they do not affect instruction fetches.
So, plumb the PF_INSTR bit down in to the
arch_vma_access_permitted() function where we do the protection
key checks.
We also add a new FAULT_FLAG_INSTRUCTION. This is because
handle_mm_fault() is not passed the architecture-specific
error_code where we keep PF_INSTR, so we need to encode the
instruction fetch information in to the arch-generic fault
flags.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We might not strictly have to make modifictions to
access_error() to check the VMA here.
If we do not, we will do this:
1. app sets VMA pkey to K
2. app touches a !present page
3. do_page_fault(), allocates and maps page, sets pte.pkey=K
4. return to userspace
5. touch instruction reexecutes, but triggers PF_PK
6. do PKEY signal
What happens with this patch applied:
1. app sets VMA pkey to K
2. app touches a !present page
3. do_page_fault() notices that K is inaccessible
4. do PKEY signal
We basically skip the fault that does an allocation.
So what this lets us do is protect areas from even being
*populated* unless it is accessible according to protection
keys. That seems handy to me and makes protection keys work
more like an mprotect()'d mapping.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210222.EBB63D8C@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We try to enforce protection keys in software the same way that we
do in hardware. (See long example below).
But, we only want to do this when accessing our *own* process's
memory. If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
tried to PTRACE_POKE a target process which just happened to have
some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
debugger access to that memory. PKRU is fundamentally a
thread-local structure and we do not want to enforce it on access
to _another_ thread's data.
This gets especially tricky when we have workqueues or other
delayed-work mechanisms that might run in a random process's context.
We can check that we only enforce pkeys when operating on our *own* mm,
but delayed work gets performed when a random user context is active.
We might end up with a situation where a delayed-work gup fails when
running randomly under its "own" task but succeeds when running under
another process. We want to avoid that.
To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
fault flag: FAULT_FLAG_REMOTE. They indicate that we are
walking an mm which is not guranteed to be the same as
current->mm and should not be subject to protection key
enforcement.
Thanks to Jerome Glisse for pointing out this scenario.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Today, for normal faults and page table walks, we check the VMA
and/or PTE to ensure that it is compatible with the action. For
instance, if we get a write fault on a non-writeable VMA, we
SIGSEGV.
We try to do the same thing for protection keys. Basically, we
try to make sure that if a user does this:
mprotect(ptr, size, PROT_NONE);
*ptr = foo;
they see the same effects with protection keys when they do this:
mprotect(ptr, size, PROT_READ|PROT_WRITE);
set_pkey(ptr, size, 4);
wrpkru(0xffffff3f); // access disable pkey 4
*ptr = foo;
The state to do that checking is in the VMA, but we also
sometimes have to do it on the page tables only, like when doing
a get_user_pages_fast() where we have no VMA.
We add two functions and expose them to generic code:
arch_pte_access_permitted(pte_flags, write)
arch_vma_access_permitted(vma, write)
These are, of course, backed up in x86 arch code with checks
against the PTE or VMA's protection key.
But, there are also cases where we do not want to respect
protection keys. When we ptrace(), for instance, we do not want
to apply the tracer's PKRU permissions to the PTEs from the
process being traced.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current get_user_pages() code is a wee bit more complicated
than it needs to be for pte bit checking. Currently, it establishes
a mask of required pte _PAGE_* bits and ensures that the pte it
goes after has all those bits.
This consolidates the three identical copies of this code.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210218.3A2D4045@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This code matches a fault condition up with the VMA and ensures
that the VMA allows the fault to be handled instead of just
erroring out.
We will be extending this in a moment to comprehend protection
keys.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210216.C3824032@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This adds the raw instruction to access PKRU as well as some
accessor functions that correctly handle when the CPU does not
support the instruction. We don't use it here, but we will use
read_pkru() in the next patch.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210215.15238D34@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This fills in the new siginfo field: si_pkey to indicate to
userspace which protection key was set on the PTE that we faulted
on.
Note though that *ALL* protection key faults have to be generated
by a valid, present PTE at some point. But this code does no PTE
lookups which seeds odd. The reason is that we take advantage of
the way we generate PTEs from VMAs. All PTEs under a VMA share
some attributes. For instance, they are _all_ either PROT_READ
*OR* PROT_NONE. They also always share a protection key, so we
never have to walk the page tables; we just use the VMA.
Note that _pkey is a 64-bit value. The current hardware only
supports 4-bit protection keys. We do this because there is
_plenty_ of space in _sigfault and it is possible that future
processors would support more than 4 bits of protection keys.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210213.ABC488FA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A protection key fault is very similar to any other access error.
There must be a VMA, etc... We even want to take the same action
(SIGSEGV) that we do with a normal access fault.
However, we do need to let userspace know that something is
different. We do this the same way what we did with SEGV_BNDERR
with Memory Protection eXtensions (MPX): define a new SEGV code:
SEGV_PKUERR.
We add a siginfo field: si_pkey that reveals to userspace which
protection key was set on the PTE that we faulted on. There is
no other easy way for userspace to figure this out. They could
parse smaps but that would be a bit cruel.
We share space with in siginfo with _addr_bnd. #BR faults from
MPX are completely separate from page faults (#PF) that trigger
from protection key violations, so we never need both at the same
time.
Note that _pkey is a 64-bit value. The current hardware only
supports 4-bit protection keys. We do this because there is
_plenty_ of space in _sigfault and it is possible that future
processors would support more than 4 bits of protection keys.
The x86 code to actually fill in the siginfo is in the next
patch.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amanieu d'Antras <amanieu@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210212.3A9B83AC@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ia64 and mips have separate definitions for siginfo from the
generic one. Patch them to have the pkey fields.
Note that this is exactly what we did for MPX as well.
[ This fixes a compile error that Ingo was hitting with MIPS when the
x86 pkeys patch set is applied. ]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Malat <oss@malat.biz>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160217181703.E99B6656@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
During a page fault, we look up the VMA to ensure that the fault
is in a region with a valid mapping. But, in the top-level page
fault code we don't need the VMA for much else. Once we have
decided that an access is bad, we are going to send a signal no
matter what and do not need the VMA any more. So we do not pass
it down in to the signal generation code.
But, for protection keys, we need the VMA. It tells us *which*
protection key we violated if we get a PF_PK. So, we need to
pass the VMA down and fill in siginfo->si_pkey.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210211.AD3B36A3@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lots of things seem to do:
vma->vm_page_prot = vm_get_page_prot(flags);
and the ptes get created right from things we pull out
of ->vm_page_prot. So it is very convenient if we can
store the protection key in flags and vm_page_prot, just
like the existing permission bits (_PAGE_RW/PRESENT). It
greatly reduces the amount of plumbing and arch-specific
hacking we have to do in generic code.
This also takes the new PROT_PKEY{0,1,2,3} flags and
turns *those* in to VM_ flags for vma->vm_flags.
The protection key values are stored in 4 places:
1. "prot" argument to system calls
2. vma->vm_flags, filled from the mmap "prot"
3. vma->vm_page prot, filled from vma->vm_flags
4. the PTE itself.
The pseudocode for these for steps are as follows:
mmap(PROT_PKEY*)
vma->vm_flags = ... | arch_calc_vm_prot_bits(mmap_prot);
vma->vm_page_prot = ... | arch_vm_get_page_prot(vma->vm_flags);
pte = pfn | vma->vm_page_prot
Note that this provides a new definitions for x86:
arch_vm_get_page_prot()
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210210.FE483A42@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
vma->vm_flags is an 'unsigned long', so has space for 32 flags
on 32-bit architectures. The high 32 bits are unused on 64-bit
platforms. We've steered away from using the unused high VMA
bits for things because we would have difficulty supporting it
on 32-bit.
Protection Keys are not available in 32-bit mode, so there is
no concern about supporting this feature in 32-bit mode or on
32-bit CPUs.
This patch carves out 4 bits from the high half of
vma->vm_flags and allows architectures to set config option
to make them available.
Sparse complains about these constants unless we explicitly
call them "UL".
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Valentin Rothberg <valentinrothberg@gmail.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210208.81AF00D5@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Note: "PK" is how the Intel SDM refers to this bit, so we also
use that nomenclature.
This only defines the bit, it does not plumb it anywhere to be
handled.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210207.DA7B43E6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Previous documentation has referred to these 4 bits as "ignored".
That means that software could have made use of them. But, as
far as I know, the kernel never used them.
They are still ignored when protection keys is not enabled, so
they could theoretically still get used for software purposes.
We also implement "empty" versions so that code that references
to them can be optimized away by the compiler when the config
option is not enabled.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210205.81E33ED6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The protection keys register (PKRU) is saved and restored using
xsave. Define the data structure that we will use to access it
inside the xsave buffer.
Note that we also have to widen the printk of the xsave feature
masks since this is feature 0x200 and we only did two characters
before.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210204.56DF8F7B@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is a new bit in CR4 for enabling protection keys. We
will actually enable it later in the series.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210202.3CFC3DB2@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are two CPUID bits for protection keys. One is for whether
the CPU contains the feature, and the other will appear set once
the OS enables protection keys. Specifically:
Bit 04: OSPKE. If 1, OS has set CR4.PKE to enable
Protection keys (and the RDPKRU/WRPKRU instructions)
This is because userspace can not see CR4 contents, but it can
see CPUID contents.
X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation:
CPUID.(EAX=07H,ECX=0H):ECX.PKU [bit 3]
X86_FEATURE_OSPKE is "OSPKU":
CPUID.(EAX=07H,ECX=0H):ECX.OSPKE [bit 4]
These are the first CPU features which need to look at the
ECX word in CPUID leaf 0x7, so this patch also includes
fetching that word in to the cpuinfo->x86_capability[] array.
Add it to the disabled-features mask when its config option is
off. Even though we are not using it here, we also extend the
REQUIRED_MASK_BIT_SET() macro to keep it mirroring the
DISABLED_MASK_BIT_SET() version.
This means that in almost all code, you should use:
cpu_has(c, X86_FEATURE_PKU)
and *not* the CONFIG option.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210201.7714C250@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I don't have a strong opinion on whether we need a Kconfig prompt
or not. Protection Keys has relatively little code associated
with it, and it is not a heavyweight feature to keep enabled.
However, I can imagine that folks would still appreciate being
able to disable it.
Note that, with disabled-features.h, the checks in the code
for protection keys are always the same:
cpu_has(c, X86_FEATURE_PKU)
With the config option disabled, this essentially turns into an
We will hide the prompt for now.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210200.DB7055E8@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is an XSAVE state component for Intel Processor Trace (PT).
But, we do not currently use it.
We add a placeholder in the code for it so it is not a mystery and
also so we do not need an explicit enum initialization for Protection
Keys in a moment.
Why don't we use it?
We might end up using this at _some_ point in the future. But,
this is a "system" state which requires using the currently
unsupported XSAVES feature. Unlike all the other XSAVE states,
PT state is also not directly tied to a thread. You might
context-switch between threads, but not want to change any of the
PT state. Or, you might switch between threads, and *do* want to
change PT state, all depending on what is being traced.
We currently just manually set some MSRs to do this PT context
switching, and it is unclear whether replacing our direct MSR use
with XSAVE will be a net win or loss, both in code complexity and
performance.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: fenghua.yu@intel.com
Cc: linux-mm@kvack.org
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20160212210158.5E4BCAE2@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We will soon modify the vanilla get_user_pages() so it can no
longer be used on mm/tasks other than 'current/current->mm',
which is by far the most common way it is called. For now,
we allow the old-style calls, but warn when they are used.
(implemented in previous patch)
This patch switches all callers of:
get_user_pages()
get_user_pages_unlocked()
get_user_pages_locked()
to stop passing tsk/mm so they will no longer see the warnings.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The concept here was a suggestion from Ingo. The implementation
horrors are all mine.
This allows get_user_pages(), get_user_pages_unlocked(), and
get_user_pages_locked() to be called with or without the
leading tsk/mm arguments. We will give a compile-time warning
about the old style being __deprecated and we will also
WARN_ON() if the non-remote version is used for a remote-style
access.
Doing this, folks will get nice warnings and will not break the
build. This should be nice for -next and will hopefully let
developers fix up their own code instead of maintainers needing
to do it at merge time.
The way we do this is hideous. It uses the __VA_ARGS__ macro
functionality to call different functions based on the number
of arguments passed to the macro.
There's an additional hack to ensure that our EXPORT_SYMBOL()
of the deprecated symbols doesn't trigger a warning.
We should be able to remove this mess as soon as -rc1 hits in
the release after this is merged.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Leon Romanovsky <leon@leon.nu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210155.73222EE1@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For protection keys, we need to understand whether protections
should be enforced in software or not. In general, we enforce
protections when working on our own task, but not when on others.
We call these "current" and "remote" operations.
This patch introduces a new get_user_pages() variant:
get_user_pages_remote()
Which is a replacement for when get_user_pages() is called on
non-current tsk/mm.
We also introduce a new gup flag: FOLL_REMOTE which can be used
for the "__" gup variants to get this new behavior.
The uprobes is_trap_at_addr() location holds mmap_sem and
calls get_user_pages(current->mm) on an instruction address. This
makes it a pretty unique gup caller. Being an instruction access
and also really originating from the kernel (vs. the app), I opted
to consider this a 'remote' access where protection keys will not
be enforced.
Without protection keys, this patch should not change any behavior.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When GCC cannot do constant folding for this macro, it falls back to
cpu_has(). But static_cpu_has() is optimal and it works at all times
now. So use it and speedup the fallback case.
Before we had this:
mov 0x99d674(%rip),%rdx # ffffffff81b0d9f4 <boot_cpu_data+0x34>
shr $0x2e,%rdx
and $0x1,%edx
jne ffffffff811704e9 <do_munmap+0x3f9>
After alternatives patching, it turns into:
jmp 0xffffffff81170390
nopl (%rax)
...
callq ffffffff81056e00 <mpx_notify_unmap>
ffffffff81170390: mov 0x170(%r12),%rdi
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455578358-28347-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Update the mailing list used for development of support for ARM64
Renesas SoCs.
This is a follow-up for a similar change for other Renesas SoCs and
drivers uses by Renesas SoCs. The ARM64 SoC entry was not updated in
that patch as it was not yet present in mainline.
The motivation for the mailing list update is that Renesas SoCs are now
much wider than the SH architecture and there is some desire from some
for the linux-sh list to refocus on discussion of the work on the SH
architecture.
Acked-by: Magnus Damm <damm@opensource.se>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull i915 drm fixes from Dave Airlie:
"Jani sent a bunch of i915 display fixes as my weekend started, but
hopefully you can fit them in"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: fix error path in intel_setup_gmbus()
drm/i915/skl: Fix typo in DPLL_CFGCR1 definition
drm/i915/skl: Don't skip mst encoders in skl_ddi_pll_select()
drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2)
drm/i915/dp: reduce missing TPS3 support errors to debug logging
drm/i915/dp: abstract training pattern selection
drm/i915/dsi: skip gpio element execution when not supported
drm/i915/dsi: don't pass arbitrary data to sideband
drm/i915/dsi: defend gpio table against out of bounds access
drm/i915/bxt: Don't save/restore eDP panel power during suspend (v3)
drm/i915: Allow i915_gem_object_get_page() on userptr as well
i915 display fixes mostly.
* tag 'drm-intel-fixes-2016-02-12' of git://anongit.freedesktop.org/drm-intel:
drm/i915: fix error path in intel_setup_gmbus()
drm/i915/skl: Fix typo in DPLL_CFGCR1 definition
drm/i915/skl: Don't skip mst encoders in skl_ddi_pll_select()
drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2)
drm/i915/dp: reduce missing TPS3 support errors to debug logging
drm/i915/dp: abstract training pattern selection
drm/i915/dsi: skip gpio element execution when not supported
drm/i915/dsi: don't pass arbitrary data to sideband
drm/i915/dsi: defend gpio table against out of bounds access
drm/i915/bxt: Don't save/restore eDP panel power during suspend (v3)
drm/i915: Allow i915_gem_object_get_page() on userptr as well
Here are 3 fixes for some reported issues. Two nvmem driver fixes, and
one mei fix. All have been in linux-next just fine.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlbAzXYACgkQMUfUDdst+yloOQCgo2NfGm5sy05mGSmfBanqF8XP
lnQAoJte8LiHXdhIXb6b7XnqLDdKTExq
=O7sZ
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are 3 fixes for some reported issues. Two nvmem driver fixes,
and one mei fix. All have been in linux-next just fine"
* tag 'char-misc-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
nvmem: qfprom: Specify LE device endianness
nvmem: core: return error for non word aligned access
mei: validate request value in client notify request ioctl
Here is one driver core, well klist, fix for 4.5-rc4. It fixes a
problem found in the scsi device list traversal that probably also could
be triggered by other subsystems.
The fix has been in linux-next for a while with no reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlbAzd8ACgkQMUfUDdst+ymrBQCeKF7+mPHBApEJsedFJhGNNwSb
PnIAoKMa1hCZ4MR86tfdXCylh2Gw/xPt
=OlH/
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is one driver core, well klist, fix for 4.5-rc4.
It fixes a problem found in the scsi device list traversal that
probably also could be triggered by other subsystems.
The fix has been in linux-next for a while with no reported problems"
* tag 'driver-core-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
klist: fix starting point removed bug in klist iterators
Here are a number of small tty and serial driver fixes for 4.5-rc4 that
resolve some reported issues.
One of them got reverted as it wasn't correct based on testing, and all
have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlbAzkEACgkQMUfUDdst+ylE4QCfXW10ziXSblRUIJubEm45Qhn2
WJAAoLFMd/eER2TFkBl4E2Y3I7HUaL5d
=V2Vb
-----END PGP SIGNATURE-----
Merge tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for 4.5-rc4
that resolve some reported issues.
One of them got reverted as it wasn't correct based on testing, and
all have been in linux-next for a while"
* tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "8250: uniphier: allow modular build with 8250 console"
pty: make sure super_block is still valid in final /dev/tty close
pty: fix possible use after free of tty->driver_data
tty: Add support for PCIe WCH382 2S multi-IO card
serial/omap: mark wait_for_xmitr as __maybe_unused
serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
8250: uniphier: allow modular build with 8250 console
tty: Drop krefs for interrupted tty lock
Here are a number of USB and PHY driver fixes for 4.5-rc4.
They are the usual gadget and xhci drivers that had reported problems,
as well as a few small phy issues as well. All have been in linux-next
with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlbAzr8ACgkQMUfUDdst+ymO+QCfRyaxt7H2EI4jfiTruIeJoFOi
Tx4An1pJEq7KnS0J5ohlxo5c8a2rnOPY
=j7DO
-----END PGP SIGNATURE-----
Merge tag 'usb-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull PHY fixes from Greg KH:
"Here are a couple of PHY driver fixes for 4.5-rc4.
A few small phy issues. All have been in linux-next with no reported
issues"
* tag 'usb-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload
phy: twl4030-usb: Relase usb phy on unload
phy: core: fix wrong err handle for phy_power_on
phy: Restrict phy-hi6220-usb to HiSilicon arm64
Pull perf tooling fixes from Thomas Gleixner:
"Another round of fixes for the perf tooling side:
- Prevent a NULL pointer dereference in tracepoint error handling
- Fix a thread handling bug in the intel_pt error handling code
- Search both .eh_frame and .debug_frame sections as toolchains seem
to have random choices of storing the CFI information
- Fix the perf state interval output values, which got broken when
fixing the overall output"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf stat: Fix interval output values
perf probe: Search both .eh_frame and .debug_frame sections for probe location
perf tools: Fix thread lifetime related segfaut in intel_pt
perf tools: tracepoint_error() can receive e=NULL, robustify it
Pull lockdep fix from Thomas Gleixner:
"A single fix for the stack trace caching logic in lockdep, where the
duplicate avoidance managed to store no back trace at all"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Fix stack trace caching logic
Pull timer fix from Thomas Gleixner:
"A single fix preventing a 32bit overflow in timespec/val to cputime
conversions on 32bit machines"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cputime: Prevent 32bit overflow in time[val|spec]_to_cputime()
Pull irqchip fixes from Thomas Gleixner:
"Another set of ARM SoC related irqchip fixes:
- Plug a memory leak in gicv3-its
- Limit features to the root gic interrupt controller
- Add a missing barrier in the gic-v3 IAR access
- Another compile test fix for sun4i"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Make sure read from ICC_IAR1_EL1 is visible on redestributor
irqchip/gic: Only set the EOImodeNS bit for the root controller
irqchip/gic: Only populate set_affinity for the root controller
irqchip/gicv3-its: Fix memory leak in its_free_tables()
irqchip/sun4i: Fix compilation outside of arch/arm
Pull x86 fixes from Thomas Gleixner:
"Two small fixlets for x86:
- Prevent a KASAN false positive in thread_saved_pc()
- Fix a 32-bit truncation problem in the x86 numa code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels
x86: Fix KASAN false positives in thread_saved_pc()
Pull MIPS fixes from Ralf Baechle:
"Here's the first round of MIPS fixes after the merge window:
- Detect Octeon III's PCI correctly.
- Fix return value of the MT7620 probing function.
- Wire up the copy_file_range syscall.
- Fix 64k page support on 32 bit kernels.
- Fix the early Coherency Manager probe.
- Allow only hardware-supported page sizes to be selected for R6000.
- Fix corner cases for the RDHWR nstruction emulation on old hardware.
- Fix FPU handling corner cases.
- Remove stale entry for BCM33xx from the MAINTAINERS file.
- 32 and 64 bit ELF headers are different, handle them correctly"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
mips: Differentiate between 32 and 64 bit ELF header
MIPS: Octeon: Update OCTEON_FEATURE_PCIE for Octeon III
MIPS: pci-mt7620: Fix return value check in mt7620_pci_probe()
MIPS: Fix early CM probing
MIPS: Wire up copy_file_range syscall.
MIPS: Fix 64k page support for 32 bit kernels.
MIPS: R6000: Don't allow 64k pages for R6000.
MIPS: traps.c: Correct microMIPS RDHWR emulation
MIPS: traps.c: Don't emulate RDHWR in the CpU #0 exception handler
MAINTAINERS: Remove stale entry for BCM33xx chips
MIPS: Fix FPU disable with preemption
MIPS: Properly disable FPU in start_thread()
MIPS: Fix buffer overflow in syscall_get_arguments()
Pull ARM fixes from Russell King:
"A couple of ARM fixes from Linus for the ICST clock generator code"
[ "Linus" here is Linus Walleij. Name-stealer.
Linus "there can be only one" Torvalds ]
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8519/1: ICST: try other dividends than 1
ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz()
Pull component helper fixes from Russell King:
"A few fixes for problems people have encountered with the recent
update to the component helpers"
* 'component' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
component: remove device from master match list on failed add
component: Detach components when deleting master struct
component: fix crash on x86_64 with hda audio drivers
So we want to specify the dependency on both @pcid and @addr so that the
compiler doesn't reorder accesses to them *before* the TLB flush. But
for that to work, we need to express this properly in the inline asm and
deref the whole desc array, not the pointer to it. See clwb() for an
example.
This fixes the build error on 32-bit:
arch/x86/include/asm/tlbflush.h: In function ‘__invpcid’:
arch/x86/include/asm/tlbflush.h:26:18: error: memory input 0 is not directly addressable
which gcc4.7 caught but 5.x didn't. Which is strange. :-\
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Michael Matz <matz@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- One fix to ipoib multicast joins
- One fix to mlx4 error handling
- One fix to mlx5 size computation
- One fix to a thinko in core code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWvjoAAAoJELgmozMOVy/dxTcP/3ELxMBGStqhdIJhxf1UfnYh
IwdMWpBrvRZo92H+/YHZstnK1kHFZyQLejvdWU24loymzcs94vzswHPMpa0baKd1
PfBTtTKCts5CA4kTdxECiBjHlLYkrcsAv8il1x09RNF3ZzlMxrvupPa9fjulQalP
ULbNsV+DyZr7hGXly8dX6imYQCF0433uxrYSWOY9o98IbTRYmRQhp1R31TDj/KMP
jXdLrMYzs7glmtfIQdfA4Vup7PhPAZZ2tKNejNI3ECCnB/wCQsh86k7hl3ghlmDz
rvrsFheIk114jgHALb+l/FMgNNjTHxSmy98s8Fcuow753oxf940YryUORUIrI/B/
50CnjjtgCWwG6FzRPM41yYxJ2nvfGMTURKrV6LfR0hre5AxMAxb+Eu0sGl3NAe/6
rbsdQjAUsctIpSjbo2xjmazFk3Itda7xTzMYS72ovqRQ/VvdMtH63fdUXVXStxUv
b121ZS3tGYWh6k/4FasRyvvExqgJy3TRHsIEbwC8P+GFAnulhN11pLAc56Eug5Bw
+U057Uzj3lm2iKZUzC/OvCBARufNZXH2P/oAPXqIDJPBnMAia+U76m8sq1N6tO83
ct0kn3aYwZ6q1O69SzOpashnTqJCfkuwMtf+sbsDs0zHavjMzKMWhp9UVnQ73vf9
7mRZc/Uz3+Jrl8c/K1eX
=lzqm
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull more rdma fixes from Doug Ledford:
"I think we are getting pretty close to done now. There are four
one-off fixes in this update:
- fix ipoib multicast joins
- fix mlx4 error handling
- fix mlx5 size computation
- fix a thinko in core code"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/mlx5: Fix RC transport send queue overhead computation
IB/ipoib: fix for rare multicast join race condition
IB/core: Fix reading capability mask of the port info class
net/mlx4: fix some error handling in mlx4_multi_func_init()