There was a reported suspicion about a race between exit_pi_state_list()
and put_pi_state(). The same report mentioned the comment with
put_pi_state() said it should be called with hb->lock held, and it no
longer is in all places.
As it turns out, the pi_state->owner serialization is indeed broken. As per
the new rules:
734009e96d ("futex: Change locking rules")
pi_state->owner should be serialized by pi_state->pi_mutex.wait_lock.
For the sites setting pi_state->owner we already hold wait_lock (where
required) but exit_pi_state_list() and put_pi_state() were not and
raced on clearing it.
Fixes: 734009e96d ("futex: Change locking rules")
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: dvhart@infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20170922154806.jd3ffltfk24m4o4y@hirez.programming.kicks-ass.net
kmemdup() is preferred to kmalloc() followed by memcpy().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
When checking for permission to view keys whilst reading from
/proc/keys, we should use the credentials with which the /proc/keys file
was opened. This is because, in a classic type of exploit, it can be
possible to bypass checks for the *current* credentials by passing the
file descriptor to a suid program.
Following commit 34dbbcdbf6 ("Make file credentials available to the
seqfile interfaces") we can finally fix it. So let's do it.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
In key_user_lookup(), if there is no key_user for the given uid, we drop
key_user_lock, allocate a new key_user, and search the tree again. But
we failed to set 'parent' to NULL at the beginning of the second search.
If the tree were to be empty for the second search, the insertion would
be done with an invalid 'parent', scribbling over freed memory.
Fortunately this can't actually happen currently because the tree always
contains at least the root_key_user. But it still should be fixed to
make the code more robust.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
It was possible for an unprivileged user to create the user and user
session keyrings for another user. For example:
sudo -u '#3000' sh -c 'keyctl add keyring _uid.4000 "" @u
keyctl add keyring _uid_ses.4000 "" @u
sleep 15' &
sleep 1
sudo -u '#4000' keyctl describe @u
sudo -u '#4000' keyctl describe @us
This is problematic because these "fake" keyrings won't have the right
permissions. In particular, the user who created them first will own
them and will have full access to them via the possessor permissions,
which can be used to compromise the security of a user's keys:
-4: alswrv-----v------------ 3000 0 keyring: _uid.4000
-5: alswrv-----v------------ 3000 0 keyring: _uid_ses.4000
Fix it by marking user and user session keyrings with a flag
KEY_FLAG_UID_KEYRING. Then, when searching for a user or user session
keyring by name, skip all keyrings that don't have the flag set.
Fixes: 69664cf16a ("keys: don't generate user and user session keyrings unless they're accessed")
Cc: <stable@vger.kernel.org> [v2.6.26+]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Userspace can call keyctl_read() on a keyring to get the list of IDs of
keys in the keyring. But if the user-supplied buffer is too small, the
kernel would write the full list anyway --- which will corrupt whatever
userspace memory happened to be past the end of the buffer. Fix it by
only filling the space that is available.
Fixes: b2a4df200d ("KEYS: Expand the capacity of a keyring")
Cc: <stable@vger.kernel.org> [v3.13+]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
In keyctl_read_key(), if key_permission() were to return an error code
other than EACCES, we would leak a the reference to the key. This can't
actually happen currently because key_permission() can only return an
error code other than EACCES if security_key_permission() does, only
SELinux and Smack implement that hook, and neither can return an error
code other than EACCES. But it should still be fixed, as it is a bug
waiting to happen.
Fixes: 29db919063 ("[PATCH] Keys: Add LSM hooks for key management [try #3]")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
In keyctl_assume_authority(), if keyctl_change_reqkey_auth() were to
fail, we would leak the reference to the 'authkey'. Currently this can
only happen if prepare_creds() fails to allocate memory. But it still
should be fixed, as it is a more severe bug waiting to happen.
This patch also moves the read of 'authkey->serial' to before the
reference to the authkey is dropped. Doing the read after dropping the
reference is very fragile because it assumes we still hold another
reference to the key. (Which we do, in current->cred->request_key_auth,
but there's no reason not to write it in the "obviously correct" way.)
Fixes: d84f4f992c ("CRED: Inaugurate COW credentials")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
If key_instantiate_and_link() were to fail (which fortunately isn't
possible currently), the call to key_revoke(authkey) would crash with a
NULL pointer dereference in request_key_auth_revoke() because the key
has not yet been instantiated.
Fix this by removing the call to key_revoke(). key_put() is sufficient,
as it's not possible for an uninstantiated authkey to have been used for
anything yet.
Fixes: b5f545c880 ("[PATCH] keys: Permit running process to instantiate keys")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
In request_key_auth_new(), if key_alloc() or key_instantiate_and_link()
were to fail, we would leak a reference to the 'struct cred'. Currently
this can only happen if key_alloc() fails to allocate memory. But it
still should be fixed, as it is a more severe bug waiting to happen.
Fix it by cleaning things up to use a helper function which frees a
'struct request_key_auth' correctly.
Fixes: d84f4f992c ("CRED: Inaugurate COW credentials")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Yet another fix for probing the max attr.precise_ip setting: it is not
enough settting attr.exclude_kernel for !root users, as they _can_
profile the kernel if the kernel.perf_event_paranoid sysctl is set to
-1, so check that as well.
Testing it:
As non root:
$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = 2
$ perf record sleep 1
$ perf evlist -v
cycles:uppp: ..., exclude_kernel: 1, ... precise_ip: 3, ...
Now as non-root, but with kernel.perf_event_paranoid set set to the
most permissive value, -1:
$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = -1
$ perf record sleep 1
$ perf evlist -v
cycles:ppp: ..., exclude_kernel: 0, ... precise_ip: 3, ...
$
I.e. non-root, default kernel.perf_event_paranoid: :uppp modifier = not allowed to sample the kernel,
non-root, most permissible kernel.perf_event_paranoid: :ppp = allowed to sample the kernel.
In both cases, use the highest available precision: attr.precise_ip = 3.
Reported-and-Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: d37a369790 ("perf evsel: Fix attr.exclude_kernel setting for default cycles:p")
Link: http://lkml.kernel.org/n/tip-nj2qkf75xsd6pw6hhjzfqqdx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Time for a sync with ABI/uapi headers with the upcoming v4.14 kernel.
None of the ABI changes require any source code level changes to our
existing in-kernel tooling code:
- tools/arch/s390/include/uapi/asm/kvm.h:
New KVM_S390_VM_TOD_EXT ABI, not used by in-kernel tooling.
- tools/arch/x86/include/asm/cpufeatures.h:
tools/arch/x86/include/asm/disabled-features.h:
New PCID, SME and VGIF x86 CPU feature bits defined.
- tools/include/asm-generic/hugetlb_encode.h:
tools/include/uapi/asm-generic/mman-common.h:
tools/include/uapi/linux/mman.h:
Two new madvise() flags, plus a hugetlb system call mmap flags
restructuring/extension changes.
- tools/include/uapi/drm/drm.h:
tools/include/uapi/drm/i915_drm.h:
New drm_syncobj_create flags definitions, new drm_syncobj_wait
and drm_syncobj_array ABIs. DRM_I915_PERF_* calls and a new
I915_PARAM_HAS_EXEC_FENCE_ARRAY ABI for the Intel driver.
- tools/include/uapi/linux/bpf.h:
New bpf_sock fields (::mark and ::priority), new XDP_REDIRECT
action, new kvm_ppc_smmu_info fields (::data_keys, instr_keys)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170913073823.lxmi4c7ejqlfabjx@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that I'm switching the container builds from using a local volume
pointing to the kernel repository with the perf sources, instead getting
a detached tarball to be able to use a container cluster, some places
broke because I forgot to put some of the required files in
tools/perf/MANIFEST, namely some bitsperlong.h files.
So, to fix it do the same as for tools/build/ and pack the whole
tools/arch/ directory.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wmenpjfjsobwdnfde30qqncj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix the default for microblaze. Michal Simek mentioned default for
microblaze should be CPU_LITTLE_ENDIAN.
Fixes : commit 206d3642d8 ("arch/microblaze: add choice for endianness
and update Makefile")
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Use vma_pages function on vma object instead of explicit computation.
Found by coccinelle spatch "api/vma_pages.cocci"
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Note this includes fixes from recent merge window. As such the tree
is based on top of a prior staging/staging-next tree.
* iio core
- return and error for a failed read_reg debugfs call rather than
eating the error.
* ad7192
- Use the dedicated reset function in the ad_sigma_delta library
instead of an spi transfer with the data on the stack which
could cause problems with DMA.
* ad7793
- Implement a dedicate reset function in the ad_sigma_delta library
and use it to correctly reset this part.
* bme280
- ctrl_reg write must occur after any register writes
for updates to take effect.
* mcp320x
- negative voltage readout was broken.
- Fix an oops on module unload due to spi_set_drvdata not being called
in probe.
* st_magn
- Fix the data ready line configuration for the lis3mdl. It is not
configurable so the st_magn core was assuming it didn't exist
and so wasn't consuming interrupts resulting in an unhandled
interrupt.
* stm32-adc
- off by one error on max channels checking.
* stm32-timer
- preset should not be buffered - reorganising register writes avoids
this.
- fix a corner case in which write preset goes wrong when a timer is
used first as a trigger then as a counter with preset. Odd case but
you never know.
* ti-ads1015
- Fix setting of comparator polarity by fixing bitfield definition.
* twl4030
- Error path handling fix to cleanup in event of regulator
registration failure.
- Disable the vusb3v1 regulator correctly in error handling
- Don't paper over a regulator enable failure.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAlnH2PkRHGppYzIzQGtl
cm5lbC5vcmcACgkQVIU0mcT0FoiNFw//XKT/CRq7R8Dg5vy4FT7NRw/SpSVH9u8M
JsIISyAoh0jfOFJFkVv2Ll+Zs+SkbXJOPMUreaFJMnftSCkBdteT9D5kz5cYiUkO
sFn58YEEqKCqQVJBEi0yM00HMxbpNKiNHXzo9WC8ZiUVRJqSigq1coznSqaZngRd
yf4a9Z33jiH/NXalqy5Vb4CSn76UWpvvYJ11tuNxXqOldkyU4sm8bvsPWnBg1qG3
YIuVZs83Rv1HW0t6ktYf3WwC2rBhKEM8st2fUCLkU+4+foDuqQBiFdWJzTcxe6Ru
8za+q3ngOo7rRkLS8cSTsh8hbfKwIV0e3rQvsN3zqodqjPilw9uH3eybMYzLK/0I
j9o12jaD1Ow7zQfiQnNnyJqDLddEt+f6GcrI/iYQPlTIQIhqDii4YInDyCrF4xbr
i/8saOr5n4PsdHRovTUl2rrgiX+Jix6OmNLEtUdp1HWrc0XwbZ5L/1g6pXFL98QZ
Kq8MwjNu+6x+66Hga/1fll19hZ/VMYGqxFtoP+6S28TS1bO9dXm6XNl5/U5Nh5mN
J255UFCdFkBmJO9nP7tmbDscMCAbhc6AOHKxjUebuoNnyhyV+20W9S/gpzTIilnO
Gv9D0z29zITc8Hpfhl2YKC6F00BuRAdPFJo05rSE/KoWX3bkPCiXFTfQ+aPwB72h
bYT4oJeUoQg=
=CcjQ
-----END PGP SIGNATURE-----
Merge tag 'iio-fixes-for-4.14a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
First round of IIO fixes for the 4.14 cycle
Note this includes fixes from recent merge window. As such the tree
is based on top of a prior staging/staging-next tree.
* iio core
- return and error for a failed read_reg debugfs call rather than
eating the error.
* ad7192
- Use the dedicated reset function in the ad_sigma_delta library
instead of an spi transfer with the data on the stack which
could cause problems with DMA.
* ad7793
- Implement a dedicate reset function in the ad_sigma_delta library
and use it to correctly reset this part.
* bme280
- ctrl_reg write must occur after any register writes
for updates to take effect.
* mcp320x
- negative voltage readout was broken.
- Fix an oops on module unload due to spi_set_drvdata not being called
in probe.
* st_magn
- Fix the data ready line configuration for the lis3mdl. It is not
configurable so the st_magn core was assuming it didn't exist
and so wasn't consuming interrupts resulting in an unhandled
interrupt.
* stm32-adc
- off by one error on max channels checking.
* stm32-timer
- preset should not be buffered - reorganising register writes avoids
this.
- fix a corner case in which write preset goes wrong when a timer is
used first as a trigger then as a counter with preset. Odd case but
you never know.
* ti-ads1015
- Fix setting of comparator polarity by fixing bitfield definition.
* twl4030
- Error path handling fix to cleanup in event of regulator
registration failure.
- Disable the vusb3v1 regulator correctly in error handling
- Don't paper over a regulator enable failure.
The driver will forward errors to userspace after turning most of them
into -EIO. But all status codes are not equal. The -EPIPE (stall) in
particular can be seen more as a result of normal USB signaling than
an actual error. The state is automatically cleared by the USB core
without intervention from either driver or userspace.
And most devices and firmwares will never trigger a stall as a result
of GetEncapsulatedResponse. This is in fact a requirement for CDC WDM
devices. Quoting from section 7.1 of the CDC WMC spec revision 1.1:
The function shall not return STALL in response to
GetEncapsulatedResponse.
But this driver is also handling GetEncapsulatedResponse on behalf of
the qmi_wwan and cdc_mbim drivers. Unfortunately the relevant specs
are not as clear wrt stall. So some QMI and MBIM devices *will*
occasionally stall, causing the GetEncapsulatedResponse to return an
-EPIPE status. Translating this into -EIO for userspace has proven to
be harmful. Treating it as an empty read is safer, making the driver
behave as if the device was conforming to the CDC WDM spec.
There have been numerous reports of issues related to -EPIPE errors
from some newer CDC MBIM devices in particular, like for example the
Fibocom L831-EAU. Testing on this device has shown that the issues
go away if we simply ignore the -EPIPE status. Similar handling of
-EPIPE is already known from e.g. usb_get_string()
The -EPIPE log message is still kept to let us track devices with this
unexpected behaviour, hoping that it attracts attention from firmware
developers.
Cc: <stable@vger.kernel.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100938
Reported-and-tested-by: Christian Ehrig <christian.ehrig@mediamarktsaturn-bt.com>
Reported-and-tested-by: Patrick Chilton <chpatrick@gmail.com>
Reported-and-tested-by: Andreas Böhler <news@aboehler.at>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The user buffer has "uurb->buffer_length" bytes. If the kernel has more
information than that, we should truncate it instead of writing past
the end of the user's buffer. I added a WARN_ONCE() to help the user
debug the issue.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There used to be an integer overflow check in proc_do_submiturb() but
we removed it. It turns out that it's still required. The
uurb->buffer_length variable is a signed integer and it's controlled by
the user. It can lead to an integer overflow when we do:
num_sgs = DIV_ROUND_UP(uurb->buffer_length, USB_SG_SIZE);
If we strip away the macro then that line looks like this:
num_sgs = (uurb->buffer_length + USB_SG_SIZE - 1) / USB_SG_SIZE;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It's the first addition which can overflow.
Fixes: 1129d270cb ("USB: Increase usbfs transfer limit")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An off-by-one error in loop terminantion conditions in
create_setup_data_nodes() will lead to memory leak when
create_setup_data_node() failed.
Signed-off-by: Sean Fu <fxinrong@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1505090001-1157-1-git-send-email-fxinrong@gmail.com
commit 7b2d0dbac4 ("x86/mm/pkeys: Pass VMA down in to fault signal
generation code") passes down a vma pointer to the error path, but that is
done once the mmap_sem is released when calling mm_fault_error() from
__do_page_fault().
This is dangerous as the vma structure is no more safe to be used once the
mmap_sem has been released. As only the protection key value is required in
the error processing, we could just pass down this value.
Fix it by passing a pointer to a protection key value down to the fault
signal generation code. The use of a pointer allows to keep the check
generating a warning message in fill_sig_info_pkey() when the vma was not
known. If the pointer is valid, the protection value can be accessed by
deferencing the pointer.
[ tglx: Made *pkey u32 as that's the type which is passed in siginfo ]
Fixes: 7b2d0dbac4 ("x86/mm/pkeys: Pass VMA down in to fault signal generation code")
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1504513935-12742-1-git-send-email-ldufour@linux.vnet.ibm.com
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls. Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.
Unfortunately, there have been bugs in this validation code. For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0. As-is, such bugs are exploitable to read the FPU
registers of other processes on the system. To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers. Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.
This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers. Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits. There can even be reserved bits outside the
xstate_header which are easy to forget about. For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe ("KVM: x86: Fix load
damaged SSEx MXCSR register").
Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.
We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d2 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions"). This new patch is also a bit
different. First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches. In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On x86, userspace can use the ptrace() or rt_sigreturn() system calls to
set a task's extended state (xstate) or "FPU" registers. ptrace() can
set them for another task using the PTRACE_SETREGSET request with
NT_X86_XSTATE, while rt_sigreturn() can set them for the current task.
In either case, registers can be set to any value, but the kernel
assumes that the XSAVE area itself remains valid in the sense that the
CPU can restore it.
However, in the case where the kernel is using the uncompacted xstate
format (which it does whenever the XSAVES instruction is unavailable),
it was possible for userspace to set the xcomp_bv field in the
xstate_header to an arbitrary value. However, all bits in that field
are reserved in the uncompacted case, so when switching to a task with
nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault. This
caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit. In
addition, since the error is otherwise ignored, the FPU registers from
the task previously executing on the CPU were leaked.
Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in
the uncompacted case, and returning an error otherwise.
The reason for validating xcomp_bv rather than simply overwriting it
with 0 is that we want userspace to see an error if it (incorrectly)
provides an XSAVE area in compacted format rather than in uncompacted
format.
Note that as before, in case of error we clear the task's FPU state.
This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be
better to return an error before changing anything. But it seems the
"clear on error" behavior is fine for now, and it's a little tricky to
do otherwise because it would mean we couldn't simply copy the full
userspace state into kernel memory in one __copy_from_user().
This bug was found by syzkaller, which hit the above-mentioned
WARN_ON_FPU():
WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
RIP: 0010:__switch_to+0x5b5/0x5d0
RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
FS: 00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
Call Trace:
Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f
Here is a C reproducer. The expected behavior is that the program spin
forever with no output. However, on a buggy kernel running on a
processor with the "xsave" feature but without the "xsaves" feature
(e.g. Sandy Bridge through Broadwell for Intel), within a second or two
the program reports that the xmm registers were corrupted, i.e. were not
restored correctly. With CONFIG_X86_DEBUG_FPU=y it also hits the above
kernel warning.
#define _GNU_SOURCE
#include <stdbool.h>
#include <inttypes.h>
#include <linux/elf.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/uio.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void)
{
int pid = fork();
uint64_t xstate[512];
struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };
if (pid == 0) {
bool tracee = true;
for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
tracee = (fork() != 0);
uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF };
asm volatile(" movdqu %0, %%xmm0\n"
" mov %0, %%rbx\n"
"1: movdqu %%xmm0, %0\n"
" mov %0, %%rax\n"
" cmp %%rax, %%rbx\n"
" je 1b\n"
: "+m" (xmm0) : : "rax", "rbx", "xmm0");
printf("BUG: xmm registers corrupted! tracee=%d, xmm0=%08X%08X%08X%08X\n",
tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
} else {
usleep(100000);
ptrace(PTRACE_ATTACH, pid, 0, 0);
wait(NULL);
ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
xstate[65] = -1;
ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
ptrace(PTRACE_CONT, pid, 0, 0);
wait(NULL);
}
return 1;
}
Note: the program only tests for the bug using the ptrace() system call.
The bug can also be reproduced using the rt_sigreturn() system call, but
only when called from a 32-bit program, since for 64-bit programs the
kernel restores the FPU state from the signal frame by doing XRSTOR
directly from userspace memory (with proper error checking).
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org> [v3.17+]
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Fixes: 0b29643a58 ("x86/xsaves: Change compacted format xsave area header")
Link: http://lkml.kernel.org/r/20170922174156.16780-2-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-25-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Another round of CR3/PCID related fixes (I think this addresses all
but one of the known problems with PCID support), an objtool fix plus
a Clang fix that (finally) solves all Clang quirks to build a bootable
x86 kernel as-is"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Fix inline asm call constraints for Clang
objtool: Handle another GCC stack pointer adjustment bug
x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
x86/mm: Factor out CR3-building code
Pull irq fixes from Ingo Molnar:
"Three irqchip driver fixes, and an affinity mask helper function bug
fix affecting x86"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "genirq: Restrict effective affinity to interrupts actually using it"
irqchip.mips-gic: Fix shared interrupt mask writes
irqchip/gic-v4: Fix building with ancient gcc
irqchip/gic-v3: Iterate over possible CPUs by for_each_possible_cpu()
Pull address-limit checking fixes from Ingo Molnar:
"This fixes a number of bugs in the address-limit (USER_DS) checks that
got introduced in the merge window, (mostly) affecting the ARM and
ARM64 platforms"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64/syscalls: Move address limit check in loop
arm/syscalls: Optimize address limit check
Revert "arm/syscalls: Check address limit on user-mode return"
syscalls: Use CHECK_DATA_CORRUPTION for addr_limit_user_check
Pull misc security layer update from James Morris:
"This is the remaining 'general' change in the security tree for v4.14,
following the direct merging of SELinux (+ TOMOYO), AppArmor, and
seccomp.
That's everything now for the security tree except IMA, which will
follow shortly (I've been traveling for the past week with patchy
internet)"
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security: fix description of values returned by cap_inode_need_killpriv
Pull TPM updates from James Morris:
"Here are the TPM updates from Jarkko for v4.14, which I've placed in
their own branch (next-tpm). I ended up cherry-picking them as other
changes had been made in Jarkko's branch after he sent me his original
pull request.
I plan on maintaining a separate branch for TPM (and other security
subsystems) from now on.
From Jarkko: 'Not much this time except a few fixes'"
* 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
tpm: ibmvtpm: simplify crq initialization and document crq format
tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers
Documentation: tpm: add powered-while-suspended binding documentation
tpm: tpm_crb: constify acpi_device_id.
tpm: vtpm: constify vio_device_id
Depends on: 691c4b95d1 ("iio: ad_sigma_delta: Implement a dedicated reset function")
SPI host drivers can use DMA to transfer data, so the buffer should be properly allocated.
Keeping it on the stack could cause an undefined behavior.
The dedicated reset function solves this issue.
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
If an IIO device returns an error code for a read access via debugfs, it
is currently ignored by the IIO core (other than emitting an error
message). Instead, return this error code to user space, so upper layers
can detect it correctly.
Signed-off-by: Matt Fornero <matt.fornero@mathworks.com>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The serial interface can be reset by writing 32 consecutive 1s to the device.
'ret' was initialized correctly but its value was overwritten when
ad7793_check_platform_data() was called. Since a dedicated reset function
is present now, it should be used instead.
Fixes: 2edb769d24 ("iio:ad7793: Add support for the ad7798 and ad7799")
Signed-off-by: Dragos Bogdan <dragos.bogdan@analog.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Since most of the SD ADCs have the option of reseting the serial
interface by sending a number of SCLKs with CS = 0 and DIN = 1,
a dedicated function that can do this is usefull.
Needed for the patch: iio: ad7793: Fix the serial interface reset
Signed-off-by: Dragos Bogdan <dragos.bogdan@analog.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The ctrl_reg register needs to be written after any write to
the humidity registers. The value written to the ctrl_reg register
does not necessarily need to change, but a write operation must
occur.
The regmap_update_bits functions will not write to a register
if the register value matches the value to be written. This saves
unnecessary bus operations. The change in this patch forces a bus
write during the chip_config operation by switching to
regmap_write_bits.
This will fix issues where the Humidity Sensor Oversampling bits
are not updated after initialization.
Signed-off-by: Colin Parker <colin.parker@aclima.io>
Acked-by: Andreas Klinger <ak@it-klinger.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Commit f686a36b4b ("iio: adc: mcp320x: Add support for mcp3301")
returns a signed voltage from mcp320x_adc_conversion() but neglects that
the caller interprets a negative return value as failure. Only mcp3301
(and the upcoming mcp3550/1/3) is affected as the other chips are
incapable of measuring negative voltages.
Fix and while at it, add mcp3301 to the list of supported chips at the
top of the file.
Fixes: f686a36b4b ("iio: adc: mcp320x: Add support for mcp3301")
Cc: Andrea Galbusera <gizero@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The driver calls spi_get_drvdata() in its ->remove hook even though it
has never called spi_set_drvdata(). Stack trace for posterity:
Unable to handle kernel NULL pointer dereference at virtual address 00000220
Internal error: Oops: 5 [#1] SMP ARM
[<8072f564>] (mutex_lock) from [<7f1400d0>] (iio_device_unregister+0x24/0x7c [industrialio])
[<7f1400d0>] (iio_device_unregister [industrialio]) from [<7f15e020>] (mcp320x_remove+0x20/0x30 [mcp320x])
[<7f15e020>] (mcp320x_remove [mcp320x]) from [<8055a8cc>] (spi_drv_remove+0x2c/0x44)
[<8055a8cc>] (spi_drv_remove) from [<805087bc>] (__device_release_driver+0x98/0x134)
[<805087bc>] (__device_release_driver) from [<80509180>] (driver_detach+0xdc/0xe0)
[<80509180>] (driver_detach) from [<8050823c>] (bus_remove_driver+0x5c/0xb0)
[<8050823c>] (bus_remove_driver) from [<80509ab0>] (driver_unregister+0x38/0x58)
[<80509ab0>] (driver_unregister) from [<7f15e69c>] (mcp320x_driver_exit+0x14/0x1c [mcp320x])
[<7f15e69c>] (mcp320x_driver_exit [mcp320x]) from [<801a78d0>] (SyS_delete_module+0x184/0x1d0)
[<801a78d0>] (SyS_delete_module) from [<80108100>] (ret_fast_syscall+0x0/0x1c)
Fixes: f5ce4a7a92 ("iio: adc: add driver for MCP3204/08 12-bit ADC")
Cc: Oskar Andero <oskar.andero@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Fix a bad error check when counting 'st,adc-channels' array elements.
This is seen when all channels are in use simultaneously.
Fixes: 64ad7f643 ("iio: adc: stm32: introduce compatible data cfg")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Balance timer start routine that sets ARPE: clear it in stop routine.
This fixes a corner case, when timer is used successively as trigger
(with sampling_frequency start/stop routines), then as a counter
(with preset).
Fixes: 93fbe91b55 ("iio: Add STM32 timer trigger driver")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Currently, setting preset value (ARR) will update directly 'Auto reload
value' only on 1st write access. But then, ARPE is set. This makes
ARR a shadow register. Preset value should be updated upon each
write request: ensure ARPE is 0. This fixes successive writes to
preset attribute.
Fixes: 4adec7da05 ("iio: stm32 trigger: Add quadrature encoder device")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>