Commit f00cdc6df7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).
We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.
So return to the original umbrella approach, but keep away from i_mutex
this time. We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.
This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.
This issue has been tagged with CVE-2014-4171, which will require commit
f00cdc6df7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Korb reported that "repeated mapping of the same file on tmpfs
using remap_file_pages sometimes triggers a BUG at mm/filemap.c:202 when
the process exits".
He bisected the bug to d7c1755179 ("mm: implement ->map_pages for
shmem/tmpfs"), although the bug was actually added by commit
8c6e50b029 ("mm: introduce vm_ops->map_pages()").
The problem is caused by calling do_fault_around for a _non-linear_
fault. In this case pgoff is shifted and might become negative during
calculation.
Faulting around non-linear page-fault makes no sense and breaks the
logic in do_fault_around because pgoff is shifted.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Ingo Korb <ingo.korb@tu-dortmund.de>
Tested-by: Ingo Korb <ingo.korb@tu-dortmund.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Ning Qu <quning@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> [3.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When compiling a SH2A kernel (e.g. se7206_defconfig or rsk7203_defconfig)
using sh4-linux-gcc, linking fails with:
net/built-in.o: In function `__sk_run_filter':
net/core/filter.c:566: undefined reference to `__fpscr_values'
net/core/filter.c:269: undefined reference to `__fpscr_values'
...
net/built-in.o:net/core/filter.c:580: more undefined references to `__fpscr_values' follow
This happens because sh4-linux-gcc doesn't support the "-m2a-nofpu",
which is thus filtered out by "$(call cc-option, ...)".
As compiling using sh4-linux-gcc is useful for compile coverage, also
try passing "-m4-nofpu" (which is presumably filtered out when using a
real sh2a-linux toolchain) to disable the generation of FPU instructions
and references to __fpscr_values[].
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha reported lockdep warning [1] introduced by [2].
It could be fixed by doing disk revalidation out of the init_lock. It's
okay because disk capacity change is protected by init_lock so that
revalidate_disk always sees up-to-date value so there is no race.
[1] https://lkml.org/lkml/2014/7/3/735
[2] zram: revalidate disk after capacity change
Fixes 2e32baea46 ("zram: revalidate disk after capacity change").
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: "Alexander E. Patrakov" <patrakov@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I triggered VM_BUG_ON() in vma_address() when I tried to migrate an
anonymous hugepage with mbind() in the kernel v3.16-rc3. This is
because pgoff's calculation in rmap_walk_anon() fails to consider
compound_order() only to have an incorrect value.
This patch introduces page_to_pgoff(), which gets the page's offset in
PAGE_CACHE_SIZE.
Kirill pointed out that page cache tree should natively handle
hugepages, and in order to make hugetlbfs fit it, page->index of
hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
but page_to_pgoff() contains the point to be fixed in a single function.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 079148b919 ("coredump: factor out the setting of PF_DUMPCORE")
cleaned up the setting of PF_DUMPCORE by removing it from all the
linux_binfmt->core_dump() and moving it to zap_threads().But this ended
up clearing all the previously set flags. This causes issues during
core generation when tsk->flags is checked again (eg. for PF_USED_MATH
to dump floating point registers). Fix this.
Signed-off-by: Silesh C V <svellattu@mvista.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ACPI device enumeration code in Linux assumes that buttons always
are wakeup devices, so it calls acpi_setup_gpe_for_wake() for them
which leads to undesirable side effects. Namely, that function sets
up implicit device wake notification mechanism for a given GPE if
there is no handler method in the ACPI namespace, which from the
ACPICA's perspective means that there always is a way to handle
that GPE if enabled. However, we don't handle wake notify events
for buttons, so if there are no handler methods for their GPEs in
the namespace, enabling a button GPE at run time leads to a GPE
storm in some cases (the GPE triggers, ACPICA carries out the
implicit wake notification for it which isn't handled, so the
GPE triggers again and so on).
To prevent that from happening use acpi_mark_gpe_for_wake()
instead of acpi_setup_gpe_for_wake() for buttons which will cause
ACPICA to only enable button GPEs if there are handler methods for
the in the namespace.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit c49dbfed2bc069d0038ea7e1294409bfde7c2c8c
Some potential callers of acpi_setup_gpe_for_wake may know in advance that
there won't be any notify handlers installed for device wake notifications
from the given GPE (one example is a button GPE in Linux). For these cases,
acpi_mark_gpe_for_wake should be used instead of acpi_setup_gpe_for_wake.
This will set the ACPI_GPE_CAN_WAKE flag for the GPE without trying to
setup implicit wake notification for it (since there's no handler method).
Rafael Wysocki.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
ACPICA commit 23b5a8542283af28c3a3a4e3f81096d6e2569faa
There is no point in enabling a GPE that has no handler or
GPE method. At worst, it can cause GPE floods.
Rafael Wysocki.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Revert half of commit d151f9854f: If isochronous I/O is attempted with
packets larget than 1 kByte, VIA VT6315 rev 01 immediately stops to generate
any interrupts if MSI are used. Fix this by going back to legacy interrupts.
[Thread "Isochronous streaming with VT6315 OHCI",
http://marc.info/?t=139049641500003]
With smaller packets, the loss of IRQs happens too but only very rarely ---
rarely eneough that it was not yet possible for me to determine whether
QUIRK_NO_MSI is an actual fix for this rare variation of this chip bug.
I am keeping QUIRK_CYCLE_TIMER off of VT6315 rev >= 1 because this has been
verified by myself with certainty. On the other hand, I am also keeping
QUIRK_CYCLE_TIMER on for VT6315 rev 0 because I don't know at this time
whether this revision accesses Cycle Timer non-atomically like most of the
other VIA OHCIs are known to do.
Reported-by: Rémy Bruno <remy-fw@remy.trinnov.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
We must mask out the overflow bit as well, otherwise
the wptr will never match the rptr again and the interrupt
handler will loop forever.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
P4 systems with cpuid level < 4 can have SMT, but the cache topology
description available (cpuid2) does not include SMP information.
Now we know that SMT shares all cache levels, and therefore we can
mark all available cache levels as shared.
We do this by setting cpu_llc_id to ->phys_proc_id, since that's
the same for each SMT thread. We can do this unconditional since if
there's no SMT its still true, the one CPU shares cache with only
itself.
This fixes a problem where such CPUs report an incorrect LLC CPU mask.
This in turn fixes a crash in the scheduler where the topology was
build wrong, it assumes the LLC mask to include at least the SMT CPUs.
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140722133514.GM12054@laptop.lan
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Commit 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space" forgot to free conf->data in nfsd4_encode_lockt and before
sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak.
Worse, kfree() can be called on an uninitialized pointer in the case of
a succesful lock (or one that fails for a reason other than a conflict).
(Note that lock->lk_denied.ld_owner.data appears it should be zero here,
until you notice that it's one arm of a union the other arm of which is
written to in the succesful case by the
memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
sizeof(stateid_t));
in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.)
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 8c7424cff6 ""nfsd4: don't try to encode conflicting owner if low on space"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
An object can only have an active gtt mapping if it is currently bound
into the global gtt. Therefore we can simply walk the list of all bound
objects and check the flag upon those for an active gtt mapping.
From commit 48018a57a8
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Fri Dec 13 15:22:31 2013 -0200
drm/i915: release the GTT mmaps when going into D3
Also note that the WARN is inappropriate for this function as GPU
activity is orthogonal to GTT mmap status. Rather it is the caller that
relies upon this condition and so it should assert that the GPU is idle
itself.
References: https://bugs.freedesktop.org/show_bug.cgi?id=80081
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
[danvet: cherry-pick from -next to -fixes.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ZONE_DMA is created to allow 32-bit only devices to access memory in the
absence of an IOMMU. On systems where the memory starts above 4GB, it is
expected that some devices have a DMA offset hardwired to be able to
access the bottom of the memory. Linux currently supports DT bindings
for the DMA offsets but they are not (easily) available early during
boot.
This patch tries to guess a DMA offset and assumes that ZONE_DMA
corresponds to the 32-bit mask above the start of DRAM.
Fixes: 2d5a5612bc (arm64: Limit the CMA buffer to 32-bit if ZONE_DMA)
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Tested-by: Anup Patel <anup.patel@linaro.org>
As there is only CONFIG_ACPI=n processing in the <linux/acpi.h>, it is not
safe to include <acpi/acpi.h> directly for source out of Linux ACPI
subsystems.
This patch adds error messaging to warn developers of such wrong
inclusions.
In order not to be bisected and reverted as a wrong commit, warning
messages are carefully split into a seperate patch other than the wrong
inclusion cleanups.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch removes <acpi/acpi.h> inclusions from <linux/sfi_acpi.h> as
<linux/acpi.h> has already included it for CONFIG_ACPI=n builds.
Cc: Len Brown <lenb@kernel.org>
Cc: sfi-devel@simplefirmware.org
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch moves <acpi/acpi.h> out of CONFIG_ACPI condition so that all
ACPICA prototypes can be seen by the CONFIG_ACPI=n Linux kernel builds.
Note that we can do this because ACPICA has implemented stubs for all
ACPICA prototypes that are currently referenced by the Linux kernel.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The forthcoming patch will make <acpi/acpi.h> to be visible to all kernel
source code. Thus for the architectures that do not support ACPI and
haven't implemented <asm/acenv.h>, we need to make it excluded.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch adds default 64-bit mathematics in aclinux.h using do_div(). As
do_div() can be used for all Linux architectures, this can also be used as
stub macros for ACPICA 64-bit mathematics.
These macros are required by drivers/acpi/utmath.c when ACPI_USE_NATIVE_DIVIDE
is not defined. It is used by ACPICA, so currently this is only meaningful to
CONFIG_ACPI builds. So the kernel will not use these macros unless CONFIG_ACPI
is defined and ACPI_USE_DIVIDE is not defined.
For 64-bit kernels:
In include/acpi/actypes.h, for ACPI_MACHINE_WIDTH=64,
ACPI_USE_NATIVE_DIVIDE will be defined, thus these macros are not used.
In include/acpi/platform/aclinux.h, for __KERNEL__ surrounded code,
ACPI_MACHINE_WIDTH is defined to be BITS_PER_LONG.
So all 64-bit kernels do not use these macros.
For 32-bit kernels:
As mentioned above, these macros will be used when BITS_PER_LONG is 32.
Thus currently the i328 kernels are the only users for these macros.
But they won't use this default implementation provided by this patch,
because in arch/x86/include/asm/acenv.h, there are already overrides
implemented. So these default macros are not used by 32-bit x86 (i386)
kernels.
These macros will only be used by future non x86 32-bit architectures
that try to support ACPI in Linux kernel.
During the period they do not have arch specific implementations of such
macros, we can avoid build errors for them.
And since they can see ACPICA functioning without implementing any arch
specific environment tunings, we can also avoid function errors for
them.
As this implementation is not performance friendly, those architectures
still need to implement real support in the end.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The ACPI_HANDLE() macro evaluates ACPI_COMPANION() internally to
return the handle of the device's ACPI companion, so it is much
more straightforward and efficient to use ACPI_COMPANION()
directly to obtain the device's ACPI companion object instead of
using ACPI_HANDLE() and acpi_bus_get_device() on the returned
handle for the same thing.
Do that in several places in the ACPI PNP core code.
Also use acpi_device_set_power() and acpi_device_power_manageable()
instead of acpi_bus_set_power() and acpi_bus_power_manageable(),
respectively, because the former two are more efficient if the
ACPI device object is already available.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The ACPI_HANDLE() macro evaluates ACPI_COMPANION() internally to
return the handle of the device's ACPI companion, so it is much
more straightforward and efficient to use ACPI_COMPANION()
directly to obtain the device's ACPI companion object instead of
using ACPI_HANDLE() and acpi_bus_get_device() on the returned
handle for the same thing.
Do that in three places in the ACPI device PM code.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Wakeup GPEs are currently only enabled when setting up devices for
remote wakeup at run time. During system-wide transitions they are
enabled by ACPICA at the very last stage of suspend (before asking
the BIOS to take over). Of course, that only works for system
sleep states supported by ACPI, so in particular it doesn't work
for the "freeze" sleep state.
For this reason, modify the ACPI core device PM code to enable wakeup
GPEs for devices when setting them up for wakeup regardless of whether
that is remote wakeup at runtime or system wakeup. That allows the
same device wakeup setup routine to be used for both runtime PM and
system-wide PM and makes it possible to reduce code size quite a bit.
This make ACPI-based PCI Wake-on-LAN work with the "freeze" sleep
state on my venerable Toshiba Portege R500 and should help other
systems too.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since ACPI wakeup GPEs are going to be enabled during system suspend
as well as for runtime wakeup by a subsequent patch and the same
notify handlers will be used in both cases, rework the ACPI device
wakeup notification framework so that the part specific to physical
devices is always run asynchronously from the PM workqueue. This
prevents runtime resume callbacks for those devices from being
run during system suspend and resume which may not be appropriate,
among other things.
Also make ACPI device wakeup notification handling a bit more robust
agaist subsequent removal of ACPI device objects, whould that ever
happen, and create a wakeup source object for each ACPI device
configured for wakeup so that wakeup notifications for those
devices can wake up the system from the "freeze" sleep state.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The PM workqueue is going to be used by ACPI PM notify handlers
regardless of whether or not runtime PM is configured, so move
it out of #ifdef CONFIG_PM_RUNTIME.
Do that in three places in the ACPI device PM code.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
During system suspend mark ACPI buttons (other than the lid) as
"suspended" and if in that state, report wakeup events on button
events, but do not propagate those events up the stack.
This prevents systems from being turned off after a button-triggered
wakeup from the "freeze" sleep state.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77611
Tested-on: Acer Aspire S5, Toshiba Portege R500
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This commit is a supplement to my previous patch.
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-July/079190.html
The special_clk_ctl_put() still returns 0 in error handling case. It should
return -EINVAL.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Here some additional changes to set a capability flag so that clients can
detect when it's appropriate to return -ENOSYS from open.
This amends the following commit introduced in 3.14:
7678ac5061 fuse: support clients that don't implement 'open'
However we can only add the flag to 3.15 and later since there was no
protocol version update in 3.14.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+
This commit is for correction of my misunderstanding about return value of
.put callback in ALSA Control interface.
According to 'Writing ALSA Driver' (*1), return value of the callback has
three patterns; 1: changed, 0: not changed, an negative value: fatal error.
But I misunderstood that it's boolean; zero or nonzero.
*1: Writing an ALSA Driver (2005, Takashi Iwai)
http://www.alsa-project.org/main/index.php/ALSA_Driver_Documentation
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This commit uses different labels for control elements of digital input/output
interfaces to correct my misunderstanding about M-Audio Firewire 1814 and
ProjectMix I/O.
According to user manuals for these two models, they have two modes for
digital input; one is S/PDIF in both of optical and coaxial interfaces,
another is ADAT in optical interface only.
But in current implementation, a control element for it reduced labels which
a control element for digital output uses because of my misunderstanding
that optical interface is not available for digital input with S/PDIF mode.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In error handling case, special_clk_ctl_put() returns without unlock_mutex(),
therefore the mutex is still locked. This commit moves mutex_lock() after
the error handling case.
This commit is my solution for this post.
[PATCH -next] ALSA: bebob: Fix missing unlock on error in special_clk_ctl_put()
https://lkml.org/lkml/2014/7/20/12
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
code, resulting in syscall() not returning proper errors for undefined
syscalls on CPUs supporting the sysenter feature.
The following code:
> int result = syscall(666);
> printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
results in:
> result=666 errno=0 error=Success
Obviously, the syscall return value is the called syscall number, but it
should have been an ENOSYS error. When run under ptrace it behaves
correctly, which makes it hard to debug in the wild:
> result=-1 errno=38 error=Function not implemented
The %eax register is the return value register. For debugging via ptrace
the syscall entry code stores the complete register context on the
stack. The badsys handlers only store the ENOSYS error code in the
ptrace register set and do not set %eax like a regular syscall handler
would. The old resume_userspace call chain contains code that clobbers
%eax and it restores %eax from the ptrace registers afterwards. The same
goes for the ptrace-enabled call chain. When ptrace is not used, the
syscall return value is the passed-in syscall number from the untouched
%eax register.
Use %eax as the return value register in syscall_badsys and
sysenter_badsys, like a real syscall handler does, and have the caller
push the value onto the stack for ptrace access.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.net
Reviewed-and-tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: <stable@vger.kernel.org> # If 554086d is backported
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
x86_64 boots and displays fine, but booting x86_32 with CONFIG_HIGHMEM
has frozen with a blank screen throughout 3.16-rc on this ThinkPad T420s,
with i915 generation 6 graphics.
Fix 9d0a6fa6c5 ("drm/i915: add render state initialization"): kunmap()
takes struct page * argument, not virtual address. Which the compiler
kindly points out, if you use the appropriate u32 *batch, instead of
silencing it with a void *.
Why did bisection lead decisively to nearby 229b0489aa ("drm/i915:
add null render states for gen6, gen7 and gen8")? Because the u32
deposited at that virtual address by the previous stub failed the
PageHighMem test, and so did no harm.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
memmove may be called from module code copy_pages(btrfs), and it may
call memcpy, which may call back to C code, so it needs to use
_GLOBAL_TOC to set up r2 correctly.
This fixes following error when I tried to boot an le guest:
Vector: 300 (Data Access) at [c000000073f97210]
pc: c000000000015004: enable_kernel_altivec+0x24/0x80
lr: c000000000058fbc: enter_vmx_copy+0x3c/0x60
sp: c000000073f97490
msr: 8000000002009033
dar: d000000001d50170
dsisr: 40000000
current = 0xc0000000734c0000
paca = 0xc00000000fff0000 softe: 0 irq_happened: 0x01
pid = 815, comm = mktemp
enter ? for help
[c000000073f974f0] c000000000058fbc enter_vmx_copy+0x3c/0x60
[c000000073f97510] c000000000057d34 memcpy_power7+0x274/0x840
[c000000073f97610] d000000001c3179c copy_pages+0xfc/0x110 [btrfs]
[c000000073f97660] d000000001c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs]
[c000000073f97700] d000000001be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs]
[c000000073f97820] d000000001be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs]
[c000000073f97890] d000000001bfed30 insert_with_overflow+0x70/0x180 [btrfs]
[c000000073f97900] d000000001bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs]
[c000000073f979a0] d000000001c1f92c btrfs_add_link+0x10c/0x370 [btrfs]
[c000000073f97a40] d000000001c20e94 btrfs_create+0x204/0x270 [btrfs]
[c000000073f97b00] c00000000026d438 vfs_create+0x178/0x210
[c000000073f97b50] c000000000270a70 do_last+0x9f0/0xe90
[c000000073f97c20] c000000000271010 path_openat+0x100/0x810
[c000000073f97ce0] c000000000272ea8 do_filp_open+0x58/0xd0
[c000000073f97dc0] c00000000025ade8 do_sys_open+0x1b8/0x300
[c000000073f97e30] c00000000000a008 syscall_exit+0x0/0x7c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 75b57ecf9 refactored device tree nodes to use kobjects such that they
can be exposed via /sysfs. A secondary commit 0829f6d1f furthered this rework
by moving the kobect initialization logic out of of_node_add into its own
of_node_init function. The inital commit removed the existing kref_init calls
in the pseries dlpar code with the assumption kobject initialization would
occur in of_node_add. The second commit had the side effect of triggering a
BUG_ON during DLPAR, migration and suspend/resume operations as a result of
dynamically added nodes being uninitialized.
This patch fixes this by adding of_node_init calls in place of the previously
removed kref_init calls.
Fixes: 0829f6d1f6 ("of: device_node kobject lifecycle fixes")
Cc: stable@vger.kernel.org
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We now support TASK_SIZE of 16TB, hence the array should be 8.
Fixes the below crash:
Unable to handle kernel paging request for data at address 0x000100bd
Faulting instruction address: 0xc00000000004f914
cpu 0x13: Vector: 300 (Data Access) at [c000000fea75fa90]
pc: c00000000004f914: .sys_subpage_prot+0x2d4/0x5c0
lr: c00000000004fb5c: .sys_subpage_prot+0x51c/0x5c0
sp: c000000fea75fd10
msr: 9000000000009032
dar: 100bd
dsisr: 40000000
current = 0xc000000fea6ae490
paca = 0xc00000000fb8ab00 softe: 0 irq_happened: 0x00
pid = 8237, comm = a.out
enter ? for help
[c000000fea75fe30] c00000000000a164 syscall_exit+0x0/0x98
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This fixes some bugs in emulate_step(). First, the setting of the carry
bit for the arithmetic right-shift instructions was not correct on 64-bit
machines because we were masking with a mask of type int rather than
unsigned long. Secondly, the sld (shift left doubleword) instruction was
using the wrong instruction field for the register containing the shift
count.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These processors do not currently support doorbell IPIs, so remove them
from the feature list if we are at DD 1.xx for the 0x004d part.
This fixes a regression caused by d4e58e5928 (powerpc/powernv: Enable
POWER8 doorbell IPIs). With that patch the kernel would hang at boot
when calling smp_call_function_many, as the doorbell would not be
received by the target CPUs:
.smp_call_function_many+0x2bc/0x3c0 (unreliable)
.on_each_cpu_mask+0x30/0x100
.cpuidle_register_driver+0x158/0x1a0
.cpuidle_register+0x2c/0x110
.powernv_processor_idle_init+0x23c/0x2c0
.do_one_initcall+0xd4/0x260
.kernel_init_freeable+0x25c/0x33c
.kernel_init+0x1c/0x120
.ret_from_kernel_thread+0x58/0x7c
Fixes: d4e58e5928 (powerpc/powernv: Enable POWER8 doorbell IPIs)
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull networking fixes from David Miller:
1) Null termination fix in dns_resolver got the pointer dereferncing
wrong, fix from Ben Hutchings.
2) ip_options_compile() has a benign but real buffer overflow when
parsing options. From Eric Dumazet.
3) Table updates can crash in netfilter's nftables if none of the state
flags indicate an actual change, from Pablo Neira Ayuso.
4) Fix race in nf_tables dumping, also from Pablo.
5) GRE-GRO support broke the forwarding path because the segmentation
state was not fully initialized in these paths, from Jerry Chu.
6) sunvnet driver leaks objects and potentially crashes on module
unload, from Sowmini Varadhan.
7) We can accidently generate the same handle for several u32
classifier filters, fix from Cong Wang.
8) Several edge case bug fixes in fragment handling in xen-netback,
from Zoltan Kiss.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
ipv4: fix buffer overflow in ip_options_compile()
batman-adv: fix TT VLAN inconsistency on VLAN re-add
batman-adv: drop QinQ claim frames in bridge loop avoidance
dns_resolver: Null-terminate the right string
xen-netback: Fix pointer incrementation to avoid incorrect logging
xen-netback: Fix releasing header slot on error path
xen-netback: Fix releasing frag_list skbs in error path
xen-netback: Fix handling frag_list on grant op error path
net_sched: avoid generating same handle for u32 filters
net: huawei_cdc_ncm: add "subclass 3" devices
net: qmi_wwan: add two Sierra Wireless/Netgear devices
wan/x25_asy: integer overflow in x25_asy_change_mtu()
net: ppp: fix creating PPP pass and active filters
net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's
sunvnet: clean up objects created in vnet_new() on vnet_exit()
r8169: Enable RX_MULTI_EN for RTL_GIGA_MAC_VER_40
net-gre-gro: Fix a bug that breaks the forwarding path
netfilter: nf_tables: 64bit stats need some extra synchronization
netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale
netfilter: nf_tables: safe RCU iteration on list when dumping
...
Pull sparc fix from David Miller:
"Need to hook up the new renameat2 system call"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Hook up renameat2 syscall.
Pull IDE fixes from David Miller:
- fix interrupt registry for some Atari IDE chipsets.
- adjust Kconfig dependencies for x86_32 specific chips.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
ide: Fix SC1200 dependencies
ide: Fix CS5520 and CS5530 dependencies
m68k/atari - ide: do not register interrupt if host->get_lock is set
as a counter was converted to nanoseconds (silly), and after 1 hour
11 minutes and 34 seconds, this monotonic clock would wrap, causing
havoc with the tracing system and making the clock useless.
He converted that clock to use jiffies_64 and made it into a counter
instead of nanosecond conversions, and displayed the clock with the
straight jiffy count, which works much better than it did in the past.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTzcD+AAoJEKQekfcNnQGuy4UH/2IaAlJov97GTlYcpA6CCOfQ
S6KYN93V/wbJpoRhcVIyk21ugolPCPCA+W8QEHU/yIzTDuy4VzkiYDfEZSNWL9bF
36dmYyNQdeFkfhVK0sHW7/OvF/YcbMsd70N1+NuwFu0m/sDFKlPWiGe8F0GDyRQb
mKDBiAVAAhDtMWycff3iUgA7eJffejf6Hs6Ve9UdxQ4FxvDaS9ISCRzzWkEktlnw
RPHvZZRCd+TvtugjGdfusHXhnKSdZSkt5c0R0DyyTebW1Wgrq9dJGXxuth+hdoww
9iQ6o2YhhoxIo49BwxkTJMsLsS4jC+2KmMEepQ3H7BjTUYg5tWMd9kWAAtxKuDI=
=3sF+
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull trace fix from Steven Rostedt:
"Tony Luck found that using the "uptime" trace clock that uses jiffies
as a counter was converted to nanoseconds (silly), and after 1 hour 11
minutes and 34 seconds, this monotonic clock would wrap, causing havoc
with the tracing system and making the clock useless.
He converted that clock to use jiffies_64 and made it into a counter
instead of nanosecond conversions, and displayed the clock with the
straight jiffy count, which works much better than it did in the past"
* tag 'trace-fixes-v3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix wraparound problems in "uptime" trace clock
- recognise and drop Bridge Loop Avoidance packets even if
they are encapsulated in the 802.1q header multiple times.
Forwarding them into the mesh creates issues on other
nodes.
- properly handle VLAN private objects in order to avoid race
conditions upon fast VLAN deletion-addition. Such conditions
create an unrecoverable inconsistency in the TT database of
the nodes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJTzMYfAAoJEJgn97Bh2u9eKLIP+wWwqvRe5hFleA7Xd7vHS769
20TrhDPZrQAcaK8dg8/VpqUZ4oGAi0WHhbhAdur1Vj3Ie5DDsqqu45lK9a/o+PAe
avWafxcPcK5LLoLbDKNxX98n6BN3aNFIp7rUy4CDO7Beix/PfQUYGbZ01IEueNlX
tvKz1oO7r3SvWFELltSU7bndU+0NoZRon5qXSaxnlYHMXcsJEJAKRPE9eLdwXUaF
9h0oIKkPVQt8YFn0w1zZRePSPWGQSAb20exgRGwPxI23xs7ui1i+s5Od9aSt8FcR
e6eNuMDsuHVeAmW+nsxF3WAyYGIGyaTb9sSkwrToXZge7BRFRfphKN1WHD1bp6A5
a0Lu3wkzCJbrS3LZkjt99jh+0XAaaoWkAt4Lu4+VUcMYtfITHHHN4kfmzoPE7Z8y
Qq64KL/ry6v2lqGk2+9G5/oHXMAYAyed+TPk/HSn5O0CS+zXxXFvrvbYyQyFg99X
BcuOD6dGLbfaPQh9XuCE9jJ2D5QHnkAXj2FlK5oFd7y6ASdLltratTYNKJ4T7cVR
+cyBkZ6cI3Ehzq1jrR8/9qqAal+a/jdzne6J7DPnWksDWxnTylANuWecVkETkpcL
mUp6Zv9SYISqQSPtrbE7xu1XW/ICoajc+6H0eEOFhKU+JEqKjxwSE2QoKvzxeC8Y
OHIbq99fItGwH7Vuldkg
=RdJM
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
pull request [net]: batman-adv 20140721
here you have two fixes that we have been testing for quite some time
(this is why they arrived a bit late in the rc cycle).
Patch 1) ensures that BLA packets get dropped and not forwarded to the
mesh even if they reach batman-adv within QinQ frames. Forwarding them
into the mesh means messing up with the TT database of other nodes which
can generate all kind of unexpected behaviours during route computation.
Patch 2) avoids a couple of race conditions triggered upon fast VLAN
deletion-addition. Such race conditions are pretty dangerous because
they not only create inconsistencies in the TT database of the nodes
in the network, but such scenario is also unrecoverable (unless
nodes are rebooted).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>