Pull x86 fixes from Peter Anvin:
"Another set of fixes, the biggest bit of this is yet another tweak to
the UEFI anti-bricking code; apparently we finally got some feedback
from Samsung as to what makes at least their systems fail. This set
should actually fix the boot regressions that some other systems (e.g.
SGI) have exhibited.
Other than that, there is a patch to avoid a panic with particularly
unhappy memory layouts and two minor protocol fixes which may or may
not be manifest bugs"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix typo in kexec register clearing
x86, relocs: Move __vvar_page from S_ABS to S_REL
Modify UEFI anti-bricking code
x86: Fix adjust_range_size_mask calling position
few users were reporting boot regressions in v3.9. This has now been
fixed with a more accurate "minimum storage requirement to avoid
bricking" value from Samsung (5K instead of 50%) and code to trigger
garbage collection when we near our limit - Matthew Garrett.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRtkY2AAoJEC84WcCNIz1VJOsP/00xwiY4VKh2RfqNkYKSl/w5
gEshIHFEAXHX5X8C4ReocZVywvdjTgbJoKBbBy3FePYRzLddrmavvjen17hk7BzS
/cO8/eXForkNWCGR1kLagA6HLpgKP5DPayKizoMb4Mg6muzfT1SCcN6Pzh8cDMWe
btcq/l9JZejXdJ4Wfoq1My+WdXs19OT/BNeD3y65K4x29vNUjop6oaIdDJWLlH/S
aeLHh8d4xbSHNWzK1fBP7CnFTYU27xxs1BFNAReU6McxeQCYZAIaRovYnjTZEvfJ
twd2tLrOn9HBVTbWa8T4XGNSr+QcT4XGMadLvdwuqltmKDfH6Onm8aWQM3IqA7gy
Qimbcv2B7HrITgXWTzp3DPkXF1LA8/8QHSBXVMUU9Rl6QOLy18vIdKiQy3M1Ng9Z
0q+Ow93JtnL11zf9wLDMdKaKcA9HOxbG/wRTK6XO4vGaWj9brFv3n5Ib7OreHH6D
GP58zDEnThFuj97K/NKREBZZFcFOMZpKk5MAipVkzltihUQmNeTF/dAtBJ3Ncu/A
PqQE6uuKVXjASJR8Gy0bI3WHtSTZK4L/sg9c2MF3bdJa9BswN+m8IEbls+S+iFOx
+sYPQx7Zw6SFENxDw8cDYNzC14yfr60qyOxTWfkHH7l/FnvhOgwHzqPsLcXx0ouR
C6k1yPYSTgiqFdWC2sjn
=TZuM
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' into x86/urgent
* More tweaking to the EFI variable anti-bricking algorithm. Quite a
few users were reporting boot regressions in v3.9. This has now been
fixed with a more accurate "minimum storage requirement to avoid
bricking" value from Samsung (5K instead of 50%) and code to trigger
garbage collection when we near our limit - Matthew Garrett.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Fixes a typo in register clearing code. Thanks to PaX Team for fixing
this originally, and James Troup for pointing it out.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net
Cc: <stable@vger.kernel.org> v2.6.30+
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The __vvar_page relocation should actually be listed in S_REL instead
of S_ABS. Oddly, this didn't always cause things to break, presumably
because there are no users for relocation information on 64 bits yet.
[ hpa: Not for stable - new code in 3.10 ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130611185652.GA23674@www.outflux.net
Reported-by: Michael Davidson <md@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch reworks the UEFI anti-bricking code, including an effective
reversion of cc5a080c and 31ff2f20. It turns out that calling
QueryVariableInfo() from boot services results in some firmware
implementations jumping to physical addresses even after entering virtual
mode, so until we have 1:1 mappings for UEFI runtime space this isn't
going to work so well.
Reverting these gets us back to the situation where we'd refuse to create
variables on some systems because they classify deleted variables as "used"
until the firmware triggers a garbage collection run, which they won't do
until they reach a lower threshold. This results in it being impossible to
install a bootloader, which is unhelpful.
Feedback from Samsung indicates that the firmware doesn't need more than
5KB of storage space for its own purposes, so that seems like a reasonable
threshold. However, there's still no guarantee that a platform will attempt
garbage collection merely because it drops below this threshold. It seems
that this is often only triggered if an attempt to write generates a
genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to
create a variable larger than the remaining space. This should fail, but if
it somehow succeeds we can then immediately delete it.
I've tested this on the UEFI machines I have available, but I don't have
a Samsung and so can't verify that it avoids the bricking problem.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Lee, Chun-Y <jlee@suse.com> [ dummy variable cleanup ]
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
- xen/tmem stopped working after a certain combination of modprobe/swapon was used
- cpu online/offlining would trigger WARN_ON.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRtgQxAAoJEFjIrFwIi8fJ9QYH/ibEGBlKLfGzN4Apx6evBA68
l9CuLIpGCkPOiJLgVs10zY77Sg3f95bHFkJrwrEIDeTQ42joNKybsIp+qZCIS5CW
sAnzSd6Mb8jqQYJpgLl03+z3GdFzILTJ9e0zYTETbW2vfLpAETm87XWH+gjxDK0e
9I0kZX+Q3+2VWi5xv+UwWkYIOLbggLyKYajTHDwWNuC7vQgkJulCAbmjgN/NBv7A
xacvXdbEClIfZ5tvJJN0RggdEWo6WyTxyExfLPmpXFbHEvWDUX5LPEf8MhFY1ORK
U1C0BSV6YuLo350G5lY4I0R75ZEWLeUyVkZFnJIeGnUF1rP6OXEuVWTyeivPDBA=
=Fy36
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull xen fixes from Konrad Rzeszutek Wilk:
"Two bug-fixes for regressions:
- xen/tmem stopped working after a certain combination of
modprobe/swapon was used
- cpu online/offlining would trigger WARN_ON."
* tag 'stable/for-linus-3.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/tmem: Don't over-write tmem_frontswap_poolid after tmem_frontswap_init set it.
xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU.
PCI ROM from EFI
x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRsMcBAAoJEFmIoMA60/r8EGMP/RiVo0oHoM1DlNfFonw8BjTl
iClotD3rfQRgjDx8/563/KfO464mLJLbWdjanbpmWB215Xu4LJwv3wfWWEB1f3C+
hbsz4aFLmoQFnCzg1BIsuNNu88cXymf4A+osfKl+pCxPY+zChkqNZUPBk0m7aW4/
Y8JZz0NkxDUUb6ewtDhRDci9d5dHAzL/bmfcdUrP5X56znV72eVH555vY33hzNQ+
opoB9bFOdjxq8EDMlSaGxHmClUH/1VJd8Gr9Q4oj8OUOMq7At/WbHbFYJPbn/09+
xdh+78oGKZ6vI1rfIwlGSbJaizZJoE8l3EVEeY1rSgxF2zKNRAyN9IZOPaTigsjd
5zHTwUp+itxstw3fxjeG0R+Gl8guf13XTWzGcR6sC1TTn/NXSxqI3inORs5PWmpP
ldfPkOW5Y3pcP6UfmJcAn1z9W8toyCouovFPnSoit6ZzS/+aFY2vYfHBQZ8LdJHv
Gq4sYmcxWTzfgBxybc6OczztXgTQ+grhs3nnP29/QrGDyJ6RA6E6ENKiqARkALCF
1qFeOdWcoG4HaAlQhMRwuJTsUw7ofJWmYi6AdbHgI5pBQM4lLR1Kp+hGJnmeTBHm
Svx98dv525HZ2tVUZwsd+ExsEfQ4IlPECVmOpxFkUf4sRILhvCTP5C6KWHBev7sG
+F4rwomkmlS1hu82xMwl
=dMNk
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.10-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"This fixes a crash when booting a 32-bit kernel via the EFI boot stub.
PCI ROM from EFI
x86/PCI: Map PCI setup data with ioremap() so it can be in highmem"
* tag 'pci-v3.10-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
f9a37be0f0 ("x86: Use PCI setup data") added support for using PCI ROM
images from setup_data. This used phys_to_virt(), which is not valid for
highmem addresses, and can cause a crash when booting a 32-bit kernel via
the EFI boot stub.
pcibios_add_device() assumes that the physical addresses stored in
setup_data are accessible via the direct kernel mapping, and that calling
phys_to_virt() is valid. This isn't guaranteed to be true on x86 where the
direct mapping range is much smaller than on x86-64.
Calling phys_to_virt() on a highmem address results in the following:
BUG: unable to handle kernel paging request at 39a3c198
IP: [<c262be0f>] pcibios_add_device+0x2f/0x90
...
Call Trace:
[<c2370c73>] pci_device_add+0xe3/0x130
[<c274640b>] pci_scan_single_device+0x8b/0xb0
[<c2370d08>] pci_scan_slot+0x48/0x100
[<c2371904>] pci_scan_child_bus+0x24/0xc0
[<c262a7b0>] pci_acpi_scan_root+0x2c0/0x490
[<c23b7203>] acpi_pci_root_add+0x312/0x42f
...
The solution is to use ioremap() instead of phys_to_virt() to map the
setup data into the kernel address space.
[bhelgaas: changelog]
Tested-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: stable@vger.kernel.org # v3.8+
Pull kvm bugfixes from Gleb Natapov:
"The bulk of the fixes is in MIPS KVM kernel<->userspace ABI. MIPS KVM
is new for 3.10 and some problems were found with current ABI. It is
better to fix them now and do not have a kernel with broken one"
* 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Fix race in apic->pending_events processing
KVM: fix sil/dil/bpl/spl in the mod/rm fields
KVM: Emulate multibyte NOP
ARM: KVM: be more thorough when invalidating TLBs
ARM: KVM: prevent NULL pointer dereferences with KVM VCPU ioctl
mips/kvm: Use ENOIOCTLCMD to indicate unimplemented ioctls.
mips/kvm: Fix ABI by moving manipulation of CP0 registers to KVM_{G,S}ET_ONE_REG
mips/kvm: Use ARRAY_SIZE() instead of hardcoded constants in kvm_arch_vcpu_ioctl_{s,g}et_regs
mips/kvm: Fix name of gpr field in struct kvm_regs.
mips/kvm: Fix ABI for use of 64-bit registers.
mips/kvm: Fix ABI for use of FPU.
The xen_play_dead is an undead function. When the vCPU is told to
offline it ends up calling xen_play_dead wherin it calls the
VCPUOP_down hypercall which offlines the vCPU. However, when the
vCPU is onlined back, it resumes execution right after
VCPUOP_down hypercall.
That was OK (albeit the API for play_dead assumes that the CPU
stays dead and never returns) but with commit 4b0c0f294
(tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe
as said commit resets the ts->inidle which at the start of the
cpu_idle loop was set.
The net effect is that we get this warn:
Broke affinity for irq 16
installing Xen timer for CPU 1
cpu 1 spinlock event irq 48
------------[ cut here ]------------
WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0()
Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33 #1
Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009
ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0
ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010
0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0
Call Trace:
[<ffffffff816707c8>] dump_stack+0x19/0x1b
[<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0
[<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20
[<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0
[<ffffffff810da755>] cpu_startup_entry+0x205/0x250
[<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15
---[ end trace 915c8c486004dda1 ]---
b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround
to call tick_nohz_idle_enter before returning from xen_play_dead() - and
that is what this patch does and fixes the issue.
We also add the stable part b/c git commit 4b0c0f294 is on the stable
tree.
CC: stable@vger.kernel.org
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
apic->pending_events processing has a race that may cause INIT and
SIPI
processing to be reordered:
vpu0: vcpu1:
set INIT
test_and_clear_bit(KVM_APIC_INIT)
process INIT
set INIT
set SIPI
test_and_clear_bit(KVM_APIC_SIPI)
process SIPI
At the end INIT is left pending in pending_events. The following patch
fixes this by latching pending event before processing them.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
The x86-64 extended low-byte registers were fetched correctly from reg,
but not from mod/rm.
This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
not enough.
Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
This is encountered when booting RHEL5.9 64-bit. There is another bug
after this one that is not a simple emulation failure, but this one lets
the boot proceed a bit.
Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Commit
8d57470d x86, mm: setup page table in top-down
causes a kernel panic while setting mem=2G.
[mem 0x00000000-0x000fffff] page 4k
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
[mem 0x00100000-0x001fffff] page 4k
[mem 0x00200000-0x7bffffff] page 2M
for last entry is not what we want, we should have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 1G
Actually we merge the continuous ranges with same page size too early.
in this case, before merging we have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 2M
after merging them, will get
[mem 0x00200000-0x7bffffff] page 2M
even we can use 1G page to map
[mem 0x40000000-0x7bffffff]
that will cause problem, because we already map
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already.
During phys_pud_init() for [0x40000000-0x7bffffff], it will not
reuse existing that pud page, and allocate new one then try to use
2M page to map it instead, as page_size_mask does not include
PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop
in phys_pmd_init stop mapping at 0x7bffffff.
That is right behavoir, it maps exact range with exact page size that
we ask, and we should explicitly call it to map [7c000000-0x7fffffff]
before or after mapping 0x40000000-0x7bffffff.
Anyway we need to make sure ranges' page_size_mask correct and consistent
after split_mem_range for each range.
Fix that by calling adjust_range_size_mask before merging range
with same page size.
-v2: update change log.
-v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and
it causes panic.
Bisected-by: "Xie, ChanglongX" <changlongx.xie@intel.com>
Bisected-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-and-tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1370015587-20835-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull x86 fixes from Peter Anvin:
- Three EFI-related fixes
- Two early memory initialization fixes
- build fix for older binutils
- fix for an eager FPU performance regression -- currently we don't
allow the use of the FPU at interrupt time *at all* in eager mode,
which is clearly wrong.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Allow FPU to be used at interrupt time even with eagerfpu
x86, crc32-pclmul: Fix build with older binutils
x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
x86, range: fix missing merge during add range
x86, efi: initial the local variable of DataSize to zero
efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
efivarfs: Never return ENOENT from firmware again
With the addition of eagerfpu the irq_fpu_usable() now returns false
negatives especially in the case of ksoftirqd and interrupted idle task,
two common cases for FPU use for example in networking/crypto. With
eagerfpu=off FPU use is possible in those contexts. This is because of
the eagerfpu check in interrupted_kernel_fpu_idle():
...
* For now, with eagerfpu we will return interrupted kernel FPU
* state as not-idle. TBD: Ideally we can change the return value
* to something like __thread_has_fpu(current). But we need to
* be careful of doing __thread_clear_has_fpu() before saving
* the FPU etc for supporting nested uses etc. For now, take
* the simple route!
...
if (use_eager_fpu())
return 0;
As eagerfpu is automatically "on" on those CPUs that also have the
features like AES-NI this patch changes the eagerfpu check to return 1 in
case the kernel_fpu_begin() has not been said yet. Once it has been the
__thread_has_fpu() will start returning 0.
Notice that with eagerfpu the __thread_has_fpu is always true initially.
FPU use is thus always possible no matter what task is under us, unless
the state has already been saved with kernel_fpu_begin().
[ hpa: this is a performance regression, not a correctness regression,
but since it can be quite serious on CPUs which need encryption at
interrupt time I am marking this for urgent/stable. ]
Signed-off-by: Pekka Riikonen <priikone@iki.fi>
Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org
Cc: <stable@vger.kernel.org> v3.7+
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
binutils prior to 2.18 (e.g. the ones found on SLE10) don't support
assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ
in the same file should be used.
This requires making the helper macros capable of recognizing 32-bit
general purpose register operands.
[ hpa: tagging for stable as it is a low risk build fix ]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com
Cc: Alexander Boyko <alexander_boyko@xyratex.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
- Use proper error paths
- Clean up APIC IPI usage (incorrect arguments)
- Delay XenBus frontend resume is backend (xenstored) is not running
- Fix build error with various combinations of CONFIG_
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRp59pAAoJEFjIrFwIi8fJRIYIAKvRh2Dp/AB44ZN97MW/QhEN
NUvrSTYr2HlqcUW7bv0ScrMLb0LlFeo+9s/bo0KI2+2F+zK822WPC+2KEZmzQIVs
q261dNsA3/HoyBDOLwWjatjsSus+njBOEgDIwARPwhkoon4fRXBnRJVMy+0bZC3I
fpd1nlUy0J7jW0QLO5ueKqd5ZN0Mkwn2H4+D8TOPVYHCnk3mT2W+qLCEJmkMxOuZ
iFYy95K1ky5r0leUUwCTUIGLmgftoh0Qo/RweXSmzuLiZrY+5ilike3gxQSiAjsM
lIjq+gKXNJJGz4M6wbOTfDzb/WQnKD+2PqlsbulrTD7E6RD6wIsqG/zvc1RqHqw=
=9gi8
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen fixes from Konrad Rzeszutek Wilk:
- Use proper error paths
- Clean up APIC IPI usage (incorrect arguments)
- Delay XenBus frontend resume is backend (xenstored) is not running
- Fix build error with various combinations of CONFIG_
* tag 'stable/for-linus-3.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xenbus_client.c: correct exit path for xenbus_map_ring_valloc_hvm
xen-pciback: more uses of cached MSI-X capability offset
xen: Clean up apic ipi interface
xenbus: save xenstore local status for later use
xenbus: delay xenbus frontend resume if xenstored is not running
xmem/tmem: fix 'undefined variable' build error.
Commit f447d56d36 introduced the
implementation of the PV apic ipi interface. But there were some
odd things (it seems none of which cause really any issue but
maybe they should be cleaned up anyway):
- xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself)
ignore the passed in vector and only use the CALL_FUNCTION_SINGLE
vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector.
- physflat_send_IPI_allbutself is declared unnecessarily. It is never
used.
This patch tries to clean up those things.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In head_64.S, a switchover has been used to handle kernel crossing
1G, 512G boundaries.
And commit 8170e6bed4
x86, 64bit: Use a #PF handler to materialize early mappings on demand
said:
During the switchover in head_64.S, before #PF handler is available,
we use three pages to handle kernel crossing 1G, 512G boundaries with
sharing page by playing games with page aliasing: the same page is
mapped twice in the higher-level tables with appropriate wraparound.
But from the switchover code, when we set up the PUD table:
114 addq $4096, %rdx
115 movq %rdi, %rax
116 shrq $PUD_SHIFT, %rax
117 andl $(PTRS_PER_PUD-1), %eax
118 movq %rdx, (4096+0)(%rbx,%rax,8)
119 movq %rdx, (4096+8)(%rbx,%rax,8)
It seems line 119 has a potential bug there. For example,
if the kernel is loaded at physical address 511G+1008M, that is
000000000 111111111 111111000 000000000000000000000
and the kernel _end is 512G+2M, that is
000000001 000000000 000000001 000000000000000000000
So in this example, when using the 2nd page to setup PUD (line 114~119),
rax is 511.
In line 118, we put rdx which is the address of the PMD page (the 3rd page)
into entry 511 of the PUD table. But in line 119, the entry we calculate from
(4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line
119 should be wraparound into entry 0 of the PUD table.
The patch fixes the bug.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.com
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull crypto fixes from Herbert Xu:
"This push fixes a crash in the new sha256_ssse3 driver as well as a
DMA setup/teardown bug in caam"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations
crypto: caam - fix inconsistent assoc dma mapping direction
The _XFER stack element size was set too small, 8 bytes, when it needs to be
16 bytes. As _XFER is the last stack element used by these implementations,
the 16 byte stores with 'movdqa' corrupt the stack where the value of register
%r12 is temporarily stored. As these implementations align the stack pointer
to 16 bytes, this corruption did not happen every time.
Patch corrects this issue.
Reported-by: Julian Wollrath <jwollrath@web.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Tested-by: Julian Wollrath <jwollrath@web.de>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Moorestown
Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0"
Hotplug
PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRnnUuAAoJEFmIoMA60/r8+JwQALrHdQvA8rGl/TNF0xkJjsU+
EfdkBj23+qYYsYNla7Z/CfY7k5gXTtymqauhzpa1VRoY/bkpSWMDBxK551CxgBae
RW0JbZxfoswLlA9eVRUz6mlUl7hL0Ky0/E6d9/UBXSAC0wsN6BB11lqKgi/uxAr2
ACuZBZW1EZP9zoD615Id5gtRAZe8+9w47p8d6kQcyJntmCoZwo8Na/HRE/tR+wVY
o3AHMxMOr9Ac+G3/rFl5ffzoHkLwTws4UEVNhvKBek3ogsAau3tgeIgjjvFRToFh
HxsdpCmgOAfjCF/RC0AxRugtykHw4n3WlaND2WM6JWolLeIQTZr41GlwYy4F1CRG
gSRk0dMNGBYs9OgDC81KJGYqstHwL0MTig2ul4BvlrLanLnySgVu1kbhaerR3WzA
11k0GwlJbkFMCbQLZDaq51KHAdDYnCsRVG/tOGmsM8puUPOTXu3pLbPk3SbgWVGc
7vbUkjJlHwlNoUbeyOOOElUMCRnIiCl/IetwjgZb+Qibq2dO4v6WF5GR+2DPBwua
8GNbzQXyNxOPzvYBHE4YgK5lcb7yVeLvlOtPgbM0UMqC1SnFISzJRiNpfbj21rAk
ADT4lqL29vqkX37uIQ55KPeTTThA65EKUoI3+dNtGHLSPoKdt3QlJc9wwQNKxEzx
EEHxGRkebYNcPB1GqwEl
=M8/H
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Here are some more fixes for v3.10. The Moorestown update broke Intel
Medfield devices, so I reverted it. The acpiphp change fixes a
regression: we broke hotplug notifications to host bridges when we
split acpiphp into the host-bridge related part and the
endpoint-related part.
Moorestown
Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0"
Hotplug
PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check"
* tag 'pci-v3.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0"
PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check
efivarfs if an EFI variable gets deleted from under us and return EOF
when reading from a zero-length file - Lingzhu Xiang
* Fix an oops in efivar_update_sysfs_entries() caused by reusing (and
therefore corrupting) a kzalloc() allocation - Seiji Aguchi
* Initialise the DataSize argument to GetVariable() otherwise it will
not be updated with the actual size of the variable on return.
Discovered on a Acer Aspire V3 BIOS - Lee, Chun-Yi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRlgJ5AAoJEC84WcCNIz1VcRgQAJWnQpuI+Y1RjPYuLb8ISZ92
jKnVSAQPaWDccf/KKx7qtk2iK6G1mExomOggp6PpV4/uHWqkQ64qoGL0HmRbmdH7
//gNXm+ITnSVojoTBBis2LYEYDUQ1aDp3vTevFT2OOMKzKRsgwFD5KbqLnPJCxnl
kFHlxQSPa7ZYk6GXOVdBJ5ya7spU+Lk4ODiL4xM8EBkMBedUStedB5zNg5rEMLJq
5FZfs2Ryg7riOkhBnKWCvFjAj3PO/yHLVdrVrGZx+KpATHr/rbsbzDEp3+Pk8zEu
3aE1p/XLaB4Qk9WvwhWonWF7PSZP41SjrcEt77mTmYivk0cJHBORo/wJsom683BU
4elWR588Pi92nKEx4MSLREhkzP5RdVFIb4Yrg+1xKLEfI5Kfj1nEQrFVk53VM/rL
L1m1bwhKLaNUVFJmxJYHaZ6VTjcB6lupz9/AMQr1N1UAavlmr4xEveNTfto+Ys7u
/J3Z04wouhOGQonVDrZxxOzWFewq1kR/hRfwBAf/V0qemAeS61ZHGpfmmq00y5L2
jmXncQ5EspFIJfk7+KV2sPQfqC8U8b2jyfO36Ed9u3d/qQ6eiQ1rTZMLKGg1iSdW
Z+94w7+huEdsNZShSUE8+HXNvzPCS6704PBa9YT292ZibWM9onk6AoBQy/snrEaO
K+PKxkZyPBML77Vs4TSI
=8+a3
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' into x86/urgent
* Avoid confusing the user by returning -EIO instead of -ENOENT in
efivarfs if an EFI variable gets deleted from under us and return EOF
when reading from a zero-length file - Lingzhu Xiang
* Fix an oops in efivar_update_sysfs_entries() caused by reusing (and
therefore corrupting) a kzalloc() allocation - Seiji Aguchi
* Initialise the DataSize argument to GetVariable() otherwise it will
not be updated with the actual size of the variable on return.
Discovered on a Acer Aspire V3 BIOS - Lee, Chun-Yi
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
In commit 78d77df715 ("x86-64, init: Do not set NX bits on non-NX
capable hardware") we added the early_pmd_flags that gets the NX bit set
when a CPU supports NX. However, the new variable was marked __initdata,
because the main _use_ of this is in an __init routine.
However, the bit setting happens from secondary_startup_64(), which is
called not only at bootup, but on every secondary CPU start. Including
resuming from STR and at CPU hotplug time. So the value cannot be
__initdata.
Reported-bisected-and-tested-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org # v3.9
Acked-by: Peter Anvin <hpa@linux.intel.com>
Cc: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 fixes from Thomas Gleixner:
- Fix for a CPU hot-add deadlock in microcode update code
- Fix for idle consolidation fallout
- Documentation update for initial kernel direct mapping
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Add missing comments for initial kernel direct mapping
x86/microcode: Add local mutex to fix physical CPU hot-add deadlock
x86: Fix idle consolidation fallout
Pull timer fixes from Thomas Gleixner:
- Cure for not using zalloc in the first place, which leads to random
crashes with CPUMASK_OFF_STACK.
- Revert a user space visible change which broke udev
- Add a missing cpu_online early return introduced by the new full
dyntick conversions
- Plug a long standing race in the timer wheel cpu hotplug code.
Sigh...
- Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu
up.
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE
tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
tick: Cleanup NOHZ per cpu data on cpu down
tick: Use zalloc_cpumask_var for allocating offstack cpumasks
Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.
In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.
While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.
Reported-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org> #3.9
Cc: Feng Tang <feng.tang@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
That will be better initial the value of DataSize to zero for the input of
GetVariable(), otherwise we will feed a random value. The debug log of input
DataSize like this:
...
[ 195.915612] EFI Variables Facility v0.08 2004-May-17
[ 195.915819] efi: size: 18446744071581821342
[ 195.915969] efi: size': 18446744071581821342
[ 195.916324] efi: size: 18446612150714306560
[ 195.916632] efi: size': 18446612150714306560
[ 195.917159] efi: size: 18446612150714306560
[ 195.917453] efi: size': 18446612150714306560
...
The size' is value that was returned by BIOS.
After applied this patch:
[ 82.442042] EFI Variables Facility v0.08 2004-May-17
[ 82.442202] efi: size: 0
[ 82.442360] efi: size': 1039
[ 82.443828] efi: size: 0
[ 82.444127] efi: size': 2616
[ 82.447057] efi: size: 0
[ 82.447356] efi: size': 5832
...
Found on Acer Aspire V3 BIOS, it will not return the size of data if we input a
non-zero DataSize.
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
- More fixes in the vCPU PVHVM hotplug path.
- Add more documentation.
- Fix various ARM related issues in the Xen generic drivers.
- Updates in the xen-pciback driver per Bjorn's updates.
- Mask the x2APIC feature for PV guests.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRjPL5AAoJEFjIrFwIi8fJdlIIANXawH+B+aFbqsFSKOOh76XN
smgICU78SVzKpW9WAPYK7YFqSdNN4AleGC2Mn2lSkiaqgciRyDb9Yt+OSMMts2Xn
ZVbFkGhEKR+DtZfTKo9YgsGatul/McTiVEkuuli+aN5dql3WXDLAaA+/b9bO3ohh
TCWtWNuSCGmlfDoJET2je+J6CgKvCErH3fvzKNxgYxytcGhAvxoVK/lC4d3pnq/m
wQUAIcF8XYENqC2m1WDR0OGveAB0Me0j9g+UkQS+TzqA8GPmxC4aptjkroFYhOz6
8nZp+LanimmTI6olVioWEXkCr5+dxb058jQKwncQfonFpl58RS0qUrz5zoe3etU=
=h9SX
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
- More fixes in the vCPU PVHVM hotplug path.
- Add more documentation.
- Fix various ARM related issues in the Xen generic drivers.
- Updates in the xen-pciback driver per Bjorn's updates.
- Mask the x2APIC feature for PV guests.
* tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pci: Used cached MSI-X capability offset
xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
xen: clear IRQ_NOAUTOEN and IRQ_NOREQUEST
xen: mask x2APIC feature in PV
xen: SWIOTLB is only used on x86
xen/spinlock: Fix check from greater than to be also be greater or equal to.
xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info
xen/vcpu: Document the xen_vcpu_info and xen_vcpu
xen/vcpu/pvhvm: Fix vcpu hotplugging hanging.
Pull idle update from Len Brown:
"Add support for new Haswell-ULT CPU idle power states"
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
intel_idle: initial C8, C9, C10 support
tools/power turbostat: display C8, C9, C10 residency
Pull stray syscall bits from Al Viro:
"Several syscall-related commits that were missing from the original"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
unicore32: just use mmap_pgoff()...
unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
Pull kvm fixes from Gleb Natapov:
"Most of the fixes are in the emulator since now we emulate more than
we did before for correctness sake we see more bugs there, but there
is also an OOPS fixed and corruption of xcr0 register."
* tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: emulator: emulate SALC
KVM: emulator: emulate XLAT
KVM: emulator: emulate AAM
KVM: VMX: fix halt emulation while emulating invalid guest sate
KVM: Fix kvm_irqfd_init initialization
KVM: x86: fix maintenance of guest/host xcr0 state
We now cache the MSI-X capability offset in the struct pci_dev, so no
need to find the capability again.
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the
Table Offset register, not the flags ("Message Control" per spec)
register.
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Two sets of comments were lost during patch-series shuffling:
- comments for init_range_memory_mapping()
- comments in init_mem_mapping that is helpful for reminding people
that the pagetable is setup top-down
The comments were written by Yinghai in his patch in:
https://lkml.org/lkml/2012/11/28/620
This patch reintroduces them.
Originally-From: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/518BC776.7010506@gmail.com
[ Tidied it all up a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
MSI
PCI: Set ->mask_pos correctly
Hotplug
PCI: Delay final fixups until resources are assigned
Moorestown
x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRi9CqAAoJEFmIoMA60/r82WIQAMJE7kwl88TpIw4QdTum4a+z
E+eBMnGFSw/y7JNP81rOuLG8xZiZNQxGUvTEVV+WY6n7xhRSwcNZIiVqZ1xSBSjM
PpmLOLrXI6I9JIouBLKIRiPOzbC7iB6O7RhKZCO68guQ8Y7epYoJjORaKELGZmN+
09fVapvHHhcOcYYYAiUWvobjk6vCx4+6dSj6tRldI8JNl+WQtqMaK2P3QjsncZek
20OXcrfv4X3ApoSwXcn1NUUDSlrAgM+VcwL6RJK9boURDnPOU8IzaD78DQVOUCNx
BfLULFYkBq62ExCpTkg4Xo8aRowLAEcThZ3Z8/XtMmFlWDSdNm5BaTkYnPCf7nGW
+8Oaxjm0pNVa5QnQqoK1HWpWtU1JTA0hO1tmJ9WyU+84GPuqTN3qiJTigfC9NHOi
mj8O98sgbzIENKszBRpaNctjKVxKNFrBzQ3kOdFGB7NBVXN0pC2jnGBoxuxUrU3B
h/yMB0Ku/GvnHRSngEhDzlkeuQpWTxvlhdjvlL63F0gEDQO0k3UPaiqD4zMpruga
bHZ10v73Kdqp0FVatijmHztXO/yYB3m7tH3ZUDD3yfhKdaeOuTceDgbyo/ZwP2Wq
gzOwuOVYLuJK68Qj4vkHJIKy86jHfNP4HGawhh8dECZDmsciarymQ+vAzobalYIu
GSurydL4zqJJ+MIH84dW
=B2U8
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"MSI:
PCI: Set ->mask_pos correctly
Hotplug:
PCI: Delay final fixups until resources are assigned
Moorestown:
x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0"
* tag 'pci-v3.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Delay final fixups until resources are assigned
x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0
PCI: Set ->mask_pos correctly
GENERIC_GPIO now synonymous with GPIOLIB. There are no longer any valid
cases for enableing GENERIC_GPIO without GPIOLIB, even though it is
possible to do so which has been causing confusion and breakage. This
branch does the work to completely eliminate GENERIC_GPIO.
However, it is not trivial to just create a branch to remove it. Over
the course of the v3.9 cycle more code referencing GENERIC_GPIO has been
added to linux-next that conflicts with this branch. The following must
be done to resolve the conflicts when merging this branch into mainline:
* "git grep CONFIG_GENERIC_GPIO" should return 0 hits. Matches should be
replaced with CONFIG_GPIOLIB
* "git grep '\bGENERIC_GPIO\b'" should return 1 hit in the Chinese
documentation.
* Selectors of GENERIC_GPIO should be turned into selectors of GPIOLIB
* definitions of the option in architecture Kconfig code should be deleted.
Stephen has 3 merge fixup patches[1] that do the above. They are currently
applicable on mainline as of May 2nd.
[1] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg428056.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRifUnAAoJEEFnBt12D9kBs2YP/0U6+ia+xYvkVaJc28PDVIzn
OReZNcJOYU8D5voxz0voaRD0EdcPwjbMu9Kp9aXMHlk4VxevF+8jCc/us0bIjtO1
VcB5VmSCIhMhxdnBlum11Mk7Vr5MCweyl9NBsypnPt8cl4obMBZHf2yzoodFktNb
wtyYlOb6FALtc6iDbOO6dG3w9F7FAOLvskUFzdv89m8mupTsBu9jw9NqFDbJHOex
rxq0Sdd+kWF/nkJVcV5Y6jIdletRlhpipefMJ9diexreHvwqh+c4kJEYZaXgB5+m
ha95cPbReK1d+RqzM3A8d4irzSVSmq4k7ijI6QkFOr48+AH7XsgKv5so885LKzMN
IIXg2Phm9i0H8+ecEvhcc4oIYBHJiEKK54Y0qUD9dqbFoDGPTCSqMHdSSMbpAY+J
bIIXlVzj1En3PPNUJLPt8q8Qz6WxCT9mDST3QSGYnD4o90HT+1R9j92RxGL6McOq
rUOyJDwmzFvpBvKK4raGdOU435M+ps2NPKKNIRaIGQPPY9rM1kN4YqvhXukEsC9L
3a3+3cQLh7iKxBHncxeQsJfethP1CPkJnzvF9r+ZZLf2rcPH4pbQIE2uO0XnX/nd
5/DKi0nGgAJ//GMMzdo3RiOA5zGFjIZ/KMvfhQldpP6qFJRhqdGi6FPlAcwr1z1n
YnCByPwwlvfC4LTXFOGL
=xodc
-----END PGP SIGNATURE-----
Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux
Pull removal of GENERIC_GPIO from Grant Likely:
"GENERIC_GPIO now synonymous with GPIOLIB. There are no longer any
valid cases for enableing GENERIC_GPIO without GPIOLIB, even though it
is possible to do so which has been causing confusion and breakage.
This branch does the work to completely eliminate GENERIC_GPIO."
* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux:
gpio: update gpio Chinese documentation
Remove GENERIC_GPIO config option
Convert selectors of GENERIC_GPIO to GPIOLIB
blackfin: force use of gpiolib
m68k: coldfire: use gpiolib
mips: pnx833x: remove requirement for GENERIC_GPIO
openrisc: default GENERIC_GPIO to false
avr32: default GENERIC_GPIO to false
xtensa: remove explicit selection of GENERIC_GPIO
sh: replace CONFIG_GENERIC_GPIO by CONFIG_GPIOLIB
powerpc: remove redundant GENERIC_GPIO selection
unicore32: default GENERIC_GPIO to false
unicore32: remove unneeded select GENERIC_GPIO
arm: plat-orion: use GPIO driver on CONFIG_GPIOLIB
arm: remove redundant GENERIC_GPIO selection
mips: alchemy: require gpiolib
mips: txx9: change GENERIC_GPIO to GPIOLIB
mips: loongson: use GPIO driver on CONFIG_GPIOLIB
mips: remove redundant GENERIC_GPIO select
This is an almost-undocumented instruction available in 32-bit mode.
I say "almost" undocumented because AMD documents it in their opcode
maps just to say that it is unavailable in 64-bit mode (sections
"A.2.1 One-Byte Opcodes" and "B.3 Invalid and Reassigned Instructions
in 64-Bit Mode").
It is roughly equivalent to "sbb %al, %al" except it does not
set the flags. Use fastop to emulate it, but do not use the opcode
directly because it would fail if the host is 64-bit!
Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1.
It is just a MOV in disguise, with a funny source address.
Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1.
AAM needs the source operand to be unsigned; do the same in AAD as well
for consistency, even though it does not affect the result.
Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
This can easily be triggered if a new CPU is added (via
ACPI hotplug mechanism) and from user-space you do:
echo 1 > /sys/devices/system/cpu/cpu3/online
(or wait for UDEV to do it) on a newly appeared physical CPU.
The deadlock is that the "store_online" in drivers/base/cpu.c
takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up".
"cpu_up" eventually ends up calling "save_mc_for_early"
which also takes the cpu_hotplug_driver_lock() lock.
And here is that lockdep thinks of it:
smpboot: Stack at about ffff880075c39f44
smpboot: CPU3: has booted.
microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25
=============================================
[ INFO: possible recursive locking detected ]
3.9.0upstream-10129-g167af0e #1 Not tainted
---------------------------------------------
sh/2487 is trying to acquire lock:
(x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
but task is already holding lock:
(x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(x86_cpu_hotplug_driver_mutex);
lock(x86_cpu_hotplug_driver_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
6 locks held by sh/2487:
#0: (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190
#1: (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160
#2: (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160
#3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
#4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20
#5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60
Suggested-and-Acked-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: fenghua.yu@intel.com
Cc: xen-devel@lists.xensource.com
Cc: stable@vger.kernel.org # for v3.9
Link: http://lkml.kernel.org/r/1368029583-23337-1-git-send-email-konrad.wilk@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The invalid guest state emulation loop does not check halt_request
which causes 100% cpu loop while guest is in halt and in invalid
state, but more serious issue is that this leaves halt_request set, so
random instruction emulated by vm86 #GP exit can be interpreted
as halt which causes guest hang. Fix both problems by handling
halt_request in emulation loop.
Reported-by: Tomas Papan <tomas.papan@gmail.com>
Tested-by: Tomas Papan <tomas.papan@gmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
During review of git commit cb9c6f15f3
("xen/spinlock: Check against default value of -1 for IRQ line.")
Stefano pointed out a bug in the patch. Unfortunatly due to vacation
timing the fix was not applied and this patch fixes it up.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
As it will point to some data, but not event channel data (the
shared_info has an array limited to 32).
This means that for PVHVM guests with more than 32 VCPUs without
the usage of VCPUOP_register_info any interrupts to VCPUs
larger than 32 would have gone unnoticed during early bootup.
That is OK, as during early bootup, in smp_init we end up calling
the hotplug mechanism (xen_hvm_cpu_notify) which makes the
VCPUOP_register_vcpu_info call for all VCPUs and we can receive
interrupts on VCPUs 33 and further.
This is just a cleanup.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Emulation of xcr0 writes zero guest_xcr0_loaded variable so that
subsequent VM-entry reloads CPU's xcr0 with guests xcr0 value.
However, this is incorrect because guest_xcr0_loaded variable is
read to decide whether to reload hosts xcr0.
In case the vcpu thread is scheduled out after the guest_xcr0_loaded = 0
assignment, and scheduler decides to preload FPU:
switch_to
{
__switch_to
__math_state_restore
restore_fpu_checking
fpu_restore_checking
if (use_xsave())
fpu_xrstor_checking
xrstor64 with CPU's xcr0 == guests xcr0
Fix by properly restoring hosts xcr0 during emulation of xcr0 writes.
Analyzed-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Merge rwsem optimizations from Michel Lespinasse:
"These patches extend Alex Shi's work (which added write lock stealing
on the rwsem slow path) in order to provide rwsem write lock stealing
on the fast path (that is, without taking the rwsem's wait_lock).
I have unfortunately been unable to push this through -next before due
to Ingo Molnar / David Howells / Peter Zijlstra being busy with other
things. However, this has gotten some attention from Rik van Riel and
Davidlohr Bueso who both commented that they felt this was ready for
v3.10, and Ingo Molnar has said that he was OK with me pushing
directly to you. So, here goes :)
Davidlohr got the following test results from pgbench running on a
quad-core laptop:
| db_size | clients | tps-vanilla | tps-rwsem |
+---------+----------+----------------+--------------+
| 160 MB | 1 | 5803 | 6906 | + 19.0%
| 160 MB | 2 | 13092 | 15931 |
| 160 MB | 4 | 29412 | 33021 |
| 160 MB | 8 | 32448 | 34626 |
| 160 MB | 16 | 32758 | 33098 |
| 160 MB | 20 | 26940 | 31343 | + 16.3%
| 160 MB | 30 | 25147 | 28961 |
| 160 MB | 40 | 25484 | 26902 |
| 160 MB | 50 | 24528 | 25760 |
------------------------------------------------------
| 1.6 GB | 1 | 5733 | 7729 | + 34.8%
| 1.6 GB | 2 | 9411 | 19009 | + 101.9%
| 1.6 GB | 4 | 31818 | 33185 |
| 1.6 GB | 8 | 33700 | 34550 |
| 1.6 GB | 16 | 32751 | 33079 |
| 1.6 GB | 20 | 30919 | 31494 |
| 1.6 GB | 30 | 28540 | 28535 |
| 1.6 GB | 40 | 26380 | 27054 |
| 1.6 GB | 50 | 25241 | 25591 |
------------------------------------------------------
| 7.6 GB | 1 | 5779 | 6224 |
| 7.6 GB | 2 | 10897 | 13611 | + 24.9%
| 7.6 GB | 4 | 32683 | 33108 |
| 7.6 GB | 8 | 33968 | 34712 |
| 7.6 GB | 16 | 32287 | 32895 |
| 7.6 GB | 20 | 27770 | 31689 | + 14.1%
| 7.6 GB | 30 | 26739 | 29003 |
| 7.6 GB | 40 | 24901 | 26683 |
| 7.6 GB | 50 | 17115 | 25925 | + 51.5%
------------------------------------------------------
(Davidlohr also has one additional patch which further improves
throughput, though I will ask him to send it directly to you as I have
suggested some minor changes)."
* emailed patches from Michel Lespinasse <walken@google.com>:
rwsem: no need for explicit signed longs
x86 rwsem: avoid taking slow path when stealing write lock
rwsem: do not block readers at head of queue if other readers are active
rwsem: implement support for write lock stealing on the fastpath
rwsem: simplify __rwsem_do_wake
rwsem: skip initial trylock in rwsem_down_write_failed
rwsem: avoid taking wait_lock in rwsem_down_write_failed
rwsem: use cmpxchg for trying to steal write lock
rwsem: more agressive lock stealing in rwsem_down_write_failed
rwsem: simplify rwsem_down_write_failed
rwsem: simplify rwsem_down_read_failed
rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed
rwsem: shorter spinlocked section in rwsem_down_failed_common()
rwsem: make the waiter type an enumeration rather than a bitmask