Commit Graph

25107 Commits

Author SHA1 Message Date
Linus Torvalds
00bcf5cdd6 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - rwsem micro-optimizations (Davidlohr Bueso)

   - Improve the implementation and optimize the performance of
     percpu-rwsems. (Peter Zijlstra.)

   - Convert all lglock users to better facilities such as percpu-rwsems
     or percpu-spinlocks and remove lglocks. (Peter Zijlstra)

   - Remove the ticket (spin)lock implementation. (Peter Zijlstra)

   - Korean translation of memory-barriers.txt and related fixes to the
     English document. (SeongJae Park)

   - misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/cmpxchg, locking/atomics: Remove superfluous definitions
  x86, locking/spinlocks: Remove ticket (spin)lock implementation
  locking/lglock: Remove lglock implementation
  stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()
  fs/locks: Use percpu_down_read_preempt_disable()
  locking/percpu-rwsem: Add down_read_preempt_disable()
  fs/locks: Replace lg_local with a per-cpu spinlock
  fs/locks: Replace lg_global with a percpu-rwsem
  locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held()
  locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock()
  locking/rwsem, x86: Drop a bogus cc clobber
  futex: Add some more function commentry
  locking/hung_task: Show all locks
  locking/rwsem: Scan the wait_list for readers only once
  locking/rwsem: Remove a few useless comments
  locking/rwsem: Return void in __rwsem_mark_wake()
  locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
  locking/Documentation: Add Korean translation
  locking/Documentation: Fix a typo of example result
  locking/Documentation: Fix wrong section reference
  ...
2016-10-03 12:15:00 -07:00
Linus Torvalds
de956b8f45 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "Main changes in this cycle were:

   - Refactor the EFI memory map code into architecture neutral files
     and allow drivers to permanently reserve EFI boot services regions
     on x86, as well as ARM/arm64. (Matt Fleming)

   - Add ARM support for the EFI ESRT driver. (Ard Biesheuvel)

   - Make the EFI runtime services and efivar API interruptible by
     swapping spinlocks for semaphores. (Sylvain Chouleur)

   - Provide the EFI identity mapping for kexec which allows kexec to
     work on SGI/UV platforms with requiring the "noefi" kernel command
     line parameter. (Alex Thorlton)

   - Add debugfs node to dump EFI page tables on arm64. (Ard Biesheuvel)

   - Merge the EFI test driver being carried out of tree until now in
     the FWTS project. (Ivan Hu)

   - Expand the list of flags for classifying EFI regions as "RAM" on
     arm64 so we align with the UEFI spec. (Ard Biesheuvel)

   - Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
     or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
     services function table for direct calls, alleviating us from
     having to maintain the custom function table. (Lukas Wunner)

   - Miscellaneous cleanups and fixes"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE
  x86/efi: Allow invocation of arbitrary boot services
  x86/efi: Optimize away setup_gop32/64 if unused
  x86/efi: Use kmalloc_array() in efi_call_phys_prolog()
  efi/arm64: Treat regions with WT/WC set but WB cleared as memory
  efi: Add efi_test driver for exporting UEFI runtime service interfaces
  x86/efi: Defer efi_esrt_init until after memblock_x86_fill
  efi/arm64: Add debugfs node to dump UEFI runtime page tables
  x86/efi: Remove unused find_bits() function
  fs/efivarfs: Fix double kfree() in error path
  x86/efi: Map in physical addresses in efi_map_region_fixed
  lib/ucs2_string: Speed up ucs2_utf8size()
  firmware-gsmi: Delete an unnecessary check before the function call "dma_pool_destroy"
  x86/efi: Initialize status to ensure garbage is not returned on small size
  efi: Replace runtime services spinlock with semaphore
  efi: Don't use spinlocks for efi vars
  efi: Use a file local lock for efivars
  efi/arm*: esrt: Add missing call to efi_esrt_init()
  efi/esrt: Use memremap not ioremap to access ESRT table in memory
  x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data
  ...
2016-10-03 11:33:18 -07:00
Linus Torvalds
d7a0dab82f Merge branch 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core SMP updates from Ingo Molnar:
 "Two main change is generic vCPU pinning and physical CPU SMP-call
  support, for Xen to be able to perform certain calls on specific
  physical CPUs - by Juergen Gross"

* 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  smp: Allocate smp_call_on_cpu() workqueue on stack too
  hwmon: Use smp_call_on_cpu() for dell-smm i8k
  dcdbas: Make use of smp_call_on_cpu()
  xen: Add xen_pin_vcpu() to support calling functions on a dedicated pCPU
  smp: Add function to execute a function synchronously on a CPU
  virt, sched: Add generic vCPU pinning support
  xen: Sync xen header
2016-10-03 11:02:39 -07:00
Linus Torvalds
72d39926f0 ACPI material for v4.9-rc1
- Update of the ACPICA code in the kernel to upstream revision 20160831 with
    the following major changes:
    * New mechanism for GPE masking.
    * Fixes for issues related to the LoadTable operator and table loading.
    * Fixes for issues related to so-called module-level code (MLC), that is
      AML that doesn't belong to any methods.
    * Change of the return value of the _OSI method to reflect the Windows
      behavior.
    * GAS (Generic Address Structure) support fix related to 32-bit FADT
      addresses.
    * Elimination of unnecessary FADT version 2 support.
    * ACPI tools fixes and cleanups.
    From Bob Moore, Lv Zheng, and Jung-uk Kim.
 
  - ACPI sysfs interface updates to fix GPE handling (on top of the new GPE
    masking mechanism in ACPICA) and issues related to table loading (Lv Zheng).
 
  - New watchdog driver based on the ACPI WDAT (ACPI Watchdog Action Table),
    needed on some platforms to replace the iTCO watchdog that doesn't work there
    and related updates of the intel_pmc_ipc, i2c/i801 and MFD/lcp_ich drivers
    (Mika Westerberg).
 
  - Driver core fix to prevent it from leaking secondary fwnode objects during
    device removal (Lukas Wunner).
 
  - New definitions of built-in properties for UART in ACPI-based x86 SoC drivers
    and a 8250_dw driver quirk for the APM X-Gene SoC (Heikki Krogerus).
 
  - New device ID for the Vulcan SPI controller and constification of local
    strucures in the AMD SoC (APD) ACPI driver (Kamlakant Patel, Julia Lawall).
 
  - Fix for a bug causing the allocation of PCI resorces to fail if
    ACPI-enumerated child platform devices are registered below the PCI
    devices in question (Mika Westerberg).
 
  - Change of the default polarity for PCI legacy IRQs to high on systems
    booting wth ACPI on platforms with a GIC interrupt controller model
    fixing the discrepancy between the specification and HW behavior (Lorenzo
    Pieralisi).
 
  - Fixes for the handling of system suspend/resume in the ACPI EC driver and
    update of that driver to make it cope with the cases when the EC device
    defined in the ECDT has to be used throughout the entire system life cycle
    (Lv Zheng).
 
  - Update of the ACPI CPPC library to allow it to batch requests sent over the
    PCC channel (to reduce overhead), to support the fixed functional hardware
    (FFH) CPPC registers access type, to notify the mailbox framework about TX
    completions when the interrupt flag is set for the PCC mailbox, and to
    support HW-Reduced Communication Subspace type 2 (Ashwin Chaugule, Prashanth
    Prakash, Srinivas Pandruvada, Hoan Tran).
 
  - ACPI button driver fix and documentation update related to the handling of
    laptop lids (Lv Zheng).
 
  - ACPI battery driver initialization fix (Carlos Garnacho).
 
  - ACPI GPIO enumeration documentation update (Mika Westerberg).
 
  - Assorted updates of the core ACPI bus type code (Lukas Wunner, Lv Zheng).
 
  - Assorted cleanups of the ACPI table parsing code and the x86-specific ACPI
    code (Al Stone).
 
  - Fixes for assorted ACPI-related issues found in linux-next (Wei Yongjun).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJX8Y5+AAoJEILEb/54YlRx73oP/RiAi86NKjOj+GfYceVe37jn
 6lSqoMugjgTQHRYvYiQCjJ/BR0GzQZqUkz9TAu1Op14+rhTH3OhSfPizzJWCpVfA
 G9l9ZRQNnsKNs14bbYmWtmWduh46dFLVFJqo+M/0H3ZMFZu6Adcb+1SBtXHUoQ6L
 z69ngFxTu3yRvqS4cmm5h7SOx5W2uZZl8zViJW8jgyGhUBStG87gzR6wsYBldGCk
 XFxcaGWBXRccWGAQLSwfs0psQccEooCqbpsDqaUdrK/mI0rsQr88f25ZxEE7Zw7H
 bv3py1cgJBZRq36L7eBGQXjIE7YQey6qG2lug2zsUJWe+vzy2vHjHVJHuBXKKgv3
 txOA6QZx63UgEyN3zFT7K5ek6uOnkKdeE+s+Laj+K/x4V2R6gbtgO011EVcXy+bI
 NvqsO76tfPHpwrn5s1VVc5lcEBEPHKHb+WulHrqhSSU4ivk0gtJDeSI+c8xta6YT
 XwSry5tozDLkG1uEZqkyY1XTlOUAHO8E6YcrlOv2z1+mG7L8OH/vCp1apzgexsZA
 1683AH5cwKc3KaP+4QdKGdxY2BDxb7OTVh3cGy4kAYb6tqQ/vj7vlRiJvtaMBtFw
 xJn3buuagwJzKtgebpA565opvyFAfUX/RNFlTP63aXAefSAgq6KLq70vKFxkIZto
 H1LpUbmiEbuBml8CBGb1
 =xDOQ
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "First off, the ACPICA code in the kernel is updated to upstream
  revision 20160831 that brings in a few bug fixes and cleanups. In
  particular, it is possible to mask GPEs now (and the sysfs interface
  for GPE control is fixed on top of that), problems related to the
  table loading mechanism are fixed and all code related to FADT version
  2 (which has never been part of the ACPI specification) is dropped.

  On the new features front, there is a new watchdog driver based on the
  ACPI WDAT (ACPI Watchdog Action Table), needed on some platforms to
  replace the iTCO watchdog that doesn't work there, and some UART
  devices get new definitions of built-in properties (to be accessed via
  the generic device properties API).

  Also, included is a fix for an ACPI-related PCI resorces allocation
  issue and a few problems in the EC driver and in the button and
  battery drivers are fixed.

  In addition to that, the ACPI CPPC library is updated to make batching
  of requests sent over the PCC channel possible (which reduces the PCC
  usage overhead substantially in some cases) and to support functional
  fixed hardware (FFH) type of CPPC registers access (which will allow
  CPPC to be used on x86 too in the future).

  As usual, there are some assorted fixes and cleanups too.

  Specifics:

   - Update of the ACPICA code in the kernel to upstream revision
     20160831 with the following major changes:

      * New mechanism for GPE masking.
      * Fixes for issues related to the LoadTable operator and table
        loading.
      * Fixes for issues related to so-called module-level code (MLC),
        that is AML that doesn't belong to any methods.
      * Change of the return value of the _OSI method to reflect the
        Windows behavior.
      * GAS (Generic Address Structure) support fix related to 32-bit
        FADT addresses.
      * Elimination of unnecessary FADT version 2 support.
      * ACPI tools fixes and cleanups.

     From Bob Moore, Lv Zheng, and Jung-uk Kim.

   - ACPI sysfs interface updates to fix GPE handling (on top of the new
     GPE masking mechanism in ACPICA) and issues related to table
     loading (Lv Zheng).

   - New watchdog driver based on the ACPI WDAT (ACPI Watchdog Action
     Table), needed on some platforms to replace the iTCO watchdog that
     doesn't work there and related updates of the intel_pmc_ipc,
     i2c/i801 and MFD/lcp_ich drivers (Mika Westerberg).

   - Driver core fix to prevent it from leaking secondary fwnode objects
     during device removal (Lukas Wunner).

   - New definitions of built-in properties for UART in ACPI-based x86
     SoC drivers and a 8250_dw driver quirk for the APM X-Gene SoC
     (Heikki Krogerus).

   - New device ID for the Vulcan SPI controller and constification of
     local strucures in the AMD SoC (APD) ACPI driver (Kamlakant Patel,
     Julia Lawall).

   - Fix for a bug causing the allocation of PCI resorces to fail if
     ACPI-enumerated child platform devices are registered below the PCI
     devices in question (Mika Westerberg).

   - Change of the default polarity for PCI legacy IRQs to high on
     systems booting wth ACPI on platforms with a GIC interrupt
     controller model fixing the discrepancy between the specification
     and HW behavior (Lorenzo Pieralisi).

   - Fixes for the handling of system suspend/resume in the ACPI EC
     driver and update of that driver to make it cope with the cases
     when the EC device defined in the ECDT has to be used throughout
     the entire system life cycle (Lv Zheng).

   - Update of the ACPI CPPC library to allow it to batch requests sent
     over the PCC channel (to reduce overhead), to support the fixed
     functional hardware (FFH) CPPC registers access type, to notify the
     mailbox framework about TX completions when the interrupt flag is
     set for the PCC mailbox, and to support HW-Reduced Communication
     Subspace type 2 (Ashwin Chaugule, Prashanth Prakash, Srinivas
     Pandruvada, Hoan Tran).

   - ACPI button driver fix and documentation update related to the
     handling of laptop lids (Lv Zheng).

   - ACPI battery driver initialization fix (Carlos Garnacho).

   - ACPI GPIO enumeration documentation update (Mika Westerberg).

   - Assorted updates of the core ACPI bus type code (Lukas Wunner, Lv
     Zheng).

   - Assorted cleanups of the ACPI table parsing code and the
     x86-specific ACPI code (Al Stone).

   - Fixes for assorted ACPI-related issues found in linux-next (Wei
     Yongjun)"

* tag 'acpi-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (98 commits)
  ACPI / documentation: Use recommended name in GPIO property names
  watchdog: wdat_wdt: Fix warning for using 0 as NULL
  watchdog: wdat_wdt: fix return value check in wdat_wdt_probe()
  platform/x86: intel_pmc_ipc: Do not create iTCO watchdog when WDAT table exists
  i2c: i801: Do not create iTCO watchdog when WDAT table exists
  mfd: lpc_ich: Do not create iTCO watchdog when WDAT table exists
  ACPI / bus: Adjust ACPI subsystem initialization for new table loading mode
  ACPICA: Parser: Fix a regression in LoadTable support
  ACPICA: Tables: Fix "UNLOAD" code path lock issues
  ACPI / watchdog: Add support for WDAT hardware watchdog
  ACPI / platform: Pay attention to parent device's resources
  PCI: Add pci_find_resource()
  ACPI / CPPC: Support PCC with interrupt flag
  ACPI / sysfs: Update sysfs signature handling code
  ACPI / sysfs: Fix an issue for LoadTable opcode
  ACPICA: Tables: Fix a regression in acpi_tb_find_table()
  ACPI / tables: Remove duplicated include from tables.c
  ACPI / APD: constify local structures
  x86: ACPI: make variable names clearer in acpi_parse_madt_lapic_entries()
  x86: ACPI: remove extraneous white space after semicolon
  ...
2016-10-03 10:11:58 -07:00
Rafael J. Wysocki
0d573c6a01 Merge branches 'acpi-x86', 'acpi-cppc' and 'acpi-soc'
* acpi-x86:
  x86: ACPI: make variable names clearer in acpi_parse_madt_lapic_entries()
  x86: ACPI: remove extraneous white space after semicolon

* acpi-cppc:
  ACPI / CPPC: Support PCC with interrupt flag
  ACPI / CPPC: Add prefix cppc to cpudata structure name
  ACPI / CPPC: Add support for functional fixed hardware address
  ACPI / CPPC: Don't return on CPPC probe failure
  ACPI / CPPC: Allow build with ACPI_CPU_FREQ_PSS config
  ACPI / CPPC: check for error bit in PCC status field
  ACPI / CPPC: move all PCC related information into pcc_data
  ACPI / CPPC: add sysfs support to compute delivered performance
  ACPI / CPPC: set a non-zero value for transition_latency
  ACPI / CPPC: support for batching CPPC requests
  ACPI / CPPC: acquire pcc_lock only while accessing PCC subspace
  ACPI / CPPC: restructure read/writes for efficient sys mapped reg ops
  mailbox: pcc: Support HW-Reduced Communication Subspace type 2

* acpi-soc:
  ACPI / APD: constify local structures
  ACPI / APD: Add device HID for Vulcan SPI controller
2016-10-02 01:39:09 +02:00
Wanpeng Li
2fa5f04f85 x86/entry/64: Fix context tracking state warning when load_gs_index fails
This warning:

 WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50
 CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13
 Call Trace:
  dump_stack+0x99/0xd0
  __warn+0xd1/0xf0
  warn_slowpath_null+0x1d/0x20
  enter_from_user_mode+0x32/0x50
  error_entry+0x6d/0xc0
  ? general_protection+0x12/0x30
  ? native_load_gs_index+0xd/0x20
  ? do_set_thread_area+0x19c/0x1f0
  SyS_set_thread_area+0x24/0x30
  do_int80_syscall_32+0x7c/0x220
  entry_INT80_compat+0x38/0x50

... can be reproduced by running the GS testcase of the ldt_gdt test unit in
the x86 selftests.

do_int80_syscall_32() will call enter_form_user_mode() to convert context
tracking state from user state to kernel state. The load_gs_index() call
can fail with user gsbase, gsbase will be fixed up and proceed if this
happen.

However, enter_from_user_mode() will be called again in the fixed up path
though it is context tracking kernel state currently.

This patch fixes it by just fixing up gsbase and telling lockdep that IRQs
are off once load_gs_index() failed with user gsbase.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 13:53:12 +02:00
Andy Lutomirski
05fb3c199b x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
Otherwise arch_task_struct_size == 0 and we die.  While we're at it,
set X86_FEATURE_ALWAYS, too.

Reported-by: David Saggiorato <david@saggiorato.net>
Tested-by: David Saggiorato <david@saggiorato.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86")
Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 13:53:04 +02:00
Segher Boessenkool
e4aad64597 x86/vdso: Fix building on big endian host
We need to call GET_LE to read hdr->e_type.

Fixes: 57f90c3dfc ("x86/vdso: Error out if the vDSO isn't a valid DSO")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linux-next@vger.kernel.org
Link: http://lkml.kernel.org/r/20160929193442.GA16617@gate.crashing.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30 12:37:40 +02:00
Andy Lutomirski
192d1dccbf x86/boot: Fix another __read_cr4() case on 486
The condition for reading CR4 was wrong: there are some CPUs with
CPUID but not CR4.  Rather than trying to make the condition exact,
use __read_cr4_safe().

Fixes: 18bc7bd523 ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly")
Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30 12:37:40 +02:00
Nikolay Borisov
08645077b7 x86/cmpxchg, locking/atomics: Remove superfluous definitions
cmpxchg contained definitions for unused (x)add_* operations, dating back
to the original ticket spinlock implementation. Nowadays these are
unused so remove them.

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1474913478-17757-1-git-send-email-n.borisov.lkml@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 10:56:01 +02:00
Peter Zijlstra
cfd8983f03 x86, locking/spinlocks: Remove ticket (spin)lock implementation
We've unconditionally used the queued spinlock for many releases now.

Its time to remove the old ticket lock code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Waiman.Long@hpe.com
Cc: david.vrabel@citrix.com
Cc: dhowells@redhat.com
Cc: pbonzini@redhat.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20160518184302.GO3193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 10:56:00 +02:00
Ingo Molnar
0b429e18c2 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 10:54:46 +02:00
Andy Lutomirski
e1bfc11c5a x86/init: Fix cr4_init_shadow() on CR4-less machines
cr4_init_shadow() will panic on 486-like machines without CR4.  Fix
it using __read_cr4_safe().

Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")
Link: http://lkml.kernel.org/r/43a20f81fb504013bf613913dc25574b45336a61.1475091074.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-29 19:08:30 +02:00
Linus Torvalds
9c0e28a7be Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Three fixlets for perf:

   - add a missing NULL pointer check in the intel BTS driver

   - make BTS an exclusive PMU because BTS can only handle one event at
     a time

   - ensure that exclusive events are limited to one PMU so that several
     exclusive events can be scheduled on different PMU instances"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Limit matching exclusive events to one PMU
  perf/x86/intel/bts: Make it an exclusive PMU
  perf/x86/intel/bts: Make sure debug store is valid
2016-09-24 12:44:28 -07:00
Ingo Molnar
7cf0f1426a Merge branch 'locking/urgent' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 15:21:48 +02:00
Alexander Shishkin
08b90f0655 perf/x86/intel/bts: Make it an exclusive PMU
Just like intel_pt, intel_bts can only handle one event at a time,
which is the reason we introduced PERF_PMU_CAP_EXCLUSIVE in the first
place. However, at the moment one can have as many intel_bts events
within the same context at the same time as one pleases. Only one of
them, however, will get scheduled and receive the actual trace data.

Fix this by making intel_bts an "exclusive" PMU.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160920154811.3255-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 14:56:08 +02:00
Ingo Molnar
2ab78a724b * Fix a boot crash reported by Mike Galbraith and Mike Krinkin. The
new EFI memory map reservation code didn't align reservations to
    EFI_PAGE_SIZE boundaries causing bogus regions to be inserted into
    the global EFI memory map - Matt Fleming
 -----BEGIN PGP SIGNATURE-----
 
 iQI2BAABCAAgBQJX4UshGRxtYXR0QGNvZGVibHVlcHJpbnQuY28udWsACgkQLzhZ
 wI0jPVX2RA/9HiT48K98R1fkiig/V3Wh8H8y7lQbwO+qA1WZ7q/D+DYau62qI3zM
 sLgTSoV2zzjBr+4JWuGIdbtpYBjhHci0HwEcOGp5EFSZmgJGOpc7VY2VdFaYyZ2X
 xhowGLXv2jgfKoAPt9MfgZX2Qh5Jt4CRlC+UcilUJp0wy1+uD4W7XmrLM1Rg7Ug2
 8XTL/BA5QxQa3wx8+/z6xV/ADlhzf3DQ7BwI3qkx5RHSXSuy1/E2XGTnyuDt/7Oo
 KmMwQGGrrv92uSjyq5vJ53DuNxXDJ7nGtlXTT2u0WtcZbzJXsXIaCZItbCA5rqW/
 wWGgsz5XSc406m1WNv8Zs8xx9NAEP/+KXQfftwuQHFqoylG4QqYuO060V5XOxQlh
 cdRuyVUgMONZAvTzEAUC+5SaN3c+WDCL8oEnrs/tdMxSzBL9Rj7yaHMW+ozVe5Nf
 GFxMnO0l5SG9+o8hSAkkn5RDfIDzSTC1W5imqWJQXhC2BXHEdmh4wVWIG4QgPM/K
 gqaTpX72Nu04lBjCXXpfKSK5Xqv67hZOzxUDkkjxtq1PtDbL1dPpzAF3s3GFdreI
 FfQgetg4jM4EPnWDQWqWVJxQUv+uCfpDV9g5y9kMinVb2OpyGjcZpNckGHKnqIXr
 hyntPYqyy793W6dPGEjTqfcVP751qisMjp6iIIqk3h+qr7HyJSF5nqs=
 =S9gJ
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/core

Pull EFI fix from Matt Fleming:

 * Fix a boot crash reported by Mike Galbraith and Mike Krinkin. The
   new EFI memory map reservation code didn't align reservations to
   EFI_PAGE_SIZE boundaries causing bogus regions to be inserted into
   the global EFI memory map (Matt Fleming)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20 16:59:15 +02:00
Ingo Molnar
41a66072c3 Merge branch 'efi/urgent' into efi/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20 16:58:59 +02:00
Matt Fleming
92dc33501b x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE
Mike Galbraith reported that his machine started rebooting during boot
after,

  commit 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")

The ESRT table on his machine is 56 bytes and at no point in the
efi_arch_mem_reserve() call path is that size rounded up to
EFI_PAGE_SIZE, nor is the start address on an EFI_PAGE_SIZE boundary.

Since the EFI memory map only deals with whole pages, inserting an EFI
memory region with 56 bytes results in a new entry covering zero
pages, and completely screws up the calculations for the old regions
that were trimmed.

Round all sizes upwards, and start addresses downwards, to the nearest
EFI_PAGE_SIZE boundary.

Additionally, efi_memmap_insert() expects the mem::range::end value to
be one less than the end address for the region.

Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Tested-by: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-20 15:43:31 +01:00
Sebastian Andrzej Siewior
f1e1c9e5e3 perf/x86/intel/bts: Make sure debug store is valid
Since commit 4d4c474124 ("perf/x86/intel/bts: Fix BTS PMI detection")
my box goes boom on boot:

| .... node  #0, CPUs:      #1 #2 #3 #4 #5 #6 #7
| BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
| IP: [<ffffffff8100c463>] intel_bts_interrupt+0x43/0x130
| Call Trace:
|  <NMI> d [<ffffffff8100b341>] intel_pmu_handle_irq+0x51/0x4b0
|  [<ffffffff81004d47>] perf_event_nmi_handler+0x27/0x40

This happens because the code introduced in this commit dereferences the
debug store pointer unconditionally. The debug store is not guaranteed to
be available, so a NULL pointer check as on other places is required.

Fixes: 4d4c474124 ("perf/x86/intel/bts: Fix BTS PMI detection")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: vince@deater.net
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20160920131220.xg5pbdjtznszuyzb@breakpoint.cc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-20 16:06:09 +02:00
Matt Fleming
1297667083 x86/efi: Only map RAM into EFI page tables if in mixed-mode
Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
multi-terabyte HP machine results in boot crashes, because the EFI
region mapping functions loop forever while trying to map those
regions describing RAM.

While this patch doesn't fix the underlying hang, there's really no
reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
when mixed-mode is not in use at runtime.

Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-20 14:53:04 +01:00
Matt Fleming
e535ec0899 x86/mm/pat: Prevent hang during boot when mapping pages
There's a mixture of signed 32-bit and unsigned 32-bit and 64-bit data
types used for keeping track of how many pages have been mapped.

This leads to hangs during boot when mapping large numbers of pages
(multiple terabytes, as reported by Waiman) because those values are
interpreted as being negative.

commit 742563777e ("x86/mm/pat: Avoid truncation when converting
cpa->numpages to address") fixed one of those bugs, but there is
another lurking in __change_page_attr_set_clr().

Additionally, the return value type for the populate_*() functions can
return negative values when a large number of pages have been mapped,
triggering the error paths even though no error occurred.

Consistently use 64-bit types on 64-bit platforms when counting pages.
Even in the signed case this gives us room for regions 8PiB
(pebibytes) in size whilst still allowing the usual negative value
error checking idiom.

Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-20 14:53:00 +01:00
Jan Beulich
c907420fda locking/rwsem, x86: Drop a bogus cc clobber
With the addition of uses of GCC's condition code outputs in commit:

  35ccfb7114 ("x86, asm: Use CC_SET()/CC_OUT() in <asm/rwsem.h>")

... there's now an overlap of outputs and clobbers in __down_write_trylock().

Such overlaps are generally getting tagged with an error (occasionally
even with an ICE). I can't really tell why plain GCC 6.2 doesn't detect
this (judging by the code it is meant to), while the slightly modified
one I use does. Since condition code clobbers are never necessary on x86
(other than perhaps for documentation purposes, which doesn't really
get done consistently), remove it altogether rather than inventing
something like CC_CLOBBER (to accompany CC_SET/CC_OUT).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/57E003CC0200007800110102@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20 08:26:41 +02:00
Linus Torvalds
3286be9480 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A couple of small fixes to x86 perf drivers:

   - Measure L2 for HW_CACHE* events on AMD

   - Fix the address filter handling in the intel/pt driver

   - Handle the BTS disabling at the proper place"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  perf/x86/intel/pt: Do validate the size of a kernel address filter
  perf/x86/intel/pt: Fix kernel address filter's offset validation
  perf/x86/intel/pt: Fix an off-by-one in address filter configuration
  perf/x86/intel: Don't disable "intel_bts" around "intel" event batching
2016-09-18 11:50:48 -07:00
Matt Fleming
080fe0b790 perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
While the Intel PMU monitors the LLC when perf enables the
HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
L1 instruction cache fetches (0x0080) and instruction cache misses
(0x0081) on the AMD PMU.

This is extremely confusing when monitoring the same workload across
Intel and AMD machines, since parameters like,

  $ perf stat -e cache-references,cache-misses

measure completely different things.

Instead, make the AMD PMU measure instruction/data cache and TLB fill
requests to the L2 and instruction/data cache and TLB misses in the L2
when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
respectively. That way the events measure unified caches on both
platforms.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1472044328-21302-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 16:19:49 +02:00
Alexander Shishkin
1155bafcb7 perf/x86/intel/pt: Do validate the size of a kernel address filter
Right now, the kernel address filters in PT are prone to integer overflow
that may happen in adding filter's size to its offset to obtain the end
of the range. Such an overflow would also throw a #GP in the PT event
configuration path.

Fix this by explicitly validating the result of this calculation.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 11:14:16 +02:00
Alexander Shishkin
ddfdad991e perf/x86/intel/pt: Fix kernel address filter's offset validation
The kernel_ip() filter is used mostly by the DS/LBR code to look at the
branch addresses, but Intel PT also uses it to validate the address
filter offsets for kernel addresses, for which it is not sufficient:
supplying something in bits 64:48 that's not a sign extension of the lower
address bits (like 0xf00d000000000000) throws a #GP.

This patch adds address validation for the user supplied kernel filters.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 11:14:16 +02:00
Alexander Shishkin
95f60084ac perf/x86/intel/pt: Fix an off-by-one in address filter configuration
PT address filter configuration requires that a range is specified by
its first and last address, but at the moment we're obtaining the end
of the range by adding user specified size to its start, which is off
by one from what it actually needs to be.

Fix this and make sure that zero-sized filters don't pass the filter
validation.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 11:14:16 +02:00
Linus Torvalds
024c7e3756 One fix for an x86 regression in VM migration, mostly visible with
Windows because it uses RTC periodic interrupts.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJX2xpHAAoJEL/70l94x66D9NsIAIw+9oRA86qjehVnguV3fRKA
 ITZ4OGFDiXPWuxqDaw8mHHXr0RYx8KcMTzfFNbV+YL5U0cq9xYzdaNhchKPpyF+3
 7H5wL8Ku9wkYZ930kdCf5Q+LNCfg8d/wKlibPEbX0MDx4jL99kkcxLzEkmIRqFlq
 bpXaQe/KR1xCWR6gI/a6aRJWLfGuFMV82YSnk/dCSjwotbAwjJUSt+IPhLwhx28o
 7ddcxW3CxQqelJorcu2lvRiGnCvEzDhIdOvHJqibCjo3uzqbcI4PA2gs3rozbs9s
 VMEzqZpNgK0XsyKyccSw75npViIHYPkjMzxoyHMDhgiP3eIwp/tJquxAfLjK4WE=
 =h4P4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fix from Paolo Bonzini:
 "One fix for an x86 regression in VM migration, mostly visible with
  Windows because it uses RTC periodic interrupts"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: correctly reset dest_map->vector when restoring LAPIC state
2016-09-15 15:15:41 -07:00
Al Viro
1c109fabbd fix minor infoleak in get_user_ex()
get_user_ex(x, ptr) should zero x on failure.  It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ This sat in different branch from the uaccess fixes since mid-August ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-15 12:54:04 -07:00
Paolo Bonzini
b0eaf4506f kvm: x86: correctly reset dest_map->vector when restoring LAPIC state
When userspace sends KVM_SET_LAPIC, KVM schedules a check between
the vCPU's IRR and ISR and the IOAPIC redirection table, in order
to re-establish the IOAPIC's dest_map (the list of CPUs servicing
the real-time clock interrupt with the corresponding vectors).

However, __rtc_irq_eoi_tracking_restore_one was forgetting to
set dest_map->vectors.  Because of this, the IOAPIC did not process
the real-time clock interrupt EOI, ioapic->rtc_status.pending_eoi
got stuck at a non-zero value, and further RTC interrupts were
reported to userspace as coalesced.

Fixes: 9e4aabe2bb
Fixes: 4d99ba898d
Cc: stable@vger.kernel.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: David Gilbert <dgilbert@redhat.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-15 18:00:32 +02:00
Alexander Shishkin
cecf62352a perf/x86/intel: Don't disable "intel_bts" around "intel" event batching
At the moment, intel_bts events get disabled from intel PMU's disable
callback, which includes event scheduling transactions of said PMU,
which have nothing to do with intel_bts events.

We do want to keep intel_bts events off inside the PMI handler to
avoid filling up their buffer too soon.

This patch moves intel_bts enabling/disabling directly to the PMI
handler.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915082233.11065-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 11:25:26 +02:00
Linus Torvalds
4cea877657 PCI updates for v4.8:
Enumeration
     Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)
 
   Power management
     Fix bridge_d3 update on device removal (Lukas Wunner)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX2az0AAoJEFmIoMA60/r8/EoP/28mGRiKi8mlqNAR3MYN3F0n
 VSIm7WyxWNawH1gRJXKQBzNqgMJnj4qRGXSIvP3AIYyBDcJs/X7j91/eKOARNfQr
 55A+gfSz4jUKlw+0WgPY8/U2/xQ4yoom1zhbsAYcIVeljZo/3JUg+wHpPjhIMkH0
 2slTerHRDExrS43jxQi225toiEaO6lcY8EVmHCDo+jYlQz3sCEwIXg9hn1rwTbvG
 sJI0zyUwHF+oWowgJqlwYxsbPPnelPAN5YAx7KrHuVmBdL0Bgo3oIRtbb3JZZ9Up
 L9bQ6NpRjSARvijaZ2TAhueqIIDv2HGgvwNB01l4Yggw7Sm1dFCuUS6vj/e5tpZA
 xntE3F6s2Z+I4I1D7pAX3jMYCdYx/QltiTCeGRp8pJv+f4ewW3jcel3FAksY3BEg
 0NCjDrGFqGYai4hGRROpt/aXlW/Pn53eQLlu4Xg2qgkj0NMh0ODMrTjMnABB39ae
 eGqIXab7WeVBxt10eU19J1u1RTqpUO2LJW+cMnvYdCfKAYby/gj8SD8vIsn3oDjZ
 hQS/4fSHurc7LZwDmwOfaiHlGnvcQWV9EKwgScS0v8AxPQnC8pNUgYpzZcXY8Q6I
 YXtyK7suFriSZPS0Qs4FrdfJrmTBaBQ55aZu9aftb5v4YqacN5qZo+HaXiONBy3v
 RzsFK6xIbnIgb35g8vKy
 =PQml
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Here are two changes for v4.8.  The first fixes a "[Firmware Bug]: reg
  0x10: invalid BAR (can't size)" warning on Haswell, and the second
  fixes a problem in some new runtime suspend functionality we merged
  for v4.8.  Summary:

  Enumeration:
    Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)

  Power management:
    Fix bridge_d3 update on device removal (Lukas Wunner)"

* tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Fix bridge_d3 update on device removal
  PCI: Mark Haswell Power Control Unit as having non-compliant BARs
2016-09-14 14:06:30 -07:00
Linus Torvalds
5924bbecd0 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Three fixes:

   - AMD microcode loading fix with randomization

   - an lguest tooling fix

   - and an APIC enumeration boundary condition fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix num_processors value in case of failure
  tools/lguest: Don't bork the terminal in case of wrong args
  x86/microcode/AMD: Fix load of builtin microcode with randomized memory
2016-09-13 12:52:45 -07:00
Linus Torvalds
ee319d5834 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This contains:

   - a set of fixes found by directed-random perf fuzzing efforts by
     Vince Weaver, Alexander Shishkin and Peter Zijlstra

   - a cqm driver crash fix

   - an AMD uncore driver use after free fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix PEBSv3 record drain
  perf/x86/intel/bts: Kill a silly warning
  perf/x86/intel/bts: Fix BTS PMI detection
  perf/x86/intel/bts: Fix confused ordering of PMU callbacks
  perf/core: Fix aux_mmap_count vs aux_refcount order
  perf/core: Fix a race between mmap_close() and set_output() of AUX events
  perf/x86/amd/uncore: Prevent use after free
  perf/x86/intel/cqm: Check cqm/mbm enabled state in event init
  perf/core: Remove WARN from perf_event_read()
2016-09-13 12:47:29 -07:00
Linus Torvalds
7c2c114416 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "This contains a Xen fix, an arm64 fix and a race condition /
  robustization set of fixes related to ExitBootServices() usage and
  boundary conditions"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Use efi_exit_boot_services()
  efi/libstub: Use efi_exit_boot_services() in FDT
  efi/libstub: Introduce ExitBootServices helper
  efi/libstub: Allocate headspace in efi_get_memory_map()
  efi: Fix handling error value in fdt_find_uefi_params
  efi: Make for_each_efi_memory_desc_in_map() cope with running on Xen
2016-09-13 12:02:00 -07:00
Ingo Molnar
5465fe0fc3 * Refactor the EFI memory map code into architecture neutral files
and allow drivers to permanently reserve EFI boot services regions
    on x86, as well as ARM/arm64 - Matt Fleming
 
  * Add ARM support for the EFI esrt driver - Ard Biesheuvel
 
  * Make the EFI runtime services and efivar API interruptible by
    swapping spinlocks for semaphores - Sylvain Chouleur
 
  * Provide the EFI identity mapping for kexec which allows kexec to
    work on SGI/UV platforms with requiring the "noefi" kernel command
    line parameter - Alex Thorlton
 
  * Add debugfs node to dump EFI page tables on arm64 - Ard Biesheuvel
 
  * Merge the EFI test driver being carried out of tree until now in
    the FWTS project - Ivan Hu
 
  * Expand the list of flags for classifying EFI regions as "RAM" on
    arm64 so we align with the UEFI spec - Ard Biesheuvel
 
  * Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
    or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
    services function table for direct calls, alleviating us from
    having to maintain the custom function table - Lukas Wunner
 
  * Miscellaneous cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQI2BAABCAAgBQJX0tCTGRxtYXR0QGNvZGVibHVlcHJpbnQuY28udWsACgkQLzhZ
 wI0jPVWLVBAAn/iM91Vmhggdk3t0wCMJzrNGonw61VJ9TZJVbCUJyiH0qdDUThhj
 R4rO+6Vf6yOuyswu+mGmae61tfsjwJHH+IPpB8nRLIGQRwzoxk+aGC7FzmQ0ISVO
 wIdv5shsmeWhFAyNB1D4hzlp1NxOZaqcU/0cfUVGe4HmK0Js3tUpWWx8VgJ7yvW+
 X1PBbfyChArGqiwV6FJz/mJxRAgByUfhvYMcX9HhQkou6F4U5Y8l3vlhUMbuAZAi
 ZfG2LWGYCQ+F4XKxMq2QDAtdUgBzlYWw6W60o55x9WO4cEVSzNVRgedto5o1Zea9
 2QGEr94gim+e5cJ/HeDIEmbWZhAqIdcNDqXSSBd1CDVQytp4PNAn6rxk+2S9kxoe
 T9Mk523HEabo+AZvDAPPJlzcsnIe83JYy69M1xFvcP25ebk7y2BwQtd1jwWPrPDQ
 Q/llzF93aezUFR/guvIw0oHckhQl0ZkNedL9Tq4+UKL0ibp2X4gSX636/x4PkBSP
 5+pyfmO1SAqTiiMQGQMnp4+ngPQeQrxkmVnh1P7cKlTNXg1IoS03t46Xn2Pj10cd
 3KneVDeN9DKIAOn7wPKuPnjTho+9FH36xbwTaIgbt0cWuFFfu090rmqOQfjAJEDN
 foHzsMZ7S6CmeOJnj97NNR8sMQDcc+p9bh1KXpJIHaZAgrKmvqPZpMk=
 =G7L6
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/core

Pull EFI updates from Matt Fleming:

"* Refactor the EFI memory map code into architecture neutral files
   and allow drivers to permanently reserve EFI boot services regions
   on x86, as well as ARM/arm64 - Matt Fleming

 * Add ARM support for the EFI esrt driver - Ard Biesheuvel

 * Make the EFI runtime services and efivar API interruptible by
   swapping spinlocks for semaphores - Sylvain Chouleur

 * Provide the EFI identity mapping for kexec which allows kexec to
   work on SGI/UV platforms with requiring the "noefi" kernel command
   line parameter - Alex Thorlton

 * Add debugfs node to dump EFI page tables on arm64 - Ard Biesheuvel

 * Merge the EFI test driver being carried out of tree until now in
   the FWTS project - Ivan Hu

 * Expand the list of flags for classifying EFI regions as "RAM" on
   arm64 so we align with the UEFI spec - Ard Biesheuvel

 * Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
   or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
   services function table for direct calls, alleviating us from
   having to maintain the custom function table - Lukas Wunner

 * Miscellaneous cleanups and fixes"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-13 20:21:55 +02:00
Al Stone
c12f29a5d4 x86: ACPI: make variable names clearer in acpi_parse_madt_lapic_entries()
This patch has no functional change; it is purely cosmetic, though
it does make it a wee bit easier to understand the code.  Before, the
count of LAPICs was being stored in the variable 'x2count' and the
count of X2APICs was being stored in the variable 'count'.  This
patch swaps that so that the routine acpi_parse_madt_lapic_entries()
will now consistently use x2count to refer to X2APIC info, and count
to refer to LAPIC info.

Signed-off-by: Al Stone <ahs3@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-13 02:19:59 +02:00
Al Stone
0f61aaa413 x86: ACPI: remove extraneous white space after semicolon
Signed-off-by: Al Stone <ahs3@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-13 02:19:59 +02:00
Linus Torvalds
ac059c4fa7 * s390: nested virt fixes (new 4.8 feature)
* x86: fixes for 4.8 regressions
 * ARM: two small bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJX1xaBAAoJEL/70l94x66Db/8H/AuKWs6QbvUNnA+tWITdmFbi
 ah7R2BoRVak4h6UGYC4OsY9BTuWVeaCx2+8yLIqSdfWoYUJzwiGFPwkuUK/hVSkK
 /9gZO8SsREAm/1gVck2X2vzATYbfCxKcRsSq0/ocmIQEpRYWKGoJWS6zeiwLZ9kq
 H/KsAvsGc83VdlPXrhLVQa7js/Tl/M5JlBEbm6i8nIsvhoqZkES4rgwHKBtkxTyU
 8oepfNQnYTb2hZhJE0aW+9z3V0O+NKbGW75bKOzrbJBoEUGeHuPpLnP0mZIyEGPU
 grQkfuWMmfEaWrGahNA9ARpsNcy4Zp0BF5mgJ03OAmgfY+KmJroVk95IvcP9jF4=
 =+uuA
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 - s390: nested virt fixes (new 4.8 feature)
 - x86: fixes for 4.8 regressions
 - ARM: two small bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm-arm: Unmap shadow pagetables properly
  x86, clock: Fix kvm guest tsc initialization
  arm: KVM: Fix idmap overlap detection when the kernel is idmap'ed
  KVM: lapic: adjust preemption timer correctly when goes TSC backward
  KVM: s390: vsie: fix riccbd
  KVM: s390: don't use current->thread.fpu.* when accessing registers
2016-09-12 14:30:14 -07:00
Linus Torvalds
98ac9a608d Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "nvdimm fixes for v4.8, two of them are tagged for -stable:

   - Fix devm_memremap_pages() to use track_pfn_insert().  Otherwise,
     DAX pmd mappings end up with an uncached pgprot, and unusable
     performance for the device-dax interface.  The device-dax interface
     appeared in 4.7 so this is tagged for -stable.

   - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
     understand DAX pmd entries.  This fix is tagged for -stable.

   - Fix a mis-merge of the nfit machine-check handler to flip the
     polarity of an if() to match the final version of the patch that
     Vishal sent for 4.8-rc1.  Without this the nfit machine check
     handler never detects / inserts new 'badblocks' entries which
     applications use to identify lost portions of files.

   - For test purposes, fix the nvdimm_clear_poison() path to operate on
     legacy / simulated nvdimm memory ranges.  Without this fix a test
     can set badblocks, but never clear them on these ranges.

   - Fix the range checking done by dax_dev_pmd_fault().  This is not
     tagged for -stable since this problem is mitigated by specifying
     aligned resources at device-dax setup time.

  These patches have appeared in a next release over the past week.  The
  recent rebase you can see in the timestamps was to drop an invalid fix
  as identified by the updated device-dax unit tests [1].  The -mm
  touches have an ack from Andrew"

[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
   https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm: allow legacy (e820) pmem region to clear bad blocks
  nfit, mce: Fix SPA matching logic in MCE handler
  mm: fix cache mode of dax pmd mappings
  mm: fix show_smap() for zone_device-pmd ranges
  dax: fix mapping size check
2016-09-10 09:58:52 -07:00
Peter Zijlstra
8ef9b8455a perf/x86/intel: Fix PEBSv3 record drain
Alexander hit the WARN_ON_ONCE(!event) on his Skylake while running
the perf fuzzer.

This means the PEBSv3 record included a status bit for an inactive
event, something that _should_ not happen.

Move the code that filters the status bits against our known PEBS
events up a spot to guarantee we only deal with events we know about.

Further add "continue" statements to the WARN_ON_ONCE()s such that
we'll not die nor generate silly events in case we ever do hit them
again.

Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: stable@vger.kernel.org
Fixes: a3d86542de ("perf/x86/intel/pebs: Add PEBSv3 decoding")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:39 +02:00
Alexander Shishkin
ef9ef3befa perf/x86/intel/bts: Kill a silly warning
At the moment, intel_bts will WARN() out if there is more than one
event writing to the same ring buffer, via SET_OUTPUT, and will only
send data from one event to a buffer.

There is no reason to have this warning in, so kill it.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-6-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:38 +02:00
Alexander Shishkin
4d4c474124 perf/x86/intel/bts: Fix BTS PMI detection
Since BTS doesn't have a dedicated PMI status bit, the driver needs to
take extra care to check for the condition that triggers it to avoid
spurious NMI warnings.

Regardless of the local BTS context state, the only way of knowing that
the NMI is ours is to compare the write pointer against the interrupt
threshold.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-5-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:38 +02:00
Alexander Shishkin
a9a94401c2 perf/x86/intel/bts: Fix confused ordering of PMU callbacks
The intel_bts driver is using a CPU-local 'started' variable to order
callbacks and PMIs and make sure that AUX transactions don't get messed
up. However, the ordering rules in regard to this variable is a complete
mess, which recently resulted in perf_fuzzer-triggered warnings and
panics.

The general ordering rule that is patch is enforcing is that this
cpu-local variable be set only when the cpu-local AUX transaction is
active; consequently, this variable is to be checked before the AUX
related bits can be touched.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:37 +02:00
Dan Williams
9049771f7d mm: fix cache mode of dax pmd mappings
track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
uncacheable rendering them impractical for application usage.  DAX-pte
mappings are cached and the goal of establishing DAX-pmd mappings is to
attain more performance, not dramatically less (3 orders of magnitude).

track_pfn_insert() relies on a previous call to reserve_memtype() to
establish the expected page_cache_mode for the range.  While memremap()
arranges for reserve_memtype() to be called, devm_memremap_pages() does
not.  So, teach track_pfn_insert() and untrack_pfn() how to handle
tracking without a vma, and arrange for devm_memremap_pages() to
establish the write-back-cache reservation in the memtype tree.

Cc: <stable@vger.kernel.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-09 17:34:46 -07:00
Sebastian Andrzej Siewior
7d762e49c2 perf/x86/amd/uncore: Prevent use after free
The resent conversion of the cpu hotplug support in the uncore driver
introduced a regression due to the way the callbacks are invoked at
initialization time.

The old code called the prepare/starting/online function on each online cpu
as a block. The new code registers the hotplug callbacks in the core for
each state. The core invokes the callbacks at each registration on all
online cpus.

The code implicitely relied on the prepare/starting/online callbacks being
called as combo on a particular cpu, which was not obvious and completely
undocumented.

The resulting subtle wreckage happens due to the way how the uncore code
manages shared data structures for cpus which share an uncore resource in
hardware. The sharing is determined in the cpu starting callback, but the
prepare callback allocates per cpu data for the upcoming cpu because
potential sharing is unknown at this point. If the starting callback finds
a online cpu which shares the hardware resource it takes a refcount on the
percpu data of that cpu and puts the own data structure into a
'free_at_online' pointer of that shared data structure. The online callback
frees that.

With the old model this worked because in a starting callback only one non
unused structure (the one of the starting cpu) was available. The new code
allocates the data structures for all cpus when the prepare callback is
registered.

Now the starting function iterates through all online cpus and looks for a
data structure (skipping its own) which has a matching hardware id. The id
member of the data structure is initialized to 0, but the hardware id can
be 0 as well. The resulting wreckage is:

  CPU0 finds a matching id on CPU1, takes a refcount on CPU1 data and puts
  its own data structure into CPU1s data structure to be freed.

  CPU1 skips CPU0 because the data structure is its allegedly unsued own.
  It finds a matching id on CPU2, takes a refcount on CPU1 data and puts
  its own data structure into CPU2s data structure to be freed.

  ....

Now the online callbacks are invoked.

  CPU0 has a pointer to CPU1s data and frees the original CPU0 data. So
  far so good.

  CPU1 has a pointer to CPU2s data and frees the original CPU1 data, which
  is still referenced by CPU0 ---> Booom

So there are two issues to be solved here:

1) The id field must be initialized at allocation time to a value which
   cannot be a valid hardware id, i.e. -1

   This prevents the above scenario, but now CPU1 and CPU2 both stick their
   own data structure into the free_at_online pointer of CPU0. So we leak
   CPU1s data structure.

2) Fix the memory leak described in #1

   Instead of having a single pointer, use a hlist to enqueue the
   superflous data structures which are then freed by the first cpu
   invoking the online callback.

Ideally we should know the sharing _before_ invoking the prepare callback,
but that's way beyond the scope of this bug fix.

[ tglx: Rewrote changelog ]

Fixes: 96b2bd3866 ("perf/x86/amd/uncore: Convert to hotplug state machine")
Reported-and-tested-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20160909160822.lowgmkdwms2dheyv@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-10 00:00:06 +02:00
Lukas Wunner
0a637ee612 x86/efi: Allow invocation of arbitrary boot services
We currently allow invocation of 8 boot services with efi_call_early().
Not included are LocateHandleBuffer and LocateProtocol in particular.
For graphics output or to retrieve PCI ROMs and Apple device properties,
we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
combo, which is cumbersome and needs more code.

The ARM folks allow invocation of the full set of boot services but are
restricted to our 8 boot services in functions shared across arches.

Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
struct efi_config, let's rework efi_call_early() to allow invocation of
arbitrary boot services by selecting the 64 bit vs 32 bit code path in
the macro itself.

When compiling for 32 bit or for 64 bit without mixed mode, the unused
code path is optimized away and the binary code is the same as before.
But on 64 bit with mixed mode enabled, this commit adds one compare
instruction to each invocation of a boot service and, depending on the
code path selected, two jump instructions. (Most of the time gcc
arranges the jumps in the 32 bit code path.) The result is a minuscule
performance penalty and the binary code becomes slightly larger and more
difficult to read when disassembled. This isn't a hot path, so these
drawbacks are arguably outweighed by the attainable simplification of
the C code. We have some overhead anyway for thunking or conversion
between calling conventions.

The 8 boot services can consequently be removed from struct efi_config.

No functional change intended (for now).

Example -- invocation of free_pool before (64 bit code path):
0x2d4      movq  %ds:efi_early, %rdx          ; efi_early
0x2db      movq  %ss:arg_0-0x20(%rsp), %rsi
0x2e0      xorl  %eax, %eax
0x2e2      movq  %ds:0x28(%rdx), %rdi         ; efi_early->free_pool
0x2e6      callq *%ds:0x58(%rdx)              ; efi_early->call()

Example -- invocation of free_pool after (64 / 32 bit mixed code path):
0x0dc      movq  %ds:efi_early, %rax          ; efi_early
0x0e3      cmpb  $0, %ds:0x28(%rax)           ; !efi_early->is64 ?
0x0e7      movq  %ds:0x20(%rax), %rdx         ; efi_early->call()
0x0eb      movq  %ds:0x10(%rax), %rax         ; efi_early->boot_services
0x0ef      je    $0x150
0x0f1      movq  %ds:0x48(%rax), %rdi         ; free_pool (64 bit)
0x0f5      xorl  %eax, %eax
0x0f7      callq *%rdx
...
0x150      movl  %ds:0x30(%rax), %edi         ; free_pool (32 bit)
0x153      jmp   $0x0f5

Size of eboot.o text section:
CONFIG_X86_32:                         6464 before, 6318 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED:    7670 before, 7573 after
CONFIG_X86_64 &&  CONFIG_EFI_MIXED:    7670 before, 8319 after

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:57 +01:00
Lukas Wunner
2757161638 x86/efi: Optimize away setup_gop32/64 if unused
Commit 2c23b73c2d ("x86/efi: Prepare GOP handling code for reuse
as generic code") introduced an efi_is_64bit() macro to x86 which
previously only existed for arm arches.  The macro is used to
choose between the 64 bit or 32 bit code path in gop.c at runtime.

However the code path that's going to be taken is known at compile
time when compiling for x86_32 or for x86_64 with mixed mode disabled.
Amend the macro to eliminate the unused code path in those cases.

Size of gop.o text section:
CONFIG_X86_32:                         1758 before, 1299 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED:    2201 before, 1406 after
CONFIG_X86_64 &&  CONFIG_EFI_MIXED:    2201 before and after

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:56 +01:00
Markus Elfring
20ebc15e6c x86/efi: Use kmalloc_array() in efi_call_phys_prolog()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus reuse the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:55 +01:00