Commit Graph

6261 Commits

Author SHA1 Message Date
Linus Torvalds
e663107fa1 ACPI material for v4.8-rc1
- Support for ACPI SSDT overlays allowing Secondary System
    Description Tables (SSDTs) to be loaded at any time from EFI
    variables or via configfs (Octavian Purdila, Mika Westerberg).
 
  - Support for the ACPI LPI (Low-Power Idle) feature introduced in
    ACPI 6.0 and allowing processor idle states to be represented in
    ACPI tables in a hierarchical way (with the help of Processor
    Container objects) and support for ACPI idle states management
    on ARM64, based on LPI (Sudeep Holla).
 
  - General improvements of ACPI support for NUMA and ARM64 support
    for ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter).
 
  - General improvements of the ACPI table upgrade mechanism and
    ARM64 support for that feature (Aleksey Makarov, Jon Masters).
 
  - Support for the Boot Error Record Table (BERT) in APEI and
    improvements of kernel messages printed by the error injection
    code (Huang Ying, Borislav Petkov).
 
  - New driver for the Intel Broxton WhiskeyCove PMIC operation
    region and support for the REGS operation region on Broxton,
    PMIC code cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker).
 
  - New driver for the power participant device which is part of the
    Dynamic Power and Thermal Framework (DPTF) and DPTF-related code
    reorganization (Srinivas Pandruvada).
 
  - Support for the platform-initiated graceful shutdown feature
    introduced in ACPI 6.1 (Prashanth Prakash).
 
  - ACPI button driver update related to lid input events generated
    automatically on initialization and system resume that have been
    problematic for some time (Lv Zheng).
 
  - ACPI EC driver cleanups (Lv Zheng).
 
  - Documentation of the ACPICA release automation process and the
    in-kernel ACPI AML debugger (Lv Zheng).
 
  - New blacklist entry and two fixes for the ACPI backlight driver
    (Alex Hung, Arvind Yadav, Ralf Gerbig).
 
  - Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker).
 
  - ACPI CPPC code changes to make it more robust against possible
    defects in ACPI tables and new symbol definitions for PCC (Hoan
    Tran).
 
  - System reboot code modification to execute the ACPI _PTS (Prepare
    To Sleep) method in addition to _TTS (Ocean He).
 
  - ACPICA-related change to carry out lock ordering checks in ACPICA
    if ACPICA debug is enabled in the kernel (Lv Zheng).
 
  - Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He,
    Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXl8A7AAoJEILEb/54YlRxF0kQAI6mH0yan60Osu4598+VNvgv
 wxOWl1TEbKd+LaJkofRZ+FPzZkQf5c/h/8Oo8Q3LEpFhjkARhhX7ThDjS5v2Nx6v
 I/icQ64ynPUPrw6hGNVrmec9ofZjiAs3j3Rt2bEiae+YN6guvfhWE+kBCHo2G/nN
 o4BSaxYjkphUTDSi4/5BfaocV2sl3apvwjtAj8zgGn4RD81bFFLnblynHkqJVcoN
 HAfm7QTVjT01Zkv565OSZgK8CFcD8Ky2KKKBQvcIW8zQmD6IXaoTHSYSwL0SH+oK
 bxUZUmWVfFWw4kDTAY9mw0QwtWz9ODTWh/WMhs3itWRRN5qHfogs99rCVYFtFufQ
 ODVy4wpt4wmpzZVhyUDTTigAhznPAtCam6EpL1YeNbtyrRN4evvZVFHBZJnmhosX
 zI9iLF4eqdnJZKvh+L1VFU+py8aAZpz1ZEOatNMI+xdhArbGm7v89cldzaRkJhuW
 LZr+JqYQGaOZS5qSnymwJL1KfF66+2QGpzdvzJN5FNIDACoqanATbZ/Iie2ENcM+
 WwCEWrGJFDmM30raBNNcvx0yHFtVkcNbOymla4paVg7i29nu88Ynw4Z6seIIP11C
 DryzLFhw+3jdTg2zK/te/wkhciJ0F+iZjo6VXywSMnwatf36bpdp4r4JLUVfEo2t
 8DOGKyFMLYY1zOPMK9Th
 =YwbM
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "The new feaures here are the support for ACPI overlays (allowing ACPI
  tables to be loaded at any time from EFI variables or via configfs)
  and the LPI (Low-Power Idle) support.  Also notable is the ACPI-based
  NUMA support for ARM64.

  Apart from that we have two new drivers, for the DPTF (Dynamic Power
  and Thermal Framework) power participant device and for the Intel
  Broxton WhiskeyCove PMIC, some more PMIC-related changes, support for
  the Boot Error Record Table (BERT) in APEI and support for
  platform-initiated graceful shutdown.

  Plus two new pieces of documentation and usual assorted fixes and
  cleanups in quite a few places.

  Specifics:

   - Support for ACPI SSDT overlays allowing Secondary System
     Description Tables (SSDTs) to be loaded at any time from EFI
     variables or via configfs (Octavian Purdila, Mika Westerberg).

   - Support for the ACPI LPI (Low-Power Idle) feature introduced in
     ACPI 6.0 and allowing processor idle states to be represented in
     ACPI tables in a hierarchical way (with the help of Processor
     Container objects) and support for ACPI idle states management on
     ARM64, based on LPI (Sudeep Holla).

   - General improvements of ACPI support for NUMA and ARM64 support for
     ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter).

   - General improvements of the ACPI table upgrade mechanism and ARM64
     support for that feature (Aleksey Makarov, Jon Masters).

   - Support for the Boot Error Record Table (BERT) in APEI and
     improvements of kernel messages printed by the error injection code
     (Huang Ying, Borislav Petkov).

   - New driver for the Intel Broxton WhiskeyCove PMIC operation region
     and support for the REGS operation region on Broxton, PMIC code
     cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker).

   - New driver for the power participant device which is part of the
     Dynamic Power and Thermal Framework (DPTF) and DPTF-related code
     reorganization (Srinivas Pandruvada).

   - Support for the platform-initiated graceful shutdown feature
     introduced in ACPI 6.1 (Prashanth Prakash).

   - ACPI button driver update related to lid input events generated
     automatically on initialization and system resume that have been
     problematic for some time (Lv Zheng).

   - ACPI EC driver cleanups (Lv Zheng).

   - Documentation of the ACPICA release automation process and the
     in-kernel ACPI AML debugger (Lv Zheng).

   - New blacklist entry and two fixes for the ACPI backlight driver
     (Alex Hung, Arvind Yadav, Ralf Gerbig).

   - Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker).

   - ACPI CPPC code changes to make it more robust against possible
     defects in ACPI tables and new symbol definitions for PCC (Hoan
     Tran).

   - System reboot code modification to execute the ACPI _PTS (Prepare
     To Sleep) method in addition to _TTS (Ocean He).

   - ACPICA-related change to carry out lock ordering checks in ACPICA
     if ACPICA debug is enabled in the kernel (Lv Zheng).

   - Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He,
     Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki)"

* tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (71 commits)
  ACPI: enable ACPI_PROCESSOR_IDLE on ARM64
  arm64: add support for ACPI Low Power Idle(LPI)
  drivers: firmware: psci: initialise idle states using ACPI LPI
  cpuidle: introduce CPU_PM_CPU_IDLE_ENTER macro for ARM{32, 64}
  arm64: cpuidle: drop __init section marker to arm_cpuidle_init
  ACPI / processor_idle: Add support for Low Power Idle(LPI) states
  ACPI / processor_idle: introduce ACPI_PROCESSOR_CSTATE
  ACPI / DPTF: move int340x_thermal.c to the DPTF folder
  ACPI / DPTF: Add DPTF power participant driver
  ACPI / lpat: make it explicitly non-modular
  ACPI / dock: make dock explicitly non-modular
  ACPI / PCI: make pci_slot explicitly non-modular
  ACPI / PMIC: remove modular references from non-modular code
  ACPICA: Linux: Enable ACPI_MUTEX_DEBUG for Linux kernel
  ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
  ACPI / debugger: Add AML debugger documentation
  ACPI: Add documentation describing ACPICA release automation
  ACPI: add support for loading SSDTs via configfs
  ACPI: add support for configfs
  efi / ACPI: load SSTDs from EFI variables
  ...
2016-07-26 17:56:45 -07:00
Linus Torvalds
6453dbdda3 Power management material for v4.8-rc1
- Rework the cpufreq governor interface to make it more straightforward
    and modify the conservative governor to avoid using transition
    notifications (Rafael Wysocki).
 
  - Rework the handling of frequency tables by the cpufreq core to make
    it more efficient (Viresh Kumar).
 
  - Modify the schedutil governor to reduce the number of wakeups it
    causes to occur in cases when the CPU frequency doesn't need to be
    changed (Steve Muckle, Viresh Kumar).
 
  - Fix some minor issues and clean up code in the cpufreq core and
    governors (Rafael Wysocki, Viresh Kumar).
 
  - Add Intel Broxton support to the intel_pstate driver (Srinivas
    Pandruvada).
 
  - Fix problems related to the config TDP feature and to the validity
    of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
    Srinivas Pandruvada).
 
  - Make intel_pstate update the cpu_frequency tracepoint even if
    the frequency doesn't change to avoid confusing powertop (Rafael
    Wysocki).
 
  - Clean up the usage of __init/__initdata in intel_pstate, mark some
    of its internal variables as __read_mostly and drop an unused
    structure element from it (Jisheng Zhang, Carsten Emde).
 
  - Clean up the usage of some duplicate MSR symbols in intel_pstate
    and turbostat (Srinivas Pandruvada).
 
  - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
    Adiga, Viresh Kumar, Ben Dooks).
 
  - Fix a regression (introduced during the 4.5 cycle) in the
    pcc-cpufreq driver by reverting the problematic commit (Andreas
    Herrmann).
 
  - Add support for Intel Denverton to intel_idle, clean up Broxton
    support in it and make it explicitly non-modular (Jacob Pan,
    Jan Beulich, Paul Gortmaker).
 
  - Add support for Denverton and Ivy Bridge server to the Intel RAPL
    power capping driver and make it more careful about the handing
    of MSRs that may not be present (Jacob Pan, Xiaolong Wang).
 
  - Fix resume from hibernation on x86-64 by making the CPU offline
    during resume avoid using MONITOR/MWAIT in the "play dead" loop
    which may lead to an inadvertent "revival" of a "dead" CPU and
    a page fault leading to a kernel crash from it (Rafael Wysocki).
 
  - Make memory management during resume from hibernation more
    straightforward (Rafael Wysocki).
 
  - Add debug features that should help to detect problems related
    to hibernation and resume from it (Rafael Wysocki, Chen Yu).
 
  - Clean up hibernation core somewhat (Rafael Wysocki).
 
  - Prevent KASAN from instrumenting the hibernation core which leads
    to large numbers of false-positives from it (James Morse).
 
  - Prevent PM (hibernate and suspend) notifiers from being called
    during the cleanup phase if they have not been called during the
    corresponding preparation phase which is possible if one of the
    other notifiers returns an error at that time (Lianwei Wang).
 
  - Improve suspend-related debug printout in the tasks freezer and
    clean up suspend-related console handling (Roger Lu, Borislav
    Petkov).
 
  - Update the AnalyzeSuspend script in the kernel sources to
    version 4.2 (Todd Brandt).
 
  - Modify the generic power domains framework to make it handle
    system suspend/resume better (Ulf Hansson).
 
  - Make the runtime PM framework avoid resuming devices synchronously
    when user space changes the runtime PM settings for them and
    improve its error reporting (Rafael Wysocki, Linus Walleij).
 
  - Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus)
    and in the core, make some devfreq code explicitly non-modular and
    change some of it into tristate (Bartlomiej Zolnierkiewicz,
    Peter Chen, Paul Gortmaker).
 
  - Add DT support to the generic PM clocks management code and make
    it export some more symbols (Jon Hunter, Paul Gortmaker).
 
  - Make the PCI PM core code slightly more robust against possible
    driver errors (Andy Shevchenko).
 
  - Make it possible to change DESTDIR and PREFIX in turbostat
    (Andy Shevchenko).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXl7/dAAoJEILEb/54YlRx+VgQAIQJOWvxKew3Yl02c/sdj9OT
 5VNnFrzGzdcAPofvvG9qGq8B0Es1vYehJpwwOB21ri8EvYv0riIiU1yrqslObojQ
 oaZOkSBpbIoKjGR4CpYA/A+feE+8EqIBdPGd+lx5a6oRdUi7tRVHBG9lyLO3FB/i
 jan1q8dMpZsmu+Y+rVVHGnCVuIlIEqr2ZnZfCwDAulO2Arp/QFAh4kH08ELATvrl
 bkPa25vq7/VMP/vCDzrfZKD5mUuKogIRu/J5wx4py1nE+FB35cKKyqBOgklLwAeY
 UI8vjDhr/myNUs54AZlktOkq47TCYvjvhX9kmOxBjuWqFbRusU012IRek1fYPRIV
 ZqbkqNX7UEVQwunAEg9AyFwyzEtOht93dQDT5RLEd4QzKuM76gmHpLeTGGMzE+nu
 FnmF9JGl4DVwqpZl9yU2+hR2Mt3bP8OF8qYmNiGUB3KO4emPslhSd+6y8liA5Bx2
 SJf0Gb//vaHCh3/uMnwAonYPqRkZvBLOMwuL1VUjNQfRMnQtDdgHMYB1aT/EglPA
 8ww6j4J8rVRLAxvYQ3UEmNA/vBNclKXblRR18+JddEZP9/oX0ATfwnCCUpr839uk
 xxyQhrm4/AI60+PHWCX4GG80YrKdOGTkF7LXCQZanVWjjuyF17rufegZ2YWLT07v
 JU1Cmumfdy2jJluT8xsR
 =uVGz
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael  Wysocki:
 "Again, the majority of changes go into the cpufreq subsystem, but
  there are no big features this time.  The cpufreq changes that stand
  out somewhat are the governor interface rework and improvements
  related to the handling of frequency tables.  Apart from those, there
  are fixes and new device/CPU IDs in drivers, cleanups and an
  improvement of the new schedutil governor.

  Next, there are some changes in the hibernation core, including a fix
  for a nasty problem related to the MONITOR/MWAIT usage by CPU offline
  during resume from hibernation, a few core improvements related to
  memory management during resume, a couple of additional debug features
  and cleanups.

  Finally, we have some fixes and cleanups in the devfreq subsystem,
  generic power domains framework improvements related to system
  suspend/resume, support for some new chips in intel_idle and in the
  power capping RAPL driver, a new version of the AnalyzeSuspend utility
  and some assorted fixes and cleanups.

  Specifics:

   - Rework the cpufreq governor interface to make it more
     straightforward and modify the conservative governor to avoid using
     transition notifications (Rafael Wysocki).

   - Rework the handling of frequency tables by the cpufreq core to make
     it more efficient (Viresh Kumar).

   - Modify the schedutil governor to reduce the number of wakeups it
     causes to occur in cases when the CPU frequency doesn't need to be
     changed (Steve Muckle, Viresh Kumar).

   - Fix some minor issues and clean up code in the cpufreq core and
     governors (Rafael Wysocki, Viresh Kumar).

   - Add Intel Broxton support to the intel_pstate driver (Srinivas
     Pandruvada).

   - Fix problems related to the config TDP feature and to the validity
     of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
     Srinivas Pandruvada).

   - Make intel_pstate update the cpu_frequency tracepoint even if the
     frequency doesn't change to avoid confusing powertop (Rafael
     Wysocki).

   - Clean up the usage of __init/__initdata in intel_pstate, mark some
     of its internal variables as __read_mostly and drop an unused
     structure element from it (Jisheng Zhang, Carsten Emde).

   - Clean up the usage of some duplicate MSR symbols in intel_pstate
     and turbostat (Srinivas Pandruvada).

   - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
     Adiga, Viresh Kumar, Ben Dooks).

   - Fix a regression (introduced during the 4.5 cycle) in the
     pcc-cpufreq driver by reverting the problematic commit (Andreas
     Herrmann).

   - Add support for Intel Denverton to intel_idle, clean up Broxton
     support in it and make it explicitly non-modular (Jacob Pan, Jan
     Beulich, Paul Gortmaker).

   - Add support for Denverton and Ivy Bridge server to the Intel RAPL
     power capping driver and make it more careful about the handing of
     MSRs that may not be present (Jacob Pan, Xiaolong Wang).

   - Fix resume from hibernation on x86-64 by making the CPU offline
     during resume avoid using MONITOR/MWAIT in the "play dead" loop
     which may lead to an inadvertent "revival" of a "dead" CPU and a
     page fault leading to a kernel crash from it (Rafael Wysocki).

   - Make memory management during resume from hibernation more
     straightforward (Rafael Wysocki).

   - Add debug features that should help to detect problems related to
     hibernation and resume from it (Rafael Wysocki, Chen Yu).

   - Clean up hibernation core somewhat (Rafael Wysocki).

   - Prevent KASAN from instrumenting the hibernation core which leads
     to large numbers of false-positives from it (James Morse).

   - Prevent PM (hibernate and suspend) notifiers from being called
     during the cleanup phase if they have not been called during the
     corresponding preparation phase which is possible if one of the
     other notifiers returns an error at that time (Lianwei Wang).

   - Improve suspend-related debug printout in the tasks freezer and
     clean up suspend-related console handling (Roger Lu, Borislav
     Petkov).

   - Update the AnalyzeSuspend script in the kernel sources to version
     4.2 (Todd Brandt).

   - Modify the generic power domains framework to make it handle system
     suspend/resume better (Ulf Hansson).

   - Make the runtime PM framework avoid resuming devices synchronously
     when user space changes the runtime PM settings for them and
     improve its error reporting (Rafael Wysocki, Linus Walleij).

   - Fix error paths in devfreq drivers (exynos, exynos-ppmu,
     exynos-bus) and in the core, make some devfreq code explicitly
     non-modular and change some of it into tristate (Bartlomiej
     Zolnierkiewicz, Peter Chen, Paul Gortmaker).

   - Add DT support to the generic PM clocks management code and make it
     export some more symbols (Jon Hunter, Paul Gortmaker).

   - Make the PCI PM core code slightly more robust against possible
     driver errors (Andy Shevchenko).

   - Make it possible to change DESTDIR and PREFIX in turbostat (Andy
     Shevchenko)"

* tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
  Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
  PM / hibernate: Introduce test_resume mode for hibernation
  cpufreq: export cpufreq_driver_resolve_freq()
  cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
  PCI / PM: check all fields in pci_set_platform_pm()
  cpufreq: acpi-cpufreq: use cached frequency mapping when possible
  cpufreq: schedutil: map raw required frequency to driver frequency
  cpufreq: add cpufreq_driver_resolve_freq()
  cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
  intel_pstate: Update cpu_frequency tracepoint every time
  cpufreq: intel_pstate: clean remnant struct element
  PM / tools: scripts: AnalyzeSuspend v4.2
  x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
  cpufreq: powernv: Replacing pstate_id with frequency table index
  intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
  PM / hibernate: Image data protection during restoration
  PM / hibernate: Add missing braces in __register_nosave_region()
  PM / hibernate: Clean up comments in snapshot.c
  PM / hibernate: Clean up function headers in snapshot.c
  PM / hibernate: Add missing braces in hibernate_setup()
  ...
2016-07-26 17:29:07 -07:00
Linus Torvalds
37e13a1ebe Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This tree contains tooling fixes plus some additions:

   - fixes to the vdso2c build environment that Stephen Rothwell is
     using for the linux-next build (Arnaldo Carvalho de Melo)

   - AVX-512 instruction mappings (Adrian Hunter)

   - misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "perf tools: event.h needs asm/perf_regs.h"
  x86: Make the vdso2c compiler use the host architecture headers
  tools build: Fix objtool build with ARCH=x86_64
  objtool: Always use host headers
  objtool: Use tools/scripts/Makefile.arch to get ARCH and HOSTARCH
  tools build: Add HOSTARCH Makefile variable
  perf tests kmod-path: Fix build on ubuntu:16.04-x-armhf
  perf tools: Add AVX-512 instructions to the new instructions test
  perf tools: Add AVX-512 support to the instruction decoder used by Intel PT
  x86/insn: Add AVX-512 support to the instruction decoder
  x86/insn: perf tools: Fix vcvtph2ps instruction decoding
2016-07-26 10:26:29 -07:00
Linus Torvalds
5f22004ba9 Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Ingo Molnar:
 "The main change in this tree is the reworking, fixing and extension of
  the TSC frequency enumeration code (by Len Brown)"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Remove the unused check_tsc_disabled()
  x86/tsc: Enumerate BXT tsc_khz via CPUID
  x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
  x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration
  x86/tsc_msr: Add Airmont reference clock values
  x86/tsc_msr: Correct Silvermont reference clock values
  x86/tsc_msr: Update comments, expand definitions
  x86/tsc_msr: Remove debugging messages
  x86/tsc_msr: Identify Intel-specific code
  Revert "x86/tsc: Add missing Cherrytrail frequency to the table"
2016-07-25 19:41:35 -07:00
Linus Torvalds
8e466955d6 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Intel-SoC enhancements (Andy Shevchenko)

   - Intel CPU symbolic model definition rework (Dave Hansen)

   - ... other misc changes"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  x86/sfi: Enable enumeration of SD devices
  x86/pci: Use MRFLD abbreviation for Merrifield
  x86/platform/intel-mid: Make vertical indentation consistent
  x86/platform/intel-mid: Mark regulators explicitly defined
  x86/platform/intel-mid: Rename mrfl.c to mrfld.c
  x86/platform/intel-mid: Enable spidev on Intel Edison boards
  x86/platform/intel-mid: Extend PWRMU to support Penwell
  x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code
  x86/platform/intel-mid: Add pinctrl for Intel Merrifield
  x86/platform/intel-mid: Enable GPIO expanders on Edison
  x86/platform/intel-mid: Add Power Management Unit driver
  x86/platform/atom/punit: Enable support for Merrifield
  x86/platform/intel_mid_pci: Rework IRQ0 workaround
  x86, thermal: Clean up and fix CPU model detection for intel_soc_dts_thermal
  x86, mmc: Use Intel family name macros for mmc driver
  x86/intel_telemetry: Use Intel family name macros for telemetry driver
  x86/acpi/lss: Use Intel family name macros for the acpi_lpss driver
  x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
  x86/platform: Use new Intel model number macros
  x86/intel_idle: Use Intel family macros for intel_idle
  ...
2016-07-25 19:15:35 -07:00
Linus Torvalds
2d724ffddd Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "The main x86 FPU changes in this cycle were:

   - a large series of cleanups, fixes and enhancements to re-enable the
     XSAVES instruction on Intel CPUs - which is the most advanced
     instruction to do FPU context switches (Yu-cheng Yu, Fenghua Yu)

   - Add FPU tracepoints for the FPU state machine (Dave Hansen)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Do not BUG_ON() in early FPU code
  x86/fpu/xstate: Re-enable XSAVES
  x86/fpu/xstate: Fix fpstate_init() for XRSTORS
  x86/fpu/xstate: Return NULL for disabled xstate component address
  x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES
  x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstates
  x86/fpu/xstate: Fix XSTATE component offset print out
  x86/fpu/xstate: Fix PTRACE frames for XSAVES
  x86/fpu/xstate: Fix supervisor xstate component offset
  x86/fpu/xstate: Align xstate components according to CPUID
  x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use
  x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization
  x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'
  x86/fpu/xstate: Define and use 'fpu_user_xstate_size'
  x86/fpu: Add tracepoints to dump FPU state at key points
2016-07-25 18:48:27 -07:00
Linus Torvalds
36e635cb21 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 stackdump update from Ingo Molnar:
 "A number of stackdump enhancements"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/dumpstack: Add show_stack_regs() and use it
  printk: Make the printk*once() variants return a value
  x86/dumpstack: Honor supplied @regs arg
2016-07-25 18:18:04 -07:00
Linus Torvalds
80f09cf5c1 Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
 "A build system fix and a cleanup"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kbuild: Remove stale asm-generic wrappers
  kbuild, x86: Track generated headers with generated-y
2016-07-25 18:00:18 -07:00
Linus Torvalds
77cd3d0c43 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main changes:

   - add initial commits to randomize kernel memory section virtual
     addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
     (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)

   - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
     Cook)

   - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)

   - misc cleanups/fixes"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Simplify EBDA-vs-BIOS reservation logic
  x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
  x86/boot: Reorganize and clean up the BIOS area reservation code
  x86/mm: Do not reference phys addr beyond kernel
  x86/mm: Add memory hotplug support for KASLR memory randomization
  x86/mm: Enable KASLR for vmalloc memory regions
  x86/mm: Enable KASLR for physical mapping memory regions
  x86/mm: Implement ASLR for kernel memory regions
  x86/mm: Separate variable for trampoline PGD
  x86/mm: Add PUD VA support for physical mapping
  x86/mm: Update physical mapping variable names
  x86/mm: Refactor KASLR entropy functions
  x86/KASLR: Fix boot crash with certain memory configurations
  x86/boot/64: Add forgotten end of function marker
  x86/KASLR: Allow randomization below the load address
  x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
  x86/KASLR: Randomize virtual address separately
  x86/KASLR: Clarify identity map interface
  x86/boot: Refuse to build with data relocations
  x86/KASLR, x86/power: Remove x86 hibernation restrictions
2016-07-25 17:32:28 -07:00
Linus Torvalds
0f657262d5 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "Various x86 low level modifications:

   - preparatory work to support virtually mapped kernel stacks (Andy
     Lutomirski)

   - support for 64-bit __get_user() on 32-bit kernels (Benjamin
     LaHaise)

   - (involved) workaround for Knights Landing CPU erratum (Dave Hansen)

   - MPX enhancements (Dave Hansen)

   - mremap() extension to allow remapping of the special VDSO vma, for
     purposes of user level context save/restore (Dmitry Safonov)

   - hweight and entry code cleanups (Borislav Petkov)

   - bitops code generation optimizations and cleanups with modern GCC
     (H. Peter Anvin)

   - syscall entry code optimizations (Paolo Bonzini)"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  x86/mm/cpa: Add missing comment in populate_pdg()
  x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
  x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
  x86/smp: Remove unnecessary initialization of thread_info::cpu
  x86/smp: Remove stack_smp_processor_id()
  x86/uaccess: Move thread_info::addr_limit to thread_struct
  x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
  x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
  x86/dumpstack: When OOPSing, rewind the stack before do_exit()
  x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
  x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
  x86/dumpstack: Try harder to get a call trace on stack overflow
  x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
  x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
  x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
  x86/mm: Use pte_none() to test for empty PTE
  x86/mm: Disallow running with 32-bit PTEs to work around erratum
  x86/mm: Ignore A/D bits in pte/pmd/pud_none()
  x86/mm: Move swap offset/type up in PTE to work around erratum
  x86/entry: Inline enter_from_user_mode()
  ...
2016-07-25 15:34:18 -07:00
Linus Torvalds
425dbc6db3 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic updates from Ingo Molnar:
 "Misc cleanups and a small fix"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Remove the unused struct apic::apic_id_mask field
  x86/apic: Fix misspelled APIC
  x86/ioapic: Simplify ioapic_setup_resources()
2016-07-25 15:09:08 -07:00
Linus Torvalds
7e4dc77b28 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "With over 300 commits it's been a busy cycle - with most of the work
  concentrated on the tooling side (as it should).

  The main kernel side enhancements were:

   - Add per event callchain limit: Recently we introduced a sysctl to
     tune the max-stack for all events for which callchains were
     requested:

       $ sysctl kernel.perf_event_max_stack
       kernel.perf_event_max_stack = 127

     Now this patch introduces a way to configure this per event, i.e.
     this becomes possible:

       $ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a

     allowing finer tuning of how much buffer space callchains use.

     This uses an u16 from the reserved space at the end, leaving
     another u16 for future use.

     There has been interest in even finer tuning, namely to control the
     max stack for kernel and userspace callchains separately.  Further
     discussion is needed, we may for instance use the remaining u16 for
     that and when it is present, assume that the sample_max_stack
     introduced in this patch applies for the kernel, and the u16 left
     is used for limiting the userspace callchain (Arnaldo Carvalho de
     Melo)

   - Optimize AUX event (hardware assisted side-band event) delivery
     (Kan Liang)

   - Rework Intel family name macro usage (this is partially x86 arch
     work) (Dave Hansen)

   - Refine and fix Intel LBR support (David Carrillo-Cisneros)

   - Add support for Intel 'TopDown' events (Andi Kleen)

   - Intel uncore PMU driver fixes and enhancements (Kan Liang)

   - ... other misc changes.

  Here's an incomplete list of the tooling enhancements (but there's
  much more, see the shortlog and the git log for details):

   - Support cross unwinding, i.e.  collecting '--call-graph dwarf'
     perf.data files in one machine and then doing analysis in another
     machine of a different hardware architecture.  This enables, for
     instance, to do:

       $ perf record -a --call-graph dwarf

     on a x86-32 or aarch64 system and then do 'perf report' on it on a
     x86_64 workstation (He Kuang)

   - Allow reading from a backward ring buffer (one setup via
     sys_perf_event_open() with perf_event_attr.write_backward = 1)
     (Wang Nan)

   - Finish merging initial SDT (Statically Defined Traces) support, see
     cset comments for details about how it all works (Masami Hiramatsu)

   - Support attaching eBPF programs to tracepoints (Wang Nan)

   - Add demangling of symbols in programs written in the Rust language
     (David Tolnay)

   - Add support for tracepoints in the python binding, including an
     example, that sets up and parses sched:sched_switch events,
     tools/perf/python/tracepoint.py (Jiri Olsa)

   - Introduce --stdio-color to set up the color output mode selection
     in 'annotate' and 'report', allowing emit color escape sequences
     when redirecting the output of these tools (Arnaldo Carvalho de
     Melo)

   - Add 'callindent' option to 'perf script -F', to indent the Intel PT
     call stack, making this output more ftrace-like (Adrian Hunter,
     Andi Kleen)

   - Allow dumping the object files generated by llvm when processing
     eBPF scriptlet events (Wang Nan)

   - Add stackcollapse.py script to help generating flame graphs (Paolo
     Bonzini)

   - Add --ldlat option to 'perf mem' to specify load latency for loads
     event (e.g. cpu/mem-loads/ ) (Jiri Olsa)

   - Tooling support for Intel TopDown counters, recently added to the
     kernel (Andi Kleen)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
  perf tests: Add is_printable_array test
  perf tools: Make is_printable_array global
  perf script python: Fix string vs byte array resolving
  perf probe: Warn unmatched function filter correctly
  perf cpu_map: Add more helpers
  perf stat: Balance opening and reading events
  tools: Copy linux/{hash,poison}.h and check for drift
  perf tools: Remove include/linux/list.h from perf's MANIFEST
  tools: Copy the bitops files accessed from the kernel and check for drift
  Remove: kernel unistd*h files from perf's MANIFEST, not used
  perf tools: Remove tools/perf/util/include/linux/const.h
  perf tools: Remove tools/perf/util/include/asm/byteorder.h
  perf tools: Add missing linux/compiler.h include to perf-sys.h
  perf jit: Remove some no-op error handling
  perf jit: Add missing curly braces
  objtool: Initialize variable to silence old compiler
  objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
  perf record: Add --tail-synthesize option
  perf session: Don't warn about out of order event if write_backward is used
  perf tools: Enable overwrite settings
  ...
2016-07-25 13:20:41 -07:00
Linus Torvalds
c86ad14d30 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The locking tree was busier in this cycle than the usual pattern - a
  couple of major projects happened to coincide.

  The main changes are:

   - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
     across all SMP architectures (Peter Zijlstra)

   - add atomic_fetch_{inc/dec}() as well, using the generic primitives
     (Davidlohr Bueso)

   - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
     Waiman Long)

   - optimize smp_cond_load_acquire() on arm64 and implement LSE based
     atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
     on arm64 (Will Deacon)

   - introduce smp_acquire__after_ctrl_dep() and fix various barrier
     mis-uses and bugs (Peter Zijlstra)

   - after discovering ancient spin_unlock_wait() barrier bugs in its
     implementation and usage, strengthen its semantics and update/fix
     usage sites (Peter Zijlstra)

   - optimize mutex_trylock() fastpath (Peter Zijlstra)

   - ... misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
  locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
  locking/static_keys: Fix non static symbol Sparse warning
  locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
  locking/atomic, arch/tile: Fix tilepro build
  locking/atomic, arch/m68k: Remove comment
  locking/atomic, arch/arc: Fix build
  locking/Documentation: Clarify limited control-dependency scope
  locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
  locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
  locking/atomic, arch/mips: Convert to _relaxed atomics
  locking/atomic, arch/alpha: Convert to _relaxed atomics
  locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
  locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
  locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
  locking/atomic: Fix atomic64_relaxed() bits
  locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  ...
2016-07-25 12:41:29 -07:00
Linus Torvalds
a2303849a6 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The biggest change in this cycle were SGI/UV related changes that
  clean up and fix UV boot quirks and problems.

  There's also various smaller cleanups and refinements"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Reorganize the GUID table to make it easier to read
  x86/efi: Remove the unused efi_get_time() function
  x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros
  x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()
  efi: Convert efi_call_virt() to efi_call_virt_pointer()
  x86/efi: Remove unused variable 'efi'
  efi: Document #define FOO_PROTOCOL_GUID layout
  efibc: Report more information in the error messages
2016-07-25 12:30:01 -07:00
Rafael J. Wysocki
bc841e260c Merge branch 'pm-cpu'
* pm-cpu:
  x86: remove duplicate turbo ratio limit MSRs
  tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT
  cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT
2016-07-25 13:46:30 +02:00
Rafael J. Wysocki
9fedbb3b6b Merge branch 'x86/cpu' from tip 2016-07-25 13:45:39 +02:00
Rafael J. Wysocki
7f234a4d8a Merge branches 'pm-sleep' and 'pm-tools'
* pm-sleep:
  PM / hibernate: Introduce test_resume mode for hibernation
  x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
  PM / hibernate: Image data protection during restoration
  PM / hibernate: Add missing braces in __register_nosave_region()
  PM / hibernate: Clean up comments in snapshot.c
  PM / hibernate: Clean up function headers in snapshot.c
  PM / hibernate: Add missing braces in hibernate_setup()
  PM / hibernate: Recycle safe pages after image restoration
  PM / hibernate: Simplify mark_unsafe_pages()
  PM / hibernate: Do not free preallocated safe pages during image restore
  PM / suspend: show workqueue state in suspend flow
  PM / sleep: make PM notifiers called symmetrically
  PM / sleep: Make pm_prepare_console() return void
  PM / Hibernate: Don't let kasan instrument snapshot.c

* pm-tools:
  PM / tools: scripts: AnalyzeSuspend v4.2
  tools/turbostat: allow user to alter DESTDIR and PREFIX
2016-07-25 13:44:32 +02:00
Rafael J. Wysocki
d5f017b796 Merge branch 'acpi-tables'
* acpi-tables:
  ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
  ACPI: add support for loading SSDTs via configfs
  ACPI: add support for configfs
  efi / ACPI: load SSTDs from EFI variables
  spi / ACPI: add support for ACPI reconfigure notifications
  i2c / ACPI: add support for ACPI reconfigure notifications
  ACPI: add support for ACPI reconfiguration notifiers
  ACPI / scan: fix enumeration (visited) flags for bus rescans
  ACPI / documentation: add SSDT overlays documentation
  ACPI: ARM64: support for ACPI_TABLE_UPGRADE
  ACPI / tables: introduce ARCH_HAS_ACPI_TABLE_UPGRADE
  ACPI / tables: move arch-specific symbol to asm/acpi.h
  ACPI / tables: table upgrade: refactor function definitions
  ACPI / tables: table upgrade: use cacheable map for tables

Conflicts:
	arch/arm64/include/asm/acpi.h
2016-07-25 13:41:01 +02:00
Rafael J. Wysocki
d85f4eb699 Merge branch 'acpi-numa'
* acpi-numa:
  ACPI / NUMA: Enable ACPI based NUMA on ARM64
  arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT
  ACPI / processor: Add acpi_map_madt_entry()
  ACPI / NUMA: Improve SRAT error detection and add messages
  ACPI / NUMA: Move acpi_numa_memory_affinity_init() to drivers/acpi/numa.c
  ACPI / NUMA: remove unneeded acpi_numa=1
  ACPI / NUMA: move bad_srat() and srat_disabled() to drivers/acpi/numa.c
  x86 / ACPI / NUMA: cleanup acpi_numa_processor_affinity_init()
  arm64, NUMA: Cleanup NUMA disabled messages
  arm64, NUMA: rework numa_add_memblk()
  ACPI / NUMA: move acpi_numa_slit_init() to drivers/acpi/numa.c
  ACPI / NUMA: Move acpi_numa_arch_fixup() to ia64 only
  ACPI / NUMA: remove duplicate NULL check
  ACPI / NUMA: Replace ACPI_DEBUG_PRINT() with pr_debug()
  ACPI / NUMA: Use pr_fmt() instead of printk
2016-07-25 13:40:39 +02:00
Andy Lutomirski
30f027398b x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
It doesn't just control probing for the EBDA -- it controls whether we
detect and reserve the <1MB BIOS regions in general.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/55bd591115498440d461857a7b64f349a5d911f3.1469135598.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-22 11:46:01 +02:00
Adrian Hunter
25af37f4e1 x86/insn: Add AVX-512 support to the instruction decoder
Add support for Intel's AVX-512 instructions to the instruction decoder.

AVX-512 instructions are documented in Intel Architecture Instruction
Set Extensions Programming Reference (February 2016).

AVX-512 instructions are identified by a EVEX prefix which, for the
purpose of instruction decoding, can be treated as though it were a
4-byte VEX prefix.

Existing instructions which can now accept an EVEX prefix need not be
further annotated in the op code map (x86-opcode-map.txt). In the case
of new instructions, the op code map is updated accordingly.

Also add associated Mask Instructions that are used to manipulate mask
registers used in AVX-512 instructions.

The 'perf tools' instruction decoder is updated in a subsequent patch.
And a representative set of instructions is added to the perf tools new
instructions test in a subsequent patch.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: http://lkml.kernel.org/r/1469003437-32706-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21 09:37:11 -03:00
Ingo Molnar
edce21216a x86/boot: Reorganize and clean up the BIOS area reservation code
So the reserve_ebda_region() code has accumulated a number of
problems over the years that make it really difficult to read
and understand:

- The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily
  interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks...

- 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other
  parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it
  super confusing to read.

- It does not help at all that we have various memory range markers, half of which
  are 'start of range', half of which are 'end of range' - but this crucial
  property is not obvious in the naming at all ... gave me a headache trying to
  understand all this.

- Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is
  obvious, all values here are addresses!), while it does not highlight that it's
  the _start_ of the EBDA region ...

- 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value
  that is a pointer to a value, not a memory range address!

- The function name itself is a misnomer: it says 'reserve_ebda_region()' while
  its main purpose is to reserve all the firmware ROM typically between 640K and
  1MB, while the 'EBDA' part is only a small part of that ...

- Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this
  too should be about whether to reserve firmware areas in the paravirt case.

- In fact thinking about this as 'end of RAM' is confusing: what this function
  *really* wants to reserve is firmware data and code areas! Once the thinking is
  inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure
  'reserved area' notion everything becomes a lot clearer.

To improve all this rewrite the whole code (without changing the logic):

- Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start'
  and propagate this concept through all the variable names and constants.

	BIOS_RAM_SIZE_KB_PTR		// was: BIOS_LOWMEM_KILOBYTES

	BIOS_START_MIN			// was: INSANE_CUTOFF

	ebda_start			// was: ebda_addr
	bios_start			// was: lowmem

	BIOS_START_MAX			// was: LOWMEM_CAP

- Then clean up the name of the function itself by renaming it
  to reserve_bios_regions() and renaming the ::ebda_search paravirt
  flag to ::reserve_bios_regions.

- Fix up all the comments (fix typos), harmonize and simplify their
  formulation and remove comments that become unnecessary due to
  the much better naming all around.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-21 10:11:57 +02:00
Rafael J. Wysocki
406f992e4a x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
On Intel hardware, native_play_dead() uses mwait_play_dead() by
default and only falls back to the other methods if that fails.
That also happens during resume from hibernation, when the restore
(boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
except for the boot one offline.

However, that is problematic, because the address passed to
__monitor() in mwait_play_dead() is likely to be written to in the
last phase of hibernate image restoration and that causes the "dead"
CPU to start executing instructions again.  Unfortunately, the page
containing the address in that CPU's instruction pointer may not be
valid any more at that point.

First, that page may have been overwritten with image kernel memory
contents already, so the instructions the CPU attempts to execute may
simply be invalid.  Second, the page tables previously used by that
CPU may have been overwritten by image kernel memory contents, so the
address in its instruction pointer is impossible to resolve then.

A report from Varun Koyyalagunta and investigation carried out by
Chen Yu show that the latter sometimes happens in practice.

To prevent it from happening, temporarily change the smp_ops.play_dead
pointer during resume from hibernation so that it points to a special
"play dead" routine which uses hlt_play_dead() and avoids the
inadvertent "revivals" of "dead" CPUs this way.

A slightly unpleasant consequence of this change is that if the
system is hibernated with one or more CPUs offline, it will generally
draw more power after resume than it did before hibernation, because
the physical state entered by CPUs via hlt_play_dead() is higher-power
than the mwait_play_dead() one in the majority of cases.  It is
possible to work around this, but it is unclear how much of a problem
that's going to be in practice, so the workaround will be implemented
later if it turns out to be necessary.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
Reported-by: Varun Koyyalagunta <cpudebug@centtech.com>
Original-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 22:42:48 +02:00
Wei Jiangang
102bb9fef6 x86/apic: Remove the unused struct apic::apic_id_mask field
The only user verify_local_APIC() had been removed by commit:

  4399c03c67 ("x86/apic: Remove verify_local_APIC()")

... so there is no need to keep it.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: bsd@redhat.com
Cc: david.vrabel@citrix.com
Cc: jgross@suse.com
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1468463046-20849-1-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:39:05 +02:00
Wei Jiangang
c48ec42d6e x86/tsc: Remove the unused check_tsc_disabled()
check_tsc_disabled() was introduced by commit:

  c73deb6aec ("perf/x86: Add ability to calculate TSC from perf sample timestamps")

The only caller was arch_perf_update_userpage(), which had been refactored
by commit:

  d8b11a0cbd ("perf/x86: Clean up cap_user_time* setting")

... so no need keep and export it any more.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: adrian.hunter@intel.com
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1468570330-25810-1-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:35:08 +02:00
H.J. Lu
3ebfd81f7f x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
Don't use the same syscall numbers for 2 different syscalls:

 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
 534	x32	preadv2			compat_sys_preadv2
 535	x32	pwritev2		compat_sys_pwritev2

Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset
is passed in one 64-bit register on x32, similar to compat_sys_preadv64()
and compat_sys_pwritev64().

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:30:26 +02:00
Andy Lutomirski
fb59831b49 x86/smp: Remove stack_smp_processor_id()
It serves no purpose -- raw_smp_processor_id() works fine.  This
change will be needed to move thread_info off the stack.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a2bf4f07fbc30fb32f9f7f3f8f94ad3580823847.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:30 +02:00
Andy Lutomirski
13d4ea097d x86/uaccess: Move thread_info::addr_limit to thread_struct
struct thread_info is a legacy mess.  To prepare for its partial removal,
move thread_info::addr_limit out.

As an added benefit, this way is simpler.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:30 +02:00
Ingo Molnar
2a53ccbc0d x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
Rename it to match the thread_struct::uaccess_err pattern and also
because it was too long.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:29 +02:00
Andy Lutomirski
dfa9a942fd x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
struct thread_info is a legacy mess.  To prepare for its partial removal,
move the uaccess control fields out -- they're straightforward.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d0ac4d01c8e4d4d756264604e47445d5acc7900e.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:28 +02:00
Andy Lutomirski
d92fc69cca x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
kernel_unmap_pages_in_pgd() is dangerous: if a PGD entry in
init_mm.pgd were to be cleared, callers would need to ensure that
the pgd entry hadn't been propagated to any other pgd.

Its only caller was efi_cleanup_page_tables(), and that, in turn,
was unused, so just delete both functions.  This leaves a couple of
other helpers unused, so delete them, too.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/77ff20fdde3b75cd393be5559ad8218870520248.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:26 +02:00
Ingo Molnar
38452af242 Merge branch 'x86/asm' into x86/mm, to resolve conflicts
Conflicts:
	tools/testing/selftests/x86/Makefile

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:04 +02:00
Dave Hansen
97e3c602cc x86/mm: Ignore A/D bits in pte/pmd/pud_none()
The erratum we are fixing here can lead to stray setting of the
A and D bits.  That means that a pte that we cleared might
suddenly have A/D set.  So, stop considering those bits when
determining if a pte is pte_none().  The same goes for the
other pmd_none() and pud_none().  pgd_none() can be skipped
because it is not affected; we do not use PGD entries for
anything other than pagetables on affected configurations.

This adds a tiny amount of overhead to all pte_none() checks.
I doubt we'll be able to measure it anywhere.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001912.5216F89C@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13 09:43:25 +02:00
Dave Hansen
00839ee3b2 x86/mm: Move swap offset/type up in PTE to work around erratum
This erratum can result in Accessed/Dirty getting set by the hardware
when we do not expect them to be (on !Present PTEs).

Instead of trying to fix them up after this happens, we just
allow the bits to get set and try to ignore them.  We do this by
shifting the layout of the bits we use for swap offset/type in
our 64-bit PTEs.

It looks like this:

 bitnrs: |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2|1|0|
 names:  |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
 before: |         OFFSET (9-63)          |0|X|X| TYPE(1-5) |0|
  after: | OFFSET (14-63)  |  TYPE (9-13) |0|X|X|X| X| X|X|X|0|

Note that D was already a don't care (X) even before.  We just
move TYPE up and turn its old spot (which could be hit by the
A bit) into all don't cares.

We take 5 bits away from the offset, but that still leaves us
with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
I think that's probably fine for the moment.  We could
theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
doesn't gain us anything.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13 09:43:25 +02:00
Andy Shevchenko
05f310e26f x86/sfi: Enable enumeration of SD devices
SFI specification v0.8.2 defines type of devices which are connected to
SD bus. In particularly WiFi dongle is a such.

Add a callback to enumerate the devices connected to SD bus.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468322192-62080-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13 09:24:51 +02:00
Len Brown
aa297292d7 x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
Skylake CPU base-frequency and TSC frequency may differ
by up to 2%.

Enumerate CPU and TSC frequencies separately, allowing
cpu_khz and tsc_khz to differ.

The existing CPU frequency calibration mechanism is unchanged.
However, CPUID extensions are preferred, when available.

CPUID.0x16 is preferred over MSR and timer calibration
for CPU frequency discovery.

CPUID.0x15 takes precedence over CPU-frequency
for TSC frequency discovery.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/b27ec289fd005833b27d694d9c2dbb716c5cdff7.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 21:30:13 +02:00
Len Brown
02c0cd2dcf x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration
Remove the irqoff/irqon around MSR-based TSC enumeration,
as it is not necessary.

Also rename: try_msr_calibrate_tsc() to cpu_khz_from_msr(),
as that better describes what the routine does.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/a6b5c3ecd3b068175d2309599ab28163fc34215e.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 21:30:12 +02:00
Yu-cheng Yu
35ac2d7ba7 x86/fpu/xstate: Fix fpstate_init() for XRSTORS
In XSAVES mode if fpstate_init() is used to initialize a
task's extended state area, xsave.header.xcomp_bv[63] must
be set. Otherwise, when the task is scheduled, a warning is
triggered from copy_kernel_to_xregs().

One such test case is: setting an invalid extended state
through PTRACE. When xstateregs_set() rejects the syscall
and re-initializes the task's extended state area. This triggers
the warning mentioned above.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <h.peter.anvin@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468253937-40008-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11 16:44:00 +02:00
Andy Shevchenko
06a3fcc44d x86/platform/intel-mid: Make vertical indentation consistent
The vertical indentation is kinda chaotic in intel-mid.h. Let's be
consistent with it.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1465992260-29897-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:22:31 +02:00
Yu-cheng Yu
91c3dba7db x86/fpu/xstate: Fix PTRACE frames for XSAVES
XSAVES uses compacted format and is a kernel instruction. The kernel
should use standard-format, non-supervisor state data for PTRACE.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
[ Edited away artificial linebreaks. ]
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/de3d80949001305fe389799973b675cab055c457.1466179491.git.yu-cheng.yu@intel.com
[ Made various readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:12:10 +02:00
Yu-cheng Yu
1499ce2dd4 x86/fpu/xstate: Fix supervisor xstate component offset
CPUID function 0x0d, sub function (i, i > 1) returns in ebx the offset of
xstate component i. Zero is returned for a supervisor state. A supervisor
state can only be saved by XSAVES and XSAVES uses a compacted format.
There is no fixed offset for a supervisor state. This patch checks and
makes sure a supervisor state offset is not recorded or mis-used. This has
no effect in practice as we currently use no supervisor states, but it
would be good to fix.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/81b29e40d35d4cec9f2511a856fe769f34935a3f.1466179491.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:12:10 +02:00
Ingo Molnar
08fb98f5bf Merge branch 'linus' into x86/fpu, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10 17:11:17 +02:00
Dave Hansen
8709ed4d4b x86/cpu: Fix duplicated X86_BUG(9) macro
cpufeatures.h currently defines X86_BUG(9) twice on 32-bit:

	#define X86_BUG_NULL_SEG        X86_BUG(9) /* Nulling a selector preserves the base */
	...
	#ifdef CONFIG_X86_32
	#define X86_BUG_ESPFIX          X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */
	#endif

I think what happened was that this added the X86_BUG_ESPFIX, but
in an #ifdef below most of the bugs:

	58a5aac533 x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled

Then this came along and added X86_BUG_NULL_SEG, but collided
with the earlier one that did the bug below the main block
defining all the X86_BUG()s.

	7a5d670487 x86/cpu: Probe the behavior of nulling out a segment at boot time

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160618001503.CEE1B141@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-09 14:06:06 +02:00
Ingo Molnar
52e31f89cc Merge branch 'linus' into x86/asm, to pick up fixes before merging new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-09 10:43:49 +02:00
Thomas Garnier
a95ae27c2e x86/mm: Enable KASLR for vmalloc memory regions
Add vmalloc to the list of randomized memory regions.

The vmalloc memory region contains the allocation made through the vmalloc()
API. The allocations are done sequentially to prevent fragmentation and
each allocation address can easily be deduced especially from boot.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-8-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:35:21 +02:00
Thomas Garnier
021182e52f x86/mm: Enable KASLR for physical mapping memory regions
Add the physical mapping in the list of randomized memory regions.

The physical memory mapping holds most allocations from boot and heap
allocators. Knowing the base address and physical memory size, an attacker
can deduce the PDE virtual address for the vDSO memory page. This attack
was demonstrated at CanSecWest 2016, in the following presentation:

  "Getting Physical: Extreme Abuse of Intel Based Paged Systems":
  https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf

(See second part of the presentation).

The exploits used against Linux worked successfully against 4.6+ but
fail with KASLR memory enabled:

  https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits

Similar research was done at Google leading to this patch proposal.

Variants exists to overwrite /proc or /sys objects ACLs leading to
elevation of privileges. These variants were tested against 4.6+.

The page offset used by the compressed kernel retains the static value
since it is not yet randomized during this boot stage.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:35:15 +02:00
Thomas Garnier
0483e1fa6e x86/mm: Implement ASLR for kernel memory regions
Randomizes the virtual address space of kernel memory regions for
x86_64. This first patch adds the infrastructure and does not randomize
any region. The following patches will randomize the physical memory
mapping, vmalloc and vmemmap regions.

This security feature mitigates exploits relying on predictable kernel
addresses. These addresses can be used to disclose the kernel modules
base addresses or corrupt specific structures to elevate privileges
bypassing the current implementation of KASLR. This feature can be
enabled with the CONFIG_RANDOMIZE_MEMORY option.

The order of each memory region is not changed. The feature looks at the
available space for the regions based on different configuration options
and randomizes the base and space between each. The size of the physical
memory mapping is the available physical memory. No performance impact
was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory region.  An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each region.

Updated documentation on x86_64 memory layout accordingly.

Performance data, after all patches in the series:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
(13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
(12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):

attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
10,0.068,0.071 average,0.0677,0.0677

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:33:46 +02:00
Thomas Garnier
b234e8a090 x86/mm: Separate variable for trampoline PGD
Use a separate global variable to define the trampoline PGD used to
start other processors. This change will allow KALSR memory
randomization to change the trampoline PGD to be correctly aligned with
physical memory.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:33:46 +02:00
Thomas Garnier
d899a7d146 x86/mm: Refactor KASLR entropy functions
Move the KASLR entropy functions into arch/x86/lib to be used in early
kernel boot for KASLR memory randomization.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 17:33:45 +02:00
Ingo Molnar
946e0f6ffc Linux 4.7-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXefulAAoJEHm+PkMAQRiG6nMH/2O1vcZeOtqmx2yCMUeXyKAT
 wG88XflXzf3rM7C7TiObEYVf/bbLleJ7saDLEeic7ButD5gyYacIuzylVnrcqfBc
 vinz4cOw5kvu9DrRkCKdOfiTAgwYtqQW+syJ8ZK4lPQuSxnPAs+F/FKSOpyUF5FN
 Dngr520KjYKBEtn27W9UDPChFRwQoWAlaOC534eusaArCJtHGHHiuq5TEDn2EIo8
 pUw2vwx5JiquSHOY34WLU7r+QoilovCQlUSsBQdLlPjfMB1QFtclPYa+5yEMjkT4
 wusOUOfS/zK0rV6KnEdc/SkpiVX5C9WpFiWUOdEeJ5mZ+KijVkaOa9K1EDx8jSM=
 =7Hwh
 -----END PGP SIGNATURE-----

Merge tag 'v4.7-rc6' into x86/mm, to merge fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 11:51:28 +02:00