Commit Graph

47391 Commits

Author SHA1 Message Date
Mel Gorman
4011337083 mm: page_alloc: remove GFP_IOFS
GFP_IOFS was intended to be shorthand for clearing two flags, not a set of
allocation flags.  There is only one user of this flag combination now and
there appears to be no reason why Lustre had to be protected from reclaim
stalls.  As none of the sites appear to be atomic, this patch simply
deletes GFP_IOFS and converts Lustre to using GFP_KERNEL, GFP_NOFS or
GFP_NOIO as appropriate.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mel Gorman
d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mel Gorman
016c13daa5 mm, page_alloc: use masks and shifts when converting GFP flags to migrate types
This patch redefines which GFP bits are used for specifying mobility and
the order of the migrate types.  Once redefined it's possible to convert
GFP flags to a migrate type with a simple mask and shift.  The only
downside is that readers of OOM kill messages and allocation failures may
have been used to the existing values but scripts/gfp-translate will help.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mel Gorman
46e700abc4 mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled
There is a seqcounter that protects against spurious allocation failures
when a task is changing the allowed nodes in a cpuset.  There is no need
to check the seqcounter until a cpuset exists.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mel Gorman
e2b19197ff mm, page_alloc: remove unnecessary parameter from zone_watermark_ok_safe
Overall, the intent of this series is to remove the zonelist cache which
was introduced to avoid high overhead in the page allocator.  Once this is
done, it is necessary to reduce the cost of watermark checks.

The series starts with minor micro-optimisations.

Next it notes that GFP flags that affect watermark checks are abused.
__GFP_WAIT historically identified callers that could not sleep and could
access reserves.  This was later abused to identify callers that simply
prefer to avoid sleeping and have other options.  A patch distinguishes
between atomic callers, high-priority callers and those that simply wish
to avoid sleep.

The zonelist cache has been around for a long time but it is of dubious
merit with a lot of complexity and some issues that are explained.  The
most important issue is that a failed THP allocation can cause a zone to
be treated as "full".  This potentially causes unnecessary stalls, reclaim
activity or remote fallbacks.  The issues could be fixed but it's not
worth it.  The series places a small number of other micro-optimisations
on top before examining GFP flags watermarks.

High-order watermarks enforcement can cause high-order allocations to fail
even though pages are free.  The watermark checks both protect high-order
atomic allocations and make kswapd aware of high-order pages but there is
a much better way that can be handled using migrate types.  This series
uses page grouping by mobility to reserve pageblocks for high-order
allocations with the size of the reservation depending on demand.  kswapd
awareness is maintained by examining the free lists.  By patch 12 in this
series, there are no high-order watermark checks while preserving the
properties that motivated the introduction of the watermark checks.

This patch (of 10):

No user of zone_watermark_ok_safe() specifies alloc_flags.  This patch
removes the unnecessary parameter.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Linus Torvalds
bc914532a0 - New Device Support
- Add support for 88pm860; 88pm80x
    - Add support for 24c08 EEPROM; at24
    - Add support for Broxton Whiskey Cove; intel*
    - Add support for RTS522A; rts5227
    - Add support for I2C devices; intel_quark_i2c_gpio
  - New Functionality
    - Add microphone support; arizona
    - Add general purpose switch support; arizona
    - Add fuel-gauge support; da9150-core
    - Add shutdown support; sec-core
    - Add charger support; tps65217
    - Add flexible serial communication unit support; atmel-flexcom
    - Add power button support; axp20x
    - Add led-flash support; rt5033
  - Core Frameworks
    - Supply a generic macro for defining Regmap IRQs
    - Rework ACPI child device matching
  - Fix-ups
    - Use Regmap to access registers; tps6105x
    - Use DEFINE_RES_IRQ_NAMED() macro; da9150
    - Re-arrange device registration order; intel_quark_i2c_gpio
    - Allow OF matching; cros_ec_i2c, atmel-hlcdc, hi6421-pmic, max8997, sm501
    - Handle deferred probe; twl6040
    - Improve accuracy of headphone detect; arizona
    - Unnecessary MODULE_ALIAS() removal; bcm590xx, rt5033
    - Remove unused code; htc-i2cpld, arizona, pcf50633-irq, sec-core
    - Simplify code; kempld, rts5209, da903x, lm3533, da9052, arizona
    - Remove #iffery; arizona
    - DT binding adaptions; many
  - Bug Fixes
    - Fix possible NULL pointer dereference; wm831x, tps6105x
    - Fix 64bit bug; intel_soc_pmic_bxtwc
    - Fix signedness issue; arizona
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWOzEbAAoJEFGvii+H/HdhbUEQAIBHvNaunWp362L2D2lOOQMa
 cBlGiR2yXCpsFGAcvpr50zX5soFuNITGea1CidJmp/ZIHIG7hVUz5E8wQZ8HdiyG
 03UKsa6xkVx9i+PhV5agUjxi6MDgUvxsPwdbVw6LTw1wQejIrfxq7ObdcPo30OTI
 4r0tjLonQkGaFFdGqC4Jr7Grs7pvzmKzHu08AaknaoTyhUl/E9Wtq12HJFr8kUCw
 WNgEyiPzNk0h9EByCUiG7iFwyg1EJLsF1U973Th8Utvl7PFe8REQVZ46v9iYAM9J
 1OeSrsKhtyhQqAP/FJ5odZCXwTy9q6ifGfELyyS26/okCYWDzxDcF+hAp0VNbzb4
 zYvEnKue101Y+WEsF9bNaya3avf45obJuRaRukoGMZ/brW6+d1ub0kwUKMD8MqOn
 UFL3oZwfPQ8mWB5ddELl97dzJDwHbEIhWe7yKQVpTaJ2pCm6JSS5Vb0TU+A1Zywp
 Hie1Yr0y36npnCbml4W74O9H/rblpNpgFlaupfB6mHRWZyTsyQNeKJ6FWTirncgQ
 xy8Ti+ZsXlMTvWCzzS3GcjP9rAK0e5dKurOE8t+krZmTul96+OYsnvxRC0nyo7XC
 1PKomkWO+3if2DsnurFX/QKfE55PCKermJRO5mDZbsWp6aJAhKW5UwxAgkHpUHsJ
 hHaT+npBczXbq8jvEMTt
 =99/D
 -----END PGP SIGNATURE-----

Merge tag 'mfd-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:
 "New Device Support:
   - Add support for 88pm860; 88pm80x
   - Add support for 24c08 EEPROM; at24
   - Add support for Broxton Whiskey Cove; intel*
   - Add support for RTS522A; rts5227
   - Add support for I2C devices; intel_quark_i2c_gpio

  New Functionality:
   - Add microphone support; arizona
   - Add general purpose switch support; arizona
   - Add fuel-gauge support; da9150-core
   - Add shutdown support; sec-core
   - Add charger support; tps65217
   - Add flexible serial communication unit support; atmel-flexcom
   - Add power button support; axp20x
   - Add led-flash support; rt5033

  Core Frameworks:
   - Supply a generic macro for defining Regmap IRQs
   - Rework ACPI child device matching

  Fix-ups:
   - Use Regmap to access registers; tps6105x
   - Use DEFINE_RES_IRQ_NAMED() macro; da9150
   - Re-arrange device registration order; intel_quark_i2c_gpio
   - Allow OF matching; cros_ec_i2c, atmel-hlcdc, hi6421-pmic, max8997, sm501
   - Handle deferred probe; twl6040
   - Improve accuracy of headphone detect; arizona
   - Unnecessary MODULE_ALIAS() removal; bcm590xx, rt5033
   - Remove unused code; htc-i2cpld, arizona, pcf50633-irq, sec-core
   - Simplify code; kempld, rts5209, da903x, lm3533, da9052, arizona
   - Remove #iffery; arizona
   - DT binding adaptions; many

  Bug Fixes:
   - Fix possible NULL pointer dereference; wm831x, tps6105x
   - Fix 64bit bug; intel_soc_pmic_bxtwc
   - Fix signedness issue; arizona"

* tag 'mfd-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (73 commits)
  bindings: mfd: s2mps11: Add documentation for s2mps15 PMIC
  mfd: sec-core: Remove unused s2mpu02-rtc and s2mpu02-clk children
  extcon: arizona: Add extcon specific device tree binding document
  MAINTAINERS: Add binding docs for Cirrus Logic/Wolfson Arizona devices
  mfd: arizona: Remove bindings covered in new subsystem specific docs
  mfd: rt5033: Add RT5033 Flash led sub device
  mfd: lpss: Add Intel Broxton PCI IDs
  mfd: lpss: Add Broxton ACPI IDs
  mfd: arizona: Signedness bug in arizona_runtime_suspend()
  mfd: axp20x: Add a cell for the power button part of the, axp288 PMICs
  mfd: dt-bindings: Document pulled down WRSTBI pin on S2MPS1X
  mfd: sec-core: Disable buck voltage reset on watchdog falling edge
  mfd: sec-core: Dump PMIC revision to find out the HW
  mfd: arizona: Use correct type ID for device tree config
  mfd: arizona: Remove use of codec build config #ifdefs
  mfd: arizona: Simplify adding subdevices
  mfd: arizona: Downgrade type mismatch messages to dev_warn
  mfd: arizona: Factor out checking of jack detection state
  mfd: arizona: Factor out DCVDD isolation control
  mfd: Make TPS6105X select REGMAP_I2C
  ...
2015-11-06 10:23:50 -08:00
Linus Torvalds
2f4bf528ec powerpc updates for 4.4
- Kconfig: remove BE-only platforms from LE kernel build from Boqun Feng
  - Refresh ps3_defconfig from Geoff Levand
  - Emit GNU & SysV hashes for the vdso from Michael Ellerman
  - Define an enum for the bolted SLB indexes from Anshuman Khandual
  - Use a local to avoid multiple calls to get_slb_shadow() from Michael Ellerman
  - Add gettimeofday() benchmark from Michael Neuling
  - Avoid link stack corruption in __get_datapage() from Michael Neuling
  - Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar K.V
  - Add ppc64le_defconfig from Michael Ellerman
  - pseries: extract of_helpers module from Andy Shevchenko
  - Correct string length in pseries_of_derive_parent() from Nathan Fontenot
  - Free the MSI bitmap if it was slab allocated from Denis Kirjanov
  - Shorten irq_chip name for the SIU from Christophe Leroy
  - Wait 1s for secondaries to enter OPAL during kexec from Samuel Mendoza-Jonas
  - Fix _ALIGN_* errors due to type difference. from Aneesh Kumar K.V
  - powerpc/pseries/hvcserver: don't memset pi_buff if it is null from Colin Ian King
  - Disable hugepd for 64K page size. from Aneesh Kumar K.V
  - Differentiate between hugetlb and THP during page walk from Aneesh Kumar K.V
  - Make PCI non-optional for pseries from Michael Ellerman
  - Individual System V IPC system calls from Sam bobroff
  - Add selftest of unmuxed IPC calls from Michael Ellerman
  - discard .exit.data at runtime from Stephen Rothwell
  - Delete old orphaned PrPMC 280/2800 DTS and boot file. from Paul Gortmaker
  - Use of_get_next_parent to simplify code from Christophe Jaillet
  - Paginate some xmon output from Sam bobroff
  - Add some more elements to the xmon PACA dump from Michael Ellerman
  - Allow the tm-syscall selftest to build with old headers from Michael Ellerman
  - Run EBB selftests only on POWER8 from Denis Kirjanov
  - Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael Ellerman
  - Avoid reference to potentially freed memory in prom.c from Christophe Jaillet
  - Quieten boot wrapper output with run_cmd from Geoff Levand
  - EEH fixes and cleanups from Gavin Shan
  - Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
  - Use of_get_next_parent() in of_get_ibm_chip_id() from Michael Ellerman
  - Fix section mismatch warning in msi_bitmap_alloc() from Denis Kirjanov
  - Fix ps3-lpm white space from Rudhresh Kumar J
  - Fix ps3-vuart null dereference from Colin King
  - nvram: Add missing kfree in error path from Christophe Jaillet
  - nvram: Fix function name in some errors messages. from Christophe Jaillet
  - drivers/macintosh: adb: fix misleading Kconfig help text from Aaro Koskinen
  - agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
  - cxl: Free virtual PHB when removing from Andrew Donnellan
  - scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from Michael Ellerman
  - scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building with O= from Michael Ellerman
 
  - Freescale updates from Scott: Highlights include 64-bit book3e kexec/kdump
    support, a rework of the qoriq clock driver, device tree changes including
    qoriq fman nodes, support for a new 85xx board, and some fixes.
 
  - MPC5xxx updates from Anatolij: Highlights include a driver for MPC512x
    LocalPlus Bus FIFO with its device tree binding documentation, mpc512x
    device tree updates and some minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWPEZgAAoJEFHr6jzI4aWANjYQAKX2Q/95hqKfCuF5FBcUmtMC
 Pu/Nff027MVzxZ2ApDcvvLGps5Nz2bn3nIhc9zjkXc5E8DuL6X3Yl8ce7qyNcc3g
 cJJ8RvtUo6J1OMWetXFehtPYniAAwKMhZYKnj0+WnLr2SyH/Vhl3ehDkFbGyPtuH
 r+2E7krFjfVgU+bzciIFnOaDekFuFN/pXWMb6e6zQyBJe9N8ZIp96uouGCebKVd0
 VDLItzdaKErT8JFfbymMPvZm3V0rMVx4WWu3kAbQX8LrD5a18NF1zrjAOHRXc61n
 kkk8/DPuNOon1PbXXyiS5BcFyZRe+KE3VBnoW5sOMqMIRg5WdO1oU3e2pEfXMO8+
 leXYwFLXiKzUZuOgQG2QiUhrzD2yC1o6/TJWATv0dSl9AwrecgPX+Vj6X357slAf
 A9E3eMy5tgnpndBWZmvZS3W7YDKH+NkeZ+Q40+NErAlqr++ErrTcKVndk5vWlYTT
 7mMZeTXagX66al/k5ATKqwB7iUSpnYHSAa9fcUYPSM2FnXsDxPyeJGkBbcoOmkGj
 QrpgNYOvJaUJd076goZCV39v0c1xpfV9/9kyVch8HUadf6JcjpVZwYnbGw2qlJjh
 ZanuBG2VOeSwaKQqXiRBSBetnpAg8CVpFjDmX9wOBfSek2wxEJqDX/vQExdbIDQQ
 pUs7vnUxLzhmW/x+ygOI
 =YwcM
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Kconfig: remove BE-only platforms from LE kernel build from Boqun
   Feng
 - Refresh ps3_defconfig from Geoff Levand
 - Emit GNU & SysV hashes for the vdso from Michael Ellerman
 - Define an enum for the bolted SLB indexes from Anshuman Khandual
 - Use a local to avoid multiple calls to get_slb_shadow() from Michael
   Ellerman
 - Add gettimeofday() benchmark from Michael Neuling
 - Avoid link stack corruption in __get_datapage() from Michael Neuling
 - Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar
   K.V
 - Add ppc64le_defconfig from Michael Ellerman
 - pseries: extract of_helpers module from Andy Shevchenko
 - Correct string length in pseries_of_derive_parent() from Nathan
   Fontenot
 - Free the MSI bitmap if it was slab allocated from Denis Kirjanov
 - Shorten irq_chip name for the SIU from Christophe Leroy
 - Wait 1s for secondaries to enter OPAL during kexec from Samuel
   Mendoza-Jonas
 - Fix _ALIGN_* errors due to type difference, from Aneesh Kumar K.V
 - powerpc/pseries/hvcserver: don't memset pi_buff if it is null from
   Colin Ian King
 - Disable hugepd for 64K page size, from Aneesh Kumar K.V
 - Differentiate between hugetlb and THP during page walk from Aneesh
   Kumar K.V
 - Make PCI non-optional for pseries from Michael Ellerman
 - Individual System V IPC system calls from Sam bobroff
 - Add selftest of unmuxed IPC calls from Michael Ellerman
 - discard .exit.data at runtime from Stephen Rothwell
 - Delete old orphaned PrPMC 280/2800 DTS and boot file, from Paul
   Gortmaker
 - Use of_get_next_parent to simplify code from Christophe Jaillet
 - Paginate some xmon output from Sam bobroff
 - Add some more elements to the xmon PACA dump from Michael Ellerman
 - Allow the tm-syscall selftest to build with old headers from Michael
   Ellerman
 - Run EBB selftests only on POWER8 from Denis Kirjanov
 - Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael
   Ellerman
 - Avoid reference to potentially freed memory in prom.c from Christophe
   Jaillet
 - Quieten boot wrapper output with run_cmd from Geoff Levand
 - EEH fixes and cleanups from Gavin Shan
 - Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
 - Use of_get_next_parent() in of_get_ibm_chip_id() from Michael
   Ellerman
 - Fix section mismatch warning in msi_bitmap_alloc() from Denis
   Kirjanov
 - Fix ps3-lpm white space from Rudhresh Kumar J
 - Fix ps3-vuart null dereference from Colin King
 - nvram: Add missing kfree in error path from Christophe Jaillet
 - nvram: Fix function name in some errors messages, from Christophe
   Jaillet
 - drivers/macintosh: adb: fix misleading Kconfig help text from Aaro
   Koskinen
 - agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
 - cxl: Free virtual PHB when removing from Andrew Donnellan
 - scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from
   Michael Ellerman
 - scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building
   with O= from Michael Ellerman
 - Freescale updates from Scott: Highlights include 64-bit book3e
   kexec/kdump support, a rework of the qoriq clock driver, device tree
   changes including qoriq fman nodes, support for a new 85xx board, and
   some fixes.
 - MPC5xxx updates from Anatolij: Highlights include a driver for
   MPC512x LocalPlus Bus FIFO with its device tree binding
   documentation, mpc512x device tree updates and some minor fixes.

* tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (106 commits)
  powerpc/msi: Fix section mismatch warning in msi_bitmap_alloc()
  powerpc/prom: Use of_get_next_parent() in of_get_ibm_chip_id()
  powerpc/pseries: Correct string length in pseries_of_derive_parent()
  powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry
  powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)
  powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan
  powerpc/fsl: Add #clock-cells and clockgen label to clockgen nodes
  powerpc: handle error case in cpm_muram_alloc()
  powerpc: mpic: use IRQCHIP_SKIP_SET_WAKE instead of redundant mpic_irq_set_wake
  powerpc/book3e-64: Enable kexec
  powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop
  powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32
  powerpc/book3e-64/kexec: Enable SMP release
  powerpc/book3e-64/kexec: create an identity TLB mapping
  powerpc/book3e-64: Don't limit paca to 256 MiB
  powerpc/book3e/kdump: Enable crash_kexec_wait_realmode
  powerpc/book3e: support CONFIG_RELOCATABLE
  powerpc/booke64: Fix args to copy_and_flush
  powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts
  powerpc/e6500: kexec: Handle hardware threads
  ...
2015-11-05 23:38:43 -08:00
Linus Torvalds
2e3078af2c Merge branch 'akpm' (patches from Andrew)
Merge patch-bomb from Andrew Morton:

 - inotify tweaks

 - some ocfs2 updates (many more are awaiting review)

 - various misc bits

 - kernel/watchdog.c updates

 - Some of mm.  I have a huge number of MM patches this time and quite a
   lot of it is quite difficult and much will be held over to next time.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
  selftests: vm: add tests for lock on fault
  mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
  mm: introduce VM_LOCKONFAULT
  mm: mlock: add new mlock system call
  mm: mlock: refactor mlock, munlock, and munlockall code
  kasan: always taint kernel on report
  mm, slub, kasan: enable user tracking by default with KASAN=y
  kasan: use IS_ALIGNED in memory_is_poisoned_8()
  kasan: Fix a type conversion error
  lib: test_kasan: add some testcases
  kasan: update reference to kasan prototype repo
  kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
  kasan: various fixes in documentation
  kasan: update log messages
  kasan: accurately determine the type of the bad access
  kasan: update reported bug types for kernel memory accesses
  kasan: update reported bug types for not user nor kernel memory accesses
  mm/kasan: prevent deadlock in kasan reporting
  mm/kasan: don't use kasan shadow pointer in generic functions
  mm/kasan: MODULE_VADDR is not available on all archs
  ...
2015-11-05 23:10:54 -08:00
Eric B Munson
de60f5f10c mm: introduce VM_LOCKONFAULT
The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be used
this can incur a high penalty for locking.

For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well).  For the security example, any application transacting in
data that cannot be swapped out (credit card data, medical records, etc).

This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in.  The VM_LOCKONFAULT flag will be used together with VM_LOCKED
and has no effect when set without VM_LOCKED.  Setting the VM_LOCKONFAULT
flag for a VMA will cause pages faulted into that VMA to be added to the
unevictable LRU when they are faulted or if they are already present, but
will not cause any missing pages to be faulted in.

Exposing this new lock state means that we cannot overload the meaning of
the FOLL_POPULATE flag any longer.  Prior to this patch it was used to
mean that the VMA for a fault was locked.  This means we need the new
FOLL_MLOCK flag to communicate the locked state of a VMA.  FOLL_POPULATE
will now only control if the VMA should be populated and in the case of
VM_LOCKONFAULT, it will not be set.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Eric B Munson
a8ca5d0ecb mm: mlock: add new mlock system call
With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added.  mlock2 is trivial at the moment, but a follow on patch will add a
new mlock state making it useful.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Tetsuo Handa
5ba97bf9d8 mm: remove refresh_cpu_vm_stats() definition for !SMP kernel
refresh_cpu_vm_stats(int cpu) is no longer referenced by !SMP kernel
since Linux 3.12.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Johannes Weiner
6071ca5201 mm: page_counter: let page_counter_try_charge() return bool
page_counter_try_charge() currently returns 0 on success and -ENOMEM on
failure, which is surprising behavior given the function name.

Make it follow the expected pattern of try_stuff() functions that return a
boolean true to indicate success, or false for failure.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Hugh Dickins
45637bab30 mm: rename mem_cgroup_migrate to mem_cgroup_replace_page
After v4.3's commit 0610c25daa ("memcg: fix dirty page migration")
mem_cgroup_migrate() doesn't have much to offer in page migration: convert
migrate_misplaced_transhuge_page() to set_page_memcg() instead.

Then rename mem_cgroup_migrate() to mem_cgroup_replace_page(), since its
remaining callers are replace_page_cache_page() and shmem_replace_page():
both of whom passed lrucare true, so just eliminate that argument.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Vladimir Davydov
706874e909 mm: do not inc NR_PAGETABLE if ptlock_init failed
If ALLOC_SPLIT_PTLOCKS is defined, ptlock_init may fail, in which case we
shouldn't increment NR_PAGETABLE.

Since small allocations, such as ptlock, normally do not fail (currently
they can fail if kmemcg is used though), this patch does not really fix
anything and should be considered as a code cleanup.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Vladimir Davydov
df4065516b memcg: simplify and inline __mem_cgroup_from_kmem
Before the previous patch ("memcg: unify slab and other kmem pages
charging"), __mem_cgroup_from_kmem had to handle two types of kmem - slab
pages and pages allocated with alloc_kmem_pages - memcg in the page
struct.  Now we can unify it.  Since after it, this function becomes tiny
we can fold it into mem_cgroup_from_kmem.

[hughd@google.com: move mem_cgroup_from_kmem into list_lru.c]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Vladimir Davydov
f3ccb2c422 memcg: unify slab and other kmem pages charging
We have memcg_kmem_charge and memcg_kmem_uncharge methods for charging and
uncharging kmem pages to memcg, but currently they are not used for
charging slab pages (i.e.  they are only used for charging pages allocated
with alloc_kmem_pages).  The only reason why the slab subsystem uses
special helpers, memcg_charge_slab and memcg_uncharge_slab, is that it
needs to charge to the memcg of kmem cache while memcg_charge_kmem charges
to the memcg that the current task belongs to.

To remove this diversity, this patch adds an extra argument to
__memcg_kmem_charge that can be a pointer to a memcg or NULL.  If it is
not NULL, the function tries to charge to the memcg it points to,
otherwise it charge to the current context.  Next, it makes the slab
subsystem use this function to charge slab pages.

Since memcg_charge_kmem and memcg_uncharge_kmem helpers are now used only
in __memcg_kmem_charge and __memcg_kmem_uncharge, they are inlined.  Since
__memcg_kmem_charge stores a pointer to the memcg in the page struct, we
don't need memcg_uncharge_slab anymore and can use free_kmem_pages.
Besides, one can now detect which memcg a slab page belongs to by reading
/proc/kpagecgroup.

Note, this patch switches slab to charge-after-alloc design.  Since this
design is already used for all other memcg charges, it should not make any
difference.

[hannes@cmpxchg.org: better to have an outer function than a magic parameter for the memcg lookup]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Vladimir Davydov
d05e83a6f8 memcg: simplify charging kmem pages
Charging kmem pages proceeds in two steps.  First, we try to charge the
allocation size to the memcg the current task belongs to, then we allocate
a page and "commit" the charge storing the pointer to the memcg in the
page struct.

Such a design looks overcomplicated, because there is not much sense in
trying charging the allocation before actually allocating a page: we won't
be able to consume much memory over the limit even if we charge after
doing the actual allocation, besides we already charge user pages post
factum, so being pedantic with kmem pages just looks pointless.

So this patch simplifies the design by merging the "charge" and the
"commit" steps into the same function, which takes the allocated page.

Also, rename the charge and uncharge methods to memcg_kmem_charge and
memcg_kmem_uncharge and make the charge method return error code instead
of bool to conform to mem_cgroup_try_charge.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
yalin wang
f7ae3a95ea include/linux/vm_event_item.h: change HIGHMEM_ZONE macro definition
Change HIGHMEM_ZONE to be the same as the DMA_ZONE macro.

Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Andrew Morton
c2d42c16ad mm/vmstat.c: uninline node_page_state()
With x86_64 (config http://ozlabs.org/~akpm/config-akpm2.txt) and old gcc
(4.4.4), drivers/base/node.c:node_read_meminfo() is using 2344 bytes of
stack.  Uninlining node_page_state() reduces this to 440 bytes.

The stack consumption issue is fixed by newer gcc (4.8.4) however with
that compiler this patch reduces the node.o text size from 7314 bytes to
4578.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Vineet Gupta
3ca65c19dd mm: optimize PageHighMem() check
This came up when implementing HIHGMEM/PAE40 for ARC.  The kmap() /
kmap_atomic() generated code seemed needlessly bloated due to the way
PageHighMem() macro is implemented.  It derives the exact zone for page
and then does pointer subtraction with first zone to infer the zone_type.
The pointer arithmatic in turn generates the code bloat.

PageHighMem(page)
  is_highmem(page_zone(page))
     zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones

Instead use is_highmem_idx() to work on zone_type available in page flags

   ----- Before -----
80756348:	mov_s      r13,r0
8075634a:	ld_s       r2,[r13,0]
8075634c:	lsr_s      r2,r2,30
8075634e:	mpy        r2,r2,0x2a4
80756352:	add_s      r2,r2,0x80aef880
80756358:	ld_s       r3,[r2,28]
8075635a:	sub_s      r2,r2,r3
8075635c:	breq       r2,0x2a4,80756378 <kmap+0x48>
80756364:	breq       r2,0x548,80756378 <kmap+0x48>

   ----- After  -----
80756330:	mov_s      r13,r0
80756332:	ld_s       r2,[r13,0]
80756334:	lsr_s      r2,r2,30
80756336:	sub_s      r2,r2,1
80756338:	brlo       r2,2,80756348 <kmap+0x30>

For x86 defconfig build (32 bit only) it saves around 900 bytes.
For ARC defconfig with HIGHMEM, it saved around 2K bytes.

   ---->8-------
./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
function                                     old     new   delta
saveable_page                                162     154      -8
saveable_highmem_page                        154     146      -8
skb_gro_reset_offset                         147     131     -16
...
...
__change_page_attr_set_clr                  1715    1678     -37
setup_data_read                              434     394     -40
mon_bin_event                               1967    1927     -40
swsusp_save                                 1148    1105     -43
_set_pages_array                             549     493     -56
   ---->8-------

e.g. For ARC kmap()

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jennifer Herbert <jennifer.herbert@citrix.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
David Rientjes
da39da3a54 mm, oom: remove task_lock protecting comm printing
The oom killer takes task_lock() in a couple of places solely to protect
printing the task's comm.

A process's comm, including current's comm, may change due to
/proc/pid/comm or PR_SET_NAME.

The comm will always be NULL-terminated, so the worst race scenario would
only be during update.  We can tolerate a comm being printed that is in
the middle of an update to avoid taking the lock.

Other locations in the kernel have already dropped task_lock() when
printing comm, so this is consistent.

Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Vlastimil Babka
2d1e10412c mm, compaction: distinguish contended status in tracepoints
Compaction returns prematurely with COMPACT_PARTIAL when contended or has
fatal signal pending.  This is ok for the callers, but might be misleading
in the traces, as the usual reason to return COMPACT_PARTIAL is that we
think the allocation should succeed.  After this patch we distinguish the
premature ending condition in the mm_compaction_finished and
mm_compaction_end tracepoints.

The contended status covers the following reasons:
- lock contention or need_resched() detected in async compaction
- fatal signal pending
- too many pages isolated in the zone (only for async compaction)
Further distinguishing the exact reason seems unnecessary for now.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Vlastimil Babka
fa6c7b46aa mm, compaction: export tracepoints status strings to userspace
Some compaction tracepoints convert the integer return values to strings
using the compaction_status_string array.  This works for in-kernel
printing, but not userspace trace printing of raw captured trace such as
via trace-cmd report.

This patch converts the private array to appropriate tracepoint macros
that result in proper userspace support.

trace-cmd output before:
transhuge-stres-4235  [000]   453.149280: mm_compaction_finished: node=0
  zone=ffffffff81815d7a order=9 ret=

after:
transhuge-stres-4235  [000]   453.149280: mm_compaction_finished: node=0
  zone=ffffffff81815d7a order=9 ret=partial

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Yaowei Bai
13308ca9ef mm/memcontrol: make mem_cgroup_inactive_anon_is_low() return bool
Make mem_cgroup_inactive_anon_is_low return bool due to this particular
function only using either one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Junichi Nomura
aa750fd71c mm/filemap.c: make global sync not clear error status of individual inodes
filemap_fdatawait() is a function to wait for on-going writeback to
complete but also consume and clear error status of the mapping set during
writeback.

The latter functionality is critical for applications to detect writeback
error with system calls like fsync(2)/fdatasync(2).

However filemap_fdatawait() is also used by sync(2) or FIFREEZE ioctl,
which don't check error status of individual mappings.

As a result, fsync() may not be able to detect writeback error if events
happen in the following order:

   Application                    System admin
   ----------------------------------------------------------
   write data on page cache
                                  Run sync command
                                  writeback completes with error
                                  filemap_fdatawait() clears error
   fsync returns success
   (but the data is not on disk)

This patch adds filemap_fdatawait_keep_errors() for call sites where
writeback error is not handled so that they don't clear error status.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Naoya Horiguchi
5d317b2b65 mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status
Currently there's no easy way to get per-process usage of hugetlb pages,
which is inconvenient because userspace applications which use hugetlb
typically want to control their processes on the basis of how much memory
(including hugetlb) they use.  So this patch simply provides easy access
to the info via /proc/PID/status.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Joern Engel <joern@logfs.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Roman Gushchin
600e19afc5 mm: use only per-device readahead limit
Maximal readahead size is limited now by two values:
 1) by global 2Mb constant (MAX_READAHEAD in max_sane_readahead())
 2) by configurable per-device value* (bdi->ra_pages)

There are devices, which require custom readahead limit.
For instance, for RAIDs it's calculated as number of devices
multiplied by chunk size times 2.

Readahead size can never be larger than bdi->ra_pages * 2 value
(POSIX_FADV_SEQUNTIAL doubles readahead size).

If so, why do we need two limits?
I suggest to completely remove this max_sane_readahead() stuff and
use per-device readahead limit everywhere.

Also, using right readahead size for RAID disks can significantly
increase i/o performance:

before:
  dd if=/dev/md2 of=/dev/null bs=100M count=100
  100+0 records in
  100+0 records out
  10485760000 bytes (10 GB) copied, 12.9741 s, 808 MB/s

after:
  $ dd if=/dev/md2 of=/dev/null bs=100M count=100
  100+0 records in
  100+0 records out
  10485760000 bytes (10 GB) copied, 8.91317 s, 1.2 GB/s

(It's an 8-disks RAID5 storage).

This patch doesn't change sys_readahead and madvise(MADV_WILLNEED)
behavior introduced by 6d2be915e5 ("mm/readahead.c: fix readahead
failure for memoryless NUMA nodes and limit readahead pages").

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: onstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Yaowei Bai
b171e40930 mm/page_alloc: remove unused parameter in init_currently_empty_zone()
Commit a2f3aa0257 ("[PATCH] Fix sparsemem on Cell") fixed an oops
experienced on the Cell architecture when init-time functions,
early_*(), are called at runtime by introducing an 'enum memmap_context'
parameter to memmap_init_zone() and init_currently_empty_zone().  This
parameter is intended to be used to tell whether the call of these two
functions is being made on behalf of a hotplug event, or happening at
boot-time.  However, init_currently_empty_zone() does not use this
parameter at all, so remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Alexander Kuleshov
35bd16a227 mm/memblock: make memblock_remove_range() static
memblock_remove_range() is only used in the mm/memblock.c, so we can make
it static.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Tejun Heo
7f822c24c2 memcg: drop unnecessary cold-path tests from __memcg_kmem_bypass()
__memcg_kmem_bypass() decides whether a kmem allocation should be bypassed
to the root memcg.  Some conditions that it tests are valid criteria
regarding who should be held accountable; however, there are a couple
unnecessary tests for cold paths - __GFP_FAIL and fatal_signal_pending().

The previous patch updated try_charge() to handle both __GFP_FAIL and
dying tasks correctly and the only thing these two tests are doing is
making accounting less accurate and sprinkling tests for cold path
conditions in the hot paths.  There's nothing meaningful gained by these
extra tests.

This patch removes the two unnecessary tests from __memcg_kmem_bypass().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Tejun Heo
cbfb479809 memcg: collect kmem bypass conditions into __memcg_kmem_bypass()
memcg_kmem_newpage_charge() and memcg_kmem_get_cache() are testing the
same series of conditions to decide whether to bypass kmem accounting.
Collect the tests into __memcg_kmem_bypass().

This is pure refactoring.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Tejun Heo
b23afb93d3 memcg: punt high overage reclaim to return-to-userland path
Currently, try_charge() tries to reclaim memory synchronously when the
high limit is breached; however, if the allocation doesn't have
__GFP_WAIT, synchronous reclaim is skipped.  If a process performs only
speculative allocations, it can blow way past the high limit.  This is
actually easily reproducible by simply doing "find /".  slab/slub
allocator tries speculative allocations first, so as long as there's
memory which can be consumed without blocking, it can keep allocating
memory regardless of the high limit.

This patch makes try_charge() always punt the over-high reclaim to the
return-to-userland path.  If try_charge() detects that high limit is
breached, it adds the overage to current->memcg_nr_pages_over_high and
schedules execution of mem_cgroup_handle_over_high() which performs
synchronous reclaim from the return-to-userland path.

As long as kernel doesn't have a run-away allocation spree, this should
provide enough protection while making kmemcg behave more consistently.
It also has the following benefits.

- All over-high reclaims can use GFP_KERNEL regardless of the specific
  gfp mask in use, e.g. GFP_NOFS, when the limit was breached.

- It copes with prio inversion.  Previously, a low-prio task with
  small memory.high might perform over-high reclaim with a bunch of
  locks held.  If a higher prio task needed any of these locks, it
  would have to wait until the low prio task finished reclaim and
  released the locks.  By handing over-high reclaim to the task exit
  path this issue can be avoided.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Tejun Heo
626ebc4100 memcg: flatten task_struct->memcg_oom
task_struct->memcg_oom is a sub-struct containing fields which are used
for async memcg oom handling.  Most task_struct fields aren't packaged
this way and it can lead to unnecessary alignment paddings.  This patch
flattens it.

* task.memcg_oom.memcg          -> task.memcg_in_oom
* task.memcg_oom.gfp_mask	-> task.memcg_oom_gfp_mask
* task.memcg_oom.order          -> task.memcg_oom_order
* task.memcg_oom.may_oom        -> task.memcg_may_oom

In addition, task.memcg_may_oom is relocated to where other bitfields are
which reduces the size of task_struct.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Andrew Morton
0ab32b6f1b uaccess: reimplement probe_kernel_address() using probe_kernel_read()
probe_kernel_address() is basically the same as the (later added)
probe_kernel_read().

The return value on EFAULT is a bit different: probe_kernel_address()
returns number-of-bytes-not-copied whereas probe_kernel_read() returns
-EFAULT.  All callers have been checked, none cared.

probe_kernel_read() can be overridden by the architecture whereas
probe_kernel_address() cannot.  parisc, blackfin and um do this, to insert
additional checking.  Hence this patch possibly fixes obscure bugs,
although there are only two probe_kernel_address() callsites outside
arch/.

My first attempt involved removing probe_kernel_address() entirely and
converting all callsites to use probe_kernel_read() directly, but that got
tiresome.

This patch shrinks mm/slab_common.o by 218 bytes.  For a single
probe_kernel_address() callsite.

Cc: Steven Miao <realmz6@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Rasmus Villemoes
8748dd5c98 include/linux/compiler-gcc.h: hide assume_aligned attribute from sparse
The patch "slab.h: sprinkle __assume_aligned attributes" causes *tons* of
whinges if you do 'make C=2' with sparse 0.5.0:

  CHECK   drivers/media/usb/pwc/pwc-if.c
include/linux/slab.h:307:43: error: attribute '__assume_aligned__': unknown attribute
include/linux/slab.h:308:58: error: attribute '__assume_aligned__': unknown attribute
include/linux/slab.h:337:73: error: attribute '__assume_aligned__': unknown attribute
include/linux/slab.h:375:74: error: attribute '__assume_aligned__': unknown attribute
include/linux/slab.h:378:80: error: attribute '__assume_aligned__': unknown attribute

sparse apparently pretends to be gcc >= 4.9, yet isn't prepared to handle
all the function attributes supported by those gccs and complains loudly.
So hide the definition of __assume_aligned from it (so that the generic
one in compiler.h gets used).

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Christopher Li <sparse@chrisli.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Rasmus Villemoes
a744fd17b5 compiler.h: add support for function attribute assume_aligned
gcc 4.9 added the function attribute assume_aligned, indicating to the
caller that the returned pointer may be assumed to have a certain minimal
alignment.  This is useful if, for example, the return value is passed to
memset().  Add a shorthand macro for that.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Denis Kirjanov
fda901241f slab: convert slab_is_available() to boolean
A good candidate to return a boolean result.

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Cc: Christoph Lameter <cl@linux.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Don Zickus
ac1f591249 kernel/watchdog.c: add sysctl knob hardlockup_panic
The only way to enable a hardlockup to panic the machine is to set
'nmi_watchdog=panic' on the kernel command line.

This makes it awkward for end users and folks who want to run automate
tests (like myself).

Mimic the softlockup_panic knob and create a /proc/sys/kernel/hardlockup_panic
knob.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Jiri Kosina
55537871ef kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup
In many cases of hardlockup reports, it's actually not possible to know
why it triggered, because the CPU that got stuck is usually waiting on a
resource (with IRQs disabled) in posession of some other CPU is holding.

IOW, we are often looking at the stacktrace of the victim and not the
actual offender.

Introduce sysctl / cmdline parameter that makes it possible to have
hardlockup detector perform all-CPU backtrace.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Kirill A. Shutemov
720abae3d6 rcu: force alignment on struct callback_head/rcu_head
Make struct callback_head aligned to size of pointer.  On most
architectures it happens naturally due ABI requirements, but some
architectures (like CRIS) have weird ABI and we need to ask it explicitly.

The alignment is required to guarantee that bits 0 and 1 of @next will be
clear under normal conditions -- as long as we use call_rcu(),
call_rcu_bh(), call_rcu_sched(), or call_srcu() to queue callback.

This guarantee is important for few reasons:
 - future call_rcu_lazy() will make use of lower bits in the pointer;
 - the structure shares storage spacer in struct page with @compound_head,
   which encode PageTail() in bit 0. The guarantee is needed to avoid
   false-positive PageTail().

False postive PageTail() caused crash on crisv32[1].  It happend due
misaligned task_struct->rcu, which was byte-aligned.

[1] http://lkml.kernel.org/r/55FAEA67.9000102@roeck-us.net

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Linus Torvalds
2c302e7e41 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller:
 "Just a couple of fixes/cleanups:

   - Correct NUMA latency calculations on sparc64, from Nitin Gupta.

   - ASI_ST_BLKINIT_MRU_S value was wrong, from Rob Gardner.

   - Fix non-faulting load handling of non-quad values, also from Rob
     Gardner.

   - Cleanup VISsave assembler, from Sam Ravnborg.

   - Fix iommu-common code so it doesn't emit rediculous warnings on
     some architectures, particularly ARM"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix numa distance values
  sparc64: Don't restrict fp regs for no-fault loads
  iommu-common: Fix error code used in iommu_tbl_range_{alloc,free}().
  sparc64: use ENTRY/ENDPROC in VISsave
  sparc64: Fix incorrect ASI_ST_BLKINIT_MRU_S value
2015-11-05 16:34:48 -08:00
Linus Torvalds
933425fb00 s390: A bunch of fixes and optimizations for interrupt and time
handling.
 
 PPC: Mostly bug fixes.
 
 ARM: No big features, but many small fixes and prerequisites including:
 - a number of fixes for the arch-timer
 - introducing proper level-triggered semantics for the arch-timers
 - a series of patches to synchronously halt a guest (prerequisite for
   IRQ forwarding)
 - some tracepoint improvements
 - a tweak for the EL2 panic handlers
 - some more VGIC cleanups getting rid of redundant state
 
 x86: quite a few changes:
 
 - support for VT-d posted interrupts (i.e. PCI devices can inject
 interrupts directly into vCPUs).  This introduces a new component (in
 virt/lib/) that connects VFIO and KVM together.  The same infrastructure
 will be used for ARM interrupt forwarding as well.
 
 - more Hyper-V features, though the main one Hyper-V synthetic interrupt
 controller will have to wait for 4.5.  These will let KVM expose Hyper-V
 devices.
 
 - nested virtualization now supports VPID (same as PCID but for vCPUs)
 which makes it quite a bit faster
 
 - for future hardware that supports NVDIMM, there is support for clflushopt,
 clwb, pcommit
 
 - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
 userspace, which reduces the attack surface of the hypervisor
 
 - obligatory smattering of SMM fixes
 
 - on the guest side, stable scheduler clock support was rewritten to not
 require help from the hypervisor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
 f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
 VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
 p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
 PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
 iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
 =iv2z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.4.

  s390:
     A bunch of fixes and optimizations for interrupt and time handling.

  PPC:
     Mostly bug fixes.

  ARM:
     No big features, but many small fixes and prerequisites including:

      - a number of fixes for the arch-timer

      - introducing proper level-triggered semantics for the arch-timers

      - a series of patches to synchronously halt a guest (prerequisite
        for IRQ forwarding)

      - some tracepoint improvements

      - a tweak for the EL2 panic handlers

      - some more VGIC cleanups getting rid of redundant state

  x86:
     Quite a few changes:

      - support for VT-d posted interrupts (i.e. PCI devices can inject
        interrupts directly into vCPUs).  This introduces a new
        component (in virt/lib/) that connects VFIO and KVM together.
        The same infrastructure will be used for ARM interrupt
        forwarding as well.

      - more Hyper-V features, though the main one Hyper-V synthetic
        interrupt controller will have to wait for 4.5.  These will let
        KVM expose Hyper-V devices.

      - nested virtualization now supports VPID (same as PCID but for
        vCPUs) which makes it quite a bit faster

      - for future hardware that supports NVDIMM, there is support for
        clflushopt, clwb, pcommit

      - support for "split irqchip", i.e.  LAPIC in kernel +
        IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
        the hypervisor

      - obligatory smattering of SMM fixes

      - on the guest side, stable scheduler clock support was rewritten
        to not require help from the hypervisor"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
  KVM: VMX: Fix commit which broke PML
  KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
  KVM: x86: allow RSM from 64-bit mode
  KVM: VMX: fix SMEP and SMAP without EPT
  KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
  KVM: device assignment: remove pointless #ifdefs
  KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
  KVM: x86: zero apic_arb_prio on reset
  drivers/hv: share Hyper-V SynIC constants with userspace
  KVM: x86: handle SMBASE as physical address in RSM
  KVM: x86: add read_phys to x86_emulate_ops
  KVM: x86: removing unused variable
  KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
  KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: s390: use simple switch statement as multiplexer
  KVM: s390: drop useless newline in debugging data
  KVM: s390: SCA must not cross page boundaries
  KVM: arm: Do not indent the arguments of DECLARE_BITMAP
  ...
2015-11-05 16:26:26 -08:00
Linus Torvalds
39cf7c3981 IOMMU Updates for Linux v4.4
This time including:
 
 	* A new IOMMU driver for s390 pci devices
 
 	* Common dma-ops support based on iommu-api for ARM64. The plan is to
 	  use this as a basis for ARM32 and hopefully other architectures as
 	  well in the future.
 
 	* MSI support for ARM-SMMUv3
 
 	* Cleanups and dead code removal in the AMD IOMMU driver
 
 	* Better RMRR handling for the Intel VT-d driver
 
 	* Various other cleanups and small fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJWOz7hAAoJECvwRC2XARrjbvYQALwtITTA5iTm0y/ApwNMxI7n
 pZpjZVPoBPNsGBc4t/MT8pVhUSdmpBOljbV4Y4CayL1mSSB6Bl2gooZjd66m7Z81
 qMJYEVWhFQqVsIKkCSNOgaO7W5y+xt3rTgqN6vCu86/CCDfKrTPP/+CRl1T/z9bo
 1J8ioM3KnZG9KzG8JuXYFg5wwbKToaBh6swSmj+O4U9hru7zV/ILP7ikcc9pyMji
 12WbzCqchRORsJZD65xMRYAqRaPNN/3IlDejs00TOFhY3qpWgEgFUucyeRJBJ/+q
 K4U8T5vZsnr1a04l7/BeYbLmP7y/9Qv0N0xMGtTyoy/w/BieGqRWu4hHhqf/44NO
 EhCSXcEThMNCGTjP2VWC4dnQ/s7Y8OmSW9nCreUcFVxHoE5LfDoh8RngA2fpeNuS
 ixb3OwP+YXHN9Ck+1BQqQCeBznsPTLuDxlhRjCJsWntIfMSkXebOkz83YxyZ9b0Q
 gFvptfuknU7cotUwWa3dg8RiUB8kNlKJyEEByaVpWEbEOabnONKEMkstvuBx6Ots
 kA63wbe7QcPgbUYuq7g0nijDw6E2aEtf0nx2Xx4ZDL932qjg/xUkiBpmbDXHw4Gu
 nimNXVQtbCzF74SyTvxEtupiijOTm5eHtoKtg0mYnqPZ+V9eOwEvW8IHaFFf8XHD
 SecikoTtH1Q4RVtqOcAQ
 =jLlB
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "This time including:

   - A new IOMMU driver for s390 pci devices

   - Common dma-ops support based on iommu-api for ARM64.  The plan is
     to use this as a basis for ARM32 and hopefully other architectures
     as well in the future.

   - MSI support for ARM-SMMUv3

   - Cleanups and dead code removal in the AMD IOMMU driver

   - Better RMRR handling for the Intel VT-d driver

   - Various other cleanups and small fixes"

* tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
  iommu/vt-d: Fix return value check of parse_ioapics_under_ir()
  iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope()
  iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir
  iommu: Move default domain allocation to iommu_group_get_for_dev()
  iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev
  iommu/arm-smmu: Switch to device_group call-back
  iommu/fsl: Convert to device_group call-back
  iommu: Add device_group call-back to x86 iommu drivers
  iommu: Add generic_device_group() function
  iommu: Export and rename iommu_group_get_for_pci_dev()
  iommu: Revive device_group iommu-ops call-back
  iommu/amd: Remove find_last_devid_on_pci()
  iommu/amd: Remove first/last_device handling
  iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL
  iommu/amd: Cleanup buffer allocation
  iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu
  iommu/amd: Align DTE flag definitions
  iommu/amd: Remove old alias handling code
  iommu/amd: Set alias DTE in do_attach/do_detach
  iommu/amd: WARN when __[attach|detach]_device are called with irqs enabled
  ...
2015-11-05 16:12:10 -08:00
Linus Torvalds
ab1228e42e Merge git://git.infradead.org/intel-iommu
Pull intel iommu updates from David Woodhouse:
 "This adds "Shared Virtual Memory" (aka PASID support) for the Intel
  IOMMU.  This allows devices to do DMA using process address space,
  translated through the normal CPU page tables for the relevant mm.

  With corresponding support added to the i915 driver, this has been
  tested with the graphics device on Skylake.  We don't have the
  required TLP support in our PCIe root ports for supporting discrete
  devices yet, so it's only integrated devices that can do it so far"

* git://git.infradead.org/intel-iommu: (23 commits)
  iommu/vt-d: Fix rwxp flags in SVM device fault callback
  iommu/vt-d: Expose struct svm_dev_ops without CONFIG_INTEL_IOMMU_SVM
  iommu/vt-d: Clean up pasid_enabled() and ecs_enabled() dependencies
  iommu/vt-d: Handle Caching Mode implementations of SVM
  iommu/vt-d: Fix SVM IOTLB flush handling
  iommu/vt-d: Use dev_err(..) in intel_svm_device_to_iommu(..)
  iommu/vt-d: fix a loop in prq_event_thread()
  iommu/vt-d: Fix IOTLB flushing for global pages
  iommu/vt-d: Fix address shifting in page request handler
  iommu/vt-d: shift wrapping bug in prq_event_thread()
  iommu/vt-d: Fix NULL pointer dereference in page request error case
  iommu/vt-d: Implement SVM_FLAG_SUPERVISOR_MODE for kernel access
  iommu/vt-d: Implement SVM_FLAG_PRIVATE_PASID to allocate unique PASIDs
  iommu/vt-d: Add callback to device driver on page faults
  iommu/vt-d: Implement page request handling
  iommu/vt-d: Generalise DMAR MSI setup to allow for page request events
  iommu/vt-d: Implement deferred invalidate for SVM
  iommu/vt-d: Add basic SVM PASID support
  iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS
  iommu/vt-d: Add initial support for PASID tables
  ...
2015-11-05 16:06:52 -08:00
Linus Torvalds
1873499e13 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem update from James Morris:
 "This is mostly maintenance updates across the subsystem, with a
  notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
  maintainer of that"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
  apparmor: clarify CRYPTO dependency
  selinux: Use a kmem_cache for allocation struct file_security_struct
  selinux: ioctl_has_perm should be static
  selinux: use sprintf return value
  selinux: use kstrdup() in security_get_bools()
  selinux: use kmemdup in security_sid_to_context_core()
  selinux: remove pointless cast in selinux_inode_setsecurity()
  selinux: introduce security_context_str_to_sid
  selinux: do not check open perm on ftruncate call
  selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
  KEYS: Merge the type-specific data with the payload data
  KEYS: Provide a script to extract a module signature
  KEYS: Provide a script to extract the sys cert list from a vmlinux file
  keys: Be more consistent in selection of union members used
  certs: add .gitignore to stop git nagging about x509_certificate_list
  KEYS: use kvfree() in add_key
  Smack: limited capability for changing process label
  TPM: remove unnecessary little endian conversion
  vTPM: support little endian guests
  char: Drop owner assignment from i2c_driver
  ...
2015-11-05 15:32:38 -08:00
Linus Torvalds
3460b01b12 Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "Seven audit patches for 4.4, but really only one of any significant
  value, the remainder are trivial cleanups that are described well
  enough in the patch descriptions.

  The one significant patch is an attempt to make communication between
  the kernel's audit subsystem and the userspace audit daemon a bit more
  robust by retrying on certain transient error conditions.  All in all,
  it's a pretty small set of patches this time around with just fixes
  and cleanups"

* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
  audit: make audit_log_common_recv_msg() a void function
  audit: removing unused variable
  audit: fix comment block whitespace
  audit: audit_tree_match can be boolean
  audit: audit_string_contains_control can be boolean
  audit: audit_dummy_context can be boolean
  audit: try harder to send to auditd upon netlink failure
2015-11-05 15:26:43 -08:00
Linus Torvalds
69234acee5 Merge branch 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "The cgroup core saw several significant updates this cycle:

   - percpu_rwsem for threadgroup locking is reinstated.  This was
     temporarily dropped due to down_write latency issues.  Oleg's
     rework of percpu_rwsem which is scheduled to be merged in this
     merge window resolves the issue.

   - On the v2 hierarchy, when controllers are enabled and disabled, all
     operations are atomic and can fail and revert cleanly.  This allows
     ->can_attach() failure which is necessary for cpu RT slices.

   - Tasks now stay associated with the original cgroups after exit
     until released.  This allows tracking resources held by zombies
     (e.g.  pids) and makes it easy to find out where zombies came from
     on the v2 hierarchy.  The pids controller was broken before these
     changes as zombies escaped the limits; unfortunately, updating this
     behavior required too many invasive changes and I don't think it's
     a good idea to backport them, so the pids controller on 4.3, the
     first version which included the pids controller, will stay broken
     at least until I'm sure about the cgroup core changes.

   - Optimization of a couple common tests using static_key"

* 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (38 commits)
  cgroup: fix race condition around termination check in css_task_iter_next()
  blkcg: don't create "io.stat" on the root cgroup
  cgroup: drop cgroup__DEVEL__legacy_files_on_dfl
  cgroup: replace error handling in cgroup_init() with WARN_ON()s
  cgroup: add cgroup_subsys->free() method and use it to fix pids controller
  cgroup: keep zombies associated with their original cgroups
  cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock
  cgroup: don't hold css_set_rwsem across css task iteration
  cgroup: reorganize css_task_iter functions
  cgroup: factor out css_set_move_task()
  cgroup: keep css_set and task lists in chronological order
  cgroup: make cgroup_destroy_locked() test cgroup_is_populated()
  cgroup: make css_sets pin the associated cgroups
  cgroup: relocate cgroup_[try]get/put()
  cgroup: move check_for_release() invocation
  cgroup: replace cgroup_has_tasks() with cgroup_is_populated()
  cgroup: make cgroup->nr_populated count the number of populated css_sets
  cgroup: remove an unused parameter from cgroup_task_migrate()
  cgroup: fix too early usage of static_branch_disable()
  cgroup: make cgroup_update_dfl_csses() migrate all target processes atomically
  ...
2015-11-05 14:51:32 -08:00
Linus Torvalds
11eaaadb3e Merge branch 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo:
 "Most are ahci and other device specific additions.  Dan cleaned up
  ahci IRQ handling to prepare for future MSIX changes.  On the libata
  core side, Vinayak updated SG handling so that NCQ commands can be
  issued through SG_IO and Christoph cleaned up code a bit.  There's one
  merge from for-4.3-fixes to include a pata_macio commit that didn't
  get pushed out"

* 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ahci: add new Intel device IDs
  ahci: Add Marvell 88se91a2 device id
  ahci: cleanup ahci_host_activate_multi_irqs
  ahci: ahci_host_activate: kill IRQF_SHARED
  devicetree: bindings: Fixed a few typos
  ahci: qoriq: Disable NCQ on ls2080a SoC
  ahci: qoriq: Rename LS2085A SoC support code to LS2080A
  libata: enable LBA flag in taskfile for ata_scsi_pass_thru()
  libata: add support for NCQ commands for SG interface
  ahci: qoriq: Fix a compiling warning
  pata_it821x: use "const char *" for string literals
  libata: only call ->done once all per-tag ressources are released
  libata: cleanup ata_scsi_qc_complete
  ata: ahci: find eSATA ports and flag them as removable
  libata: samsung_cf: fix handling platform_get_irq result
  ata: pata_macio: Fix module autoload for OF platform driver
  ata: pata_pxa: dmaengine conversion
  ahci: added a new driver for supporting Freescale AHCI sata
  devicetree:bindings: add devicetree bindings for Freescale AHCI
  Revert "ahci: added support for Freescale AHCI sata"
2015-11-05 14:32:54 -08:00
Linus Torvalds
75f5db39ff spi: Updates for v4.4
Quite a lot of activity in SPI this cycle, almost all of it in drivers
 with a few minor improvements and tweaks in the core.
 
  - Updates to pxa2xx to support Intel Broxton and multiple chip selects.
  - Support for big endian in the bcm63xx driver.
  - Multiple slave support for the mt8173
  - New driver for the auxiliary SPI controller in bcm2835 SoCs.
  - Support for Layerscale SoCs in the Freescale DSPI driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOehzAAoJECTWi3JdVIfQTPYH+wYMDG8gAIw2s0AJ4DvVe4qZ
 sOAm1UgUJZxssrEA6BNqbfM0dfRo+oQJKmRd0Dc5n7LEMsYHdI/5yKHk8PCS6ZzD
 iQyQCzbd0thDAqwuPaMP62cyPDHwyJX22VGTsgVnj6AZqAQ+9+g4SPKhFnm1Mlm4
 hmDi6fdSrsqo8k8gkpVN8RFOfVsjAV1dLtAauQRWDHrqMxXURSrKG76eqAqUa5bn
 BLPXBoj5PA0DMLPO2j+ADZwWN723LrI2mSSlc+ThjEX/OIt2OhAoiOTV5RPqaafy
 TIsCkh68q/gYAsL5HtvvmgZByl41FLYiO0Z+rXmWUyMMbnvhZTLws9S2BNpBLuk=
 =DgXG
 -----END PGP SIGNATURE-----

Merge tag 'spi-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi updates from Mark Brown:
 "Quite a lot of activity in SPI this cycle, almost all of it in drivers
  with a few minor improvements and tweaks in the core.

   - Updates to pxa2xx to support Intel Broxton and multiple chip selects.
   - Support for big endian in the bcm63xx driver.
   - Multiple slave support for the mt8173
   - New driver for the auxiliary SPI controller in bcm2835 SoCs.
   - Support for Layerscale SoCs in the Freescale DSPI driver"

* tag 'spi-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (87 commits)
  spi: pxa2xx: Rework self-initiated platform data creation for non-ACPI
  spi: pxa2xx: Add support for Intel Broxton
  spi: pxa2xx: Detect number of enabled Intel LPSS SPI chip select signals
  spi: pxa2xx: Add output control for multiple Intel LPSS chip selects
  spi: pxa2xx: Use LPSS prefix for defines that are Intel LPSS specific
  spi: Add DSPI support for layerscape family
  spi: ti-qspi: improve ->remove() callback
  spi/spi-xilinx: Fix race condition on last word read
  spi: Drop owner assignment from spi_drivers
  spi: Add THIS_MODULE to spi_driver in SPI core
  spi: Setup the master controller driver before setting the chipselect
  spi: dw: replace magic constant by DW_SPI_DR
  spi: mediatek: mt8173 spi multiple devices support
  spi: mediatek: handle controller_data in mtk_spi_setup
  spi: mediatek: remove mtk_spi_config
  spi: mediatek: Update document devicetree bindings to support multiple devices
  spi: fix kernel-doc warnings about missing return desc in spi.c
  spi: fix kernel-doc warnings about missing return desc in spi.h
  spi: pxa2xx: Align a few defines
  spi: pxa2xx: Save other reg_cs_ctrl bits when configuring chip select
  ...
2015-11-05 13:15:12 -08:00
Linus Torvalds
52787e91bf regulator: Updates for v4.4
This is quite a quiet release in terms of volume of patches but it
 includes a couple of really nice core changes - the work Sascha has done
 in particular is something I've wanted to get done for a long time but
 just never got round to myself.  Highlights include:
 
  - Support from Sascha Hauer for setting the voltage of parent supplies
    based on requests from their children.  This is used both to allow
    set_voltage() to work through a dumb switch and to improve the
    efficiency of systems where DCDCs are used to supply LDOs by minimising
    the voltage drop over the LDOs.
  - Removal of regulator_list by Tomeu Vizoso, meaning we're not
    duplicating the device list maintained by the driver core.
  - Support for Wolfson/Cirrus WM8998 and WM1818.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOez1AAoJECTWi3JdVIfQZSoH/jk2EQCG7hSIY6qVviHh+zQI
 TWULXDnldyfT4eCYNAhwe/oQjFs7ngPXz9UpMAQwtWfWNoLBifF+Y5+X413nHY82
 pKoRd+DQ1sIzGZb2YB0kS22Zg49Jilgv3y2Cw0aZuL7aqq7CEUGN3itEeWxqW9j8
 a1ORD4d2zgki4gK9fUBTiur/dIhMn+8Gi4CmBvujsSCQR4Z5UNOjiodTU0NONiBX
 znejoAoLJXzAyAoWN+p6sEvi9Gpx4QTV6tBOo56WPlzBgTX2qOFhhmJiQQ1rIJOv
 MucgNn+cocarWfMyQwvqa0CL4ufjJm2DcYoIhfc2jZip/mAeqcbxpj5bxWzaLec=
 =CX2S
 -----END PGP SIGNATURE-----

Merge tag 'regulator-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator updates from Mark Brown:
 "This is quite a quiet release in terms of volume of patches but it
  includes a couple of really nice core changes - the work Sascha has
  done in particular is something I've wanted to get done for a long
  time but just never got round to myself.

  Highlights include:

   - Support from Sascha Hauer for setting the voltage of parent
     supplies based on requests from their children.  This is used both
     to allow set_voltage() to work through a dumb switch and to improve
     the efficiency of systems where DCDCs are used to supply LDOs by
     minimising the voltage drop over the LDOs.

   - Removal of regulator_list by Tomeu Vizoso, meaning we're not
     duplicating the device list maintained by the driver core.

   - Support for Wolfson/Cirrus WM8998 and WM1818"

* tag 'regulator-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (29 commits)
  regulator: Use regulator_lock_supply() for get_voltage() too
  regulator: arizona: Add regulator specific device tree binding document
  regulator: stw481x: compile on COMPILE_TEST
  regulator: qcom-smd: Correct set_load() unit
  regulator: core: Propagate voltage changes to supply regulators
  regulator: core: Factor out regulator_map_voltage
  regulator: i.MX anatop: Allow supply regulator
  regulator: introduce min_dropout_uV
  regulator: core: create unlocked version of regulator_set_voltage
  regulator: arizona-ldo1: Fix handling of GPIO 0
  regulator: da9053: Update regulator for DA9053 BC silicon support
  regulator: max77802: Separate sections for nodes and properties
  regulator: max77802: Add input supply properties to DT binding doc
  regulator: axp20x: set supply names for AXP22X DC1SW/DC5LDO internally
  regulator: axp20x: Drop AXP221 DC1SW and DC5LDO regulator supplies from bindings
  mfd: tps6105x: Use i2c regmap to access registers
  regulator: act8865: add DT binding for property "active-semi,vsel-high"
  regulator: act8865: support output voltage by VSET2[] bits
  regulator: arizona: add support for WM8998 and WM1814
  regulator: core: create unlocked version of regulator_list_voltage
  ...
2015-11-05 13:06:22 -08:00