Commit Graph

575181 Commits

Author SHA1 Message Date
Stefan Wahren
b17b4ab8ce mmc: sdhci-iproc: define MMC caps in platform data
This patch moves the definition of the MMC capabilities
from the probe function into iproc platform data. After
that we are able to add support for another platform more
easily.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Suggested-by: Stephen Warren <swarren@wwwdotorg.org>
Acked-by: Scott Branden <sbranden@broadcom.com>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:54 +01:00
Wolfram Sang
1ac109ddc9 mmc: mmc_test: mention that '0' runs all tests
I had to use the source to determine what I need to write to 'test' so
that all tests are run. Let's mention this explicitly in 'testlist'.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:54 +01:00
Wolfram Sang
963b14ffbc mmc: mmcif: don't depend on MMC_BLOCK
I don't see a reason why a host driver should depend on the card driver.
It also prevents that we can use the mmc_test driver. So, remove it.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:53 +01:00
Wolfram Sang
4ec96b4cbd mmc: make MAN_BKOPS_EN message a debug
IMO this info is only useful for developers. Most users won't need this
information, since there is not much they can do about it.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:53 +01:00
Arnd Bergmann
358399f8ca mmc: omap_hsmmc: don't print uninitialized variables
When DT based probing is used but the DMA request fails, the
driver will print uninitialized stack data from the rx_req
and tx_req variables, as indicated by this warning:

drivers/mmc/host/omap_hsmmc.c: In function 'omap_hsmmc_probe':
drivers/mmc/host/omap_hsmmc.c:2162:3: warning: 'rx_req' may be used uninitialized in this function [-Wmaybe-uninitialized]
   dev_err(mmc_dev(host->mmc), "unable to obtain RX DMA engine channel %u\n", rx_req);

This removes the DMA request line number from the warning, which
is the easiest solution and won't hurt us any more as we are
planning to remove the legacy code path anyway.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:52 +01:00
Fu, Zhonghui
4e6a2ef941 mmc: sdhci-acpi: enable sdhci-acpi device to suspend/resume asynchronously
This patch enables sdhci-acpi devices to suspend/resume asynchronously.
This will improve system suspend/resume speed. After enabling the
sdhci-acpi devices and all their child devices to suspend/resume
asynchronously on ASUS T100TA, the system suspend-to-idle time is
reduced from 1645ms to 1089ms, and the system resume time is reduced
from 940ms to 908ms.

Signed-off-by: Zhonghui Fu <zhonghui.fu@linux.intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:52 +01:00
Fu, Zhonghui
ccf7bfdc36 mmc: core: enable mmc host device to suspend/resume asynchronously
This patch enables mmc hosts to suspend/resume asynchronously.
This will improve system suspend/resume speed. After applying
this patch and enabling all mmc hosts' child devices to
suspend/resume asynchronously on ASUS T100TA, the system
suspend-to-idle time is reduced from 1645ms to 1107ms, and the
system resume time is reduced from 940ms to 914ms.

Signed-off-by: Zhonghui Fu <zhonghui.fu@linux.intel.com>
Acked-by: Venu Byravarasu <vbyravarasu@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:51 +01:00
Chen-Yu Tsai
f771f6e832 mmc: sunxi: Support vqmmc regulator
eMMC chips require 2 power supplies, vmmc for internal logic, and vqmmc
for driving output buffers. vqmmc also controls signaling voltage. Most
boards we've seen use the same regulator for both, nevertheless the 2
have different usages, and should be set separately.

This patch adds support for vqmmc regulator supply, including voltage
switching. The MMC core can use this to try different signaling voltages
for eMMC.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:51 +01:00
Chen-Yu Tsai
4159215ac9 mmc: sunxi: Return error on mmc_regulator_set_ocr() fail in .set_ios op
Let .set_ios() fail if mmc_regulator_set_ocr() fails to enable and set a
proper voltage for vmmc.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:50 +01:00
Chen-Yu Tsai
0314cbd438 mmc: sunxi: Document host init sequence
sunxi_mmc_init_host() originated from Allwinner kernel sources. The
magic numbers written to various registers was never documented.

Add comments for values found in Allwinner user manuals.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:50 +01:00
Stefan Wahren
9f24b0f254 mmc: sdhci-iproc: Actually enable the clock
The RPi firmware-based clocks driver can actually disable
unused clocks, so when switching to use it we ended up losing
our MMC clock once all devices were probed.

This patch adopts the changes from 1e5a0a9a58 ("mmc: sdhci-bcm2835:
Actually enable the clock") to sdhci-iproc.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Scott Branden <sbranden@broadcom.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:49 +01:00
Chuanxiao Dong
e5905ff128 mmc: debugfs: Add a restriction to mmc debugfs clock setting
Clock frequency values written to an mmc host should not be less than
the minimum clock frequency which the mmc host supports.

Signed-off-by: Yuan Juntao <juntaox.yuan@intel.com>
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:49 +01:00
Stefan Wahren
1d6ad05777 mmc: sdhci-iproc: Clean up platform allocations if shdci init fails
This patch adopts the changes from 475c9e43bf ("mmc: sdhci-bcm2835:
Clean up platform allocations if sdhci init fails") to sdhci-iproc.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Scott Branden <sbranden@broadcom.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:48 +01:00
Rameshwar Prasad Sahu
e99369dc95 mmc: sdhci-of-arasan: Remove no-hispd and no-cmd23 quirks for sdhci-arasan4.9a
The Arason SD host controller supports set block count command (cmd23)
and high speed mode. This patch re-enable both of these features that
was disabled. For device that doesn't support high speed, it should
configure its capability register accordingly instead disables it
explicitly.

Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:48 +01:00
Kishon Vijay Abraham I
9143757b57 mmc: host: omap_hsmmc: add a verbose print to enable CONFIG_REGULATOR_PBIAS
Since v4.3+, CONFIG_REGULATOR_PBIAS should be enabled (for platforms that
have PBIAS regulator) in order for MMC1 to work.

Add a more verbose print to help enable CONFIG_REGULATOR_PBIAS for users
using a olddefconfig or a custom .config.

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Acked-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:48 +01:00
Masahiro Yamada
0899e74193 mmc: remove unnecessary assignment statements before return
Variable assignment just before return is redundant.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:47 +01:00
Peter Chen
62c03ca3ff mmc: core: pwrseq_simple: remove unused header file
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:47 +01:00
Geliang Tang
238fc95e83 mmc: usdhi6rol0: use to_delayed_work
Use to_delayed_work() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:46 +01:00
Geliang Tang
1046a81151 mmc: sh_mmcif: use to_delayed_work
Use to_delayed_work() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:46 +01:00
Markus Elfring
1856de3d3e mmc: sdricoh_cs: Less checks in sdricoh_init_mmc() after, error detection
This issue was detected by using the Coccinelle software.

Two pointer checks could be repeated by the sdricoh_init_mmc() function
during error handling even if the relevant properties can be determined
for the involved variables before by source code analysis.

* This implementation detail could be improved by adjustments
  for jump targets according to the Linux coding style convention.

* Drop an unnecessary initialisation for the variable "mmc" then.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:45 +01:00
Markus Elfring
c48c5d580e mmc: sdricoh_cs: Delete unnecessary variable initialisations
These variables will eventually be set to an appropriate value a bit later.
* host
* iobase
* result

Thus let us omit the explicit initialisation at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-29 11:02:45 +01:00
Linus Torvalds
fc77dbd34c Linux 4.5-rc6 2016-02-28 08:41:20 -08:00
Linus Torvalds
1b9540ce03 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A rather largish series of 12 patches addressing a maze of race
  conditions in the perf core code from Peter Zijlstra"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Robustify task_function_call()
  perf: Fix scaling vs. perf_install_in_context()
  perf: Fix scaling vs. perf_event_enable()
  perf: Fix scaling vs. perf_event_enable_on_exec()
  perf: Fix ctx time tracking by introducing EVENT_TIME
  perf: Cure event->pending_disable race
  perf: Fix race between event install and jump_labels
  perf: Fix cloning
  perf: Only update context time when active
  perf: Allow perf_release() with !event->ctx
  perf: Do not double free
  perf: Close install vs. exit race
2016-02-28 07:52:00 -08:00
Linus Torvalds
4b696dcb1a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "This update contains:

   - Hopefully the last ASM CLAC fixups

   - A fix for the Quark family related to the IMR lock which makes
     kexec work again

   - A off-by-one fix in the MPX code.  Ironic, isn't it?

   - A fix for X86_PAE which addresses once more an unsigned long vs
     phys_addr_t hickup"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mpx: Fix off-by-one comparison with nr_registers
  x86/mm: Fix slow_virt_to_phys() for X86_PAE again
  x86/entry/compat: Add missing CLAC to entry_INT80_32
  x86/entry/32: Add an ASM_CLAC to entry_SYSENTER_32
  x86/platform/intel/quark: Change the kernel's IMR lock bit to false
2016-02-28 07:49:23 -08:00
Linus Torvalds
76c03f0f5d Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixlet from Thomas Gleixner:
 "A trivial printk typo fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix trivial typo in printk() message
2016-02-28 07:48:01 -08:00
Linus Torvalds
f055ae04ae Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "Four small fixes for irqchip drivers:

   - Add missing low level irq handler initialization on mxs, so
     interrupts can acutally be delivered

   - Add a missing barrier to the GIC driver

   - Two fixes for the GIC-V3-ITS driver, addressing a double EOI write
     and a cache flush beyond the actual region"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Add missing barrier to 32bit version of gic_read_iar()
  irqchip/mxs: Add missing set_handle_irq()
  irqchip/gicv3-its: Avoid cache flush beyond ITS_BASERn memory size
  irqchip/gic-v3-its: Fix double ICC_EOIR write for LPI in EOImode==1
2016-02-28 07:45:58 -08:00
Linus Torvalds
8da51430ff Staging (well android) fix for 4.5-rc6
Here is one patch, for the android binder driver, to resolve a reported
 problem.  Turns out it has been around for a while (since 3.15), so it
 is good to finally get it resolved.
 
 It has been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlbSgioACgkQMUfUDdst+ynUggCfdkzcX15M2vIiZsA3BLDcPY6L
 pPcAnj1uCrqUUNRF3XgglbGXtp15sOe0
 =R3bx
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/android fix from Greg KH:
 "Here is one patch, for the android binder driver, to resolve a
  reported problem.  Turns out it has been around for a while (since
  3.15), so it is good to finally get it resolved.

  It has been in linux-next for a while with no reported issues"

* tag 'staging-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  drivers: android: correct the size of struct binder_uintptr_t for BC_DEAD_BINDER_DONE
2016-02-28 07:39:15 -08:00
Linus Torvalds
62718e304a USB fixes for 4.5-rc6
Here are a few USB fixes for 4.5-rc6
 
 They fix a reported bug for some USB 3 devices by reverting the recent
 patch, a MAINTAINERS change for some drivers, some new device ids, and
 of course, the usual bunch of USB gadget driver fixes.
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlbSgVoACgkQMUfUDdst+ymghQCgiVUpZOu/hYeC/8CDdTZPLlpQ
 oR4AoLftFf9OAKRgBCbiPOY99lG9f33y
 =qZKR
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a few USB fixes for 4.5-rc6

  They fix a reported bug for some USB 3 devices by reverting the recent
  patch, a MAINTAINERS change for some drivers, some new device ids, and
  of course, the usual bunch of USB gadget driver fixes.

  All have been in linux-next for a while with no reported issues"

* tag 'usb-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  MAINTAINERS: drop OMAP USB and MUSB maintainership
  usb: musb: fix DMA for host mode
  usb: phy: msm: Trigger USB state detection work in DRD mode
  usb: gadget: net2280: fix endpoint max packet for super speed connections
  usb: gadget: gadgetfs: unregister gadget only if it got successfully registered
  usb: gadget: remove driver from pending list on probe error
  Revert "usb: hub: do not clear BOS field during reset device"
  usb: chipidea: fix return value check in ci_hdrc_pci_probe()
  usb: chipidea: error on overflow for port_test_write
  USB: option: add "4G LTE usb-modem U901"
  USB: cp210x: add IDs for GE B650V3 and B850V3 boards
  USB: option: add support for SIM7100E
  usb: musb: Fix DMA desired mode for Mentor DMA engine
  usb: gadget: fsl_qe_udc: fix IS_ERR_VALUE usage
  usb: dwc2: USB_DWC2 should depend on HAS_DMA
  usb: dwc2: host: fix the data toggle error in full speed descriptor dma
  usb: dwc2: host: fix logical omissions in dwc2_process_non_isoc_desc
  usb: dwc3: Fix assignment of EP transfer resources
  usb: dwc2: Add extra delay when forcing dr_mode
2016-02-28 07:37:30 -08:00
Linus Torvalds
12b9fa6a97 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  do_last(): ELOOP failure exit should be done after leaving RCU mode
  should_follow_link(): validate ->d_seq after having decided to follow
  namei: ->d_inode of a pinned dentry is stable only for positives
  do_last(): don't let a bogus return value from ->open() et.al. to confuse us
  fs: return -EOPNOTSUPP if clone is not supported
  hpfs: don't truncate the file when delete fails
2016-02-27 17:10:32 -08:00
Linus Torvalds
340b3a5b35 ARM: SoC fixes
We didn't have a batch last week, so this one is slightly larger.
 
 None of them are scary though, a handful of fixes for small DT pieces,
 replacing properties with newer conventions.
 
 Highlights:
 
  - N900 fix for setting system revision
  - onenand init fix to avoid filesystem corruption
  - Clock fix for audio on Beaglebone-x15
  - Fixes on shmobile to deal with CONFIG_DEBUG_RODATA (default y in 4.6)
 
  + misc smaller stuff.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW0jMpAAoJEIwa5zzehBx3nGgP/3wlhTrIyFWTu2Oa3s+0dwFJ
 nXNcHc/7egzRlcPZ/dWfyrQfVC4/Zko7tI+76vJ8vSZ5oZ+la6CC1ZymlVpxUo9y
 mF8wyFnRU5sc5yeSSNH91RzJg2fSJWvcUJ/5zeUBkjKLc1AEAfyMXEjxDHptDI/L
 s+/JRqhrF8xsnfBymSW2mW6u34Sxn76dVsofWNrSCge/+kVAM4km/PDneWKz/14Q
 oLY9eFl6b0O5DJ/+5OSME0pnnRnJC/eD5+HYQSBIu3+RKgP5CH+xQDNeqf0GIdlI
 7Y0cKbjFxT5fXfvE4KOKQuLKgAzCSRe1PwuJ8MTDE73kWsUAWN8McWkCYtCSufxU
 KSPlgjfO1xWoSkVneK3NzcRWJoi6Ev0lZ0s6HuMvZJAoce9XrcIbZRQ7CP3Iu3Oj
 iC8GxIgHyIJV95XABpliH5IVTRERTbXIOgR82dKQPxLU6cbCRbFs/GU2v7JQEjOS
 exJDM5R08SSBC8MRxvWp09pwcfO44XIkQu4pdRJfpaFVwJYejTYOUDVYCcCg3s9O
 ApXzQj6/A0QMnp1SAvPHbc3LqLq5mTzvt1j59TNA8Q0O4U4r20CBF+D7lb9KMlu/
 GyJ2wSsxCwnBDVWDPtXGdE3z/K81H7nPRBzuL0dM80cF5gQNglOdAN47UoD/bBP6
 1pR5h9K92LbV5NiToyPY
 =xeuW
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "We didn't have a batch last week, so this one is slightly larger.

  None of them are scary though, a handful of fixes for small DT pieces,
  replacing properties with newer conventions.

  Highlights:
   - N900 fix for setting system revision
   - onenand init fix to avoid filesystem corruption
   - Clock fix for audio on Beaglebone-x15
   - Fixes on shmobile to deal with CONFIG_DEBUG_RODATA (default y in 4.6)

  + misc smaller stuff"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: Extend info, add wiki and ml for meson arch
  MAINTAINERS: alpine: add a new maintainer and update the entry
  ARM: at91/dt: fix typo in sama5d2 pinmux descriptions
  ARM: OMAP2+: Fix onenand initialization to avoid filesystem corruption
  Revert "regulator: tps65217: remove tps65217.dtsi file"
  ARM: shmobile: Remove shmobile_boot_arg
  ARM: shmobile: Move shmobile_smp_{mpidr, fn, arg}[] from .text to .bss
  ARM: shmobile: r8a7779: Remove remainings of removed SCU boot setup code
  ARM: shmobile: Move shmobile_scu_base from .text to .bss
  ARM: OMAP2+: Fix omap_device for module reload on PM runtime forbid
  ARM: OMAP2+: Improve omap_device error for driver writers
  ARM: DTS: am57xx-beagle-x15: Select SYS_CLK2 for audio clocks
  ARM: dts: am335x/am57xx: replace gpio-key,wakeup with wakeup-source property
  ARM: OMAP2+: Set system_rev from ATAGS for n900
  ARM: dts: orion5x: fix the missing mtd flash on linkstation lswtgl
  ARM: dts: kirkwood: use unique machine name for ds112
  ARM: dts: imx6: remove bogus interrupt-parent from CAAM node
2016-02-27 16:58:32 -08:00
Al Viro
5129fa482b do_last(): ELOOP failure exit should be done after leaving RCU mode
... or we risk seeing a bogus value of d_is_symlink() there.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:37:37 -05:00
Al Viro
a7f775428b should_follow_link(): validate ->d_seq after having decided to follow
... otherwise d_is_symlink() above might have nothing to do with
the inode value we've got.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:31:01 -05:00
Al Viro
d4565649b6 namei: ->d_inode of a pinned dentry is stable only for positives
both do_last() and walk_component() risk picking a NULL inode out
of dentry about to become positive, *then* checking its flags and
seeing that it's not negative anymore and using (already stale by
then) value they'd fetched earlier.  Usually ends up oopsing soon
after that...

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:23:16 -05:00
Al Viro
c80567c82a do_last(): don't let a bogus return value from ->open() et.al. to confuse us
... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.

Cc: stable@vger.kernel.org # v3.6+, at least
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:17:33 -05:00
Christoph Hellwig
0fcbf996d8 fs: return -EOPNOTSUPP if clone is not supported
-EBADF is a rather confusing error if an operations is not supported,
and nfsd gets rather upset about it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:15:51 -05:00
Mikulas Patocka
b6853f78e7 hpfs: don't truncate the file when delete fails
The delete opration can allocate additional space on the HPFS filesystem
due to btree split. The HPFS driver checks in advance if there is
available space, so that it won't corrupt the btree if we run out of space
during splitting.

If there is not enough available space, the HPFS driver attempted to
truncate the file, but this results in a deadlock since the commit
7dd29d8d86 ("HPFS: Introduce a global mutex
and lock it on every callback from VFS").

This patch removes the code that tries to truncate the file and -ENOSPC is
returned instead. If the user hits -ENOSPC on delete, he should try to
delete other files (that are stored in a leaf btree node), so that the
delete operation will make some space for deleting the file stored in
non-leaf btree node.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: stable@vger.kernel.org	# 2.6.39+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:15:51 -05:00
Linus Torvalds
691429e13d Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dax: move writeback calls into the filesystems
  dax: give DAX clearing code correct bdev
  ext4: online defrag not supported with DAX
  ext2, ext4: only set S_DAX for regular inodes
  block: disable block device DAX by default
  ocfs2: unlock inode if deleting inode from orphan fails
  mm: ASLR: use get_random_long()
  drivers: char: random: add get_random_long()
  mm: numa: quickly fail allocations for NUMA balancing on full nodes
  mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
2016-02-27 12:46:16 -08:00
Linus Torvalds
1c271479b5 This fixes a file system corruption bug with DAX
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJW0fNWAAoJEPL5WVaVDYGjbroIAJKZBmlMh5DADLWIUNqeo6y+
 U8O0hpPUyoYb/j0wTVBe5z4cyfWjl1idrA4ZIb2VgMB28F8pPxuLifTVMx0kLeO9
 B1rcqn7CTzwmU9nj6yjcBkYp/spR8lBzaHq2REm3lE9Jwf6NdD4uwhzPiNmL3+xR
 dcg7lFzS6PSmLYD3mhb/lD5/3D3sYDlZ4nmX7uEl5WgxYaB1j5zsBVzYDU2Q0jZZ
 s+r/kj1eL8i9EnZ4zgZ4Bvtjm0jy5iVhO2YvLNQUZDEgmvJpNbVSBv/wAWoe9N3U
 rnm65s8F5hRbc2c8w4M43074uuEA4p0zZwR2z1E6RZvFZsl4Z5kk0/YEmxF7N6g=
 =IALP
 -----END PGP SIGNATURE-----

Merge tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext2/4 DAX fix from Ted Ts'o:
 "This fixes a file system corruption bug with DAX"

* tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
2016-02-27 12:40:49 -08:00
Linus Torvalds
a9f8094aae PCI updates for v4.5:
Enumeration
     Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)
 
   Marvell MVEBU host bridge driver
     Restrict build to 32-bit ARM (Thierry Reding)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW0gP1AAoJEFmIoMA60/r84qkQAJXOFW20cie2yepQXIk7f5aN
 M2/+iFte8YHf4ZFgZWA/oS+mZAp1OqctSTjWg1KTPZsPHAiB6DkL7WOV6fK+uXr9
 fX8D7Ec2eLgeIFl78iSQaAht4kfmfz8f5LlU6Oi9kvQOt+35gp4lP834HClx7Jep
 XT2qZy/zUQy8GylTzRqueMBpXBCnBQR8iyaD8j4rmklQB3yLXaEMTs7HzwJKBmhM
 ZDnH1xrV5cWYb7niSCBkq4IomCmezJZCvxcDjh/Z8gjDKbVl7TLYOdU8Jh4wNO++
 ng0J8WDSKQJ9Hfv6H+5dgPzoqgrIrWb/Oz5GXd8i6cqv00szG5S/w8nHcO8LPSJv
 dJxxfTlz4KRxdv/sqOVW4cDFUmScODMkDMh+hAeEVYKl9ty5fQ4O2iNwNehzrdNj
 FRrgN1980amYN2n09NZNF863dvVN+DMJ4Ll2VT01rOIUH3bwt4cO6rVWrEUlEKCn
 DiSvJlXHm5nLLCQpkkGKAeq5hYl25DFtYVwLopIbUSHFXCASHPtQewDvgzfn9zYi
 M7J8bDa/uTscSqJsGsb4/gHLEblCfju7Pj2gEHoiK4XtbCuuamFA3nsA7lzcAG9j
 W5pVDQTqctdgHq/UMLKIeoBJ592fhYzKipY8vELOKwkieDR9F3g3u8nWt4ZAUIXE
 /oS5F1eWMkDMvdjyZO4C
 =yQ41
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Enumeration:
    Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)

  Marvell MVEBU host bridge driver:
    Restrict build to 32-bit ARM (Thierry Reding)"

* tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: mvebu: Restrict build to 32-bit ARM
  Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
  Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
  Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"
2016-02-27 12:33:42 -08:00
Ross Zwisler
1e9d180ba3 ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
As it is currently written ext4_dax_mkwrite() assumes that the call into
__dax_mkwrite() will not have to do a block allocation so it doesn't create
a journal entry.  For a read that creates a zero page to cover a hole
followed by a write that actually allocates storage this is incorrect.  The
ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
get_blocks() to allocate storage.

Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
as this function already has all the logic needed to allocate a journal
entry and call __dax_fault().

Also update the ext2 fault handlers in this same way to remove duplicate
code and keep the logic between ext2 and ext4 the same.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-27 14:01:16 -05:00
Linus Torvalds
b9ea44bf2c One small fix to keep OMAP platforms working across a suspend/resume
cycle.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJW0OaVAAoJENidgRMleOc9+ioP/1SrApFcS6GcxUQDZGxV9wDo
 R58NKT7EqYeLzcw3jasNwqbmgiBtpTwOO2LjqN0UrZtsh5pO7Vz9sZt2UC5vH+8I
 Yd+BpqQ5pTs+Y7JgFsxNHX77azb15nqGzE2JEZerzferYTIOPWso8nNtgY6uSdFd
 f9ScmoZZy8yw9kWKPKZolqEeMKVT3Q4Q1994hDxhzNwNt1LEtoYlvEQzwZggiEt9
 568hTQ4G1nk8YdwkXb0eQgYTeOmC4VWK6Evh4QCVtIv+t81drVcV81AMKEDMEHyF
 2kr+Sk3XoWfCf0WhsOMp12ASlKWeleLo1HjKS+Y920m3b16BqYC9mhIRL3zurX0c
 O1ADytvFwDQfyo1JQB2BPPnoCvVP5/C6veJC2TWKmA/YTtDjA29MA4ORhN9kUmts
 8TmRqz/QXUjKK+r0E1BBSzNJapIaX/S/bm2dytjWAGaeAi4x6OOn+F6oG2ot+HEW
 euvMSIN5wxTg+UejvkwuUXpSY45LZp4VsKja8jicXwX6lxfd+yL3Iukw9fZU/X00
 QRWR5FdZomZbavsuGKcb5FC4p+nyo8FZX5NC+RTjnMJ9tDYfyewrm8zGQ7O74zEG
 Hxim1HSLUIf6ziJJ1S7GUzb40eEB+9byUmZrRqd1KOp+fmr+J3xKtYQ+IFi8r29M
 AWWvm9Iq2baHvPZTr6Xq
 =1q7R
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fix from Stephen Boyd:
 "One small fix to keep OMAP platforms working across a suspend/resume
  cycle"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: ti: omap3+: dpll: use non-locking version of clk_get_rate
2016-02-27 10:30:14 -08:00
Ross Zwisler
7f6d5b529b dax: move writeback calls into the filesystems
Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().

dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev.  This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.

Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device.  This also fixes DAX code to properly flush caches in response
to sync(2).

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
20a90f5899 dax: give DAX clearing code correct bdev
dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases.  This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.

Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block.  This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
73f34a5e2c ext4: online defrag not supported with DAX
Online defrag operations for ext4 are hard coded to use the page cache.
See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()

When combined with DAX I/O, which circumvents the page cache, this can
result in data corruption.  This was observed with xfstests ext4/307 and
ext4/308.

Fix this by only allowing online defrag for non-DAX files.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
0a6cf9137d ext2, ext4: only set S_DAX for regular inodes
When S_DAX is set on an inode we assume that if there are pages attached
to the mapping (mapping->nrpages != 0), those pages are clean zero pages
that were used to service reads from holes.  Any dirty data associated
with the inode should be in the form of DAX exceptional entries
(mapping->nrexceptional) that is written back via
dax_writeback_mapping_range().

With the current code, though, this isn't always true.  For example,
ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
data stored as dirty page cache entries.  For these types of inodes,
having S_DAX set doesn't really make sense since their I/O doesn't
actually happen through the DAX code path.

Instead, only allow S_DAX to be set for regular inodes for ext2 and
ext4.  This allows us to have strict DAX vs non-DAX paths in the
writeback code.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Dan Williams
03cdadb040 block: disable block device DAX by default
The recent *sync enabling discovered that we are inserting into the
block_device pagecache counter to the expectations of the dirty data
tracking for dax mappings.  This can lead to data corruption.

We want to support DAX for block devices eventually, but it requires
wider changes to properly manage the pagecache.

   dump_stack+0x85/0xc2
   dax_writeback_mapping_range+0x60/0xe0
   blkdev_writepages+0x3f/0x50
   do_writepages+0x21/0x30
   __filemap_fdatawrite_range+0xc6/0x100
   filemap_write_and_wait+0x4a/0xa0
   set_blocksize+0x70/0xd0
   sb_set_blocksize+0x1d/0x50
   ext4_fill_super+0x75b/0x3360
   mount_bdev+0x180/0x1b0
   ext4_mount+0x15/0x20
   mount_fs+0x38/0x170

Mark the support broken so its disabled by default, but otherwise still
available for testing.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Guozhonghua
a4a8481ff6 ocfs2: unlock inode if deleting inode from orphan fails
When doing append direct io cleanup, if deleting inode fails, it goes
out without unlocking inode, which will cause the inode deadlock.

This issue was introduced by commit cf1776a9e8 ("ocfs2: fix a tiny
race when truncate dio orohaned entry").

Signed-off-by: Guozhonghua <guozhonghua@h3c.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Daniel Cashman
5ef11c35ce mm: ASLR: use get_random_long()
Replace calls to get_random_int() followed by a cast to (unsigned long)
with calls to get_random_long().  Also address shifting bug which, in
case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.

Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Daniel Cashman
ec9ee4acd9 drivers: char: random: add get_random_long()
Commit d07e22597d ("mm: mmap: add new /proc tunable for mmap_base
ASLR") added the ability to choose from a range of values to use for
entropy count in generating the random offset to the mmap_base address.

The maximum value on this range was set to 32 bits for 64-bit x86
systems, but this value could be increased further, requiring more than
the 32 bits of randomness provided by get_random_int(), as is already
possible for arm64.  Add a new function: get_random_long() which more
naturally fits with the mmap usage of get_random_int() but operates
exactly the same as get_random_int().

Also, fix the shifting constant in mmap_rnd() to be an unsigned long so
that values greater than 31 bits generate an appropriate mask without
overflow.  This is especially important on x86, as its shift instruction
uses a 5-bit mask for the shift operand, which meant that any value for
mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base
randomization.

Finally, replace calls to get_random_int() with get_random_long() where
appropriate.

This patch (of 2):

Add get_random_long().

Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Mel Gorman
8479eba778 mm: numa: quickly fail allocations for NUMA balancing on full nodes
Commit 4167e9b2cf ("mm: remove GFP_THISNODE") removed the GFP_THISNODE
flag combination due to confusing semantics.  It noted that
alloc_misplaced_dst_page() was one such user after changes made by
commit e97ca8e5b8 ("mm: fix GFP_THISNODE callers and clarify").

Unfortunately when GFP_THISNODE was removed, users of
alloc_misplaced_dst_page() started waking kswapd and entering direct
reclaim because the wrong GFP flags are cleared.  The consequence is
that workloads that used to fit into memory now get reclaimed which is
addressed by this patch.

The problem can be demonstrated with "mutilate" that exercises memcached
which is software dedicated to memory object caching.  The configuration
uses 80% of memory and is run 3 times for varying numbers of clients.
The results on a 4-socket NUMA box are

mutilate
                            4.4.0                 4.4.0
                          vanilla           numaswap-v1
Hmean    1      8394.71 (  0.00%)     8395.32 (  0.01%)
Hmean    4     30024.62 (  0.00%)    34513.54 ( 14.95%)
Hmean    7     32821.08 (  0.00%)    70542.96 (114.93%)
Hmean    12    55229.67 (  0.00%)    93866.34 ( 69.96%)
Hmean    21    39438.96 (  0.00%)    85749.21 (117.42%)
Hmean    30    37796.10 (  0.00%)    50231.49 ( 32.90%)
Hmean    47    18070.91 (  0.00%)    38530.13 (113.22%)

The metric is queries/second with the more the better.  The results are
way outside of the noise and the reason for the improvement is obvious
from some of the vmstats

                                 4.4.0       4.4.0
                               vanillanumaswap-v1r1
Minor Faults                1929399272  2146148218
Major Faults                  19746529        3567
Swap Ins                      57307366        9913
Swap Outs                     50623229       17094
Allocation stalls                35909         443
DMA allocs                           0           0
DMA32 allocs                  72976349   170567396
Normal allocs               5306640898  5310651252
Movable allocs                       0           0
Direct pages scanned         404130893      799577
Kswapd pages scanned         160230174           0
Kswapd pages reclaimed        55928786           0
Direct pages reclaimed         1843936       41921
Page writes file                  2391           0
Page writes anon              50623229       17094

The vanilla kernel is swapping like crazy with large amounts of direct
reclaim and kswapd activity.  The figures are aggregate but it's known
that the bad activity is throughout the entire test.

Note that simple streaming anon/file memory consumers also see this
problem but it's not as obvious.  In those cases, kswapd is awake when
it should not be.

As there are at least two reclaim-related bugs out there, it's worth
spelling out the user-visible impact.  This patch only addresses bugs
related to excessive reclaim on NUMA hardware when the working set is
larger than a NUMA node.  There is a bug related to high kswapd CPU
usage but the reports are against laptops and other UMA hardware and is
not addressed by this patch.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00