Commit Graph

678372 Commits

Author SHA1 Message Date
Ankur Arora
c9b5d98b25 xen/vcpu: Handle xen_vcpu_setup() failure in hotplug
The hypercall VCPUOP_register_vcpu_info can fail. This failure is
handled by making per_cpu(xen_vcpu, cpu) point to its shared_info
slot and those without one (cpu >= MAX_VIRT_CPUS) be NULL.

For PVH/PVHVM, this is not enough, because we also need to pull
these VCPUs out of circulation.

Fix for PVH/PVHVM: on registration failure in the cpuhp prepare
callback (xen_cpu_up_prepare_hvm()), return an error to the cpuhp
state-machine so it can fail the CPU init.

Fix for PV: the registration happens before smp_init(), so, in the
failure case we clamp setup_max_cpus and limit the number of VCPUs
that smp_init() will bring-up to MAX_VIRT_CPUS.
This is functionally correct but it makes the code a bit simpler
if we get rid of this explicit clamping: for VCPUs that don't have
valid xen_vcpu, fail the CPU init in the cpuhp prepare callback
(xen_cpu_up_prepare_pv()).

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13 16:10:55 +02:00
Ankur Arora
0e4d583723 xen/pv: Fix OOPS on restore for a PV, !SMP domain
If CONFIG_SMP is disabled, xen_setup_vcpu_info_placement() is called from
xen_setup_shared_info(). This is fine as far as boot goes, but it means
that we also call it in the restore path. This results in an OOPS
because we assign to pv_mmu_ops.read_cr2 which is __ro_after_init.

Also, though less problematically, this means we call xen_vcpu_setup()
twice at restore -- once from the vcpu info placement call and the
second time from xen_vcpu_restore().

Fix by calling xen_setup_vcpu_info_placement() at boot only.

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13 16:10:51 +02:00
Ankur Arora
0b64ffb8db xen/pvh*: Support > 32 VCPUs at domain restore
When Xen restores a PVHVM or PVH guest, its shared_info only holds
up to 32 CPUs. The hypercall VCPUOP_register_vcpu_info allows
us to setup per-page areas for VCPUs. This means we can boot
PVH* guests with more than 32 VCPUs. During restore the per-cpu
structure is allocated freshly by the hypervisor (vcpu_info_mfn is
set to INVALID_MFN) so that the newly restored guest can make a
VCPUOP_register_vcpu_info hypercall.

However, we end up triggering this condition in Xen:
/* Run this command on yourself or on other offline VCPUS. */
 if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running VCPUS. The Linux PV code paths makes this work by
iterating over cpu_possible in xen_vcpu_restore() with:

 1) is target CPU up (VCPUOP_is_up hypercall?)
 2) if yes, then VCPUOP_down to pause it
 3) VCPUOP_register_vcpu_info
 4) if it was down, then VCPUOP_up to bring it back up

With Xen commit 192df6f9122d ("xen/x86: allow HVM guests to use
hypercalls to bring up vCPUs") this is available for non-PV guests.
As such first check if VCPUOP_is_up is actually possible before
trying this dance.

As most of this dance code is done already in xen_vcpu_restore()
let's make it callable on PV, PVH and PVHVM.

Based-on-patch-by: Konrad Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13 16:05:17 +02:00
Ankur Arora
ad73fd595c xen/vcpu: Simplify xen_vcpu related code
Largely mechanical changes to aid unification of xen_vcpu_restore()
logic for PV, PVH and PVHVM.

xen_vcpu_setup(): the only change in logic is that clamp_max_cpus()
is now handled inside the "if (!xen_have_vcpu_info_placement)" block.

xen_vcpu_restore(): code movement from enlighten_pv.c to enlighten.c.

xen_vcpu_info_reset(): pulls together all the code where xen_vcpu
is set to default.

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13 16:05:14 +02:00
Anoob Soman
c48f64ab47 xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU
A HVM domian booting generates around 200K (evtchn:qemu-dm xen-dyn)
interrupts,in a short period of time. All these evtchn:qemu-dm are bound
to VCPU 0, until irqbalance sees these IRQ and moves it to a different VCPU.
In one configuration, irqbalance runs every 10 seconds, which means
irqbalance doesn't get to see these burst of interrupts and doesn't
re-balance interrupts most of the time, making all evtchn:qemu-dm to be
processed by VCPU0. This cause VCPU0 to spend most of time processing
hardirq and very little time on softirq. Moreover, if dom0 kernel PREEMPTION
is disabled, VCPU0 never runs watchdog (process context), triggering a
softlockup detection code to panic.

Binding evtchn:qemu-dm to next online VCPU, will spread hardirq
processing evenly across different CPU. Later, irqbalance will try to balance
evtchn:qemu-dm, if required.

Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13 15:30:27 +02:00
Arnd Bergmann
9cc91f2121 xen: avoid type warning in xchg_xen_ulong
The improved type-checking version of container_of() triggers a warning for
xchg_xen_ulong, pointing out that 'xen_ulong_t' is unsigned, but atomic64_t
contains a signed value:

drivers/xen/events/events_2l.c: In function 'evtchn_2l_handle_events':
drivers/xen/events/events_2l.c:187:1020: error: call to '__compiletime_assert_187' declared with attribute error: pointer type mismatch in container_of()

This adds a cast to work around the warning.

Cc: Ian Abbott <abbotti@mev.co.uk>
Fixes: 85323a991d ("xen: arm: mandate EABI and use generic atomic operations.")
Fixes: daa2ac80834d ("kernel.h: handle pointers to arrays better in container_of()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Ian Abbott <abbotti@mev.co.uk>
2017-06-08 15:07:38 -07:00
Sergey Dyasli
a2237ae761 xen: fix HYPERVISOR_dm_op() prototype
Change the third parameter to be the required struct xen_dm_op_buf *
instead of a generic void * (which blindly accepts any pointer).

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-08 19:40:14 +02:00
Juergen Gross
4e93b6481c xen: don't print error message in case of missing Xenstore entry
When registering for the Xenstore watch of the node control/sysrq the
handler will be called at once. Don't issue an error message if the
Xenstore node isn't there, as it will be created only when an event
is being triggered.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-07 09:48:02 +02:00
Markus Elfring
84a0a967a4 arm/xen: Adjust one function call together with a variable assignment
The script "checkpatch.pl" pointed information out like the following.

ERROR: do not use assignment in if condition

Thus fix the affected source code place.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-06-05 10:38:24 -07:00
Markus Elfring
d6bb4ec32c arm/xen: Delete an error message for a failed memory allocation in __set_phys_to_machine_multi()
Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-06-05 10:38:07 -07:00
Markus Elfring
031229b862 arm/xen: Improve a size determination in __set_phys_to_machine_multi()
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-06-05 10:36:36 -07:00
Linus Torvalds
3c2993b8c6 Linux 4.12-rc4 2017-06-04 16:47:43 -07:00
Richard Narron
239e250e4a fs/ufs: Set UFS default maximum bytes per file
This fixes a problem with reading files larger than 2GB from a UFS-2
file system:

    https://bugzilla.kernel.org/show_bug.cgi?id=195721

The incorrect UFS s_maxsize limit became a problem as of commit
c2a9737f45 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
which started using s_maxbytes to avoid a page index overflow in
do_generic_file_read().

That caused files to be truncated on UFS-2 file systems because the
default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.

Here I simply increase the default to a common value used by other file
systems.

Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will B <will.brokenbourgh2877@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <stable@vger.kernel.org> # v4.9 and backports of c2a9737f45
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-04 16:33:54 -07:00
Linus Torvalds
125f42b0e2 NFS client bugfixes for Linux 4.12
Bugfixes include:
 
 - Fix a typo in commit e092693443 that breaks copy offload
 - Fix the connect error propagation in xs_tcp_setup_socket()
 - Fix a lock leak in nfs40_walk_client_list
 - Verify that pNFS requests lie within the offset range of the layout segment.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZM0YBAAoJEGcL54qWCgDyLUwQALaPEVp00UMdDR0in7MIFKsO
 2mgi7pOyn6po3EjxKbGtjAbL4nSlVxdaFpCIGg47YXrl9/95Zjjmyke+iwRdnMsa
 ZPyXwfhVRa80fxbOogAverNCnCptoHoG7EzdWuCTcOOxMxR3Ixs7wVJXrs+7ig+r
 IdvIAyTsiDYuP5yVp5KkmJCtLGc0Ze20rb7VgdQJfdiLibWvfYCLZ9CgfAQkdAMU
 RIlbT0/BG13XDqwh/C2V1vLge0VfpT5p8qbIb/kFyQ0ZJUUiicGGGjp3u/yj0aG9
 ljldI34WmQpsy+nCNN4dEgsF461ECvWLwRZnnpN9nv7VurUBpJNUqHLnubvDbzhh
 w8QX54ceEWuQAjg96keNuYOhoG53Omle2/Cm+nmiJOmShJbJ0yh4OcB9DYe0gdYa
 5YXbKRjPvf/HfdE7PPpvbPG2E211zfvkLdHnFxswggWyGrh23kqlWrpcHpZomGNW
 GbJLfIfhyEfBjCPdNJT3Tzvewo2LkcTNLb+3mJhkxOegkdops8vGYA9G2mba3Daj
 1HWl1yFAdzlEf2H1Cb8Y2ZrJKHAmaYBKBkKZYUeAcr6EtoxNqnNMP+PEDcVIzPKg
 6Jq7DiYwYksK+XDWK9G4QBguKKGLvYtv0MIA3QDX+bBGLo+eFYxc2iaaJefYNdkK
 +vdLHclg/YpepLg+Ui21
 =P2bm
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Bugfixes include:

   - Fix a typo in commit e092693443 ("NFS append COMMIT after
     synchronous COPY") that breaks copy offload

   - Fix the connect error propagation in xs_tcp_setup_socket()

   - Fix a lock leak in nfs40_walk_client_list

   - Verify that pNFS requests lie within the offset range of the layout
     segment"

* tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: Mark unnecessarily extern functions as static
  SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()
  NFSv4.0: Fix a lock leak in nfs40_walk_client_list
  pnfs: Fix the check for requests in range of layout segment
  xprtrdma: Delete an error message for a failed memory allocation in xprt_rdma_bc_setup()
  pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()
  NFS fix COMMIT after COPY
2017-06-04 11:56:53 -07:00
Linus Torvalds
3c06e6cbdb TTY fix for 4.12-rc4
Here is a single tty core fix for 4.12-rc4.  It reverts a patch that a
 lot of people reported as causing lockdep and other warnings.  Right
 after I reverted this in my tree, it seems like another "correct" fix
 might have shown up, but it's too late in the release cycle to be
 messing with tty core locking, so let's just revert this for now to go
 back how things always have been and try it again for 4.13.
 
 This has not been in linux-next as I only reverted it a few hours ago.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWTQplQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymn+ACgi83/vya8tSJ+J7rWBf8n2Nma3VkAoI/5lT8H
 gQJub5tZhfB3o7zZrMp4
 =bpSk
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty fix from Greg KH:
 "Here is a single tty core fix for 4.12-rc4. It reverts a patch that a
  lot of people reported as causing lockdep and other warnings.

  Right after I reverted this in my tree, it seems like another
  "correct" fix might have shown up, but it's too late in the release
  cycle to be messing with tty core locking, so let's just revert this
  for now to go back how things always have been and try it again for
  4.13.

  This has not been in linux-next as I only reverted it a few hours ago"

* tag 'tty-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "tty: fix port buffer locking"
2017-06-04 11:41:41 -07:00
Linus Torvalds
e00811b4ca Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:

 - a couple of regression fixes in synaptics and axp20x-pek drivers

 - try to ease transition from PS/2 to RMI for Synaptics touchpad users
   by ensuring we do not try to activate RMI mode when RMI SMBus support
   is not enabled, and nag users a bit to enable it

 - plus a couple of other changes that seemed worthwhile for this
   release

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: axp20x-pek - switch to acpi_dev_present and check for ACPI0011 too
  Input: axp20x-pek - only check for "INTCFD9" ACPI device on Cherry Trail
  Input: tm2-touchkey - use LEN_ON as boolean value instead of LED_FULL
  Input: synaptics - tell users to report when they should be using rmi-smbus
  Input: synaptics - warn the users when there is a better mode
  Input: synaptics - keep PS/2 around when RMI4_SMB is not enabled
  Input: synaptics - clear device info before filling in
  Input: silead - disable interrupt during suspend
2017-06-04 11:37:42 -07:00
Linus Torvalds
9f03b2c7c5 RTC for 4.12 #2
Change the mailing list address
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEXx9Viay1+e7J/aM4AyWl4gNJNJIFAlkzCZwACgkQAyWl4gNJ
 NJJerhAAjfDlj7aGmKY4/wOJiTs4isnHDW+TLkABtDbjUK+vJ137JGFqxGyeHh6S
 scoSyzGVQlPebVf/RrpeK8u58oa3NbdKRonAfksHRniiAyXefjc2miVqTzMQeK25
 GA8uNvICEnAyelgp2vi6dHzUBD8bjs8A3vDgxqqAjCI/ubjQq6nBODh8EpRuH9Pw
 PRzrYiiTCVztAvkcgHq3slkOdVYnrByuK0EyFZ/A1+UhO8FDbSfOA2eKjnP81ENs
 1otW+8vpm5KoCgoXrVLMogEyqc/G2zDKWbGNGzJMgSsqWW3Pali5crPJ5AOIEEkz
 g4+bg1SCdUr2HDcKNGQkShAQJxmwwo57j+MpE9ISfXBzRUc93qAzHJtMOpDuZtow
 a8xUWqTga6p/tXCLIWSIhS7aZVOZDrycj1GTfN1IjYGVt7ioUFsLDJqmi6nqp/aD
 oO1mzxE3jNo0MDr3KWl5xjPc8FpIgs0xiJZKWRtosynoR69MvEHncpfISFDrNnKM
 bIJdARDp20lAwhhXR+GNyIMzdxfG8VSNEoFy4/eKtqyvhT4tdOibVcsj4jI3I6g1
 H5wol+IJd/N0Z8oDUMSEvId3/Ntp5OEUCiOGJ7Cgw3cUOFMlRzkkUaFC8tp96HT1
 f1BgO2Tit3IsY6vry/zGyoimBXScmegc20/tmoG5pVLHKcKxLog=
 =oMqg
 -----END PGP SIGNATURE-----

Merge tag 'rtc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC fixlet from Alexandre Belloni:
 "A single patch, not really a fix but I don't think there is any reason
  to delay it.

  Change the mailing list address"

* tag 'rtc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  MAINTAINERS: update RTC mailing list
2017-06-04 11:29:32 -07:00
Linus Torvalds
1f915b7fed SCSI fixes on 20170603
This is nine fixes, seven of which are for the qedi driver (new in
 4.10) the other two are a use after free in the cxgbi drivers and a
 potential NULL dereference in the rdac device handler.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZMyq/AAoJEAVr7HOZEZN4AtUP/1oTEhpXp2epNkEBC0fMywAu
 T6uUUmZJhcC/k5HkaI7AJUZxYSkVcu1A1QzolJ5GoBIEesfY2W/IOwKO77hhZ1Hp
 l+e22ioTRXSE2AWyreyVAGSPweG0PyK66kn2vuZXht7xVmcgTelnxp8GPK2MT6vM
 2Hxhf+velQxDLwLYOkLY0SAw6JAJpxkFkNcaBsuF2av3fSJtFCzTfgIqLgX1LpcA
 ckN8yLW0izIeOYKs0Opc65mCZu6AxKOTcvVFglFal7EB99RIU89brifBOqJIue0d
 KvdoeED2ReO76LZcvDObzEsfNNObHXgNZ100Dk8rF1ovY5JX1L+0fSuYZnJ2Imdg
 lHlgKZFVCmtLWnWUQYGizcdXHuLHt90ye2sCFK+SZx1EBBVO1maC2ME52nGVTNoU
 Jrk39dbg//n3ll1FNtlBIFtlFb+uD8DsrWq1KWhg98JcP2XgagCjc2VPF4Eo6Z0l
 v0JrxG7wATPgRNGhHXRrh1pGxvpuIXvDNVth3cVRa48RxkEJPSW+8Lg63olpCfV3
 sAlbThDU/ZUwZybRLa1nbjN7awiR4GaQL+HE9kj0BdR8JpIQHe7162C9KN5NKK7u
 TReIFSd+oo3AyQzvSR4m0lWeHRoRarDTHkB0/NqbwjMkRUP5125onGf45IaLDI0n
 SaAsYtnsGBEFK/xjAai1
 =MYEc
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is nine fixes, seven of which are for the qedi driver (new as of
  4.10) the other two are a use after free in the cxgbi drivers and a
  potential NULL dereference in the rdac device handler"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: libcxgbi: fix skb use after free
  scsi: qedi: Fix endpoint NULL panic during recovery.
  scsi: qedi: set max_fin_rt default value
  scsi: qedi: Set firmware tcp msl timer value.
  scsi: qedi: Fix endpoint NULL panic in qedi_set_path.
  scsi: qedi: Set dma_boundary to 0xfff.
  scsi: qedi: Correctly set firmware max supported BDs.
  scsi: qedi: Fix bad pte call trace when iscsiuio is stopped.
  scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get()
2017-06-04 11:15:43 -07:00
Linus Torvalds
55cbdaf639 Fixes for 4.12-rc
- Multiple i40iw, nes, iw_cxgb4, hfi1, qib, mlx4, mlx5 fixes
 - A few upper layer protocol fixes (IPoIB, iSER, SRP)
 - A modest number of core fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZMIXpAAoJELgmozMOVy/dEHIP/RbIL0lGjH6qOOGUjTYVpFBn
 odS0nVWFl/gkw4mnvRDNm3h/BMk0SdaiyUtnRUMEcRhPrqwq40TpT5Sg59LrUgKe
 JGKB4oir7OYKGh8f6ublDlfFkdZiyGTXW50qp7+cxIu0FXSIREYWRIXIxdDdNhGK
 5+EMCJgT0eUNciRE+RlNV1slrKiMGdKm4N5U2nvgy2u/jk1JIpfhkOXrIPar5Ciq
 4Sk2DQoLu2RiMr8Htd49yrXaxxguGX0KpJwdOUv3xNlO4WejkT7KFEYB82NNdu0P
 NpnQGZXea7manripqRRrMBnaqkQD7lTtDHJBepmr4cCgY6XVTq3CQFWnsMywP60A
 10rHNeGixMH76DdE+kzTKQ2PKlVW4jjW6fk18cZ2GWbH4T9r/OzUnGR3uMJdhgHs
 g+zixnIokXa5/8S1p7Pkaq1datAQC4lb2O20c9bjnLM4jQsXMrEnbNevkvouADqj
 LWT5i1ZTQ5kquom5LmwTG9CcwPH/1E6xLXw4E41seqoZcqYZJakzACU43mJ450cO
 t3Afqz2AWgBa28DAEGy5+YR7Fr5/xof997GTB/eHnzY0E/cSJ+ntrlBLwMMrY9/t
 xzE1lZ/7750F9yjkFsLhrU4Xf3snoCg8NW1z7/aaFrx/y+CIKKjP3iRG+JQiKFnx
 D9tRKJY8iXPuCG+8OzSx
 =X6Rg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
 "For the most part this is just a minor -rc cycle for the rdma
  subsystem. Even given that this is all of the -rc patches since the
  merge window closed, it's still only about 25 patches:

   - Multiple i40iw, nes, iw_cxgb4, hfi1, qib, mlx4, mlx5 fixes

   - A few upper layer protocol fixes (IPoIB, iSER, SRP)

   - A modest number of core fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (26 commits)
  RDMA/SA: Fix kernel panic in CMA request handler flow
  RDMA/umem: Fix missing mmap_sem in get umem ODP call
  RDMA/core: not to set page dirty bit if it's already set.
  RDMA/uverbs: Declare local function static and add brackets to sizeof
  RDMA/netlink: Reduce exposure of RDMA netlink functions
  RDMA/srp: Fix NULL deref at srp_destroy_qp()
  RDMA/IPoIB: Limit the ipoib_dev_uninit_default scope
  RDMA/IPoIB: Replace netdev_priv with ipoib_priv for ipoib_get_link_ksettings
  RDMA/qedr: add null check before pointer dereference
  RDMA/mlx5: set UMR wqe fence according to HCA cap
  net/mlx5: Define interface bits for fencing UMR wqe
  RDMA/mlx4: Fix MAD tunneling when SRIOV is enabled
  RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
  RDMA/hfi1: Defer setting VL15 credits to link-up interrupt
  RDMA/hfi1: change PCI bar addr assignments to Linux API functions
  RDMA/hfi1: fix array termination by appending NULL to attr array
  RDMA/iw_cxgb4: fix the calculation of ipv6 header size
  RDMA/iw_cxgb4: calculate t4_eq_status_entries properly
  RDMA/iw_cxgb4: Avoid touch after free error in ARP failure handlers
  RDMA/nes: ACK MPA Reply frame
  ...
2017-06-04 10:41:32 -07:00
Greg Kroah-Hartman
fc098af16b Revert "tty: fix port buffer locking"
This reverts commit 925bb1ce47.

It causes lots of warnings and problems so for now, let's just revert
it.

Reported-by: <valdis.kletnieks@vt.edu>
Reported-by: Russell King <linux@armlinux.org.uk>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-04 10:23:25 +02:00
Jan Kara
4f253e1eb6 nfs: Mark unnecessarily extern functions as static
nfs_initialise_sb() and nfs_clone_super() are declared as extern even
though they are used only in fs/nfs/super.c. Mark them as static.

Also remove explicit 'inline' directive from nfs_initialise_sb() and
leave it upto compiler to decide whether inlining is worth it.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-06-03 16:06:38 -04:00
Linus Torvalds
ea094f3c83 Couple of patches for the aspeed pwm fan driver
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZMpcxAAoJEMsfJm/On5mBIO8P/R+bF7zbMm9Y02YqnPUp0FRX
 rH1QqXrPT8lAMXwCqzHwOEypogNnp7UISaG59UAywgEhADbSG3vuqr4R3vUreg3B
 MITjjscebndGy2/DRRgUfPLbbIJiWQB2KpAvNIyX6PRsmkSQ0YoWAQblfsfrBc5B
 zV+PRPnHgiKPLxuU6eTWoPdy9QbnbwTTG/JBNECblFD86dPU7Rz0u1iw9H5mQFdV
 meW5nx5MvBREXyysqFkv3jZQeSNo83Lp8J+yES9ghuExDt7eUWhd9R/aa5cwENRo
 Odfif+UXbAuebgT92v099xtsc0ibYBxyBrfSZgjMBhr4CkJmqruVUVAdoKPWMfVZ
 mesasawpo154eepDmtztg06ssczWKnpr5x3Db/yr6kRB8spHtouezzH7+3PM8q5O
 bX+7s0sYJBmalRFMCfZD7SdnpfNRxOLJtEkK76VuOvuw373kGS5ycBgPOWR9jBRD
 Cg/+TS85QpwogwQkmIkPJKqBXmUOqtP+XTUPteiL4oYHA67skTqwD6rCMQgfPPnv
 GpNi7FITXMPMItBj4UajYUtFNI61bdXHEnCLsOKsEBp/8UG5vKxDAEJAxcCJEH/D
 o/pjrYk4gFCa70RPdKFWOet/d6OSqhG8bhjWCUavm6BH/EtopMAnAhmlI9vKgHyg
 76jBv/wbc7G+yUZR+Q8u
 =GtgN
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-linus-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "A couple of patches for the aspeed pwm fan driver"

* tag 'hwmon-for-linus-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (aspeed-pwm-tacho) make fan/pwm names start with index 1
  hwmon: (aspeed-pwm-tacho) Call of_node_put() on a node not claimed
  hwmon: (aspeed-pwm-tacho) On read failure return -ETIMEDOUT
  hwmon: (aspeed-pwm-tacho) Select REGMAP
2017-06-03 08:45:03 -07:00
Linus Torvalds
cc54874055 MTD fixes for 4.12:
NAND updates from Boris:
 """
 tango fixes:
  * Add missing MODULE_DEVICE_TABLE() in tango_nand.c
  * Update the number of corrected bitflips
 
 core fixes:
  * Fix a long standing memory leak in nand_scan_tail()
  * Fix several bugs introduced by the per-vendor init/detection
    infrastructure (introduced in 4.12)
  * Add a static specifier to nand_ooblayout_lp_hamming_ops definition
 """
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZMhESAAoJEFySrpd9RFgtZVMP/37yR+PzpS1MPLyQNPalVgFs
 qHH5pQHLnDRikgeQAMQacQK9/Du/7ei9RsFOfPyvaGQhRJDej9M6Vp8VKb63LgKz
 LfQXt/KrUiGyJa929Ew7QGXvxWHKsapPZMx9UiGxMMT/qUwyv4Gfcys/GfJEK2aO
 Sb5stx8WckRZr4j8dL2EX0Q6yB+BtAKavNWakEbaJYPwQSapsCgTejse7x+hjZDT
 cpcE3EXK/yLaQM21K3GD/g2gqeMKX6k/phxnYnt3kBBO8XVgH8MpimglRuo1/cZq
 H9y56GMamiBLX03XflsooLV6vHx7MWNOtUPRAHey1hilj7VuKfRj66zx1oVv/4iw
 PenKe7mS1Y3Eg1eXM+/0m30/X3tA0xUCWpJsoPdXn3lWcxhnfN8wXwQb0JJNulqE
 9Msw1w7OswmtMnbX/YTamYMO0SLgY2Kv7BBGBp8YeXujIgpDBlVGaf3cjIo6U6sp
 3IkllIMHEHFPxKdpxYMMzZ9O/kg6YdiX76IhnmNFt1Y2RZKn0f/9j+dNoMqBjC2A
 w++PJWJYrmibDYLP1dTiFR1u58j9HHJBkIlUstj/HkTNjsf/E1MLqF9P01EqEnnm
 gfzmwt7q/8ahE7mJQ84iA6H+GvFCpzay6T26ZnXEd1LequslVpNtQgUuDZhDo4FV
 rtDnrca0h3ZP1MsniP0d
 =y6Ps
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20170602' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:
 "NAND updates from Boris:

  tango fixes:
   - Add missing MODULE_DEVICE_TABLE() in tango_nand.c
   - Update the number of corrected bitflips

  core fixes:
   - Fix a long standing memory leak in nand_scan_tail()
   - Fix several bugs introduced by the per-vendor init/detection
     infrastructure (introduced in 4.12)
   - Add a static specifier to nand_ooblayout_lp_hamming_ops definition"

* tag 'for-linus-20170602' of git://git.infradead.org/linux-mtd:
  mtd: nand: make nand_ooblayout_lp_hamming_ops static
  mtd: nand: tango: Update ecc_stats.corrected
  mtd: nand: tango: Export OF device ID table as module aliases
  mtd: nand: samsung: warn about un-parseable ECC info
  mtd: nand: free vendor-specific resources in init failure paths
  mtd: nand: drop unneeded module.h include
  mtd: nand: don't leak buffers when ->scan_bbt() fails
2017-06-03 08:42:30 -07:00
Stefan Schaeckeler
5f348fa35a hwmon: (aspeed-pwm-tacho) make fan/pwm names start with index 1
Make fan and pwm names in sysfs start with index 1 in accordance to
Documentation/hwmon/sysfs-interface conventions.

Current implementation starts with index 0, making tools such as
sensors(1) skip the first fan.

Signed-off-by: Stefan Schaeckeler <sschaeck@cisco.com>
Fixes: 2d7a548a3e ("drivers: hwmon: Support for ASPEED PWM/Fan tach")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-06-03 03:55:43 -07:00
Stefan Schaeckeler
4d58e7329f hwmon: (aspeed-pwm-tacho) Call of_node_put() on a node not claimed
Call of_node_put() on a node claimed with of_node_get() or by any other
means such as for_each_child_of_node().

Signed-off-by: Stefan Schaeckeler <sschaeck@cisco.com>
Fixes: 2d7a548a3e ("drivers: hwmon: Support for ASPEED PWM/Fan tach")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-06-03 03:54:00 -07:00
Hans de Goede
0fd5f22109 Input: axp20x-pek - switch to acpi_dev_present and check for ACPI0011 too
acpi_dev_found checks that there is a matching ACPI node, but it
may be disabled (_STA method returns 0) in which case the
soc_button_array driver will not bind to it and axp20x-pek should
handle the power-button.

This commit switches from acpi_dev_found to acpi_dev_present to
avoid not registering an input-dev for the powerbutton when there
is a disabled PNP0C40 device.

The ACPI-6.0 standard defines a standard gpio button device using
the ACPI0011 HID replacing the custom PNP0C40 gpio device, many
newer devices define both PNP0C40 and ACPI0011 devices enabling one
or the other depending on whether the BIOS thinks it is going to boot
Android or Windows.

This commit adds a check for the ACPI0011 device, so that if
either device is present *and* enabled we don't register an input-dev
for the powerbutton.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-06-02 17:53:20 -07:00
Hans de Goede
8d4b313769 Input: axp20x-pek - only check for "INTCFD9" ACPI device on Cherry Trail
Commit 9b13a4ca8d ("Input: axp20x-pek - do not register input device
on some systems") added a check for the INTCFD9 ACPI device which also
handles the powerbutton as on some systems the powerbutton is connected
to both the PMIC, handled by axp20x-pek, and to a gpio on the SoC, handled
by soc_button_array which attaches itself to the INTCFD9 ACPI device.

Testing + comparing DSDTs has shown that this only happens on Cherry
Trail devices with an AXP288 PMIC, the AXP288 PMIC is also used on
Bay Trail devices but there the power button is only connected to
the PMIC and not handled by soc_button_array.

This means that the INTCFD9 check has caused a regression on Bay Trail
devices, causing power-button presses to no longer be seen.

This commit fixes this by limiting the check to devices where the ACPI
node for the AXP288 contains a _HRV (hardware revision) attribute with
a value of 3 which indicates we are dealing with a Cherry Trail platform.

Fixes: 9b13a4ca8d ("Input: axp20x-pek - do not register input ...")
Reported-by: Сергей Трусов <t.rus76@ya.ru>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-06-02 17:53:19 -07:00
Dmitry Torokhov
eadcbfa58a Linux 4.12-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZK2lrAAoJEHm+PkMAQRiGm3AH/13F1DlIk05aSXHoDr/idIpR
 GMHmk3YF+EuFjsL463Sh6s/SSWmz0Lda8euaoB4wCWvQFX2ZjTE+aOd79XlRiZJQ
 OTtLkV9I41eXIJUpEOHia7xZiCsbw+usqcHrm1aBoSh5KKV2iQmEOrnJdibqJVOF
 eXUMphNK/zFtAd2bKtQSxkaBnOOqsQUgVQSkr2K9rSg25l0KokFC6c5K5IjLn4x9
 QgDY4wmMvHrDz0CtpoqlNM4XqbsDJVrFeZGfg6hlMqSRDeXeg4h3Ol0VfIT496RP
 QBdrDb6hWO+HKt9B0M+7Q+8a/Fsw+5dtpqv1W/Wlr0i4CS6euU8NChAmrpkrqGo=
 =m5ba
 -----END PGP SIGNATURE-----

Merge tag 'v4.12-rc3' into for-linus

Merge with mainline to get acpi_dev_present() needed by patches to
axp20x-pek driver.
2017-06-02 17:49:10 -07:00
Linus Torvalds
104c08ba8e ACPI fixes for v4.12-rc4
- Revert one more commit related to the ACPI-based handling of
    laptop lids that changed the default behavior on laptops that
    booted with closed lids and introduced a regression there
    (Benjamin Tissoires).
 
  - Add a missing acpi_put_table() to the code implementing the
    /sys/firmware/acpi/tables interface to prevent a counter in
    the ACPICA core from overflowing (Dan Williams).
 
  - Drop error messages printed by ACPICA on acpi_get_table()
    reference counting mismatches as they need not indicate real
    errors at this point (Lv Zheng).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZMeW9AAoJEILEb/54YlRx4B0P/R4z8QpaYtHRCJmjl4wULRJU
 GQ3YYar4GSS5dROIsGj9KF2hD9JikeQxmf/E3qJSZrwVVlDOJzFAsi6hWSBBRDv4
 u0FhgNRUV6x84TMOTk1KauH8Ucb9oJReDpjZcbrIXgGX3CfE0K9YrXX9gfXn4ITp
 gIF9wW3OjXNy5E4YqHB9MoPJ8TSD4qBJUJUYS23igZOsT1fxV8KkcSLmrUk2znMB
 J7uOVA9rzmih+nilnH1PyEVm/p3C9nflqhjZPc2oKAdFWj+3WLHdAoCRgmIkOFk4
 RVZm4iRsv0a9ONwFvi7ALsUJWS8oAfcIq1W4qC4D1xgSIIU0vXVKtKpDeYPWrL/W
 gXuduT3yAvLAePqw24PFABkKbhSgxyd7Y90uTlOHAvSW3deFlOYLPXsCo7uRp9X1
 MrZzy3gzZRIZeYSYM31OpawnPeTkyPX5vG//+rAlWBa7kW07emSob7DYUKb0At/5
 9uFVQVmRjTtA6cVPVz5oaZknTGJkF3k6OmyG38zy7FMBMhqdiRFD4myQXuarnuVs
 /3gl7LfNVHXuNiIg2oAoKujJ9hABWvtIYT1dCoW1KY6/LQyz6BmJVOLOQ483YUAB
 SWuNfKVnRn8zYp4isYaJinLqPF7OORwB1zBPKpoPgZV+n/30blxLpGj5kYx492mt
 Oi8JrydZeAvK3FAz34oO
 =5CiR
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These revert one more problematic commit related to the ACPI-based
  handling of laptop lids and make some unuseful error messages coming
  from ACPICA go away.

  Specifics:

   - Revert one more commit related to the ACPI-based handling of laptop
     lids that changed the default behavior on laptops that booted with
     closed lids and introduced a regression there (Benjamin Tissoires).

   - Add a missing acpi_put_table() to the code implementing the
     /sys/firmware/acpi/tables interface to prevent a counter in the
     ACPICA core from overflowing (Dan Williams).

   - Drop error messages printed by ACPICA on acpi_get_table() reference
     counting mismatches as they need not indicate real errors at this
     point (Lv Zheng)"

* tag 'acpi-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
  Revert "ACPI / button: Change default behavior to lid_init_state=open"
  ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service
2017-06-02 16:36:23 -07:00
Linus Torvalds
89af529a64 Power management fixes for v4.12-rc4
- Make cpufreq_register_driver() return an error if the ->init()
    calls fail for all CPUs to prevent non-functional drivers from
    hanging around for no reason (David Arcari).
 
  - Make kirkwood-cpufreq check the return value of clk_prepare_enable()
    (which may fail) as appropriate (Arvind Yadav).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZMeVLAAoJEILEb/54YlRxm50QALGQv5EqJExNL9CwRSkcHabU
 VJMxx0Q9hhTuGb9hrQaFCsmE5svcbUpk8w/80nAGvtdRT5w2vB1ssTpJMwE7y407
 8k8DbqEG+Hzk6cjIsMUN2w6n8mWfhWFkk1bzk7xH4xVOcAsJBnfkxlW1cU9viU9G
 FnsEwVmFN5gpZZGwAIqi1Y4GoCpr5xr8mhvTa8D1ofHWfQ/DvroD6uyxd6p3T1cy
 ffXhLxCZAXWkg/RjkVzy+fksLF1A/H/7ylTMjLJzYbzdlf1R73tQ+vLFOJu9RYvT
 xHLU76zozx85LPMxe0MOvuP8yu4Kfy2DOZIBiH3/upBB+I7UzVzuQL0jk1byIGWv
 k5kwvThjJcnF07H0e56tHxq989tDE0RaGg3Mt3JqJ9+ddOKApPa2rDTTv9LtOZOX
 4mjDBdDzQg+N+uec8mGpuH+/Nz5LTJtopPyK7LG2r+KwS1cGMyhoEpolwUpVB9wp
 t/WcKfewm89ROfCrUjRe0+eRiF8DJz3xMsMbYVYHOxVSfWE/M/llQsMxhmrxW1Bw
 jMYzemHziOnYKcSKQtFzgo4/JtDnvx+/2ktlApgcRy9F89I/vjvZfcc3UGCfcd6o
 p46Y26b0cTHxfnBGDZOyHQzrlWFiv6j5wSScFncAJGNzZbNfweGRI+lyoS+IPsEf
 ueBWL3SzN/Tja4FeQ9BO
 =dmsR
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix two bugs in error code paths in the cpufreq core and in the
  kirkwood-cpufreq driver.

  Specifics:

   - Make cpufreq_register_driver() return an error if the ->init()
     calls fail for all CPUs to prevent non-functional drivers from
     hanging around for no reason (David Arcari).

   - Make kirkwood-cpufreq check the return value of
     clk_prepare_enable() (which may fail) as appropriate (Arvind
     Yadav)"

* tag 'pm-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: kirkwood-cpufreq:- Handle return value of clk_prepare_enable()
  cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
2017-06-02 16:33:33 -07:00
Linus Torvalds
5a4829b564 Fix a race on architectures with prioritized interrupts (such as m68k)
which can causes crashes in drivers/char/random.c:get_reg().
 -----BEGIN PGP SIGNATURE-----
 
 iQEyBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlkxzw8ACgkQ8vlZVpUN
 gaMFowf2KbhElx9OO7Nc62adSF8Ni9vJgk9kigOgrHFXCk4AV5L2QtyLBfk9gIOb
 QxG62fzs2mnnL26SgnvN+5qbuZch9G0GjZDOQTpAcxvT9EL67oZ5EhM3AJAHckOW
 LxSYCJKYBZqvd4dTHZI36gGNfnpYY7O58pRydE1BPPYSftzaQSS0UtMdcBkjCOoc
 Kix35I8PpIhXBubgTE2I38FlGHneS442FAGctnMa/8QlqbkCjSqd1V2rdqxvlNvi
 lQ4AG+6xLu2r/yCXqwHf18sKjt6XuluqrJTQGdUzarFw20ikURjSxwlu5Rh2PwJk
 hyFHMzBlQ7o68XBDk4LzNkNPgBSb
 =4W3q
 -----END PGP SIGNATURE-----

Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull /dev/random bug fix from Ted Ts'o:
 "Fix a race on architectures with prioritized interrupts (such as m68k)
  which can causes crashes in drivers/char/random.c:get_reg()"

* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  fix race in drivers/char/random.c:get_reg()
2017-06-02 16:19:47 -07:00
Linus Torvalds
f219764920 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  scripts/gdb: make lx-dmesg command work (reliably)
  mm: consider memblock reservations for deferred memory initialization sizing
  mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
  mlock: fix mlock count can not decrease in race condition
  mm/migrate: fix refcount handling when !hugepage_migration_supported()
  dax: fix race between colliding PMD & PTE entries
  mm: avoid spurious 'bad pmd' warning messages
  mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once
  pcmcia: remove left-over %Z format
  slub/memcg: cure the brainless abuse of sysfs attributes
  initramfs: fix disabling of initramfs (and its compression)
  mm: clarify why we want kmalloc before falling backto vmallock
  frv: declare jiffies to be located in the .data section
  include/linux/gfp.h: fix ___GFP_NOLOCKDEP value
  ksm: prevent crash after write_protect_page fails
2017-06-02 15:49:46 -07:00
André Draszik
d6c9708737 scripts/gdb: make lx-dmesg command work (reliably)
lx-dmesg needs access to the log_buf symbol from printk.c.
Unfortunately, the symbol log_buf also exists in BPF's verifier.c and
hence gdb can pick one or the other.  If it happens to pick BPF's
log_buf, lx-dmesg doesn't work:

  (gdb) lx-dmesg
  Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x0:
  Error occurred in Python command: Cannot access memory at address 0x0
  (gdb) p log_buf
  $15 = 0x0

Luckily, GDB has a way to deal with this, see
  https://sourceware.org/gdb/onlinedocs/gdb/Symbols.html

  (gdb) info variables ^log_buf$
  All variables matching regular expression "^log_buf$":

  File <linux.git>/kernel/bpf/verifier.c:
  static char *log_buf;

  File <linux.git>/kernel/printk/printk.c:
  static char *log_buf;
  (gdb) p 'verifier.c'::log_buf
  $1 = 0x0
  (gdb) p 'printk.c'::log_buf
  $2 = 0x811a6aa0 <__log_buf> ""
  (gdb) p &log_buf
  $3 = (char **) 0x8120fe40 <log_buf>
  (gdb) p &'verifier.c'::log_buf
  $4 = (char **) 0x8120fe40 <log_buf>
  (gdb) p &'printk.c'::log_buf
  $5 = (char **) 0x8048b7d0 <log_buf>

By being explicit about the location of the symbol, we can make lx-dmesg
work again.  While at it, do the same for the other symbols we need from
printk.c

Link: http://lkml.kernel.org/r/20170526112222.3414-1-git@andred.net
Signed-off-by: André Draszik <git@andred.net>
Tested-by: Kieran Bingham <kieran@bingham.xyz>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:38 -07:00
Michal Hocko
864b9a393d mm: consider memblock reservations for deferred memory initialization sizing
We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:

	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
	kthreadd cpuset=/ mems_allowed=7
	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
	Call Trace:
	  dump_stack+0xb0/0xf0 (unreliable)
	  dump_header+0xb0/0x258
	  out_of_memory+0x5f0/0x640
	  __alloc_pages_nodemask+0xa8c/0xc80
	  kmem_getpages+0x84/0x1a0
	  fallback_alloc+0x2a4/0x320
	  kmem_cache_alloc_node+0xc0/0x2e0
	  copy_process.isra.25+0x260/0x1b30
	  _do_fork+0x94/0x470
	  kernel_thread+0x48/0x60
	  kthreadd+0x264/0x330
	  ret_from_kernel_thread+0x5c/0xa4

	Mem-Info:
	active_anon:0 inactive_anon:0 isolated_anon:0
	 active_file:0 inactive_file:0 isolated_file:0
	 unevictable:0 dirty:0 writeback:0 unstable:0
	 slab_reclaimable:5 slab_unreclaimable:73
	 mapped:0 shmem:0 pagetables:0 bounce:0
	 free:0 free_pcp:0 free_cma:0
	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
	lowmem_reserve[]: 0 0 0 0
	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
	0 total pagecache pages
	0 pages in swap cache
	Swap cache stats: add 0, delete 0, find 0/0
	Free swap  = 0kB
	Total swap = 0kB
	819200 pages RAM
	0 pages HighMem/MovableOnly
	817481 pages reserved
	0 pages cma reserved
	0 pages hwpoisoned

the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done.  update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range.  In this particular case we've had

	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

so the low 2GB is mostly depleted.

Fix this by considering memblock allocations in the initial static
initialization estimation.  Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address.  The cumulative size is than added on top
of the initial estimation.  This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.

Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:38 -07:00
James Morse
9a291a7c94 mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
special case.  (check_user_page_hwpoison())

When huge pages are involved, this doesn't work so well.
get_user_pages() calls follow_hugetlb_page(), which stops early if it
receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
-EFAULT to the caller.  The step to map this to -EHWPOISON based on the
FOLL_ flags is missing.  The hwpoison special case is skipped, and
-EFAULT is returned to user-space, causing Qemu or kvmtool to exit.

Instead, move this VM_FAULT_ to errno mapping code into a header file
and use it from faultin_page() and follow_hugetlb_page().

With this, KVM works as expected.

This isn't a problem for arm64 today as we haven't enabled
MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
too, so I think this should be a fix.  This doesn't apply earlier than
stable's v4.11.1 due to all sorts of cleanup.

[james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
suggested.
  Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>	[4.11.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:38 -07:00
Yisheng Xie
70feee0e1e mlock: fix mlock count can not decrease in race condition
Kefeng reported that when running the follow test, the mlock count in
meminfo will increase permanently:

 [1] testcase
 linux:~ # cat test_mlockal
 grep Mlocked /proc/meminfo
  for j in `seq 0 10`
  do
 	for i in `seq 4 15`
 	do
 		./p_mlockall >> log &
 	done
 	sleep 0.2
 done
 # wait some time to let mlock counter decrease and 5s may not enough
 sleep 5
 grep Mlocked /proc/meminfo

 linux:~ # cat p_mlockall.c
 #include <sys/mman.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define SPACE_LEN	4096

 int main(int argc, char ** argv)
 {
	 	int ret;
	 	void *adr = malloc(SPACE_LEN);
	 	if (!adr)
	 		return -1;

	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
	 	printf("mlcokall ret = %d\n", ret);

	 	ret = munlockall();
	 	printf("munlcokall ret = %d\n", ret);

	 	free(adr);
	 	return 0;
	 }

In __munlock_pagevec() we should decrement NR_MLOCK for each page where
we clear the PageMlocked flag.  Commit 1ebb7cc6a5 ("mm: munlock: batch
NR_MLOCK zone state updates") has introduced a bug where we don't
decrement NR_MLOCK for pages where we clear the flag, but fail to
isolate them from the lru list (e.g.  when the pages are on some other
cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
accounting gets permanently disrupted by this.

Fix it by counting the number of page whose PageMlock flag is cleared.

Fixes: 1ebb7cc6a5 (" mm: munlock: batch NR_MLOCK zone state updates")
Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joern Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhongjiang <zhongjiang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:38 -07:00
Punit Agrawal
30809f559a mm/migrate: fix refcount handling when !hugepage_migration_supported()
On failing to migrate a page, soft_offline_huge_page() performs the
necessary update to the hugepage ref-count.

But when !hugepage_migration_supported() , unmap_and_move_hugepage()
also decrements the page ref-count for the hugepage.  The combined
behaviour leaves the ref-count in an inconsistent state.

This leads to soft lockups when running the overcommitted hugepage test
from mce-tests suite.

  Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000
  soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head)
  INFO: rcu_preempt detected stalls on CPUs/tasks:
   Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715
    (detected by 7, t=5254 jiffies, g=963, c=962, q=321)
    thugetlb_overco R  running task        0  2715   2685 0x00000008
    Call trace:
      dump_backtrace+0x0/0x268
      show_stack+0x24/0x30
      sched_show_task+0x134/0x180
      rcu_print_detail_task_stall_rnp+0x54/0x7c
      rcu_check_callbacks+0xa74/0xb08
      update_process_times+0x34/0x60
      tick_sched_handle.isra.7+0x38/0x70
      tick_sched_timer+0x4c/0x98
      __hrtimer_run_queues+0xc0/0x300
      hrtimer_interrupt+0xac/0x228
      arch_timer_handler_phys+0x3c/0x50
      handle_percpu_devid_irq+0x8c/0x290
      generic_handle_irq+0x34/0x50
      __handle_domain_irq+0x68/0xc0
      gic_handle_irq+0x5c/0xb0

Address this by changing the putback_active_hugepage() in
soft_offline_huge_page() to putback_movable_pages().

This only triggers on systems that enable memory failure handling
(ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration
(!ARCH_ENABLE_HUGEPAGE_MIGRATION).

I imagine this wasn't triggered as there aren't many systems running
this configuration.

[akpm@linux-foundation.org: remove dead comment, per Naoya]
Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.com
Reported-by: Manoj Iyer <manoj.iyer@canonical.com>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>	[3.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:38 -07:00
Ross Zwisler
e2093926a0 dax: fix race between colliding PMD & PTE entries
We currently have two related PMD vs PTE races in the DAX code.  These
can both be easily triggered by having two threads reading and writing
simultaneously to the same private mapping, with the key being that
private mapping reads can be handled with PMDs but private mapping
writes are always handled with PTEs so that we can COW.

Here is the first race:

  CPU 0					CPU 1

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
    handle_pte_fault()
      passes check for pmd_devmap()

					(private mapping read)
					__handle_mm_fault()
					  create_huge_pmd()
					    dax_iomap_pmd_fault() inserts PMD

      dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD
      			  installed in our page tables at this spot.

Here's the second race:

  CPU 0					CPU 1

  (private mapping read)
  __handle_mm_fault()
    passes check for pmd_none()
    create_huge_pmd()
      dax_iomap_pmd_fault() inserts PMD

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
					(private mapping read)
					__handle_mm_fault()
					  passes check for pmd_none()
					  create_huge_pmd()

    handle_pte_fault()
      dax_iomap_pte_fault() inserts PTE
					    dax_iomap_pmd_fault() inserts PMD,
					       but we already have a PTE at
					       this spot.

The core of the issue is that while there is isolation between faults to
the same range in the DAX fault handlers via our DAX entry locking,
there is no isolation between faults in the code in mm/memory.c.  This
means for instance that this code in __handle_mm_fault() can run:

	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
		ret = create_huge_pmd(&vmf);

But by the time we actually get to run the fault handler called by
create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE
fault has installed a normal PMD here as a parent.  This is the cause of
the 2nd race.  The first race is similar - there is the following check
in handle_pte_fault():

	} else {
		/* See comment in pte_alloc_one_map() */
		if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd))
			return 0;

So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we
will bail and retry the fault.  This is correct, but there is nothing
preventing the PMD from being installed after this check but before we
actually get to the DAX PTE fault handlers.

In my testing these races result in the following types of errors:

  BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1
  BUG: non-zero nr_ptes on freeing mm: 15

Fix this issue by having the DAX fault handlers verify that it is safe
to continue their fault after they have taken an entry lock to block
other racing faults.

[ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries]
  Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Pawel Lebioda <pawel.lebioda@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Ross Zwisler
d0f0931de9 mm: avoid spurious 'bad pmd' warning messages
When the pmd_devmap() checks were added by 5c7fb56e5e ("mm, dax:
dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
pages, they were all added to the end of if() statements after existing
pmd_trans_huge() checks.  So, things like:

  -       if (pmd_trans_huge(*pmd))
  +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))

When further checks were added after pmd_trans_unstable() checks by
commit 7267ec008b ("mm: postpone page table allocation until we have
page to map") they were also added at the end of the conditional:

  +       if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))

This ordering is fine for pmd_trans_huge(), but doesn't work for
pmd_trans_unstable().  This is because DAX huge pages trip the bad_pmd()
check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
pmd_trans_unstable()), which prints out a warning and returns 1.  So, we
do end up doing the right thing, but only after spamming dmesg with
suspicious looking messages:

  mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)

Reorder these checks in a helper so that pmd_devmap() is checked first,
avoiding the error messages, and add a comment explaining why the
ordering is important.

Fixes: commit 7267ec008b ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Tetsuo Handa
c288983ddd mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once
Roman Gushchin has reported that the OOM killer can trivially selects
next OOM victim when a thread doing memory allocation from page fault
path was selected as first OOM victim.

    allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
    allocate cpuset=/ mems_allowed=0
    CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
     oom_kill_process+0x219/0x3e0
     out_of_memory+0x11d/0x480
     __alloc_pages_slowpath+0xc84/0xd40
     __alloc_pages_nodemask+0x245/0x260
     alloc_pages_vma+0xa2/0x270
     __handle_mm_fault+0xca9/0x10c0
     handle_mm_fault+0xf3/0x210
     __do_page_fault+0x240/0x4e0
     trace_do_page_fault+0x37/0xe0
     do_async_page_fault+0x19/0x70
     async_page_fault+0x28/0x30
    ...
    Out of memory: Kill process 492 (allocate) score 899 or sacrifice child
    Killed process 492 (allocate) total-vm:2052368kB, anon-rss:1894576kB, file-rss:4kB, shmem-rss:0kB
    allocate: page allocation failure: order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null)
    allocate cpuset=/ mems_allowed=0
    CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
     __alloc_pages_slowpath+0xd32/0xd40
     __alloc_pages_nodemask+0x245/0x260
     alloc_pages_vma+0xa2/0x270
     __handle_mm_fault+0xca9/0x10c0
     handle_mm_fault+0xf3/0x210
     __do_page_fault+0x240/0x4e0
     trace_do_page_fault+0x37/0xe0
     do_async_page_fault+0x19/0x70
     async_page_fault+0x28/0x30
    ...
    oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
    ...
    allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null),  order=0, oom_score_adj=0
    allocate cpuset=/ mems_allowed=0
    CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
     oom_kill_process+0x219/0x3e0
     out_of_memory+0x11d/0x480
     pagefault_out_of_memory+0x68/0x80
     mm_fault_error+0x8f/0x190
     ? handle_mm_fault+0xf3/0x210
     __do_page_fault+0x4b2/0x4e0
     trace_do_page_fault+0x37/0xe0
     do_async_page_fault+0x19/0x70
     async_page_fault+0x28/0x30
    ...
    Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child
    Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB

There is a race window that the OOM reaper completes reclaiming the
first victim's memory while nothing but mutex_trylock() prevents the
first victim from calling out_of_memory() from pagefault_out_of_memory()
after memory allocation for page fault path failed due to being selected
as an OOM victim.

This is a side effect of commit 9a67f6488e ("mm: consolidate
GFP_NOFAIL checks in the allocator slowpath") because that commit
silently changed the behavior from

    /* Avoid allocations with no watermarks from looping endlessly */

to

    /*
     * Give up allocations without trying memory reserves if selected
     * as an OOM victim
     */

in __alloc_pages_slowpath() by moving the location to check TIF_MEMDIE
flag.  I have noticed this change but I didn't post a patch because I
thought it is an acceptable change other than noise by warn_alloc()
because !__GFP_NOFAIL allocations are allowed to fail.  But we
overlooked that failing memory allocation from page fault path makes
difference due to the race window explained above.

While it might be possible to add a check to pagefault_out_of_memory()
that prevents the first victim from calling out_of_memory() or remove
out_of_memory() from pagefault_out_of_memory(), changing
pagefault_out_of_memory() does not suppress noise by warn_alloc() when
allocating thread was selected as an OOM victim.  There is little point
with printing similar backtraces and memory information from both
out_of_memory() and warn_alloc().

Instead, if we guarantee that current thread can try allocations with no
watermarks once when current thread looping inside
__alloc_pages_slowpath() was selected as an OOM victim, we can follow "who
can use memory reserves" rules and suppress noise by warn_alloc() and
prevent memory allocations from page fault path from calling
pagefault_out_of_memory().

If we take the comment literally, this patch would do

  -    if (test_thread_flag(TIF_MEMDIE))
  -        goto nopage;
  +    if (alloc_flags == ALLOC_NO_WATERMARKS || (gfp_mask & __GFP_NOMEMALLOC))
  +        goto nopage;

because gfp_pfmemalloc_allowed() returns false if __GFP_NOMEMALLOC is
given.  But if I recall correctly (I couldn't find the message), the
condition is meant to apply to only OOM victims despite the comment.
Therefore, this patch preserves TIF_MEMDIE check.

Fixes: 9a67f6488e ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath")
Link: http://lkml.kernel.org/r/201705192112.IAF69238.OQOHSJLFOFFMtV@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Roman Gushchin <guro@fb.com>
Tested-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>	[4.11]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Nicolas Iooss
ff5a20169b pcmcia: remove left-over %Z format
Commit 5b5e0928f7 ("lib/vsprintf.c: remove %Z support") removed some
usages of format %Z but forgot "%.2Zx".  This makes clang 4.0 reports a
-Wformat-extra-args warning because it does not know about %Z.

Replace %Z with %z.

Link: http://lkml.kernel.org/r/20170520090946.22562-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Harald Welte <laforge@gnumonks.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>	[4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Thomas Gleixner
478fe3037b slub/memcg: cure the brainless abuse of sysfs attributes
memcg_propagate_slab_attrs() abuses the sysfs attribute file functions
to propagate settings from the root kmem_cache to a newly created
kmem_cache.  It does that with:

     attr->show(root, buf);
     attr->store(new, buf, strlen(bug);

Aside of being a lazy and absurd hackery this is broken because it does
not check the return value of the show() function.

Some of the show() functions return 0 w/o touching the buffer.  That
means in such a case the store function is called with the stale content
of the previous show().  That causes nonsense like invoking
kmem_cache_shrink() on a newly created kmem_cache.  In the worst case it
would cause handing in an uninitialized buffer.

This should be rewritten proper by adding a propagate() callback to
those slub_attributes which must be propagated and avoid that insane
conversion to and from ASCII, but that's too large for a hot fix.

Check at least the return value of the show() function, so calling
store() with stale content is prevented.

Steven said:
 "It can cause a deadlock with get_online_cpus() that has been uncovered
  by recent cpu hotplug and lockdep changes that Thomas and Peter have
  been doing.

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(cpu_hotplug.lock);
                                   lock(slab_mutex);
                                   lock(cpu_hotplug.lock);
      lock(slab_mutex);

     *** DEADLOCK ***"

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Florian Fainelli
57ddfdaa9a initramfs: fix disabling of initramfs (and its compression)
Commit db2aa7fd15 ("initramfs: allow again choice of the embedded
initram compression algorithm") introduced the possibility to select the
initramfs compression algorithm from Kconfig and while this is a nice
feature it broke the use case described below.

Here is what my build system does:

 - kernel is initially configured not to have an initramfs included

 - build the user space root file system

 - re-configure the kernel to have an initramfs included
   (CONFIG_INITRAMFS_SOURCE="/path/to/romfs") and set relevant
   CONFIG_INITRAMFS options, in my case, no compression option
   (CONFIG_INITRAMFS_COMPRESSION_NONE)

 - kernel is re-built with these options -> kernel+initramfs image is
   copied

 - kernel is re-built again without these options -> kernel image is
   copied

Building a kernel without an initramfs means setting this option:

  CONFIG_INITRAMFS_SOURCE="" (and this one only)

whereas building a kernel with an initramfs means setting these options:

  CONFIG_INITRAMFS_SOURCE="/home/fainelli/work/uclinux-rootfs/romfs /home/fainelli/work/uclinux-rootfs/misc/initramfs.dev"
  CONFIG_INITRAMFS_ROOT_UID=1000
  CONFIG_INITRAMFS_ROOT_GID=1000
  CONFIG_INITRAMFS_COMPRESSION_NONE=y
  CONFIG_INITRAMFS_COMPRESSION=""

Commit db2aa7fd15 ("initramfs: allow again choice of the embedded
initram compression algorithm") is problematic because
CONFIG_INITRAMFS_COMPRESSION which is used to determine the
initramfs_data.cpio extension/compression is a string, and due to how
Kconfig works it will evaluate in order, how to assign it.

Setting CONFIG_INITRAMFS_COMPRESSION_NONE with CONFIG_INITRAMFS_SOURCE=""
cannot possibly work (because of the depends on INITRAMFS_SOURCE!=""
imposed on CONFIG_INITRAMFS_COMPRESSION ) yet we still get
CONFIG_INITRAMFS_COMPRESSION assigned to ".gz" because CONFIG_RD_GZIP=y
is set in my kernel, even when there is no initramfs being built.

So we basically end-up generating two initramfs_data.cpio* files, one
without extension, and one with .gz.  This causes usr/Makefile to track
usr/initramfs_data.cpio.gz, and not usr/initramfs_data.cpio anymore,
that is also largely problematic after 9e3596b0c6 ("kbuild:
initramfs cleanup, set target from Kconfig") because we used to track
all possible initramfs_data files in the $(targets) variable before that
commit.

The end result is that the kernel with an initramfs clearly does not
contain what we expect it to, it has a stale initramfs_data.cpio file
built into it, and we keep re-generating an initramfs_data.cpio.gz file
which is not the one that we want to include in the kernel image proper.

The fix consists in hiding CONFIG_INITRAMFS_COMPRESSION when
CONFIG_INITRAMFS_SOURCE="".  This puts us back in a state to the
pre-4.10 behavior where we can properly disable and re-enable initramfs
within the same kernel .config file, and be in control of what
CONFIG_INITRAMFS_COMPRESSION is set to.

Fixes: db2aa7fd15 ("initramfs: allow again choice of the embedded initram compression algorithm")
Fixes: 9e3596b0c6 ("kbuild: initramfs cleanup, set target from Kconfig")
Link: http://lkml.kernel.org/r/20170521033337.6197-1-f.fainelli@gmail.com
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Cc: P J P <ppandit@redhat.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Michal Hocko
4f4f2ba9c5 mm: clarify why we want kmalloc before falling backto vmallock
While converting drm_[cm]alloc* helpers to kvmalloc* variants Chris
Wilson has wondered why we want to try kmalloc before vmalloc fallback
even for larger allocations requests.  Let's clarify that one larger
physically contiguous block is less likely to fragment memory than many
scattered pages which can prevent more large blocks from being created.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170517080932.21423-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Matthias Kaehlcke
60b0a8c3d2 frv: declare jiffies to be located in the .data section
Commit 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with
____cacheline_aligned_in_smp") removed a section specification from the
jiffies declaration that caused conflicts on some platforms.

Unfortunately this change broke the build for frv:

  kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol
      `jiffies' defined in *ABS* section in .tmp_vmlinux1
  kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol
      `jiffies' defined in *ABS* section in .tmp_vmlinux1
  kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against
      symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1
  ...

Add __jiffy_arch_data to the declaration of jiffies and use it on frv to
include the section specification.  For all other platforms
__jiffy_arch_data (currently) has no effect.

Fixes: 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Michal Hocko
1bde33e051 include/linux/gfp.h: fix ___GFP_NOLOCKDEP value
Igor Stoppa has noticed that __GFP_NOLOCKDEP can use a lower bit.  At
the time commit 7e7844226f ("lockdep: allow to disable reclaim lockup
detection") was written we still had __GFP_OTHER_NODE but I have removed
it in commit 41b6167e8f ("mm: get rid of __GFP_OTHER_NODE") and forgot
to lower the bit value.

The current value is outside of __GFP_BITS_SHIFT so it cannot be used
actually.

Fixes: 7e7844226f ("lockdep: allow to disable reclaim lockup detection")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Igor Stoppa <igor.stoppa@nokia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Andrea Arcangeli
a7306c3436 ksm: prevent crash after write_protect_page fails
"err" needs to be left set to -EFAULT if split_huge_page succeeds.
Otherwise if "err" gets clobbered with zero and write_protect_page
fails, try_to_merge_one_page() will succeed instead of returning -EFAULT
and then try_to_merge_with_ksm_page() will continue thinking kpage is a
PageKsm when in fact it's still an anonymous page.  Eventually it'll
crash in page_add_anon_rmap.

This has been reproduced on Fedora25 kernel but I can reproduce with
upstream too.

The bug was introduced in commit f765f54059 ("ksm: prepare to new THP
semantics") introduced in v4.5.

    page:fffff67546ce1cc0 count:4 mapcount:2 mapping:ffffa094551e36e1 index:0x7f0f46673
    flags: 0x2ffffc0004007c(referenced|uptodate|dirty|lru|active|swapbacked)
    page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
    page->mem_cgroup:ffffa09674bf0000
    ------------[ cut here ]------------
    kernel BUG at mm/rmap.c:1222!
    CPU: 1 PID: 76 Comm: ksmd Not tainted 4.9.3-200.fc25.x86_64 #1
    RIP: do_page_add_anon_rmap+0x1c4/0x240
    Call Trace:
      page_add_anon_rmap+0x18/0x20
      try_to_merge_with_ksm_page+0x50b/0x780
      ksm_scan_thread+0x1211/0x1410
      ? prepare_to_wait_event+0x100/0x100
      ? try_to_merge_with_ksm_page+0x780/0x780
      kthread+0xd9/0xf0
      ? kthread_park+0x60/0x60
      ret_from_fork+0x25/0x30

Fixes: f765f54059 ("ksm: prepare to new THP semantics")
Link: http://lkml.kernel.org/r/20170513131040.21732-1-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Federico Simoncelli <fsimonce@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Rafael J. Wysocki
6031913025 Merge branches 'acpi-button', 'acpica' and 'acpi-sysfs'
* acpi-button:
  Revert "ACPI / button: Change default behavior to lid_init_state=open"

* acpica:
  ACPICA: Tables: Fix regression introduced by a too early mechanism enabling

* acpi-sysfs:
  ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service
2017-06-03 00:03:29 +02:00
Rafael J. Wysocki
bb5710e72c Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: kirkwood-cpufreq:- Handle return value of clk_prepare_enable()
  cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
2017-06-03 00:01:45 +02:00
Linus Torvalds
e6e6d07436 Changes since last update:
- Fix an unmount hang due to a race in io buffer accounting.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJZMKVEAAoJEPh/dxk0SrTrBYcQAKSpzE8C9wDBw6cyxP3kwrTr
 FSQiSr7flnGBHwy2U0UC/SIFIwYxvW4BTnXJWADyqtnvLWP1+TC7UY1oNpkTsbkK
 KLsWgz3aOcT/8sb346PzFDAuxof2lkv3xFPRBFaoeSkybxWqLz6BWsbmaJNH/wqy
 W3k3H241mAftEiv1i9IUlAZMXE31qywIKzzUJvkOglXS8OdVFfMPQvUz6epU2LWA
 I2tBip936Sl45vLu6ubqoRpk8dWNuPPX+f4YXl8dVeqRKTYhviMwgYD4rlljb6Ti
 kIRG9HYg1GVZo5z/5unAjyEaKzYoRrXnO5Lg+i09NIhezlDhB2HJ+k71NljoeHoe
 YCwqumQIGgnxdFu+FP10tKh2EWvDp80SQxgzIvr+FCCKJdsdNYyftRh4CtsCPJSG
 xWHT1jgovygHsBEEmG2LS9mCXKkyWgMkHNMBu3Yy/F/4HGzrPjcU3F+x90OmOo7J
 S26kEwsAoo+Q5Is8QkmqrnD+CQ7jwXEv9Mw3UqRwQ7UagRdR2nI8CIGEC7W+42Mm
 Gd3TtAyJCbhZWXNq7pLeTnGu7JY3/dhR/8VSW+mIKtvFg7v9O1wZBYId8vTwZN1+
 8jgnW0h6myE10YKU5bc1TZeYYAkWA+JLRKxoexL3QD8jWeffyZgMNWPM2rb+4Jjp
 2wwCHMPvHE8X7a2urTW3
 =wRbJ
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS fix from Darrick Wong:
 "I've one more bugfix for you for 4.12-rc4: Fix an unmount hang due to
  a race in io buffer accounting"

* tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: use ->b_state to fix buffer I/O accounting release race
2017-06-02 12:29:03 -07:00