Commit Graph

754117 Commits

Author SHA1 Message Date
Masahiro Yamada
d6605b6bbe x86/build: Remove unnecessary preparation for purgatory
kexec-purgatory.c is properly generated when Kbuild descend into
the arch/x86/purgatory/.

Thus the 'archprepare' target is redundant.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/1529401422-28838-3-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 12:55:05 +02:00
Masahiro Yamada
bdab125c93 Revert "kexec/purgatory: Add clean-up for purgatory directory"
Reverts the following commit:

  b0108f9e93 ("kexec: purgatory: add clean-up for purgatory directory")

... which incorrectly stated that the kexec-purgatory.c and purgatory.ro files
were not removed after 'make mrproper'.

In fact, they are.  You can confirm it after reverting it.

  $ make mrproper
  $ touch arch/x86/purgatory/kexec-purgatory.c
  $ touch arch/x86/purgatory/purgatory.ro
  $ make mrproper
    CLEAN   arch/x86/purgatory
  $ ls arch/x86/purgatory/
  entry64.S  Makefile  purgatory.c  setup-x86_64.S  stack.S  string.c

This is obvious from the build system point of view.

arch/x86/Makefile adds 'arch/x86' to core-y.
Hence 'make clean' descends like this:

  arch/x86/Kbuild
    -> arch/x86/purgatory/Makefile

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/1529401422-28838-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 12:55:05 +02:00
Juergen Gross
74899d92e6 x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths
Commit:

  1f50ddb4f4 ("x86/speculation: Handle HT correctly on AMD")

... added speculative_store_bypass_ht_init() to the per-CPU initialization sequence.

speculative_store_bypass_ht_init() needs to be called on each CPU for
PV guests, too.

Reported-by: Brian Woods <brian.woods@amd.com>
Tested-by: Brian Woods <brian.woods@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: xen-devel@lists.xenproject.org
Fixes: 1f50ddb4f4 ("x86/speculation: Handle HT correctly on AMD")
Link: https://lore.kernel.org/lkml/20180621084331.21228-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 10:55:52 +02:00
Siarhei Liakh
3ae6295ccb x86: Call fixup_exception() before notify_die() in math_error()
fpu__drop() has an explicit fwait which under some conditions can trigger a
fixable FPU exception while in kernel. Thus, we should attempt to fixup the
exception first, and only call notify_die() if the fixup failed just like
in do_general_protection(). The original call sequence incorrectly triggers
KDB entry on debug kernels under particular FPU-intensive workloads.

Andy noted, that this makes the whole conditional irq enable thing even
more inconsistent, but fixing that it outside the scope of this.

Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Borislav  Petkov" <bpetkov@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/DM5PR11MB201156F1CAB2592B07C79A03B17D0@DM5PR11MB2011.namprd11.prod.outlook.com
2018-06-20 11:44:56 +02:00
Tony Luck
1d9f3e20a5 x86/intel_rdt: Enable CMT and MBM on new Skylake stepping
New stepping of Skylake has fixes for cache occupancy and memory
bandwidth monitoring.

Update the code to enable these by default on newer steppings.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org # v4.14
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: https://lkml.kernel.org/r/20180608160732.9842-1-tony.luck@intel.com
2018-06-09 16:04:34 +02:00
Thomas Gleixner
a07771ac6a x86/apic/vector: Print APIC control bits in debugfs
Extend the debugability of the vector management by adding the state bits
to the debugfs output.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.908136099@linutronix.de
2018-06-06 15:18:22 +02:00
Thomas Gleixner
12f47073a4 genirq/affinity: Defer affinity setting if irq chip is busy
The case that interrupt affinity setting fails with -EBUSY can be handled
in the kernel completely by using the already available generic pending
infrastructure.

If a irq_chip::set_affinity() fails with -EBUSY, handle it like the
interrupts for which irq_chip::set_affinity() can only be invoked from
interrupt context. Copy the new affinity mask to irq_desc::pending_mask and
set the affinity pending bit. The next raised interrupt for the affected
irq will check the pending bit and try to set the new affinity from the
handler. This avoids that -EBUSY is returned when an affinity change is
requested from user space and the previous change has not been cleaned
up. The new affinity will take effect when the next interrupt is raised
from the device.

Fixes: dccfe3147b ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de
2018-06-06 15:18:22 +02:00
Thomas Gleixner
839b0f1c4e x86/platform/uv: Use apic_ack_irq()
To address the EBUSY fail of interrupt affinity settings in case that the
previous setting has not been cleaned up yet, use the new apic_ack_irq()
function instead of the special uv_ack_apic() implementation which is
merily a wrapper around ack_APIC_irq().

Preparatory change for the real fix

Fixes: dccfe3147b ("x86/vector: Simplify vector move cleanup")
Reported-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.721691398@linutronix.de
2018-06-06 15:18:21 +02:00
Thomas Gleixner
2b04e46d8d x86/ioapic: Use apic_ack_irq()
To address the EBUSY fail of interrupt affinity settings in case that the
previous setting has not been cleaned up yet, use the new apic_ack_irq()
function instead of directly invoking ack_APIC_irq().

Preparatory change for the real fix

Fixes: dccfe3147b ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.639011135@linutronix.de
2018-06-06 15:18:21 +02:00
Thomas Gleixner
8a2b7d142e irq_remapping: Use apic_ack_irq()
To address the EBUSY fail of interrupt affinity settings in case that the
previous setting has not been cleaned up yet, use the new apic_ack_irq()
function instead of the special ir_ack_apic_edge() implementation which is
merily a wrapper around ack_APIC_irq().

Preparatory change for the real fix

Fixes: dccfe3147b ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.555716895@linutronix.de
2018-06-06 15:18:20 +02:00
Thomas Gleixner
c0255770cc x86/apic: Provide apic_ack_irq()
apic_ack_edge() is explicitely for handling interrupt affinity cleanup when
interrupt remapping is not available or disable.

Remapped interrupts and also some of the platform specific special
interrupts, e.g. UV, invoke ack_APIC_irq() directly.

To address the issue of failing an affinity update with -EBUSY the delayed
affinity mechanism can be reused, but ack_APIC_irq() does not handle
that. Adding this to ack_APIC_irq() is not possible, because that function
is also used for exceptions and directly handled interrupts like IPIs.

Create a new function, which just contains the conditional invocation of
irq_move_irq() and the final ack_APIC_irq().

Reuse the new function in apic_ack_edge().

Preparatory change for the real fix.

Fixes: dccfe3147b ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
2018-06-06 15:18:20 +02:00
Thomas Gleixner
d340ebd696 genirq/migration: Avoid out of line call if pending is not set
The upcoming fix for the -EBUSY return from affinity settings requires to
use the irq_move_irq() functionality even on irq remapped interrupts. To
avoid the out of line call, move the check for the pending bit into an
inline helper.

Preparatory change for the real fix. No functional change.

Fixes: dccfe3147b ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
2018-06-06 15:18:20 +02:00
Thomas Gleixner
a33a5d2d16 genirq/generic_pending: Do not lose pending affinity update
The generic pending interrupt mechanism moves interrupts from the interrupt
handler on the original target CPU to the new destination CPU. This is
required for x86 and ia64 due to the way the interrupt delivery and
acknowledge works if the interrupts are not remapped.

However that update can fail for various reasons. Some of them are valid
reasons to discard the pending update, but the case, when the previous move
has not been fully cleaned up is not a legit reason to fail.

Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
a pending cleanup, and rearm the pending move in the irq dexcriptor so it's
tried again when the next interrupt arrives.

Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de
2018-06-06 15:18:19 +02:00
Thomas Gleixner
80ae7b1a91 x86/apic/vector: Prevent hlist corruption and leaks
Several people observed the WARN_ON() in irq_matrix_free() which triggers
when the caller tries to free an vector which is not in the allocation
range. Song provided the trace information which allowed to decode the root
cause.

The rework of the vector allocation mechanism failed to preserve a sanity
check, which prevents setting a new target vector/CPU when the previous
affinity change has not fully completed.

As a result a half finished affinity change can be overwritten, which can
cause the leak of a irq descriptor pointer on the previous target CPU and
double enqueue of the hlist head into the cleanup lists of two or more
CPUs. After one CPU cleaned up its vector the next CPU will invoke the
cleanup handler with vector 0, which triggers the out of range warning in
the matrix allocator.

Prevent this by checking the apic_data of the interrupt whether the
move_in_progress flag is false and the hlist node is not hashed. Return
-EBUSY if not.

This prevents the damage and restores the behaviour before the vector
allocation rework, but due to other changes in that area it also widens the
chance that user space can observe -EBUSY. In theory this should be fine,
but actually not all user space tools handle -EBUSY correctly. Addressing
that is not part of this fix, but will be addressed in follow up patches.

Fixes: 69cde0004a ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180604162224.303870257@linutronix.de
2018-06-06 15:18:19 +02:00
Dou Liyang
838d76d63e x86/vector: Fix the args of vector_alloc tracepoint
The vector_alloc tracepont reversed the reserved and ret aggs, that made
the trace print wrong. Exchange them.

Fixes: 8d1e3dca7d ("x86/vector: Add tracepoints for vector management")
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180601065031.21872-1-douly.fnst@cn.fujitsu.com
2018-06-06 13:38:02 +02:00
Dou Liyang
3366281288 x86/idt: Simplify the idt_setup_apic_and_irq_gates()
The idt_setup_apic_and_irq_gates() sets the gates from
FIRST_EXTERNAL_VECTOR up to FIRST_SYSTEM_VECTOR first. then secondly, from
FIRST_SYSTEM_VECTOR to NR_VECTORS, it takes both APIC=y and APIC=n into
account.

But for APIC=n, the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS, all
vectors has been set at the first step.

Simplify the second step, make it just work for APIC=y.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180523023555.2933-1-douly.fnst@cn.fujitsu.com
2018-06-06 13:38:01 +02:00
Varsha Rao
cce2946b9b x86/platform/uv: Remove extra parentheses
Remove extra parentheses to fix the extraneous parentheses clang
warning.

Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Mc Guire <der.herr@hofr.at>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180520080012.8215-1-rvarsha016@gmail.com
2018-06-06 13:38:01 +02:00
Kirill A. Shutemov
94d49eb30e x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME
AMD SME claims one bit from physical address to indicate whether the
page is encrypted or not. To achieve that we clear out the bit from
__PHYSICAL_MASK.

The capability to adjust __PHYSICAL_MASK is required beyond AMD SME.
For instance for upcoming Intel Multi-Key Total Memory Encryption.

Factor it out into a separate feature with own Kconfig handle.

It also helps with overhead of AMD SME. It saves more than 3k in .text
on defconfig + AMD_MEM_ENCRYPT:

	add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564)

We would need to return to this once we have infrastructure to patch
constants in code. That's good candidate for it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-mm@kvack.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180518113028.79825-1-kirill.shutemov@linux.intel.com
2018-06-06 13:38:01 +02:00
Arnd Bergmann
046c0dbec0 x86: Mark native_set_p4d() as __always_inline
When CONFIG_OPTIMIZE_INLINING is enabled, the function native_set_p4d()
may not be fully inlined into the caller, resulting in a false-positive
warning about an access to the __pgtable_l5_enabled variable from a
non-__init function, despite the original caller being an __init function:

WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from the function native_set_p4d() to the variable .init.data:__pgtable_l5_enabled
WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from the function native_p4d_clear() to the variable .init.data:__pgtable_l5_enabled

The function native_set_p4d() references the variable __initdata
__pgtable_l5_enabled.  This is often because native_set_p4d lacks a
__initdata annotation or the annotation of __pgtable_l5_enabled is wrong.

Marking the native_set_p4d function and its caller native_p4d_clear()
avoids this problem.

I did not bisect the original cause, but I assume this is related to the
recent rework that turned pgtable_l5_enabled() into an inline function,
which in turn caused the compiler to make different inlining decisions.

Fixes: ad3fe525b9 ("x86/mm: Unify pgtable_l5_enabled usage in early boot code")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Link: https://lkml.kernel.org/r/20180605113715.1133726-1-arnd@arndb.de
2018-06-06 12:09:45 +02:00
Ingo Molnar
24dd064d5b Merge branches 'x86/dma', 'x86/microcode', 'x86/mm' and 'x86/vdso' into x86/urgent
Merge these small and simple 1-2 commit branches into the urgent branch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-04 18:50:32 +02:00
Linus Torvalds
29dcea8877 Linux 4.17 2018-06-03 14:15:21 -07:00
Linus Torvalds
325e14f97e Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

 - fix io_destroy()/aio_complete() race

 - the vfs_open() change to get rid of open_check_o_direct() boilerplate
   was nice, but buggy. Al has a patch avoiding a revert, but that's
   definitely not a last-day fodder, so for now revert it is...

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Revert "fs: fold open_check_o_direct into do_dentry_open"
  fix io_destroy()/aio_complete() race
2018-06-03 11:01:28 -07:00
Al Viro
af04fadcaa Revert "fs: fold open_check_o_direct into do_dentry_open"
This reverts commit cab64df194.

Having vfs_open() in some cases drop the reference to
struct file combined with

	error = vfs_open(path, f, cred);
	if (error) {
		put_filp(f);
		return ERR_PTR(error);
	}
	return f;

is flat-out wrong.  It used to be

		error = vfs_open(path, f, cred);
		if (!error) {
			/* from now on we need fput() to dispose of f */
			error = open_check_o_direct(f);
			if (error) {
				fput(f);
				f = ERR_PTR(error);
			}
		} else {
			put_filp(f);
			f = ERR_PTR(error);
		}

and sure, having that open_check_o_direct() boilerplate gotten rid of is
nice, but not that way...

Worse, another call chain (via finish_open()) is FUBAR now wrt
FILE_OPENED handling - in that case we get error returned, with file
already hit by fput() *AND* FILE_OPENED not set.  Guess what happens in
path_openat(), when it hits

	if (!(opened & FILE_OPENED)) {
		BUG_ON(!error);
		put_filp(file);
	}

The root cause of all that crap is that the callers of do_dentry_open()
have no way to tell which way did it fail; while that could be fixed up
(by passing something like int *opened to do_dentry_open() and have it
marked if we'd called ->open()), it's probably much too late in the
cycle to do so right now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-03 10:58:23 -07:00
Linus Torvalds
874cd339ac Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:

 - two patches addressing the problem that the scheduler allows under
   certain conditions user space tasks to be scheduled on CPUs which are
   not yet fully booted which causes a few subtle and hard to debug
   issue

 - add a missing runqueue clock update in the deadline scheduler which
   triggers a warning under certain circumstances

 - fix a silly typo in the scheduler header file

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers: Fix typo
  sched/deadline: Fix missing clock update
  sched/core: Require cpu_active() in select_task_rq(), for user tasks
  sched/core: Fix rules for running on online && !active CPUs
2018-06-03 09:01:41 -07:00
Linus Torvalds
26bdace74c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:

 - fix 'perf test Session topology' segfault on s390 (Thomas Richter)

 - fix NULL return handling in bpf__prepare_load() (YueHaibing)

 - fix indexing on Coresight ETM packet queue decoder (Mathieu Poirier)

 - fix perf.data format description of NRCPUS header (Arnaldo Carvalho
   de Melo)

 - update perf.data documentation section on cpu topology

 - handle uncore event aliases in small groups properly (Kan Liang)

 - add missing perf_sample.addr into python sample dictionary (Leo Yan)

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fix perf.data format description of NRCPUS header
  perf script python: Add addr into perf sample dict
  perf data: Update documentation section on cpu topology
  perf cs-etm: Fix indexing for decoder packet queue
  perf bpf: Fix NULL return handling in bpf__prepare_load()
  perf test: "Session topology" dumps core on s390
  perf parse-events: Handle uncore event aliases in small groups properly
2018-06-03 08:58:59 -07:00
Linus Torvalds
918fe1b315 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Infinite loop in _decode_session6(), from Eric Dumazet.

 2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric
    Dumazet.

 3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux.

 4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki
    Makita.

 5) Incorrect idr release in cls_flower, from Paul Blakey.

 6) Probe error handling fix in davinci_emac, from Dan Carpenter.

 7) Memory leak in XPS configuration, from Alexander Duyck.

 8) Use after free with cloned sockets in kcm, from Kirill Tkhai.

 9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas
    Dichtel.

10) Fix UAPI hole in bpf data structure for 32-bit compat applications,
    from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
  bpf: fix uapi hole for 32 bit compat applications
  net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
  ip6_tunnel: remove magic mtu value 0xFFF8
  ip_tunnel: restore binding to ifaces with a large mtu
  net: dsa: b53: Add BCM5389 support
  kcm: Fix use-after-free caused by clonned sockets
  net-sysfs: Fix memory leak in XPS configuration
  ixgbe: fix parsing of TC actions for HW offload
  net: ethernet: davinci_emac: fix error handling in probe()
  net/ncsi: Fix array size in dumpit handler
  cls_flower: Fix incorrect idr release when failing to modify rule
  net/sonic: Use dma_mapping_error()
  xfrm Fix potential error pointer dereference in xfrm_bundle_create.
  vhost_net: flush batched heads before trying to busy polling
  tun: Fix NULL pointer dereference in XDP redirect
  be2net: Fix error detection logic for BE3
  net: qmi_wwan: Add Netgear Aircard 779S
  mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
  atm: zatm: fix memcmp casting
  iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs
  ...
2018-06-02 17:35:53 -07:00
Linus Torvalds
e0255aec66 SCSI fixes on 20180602
Eve of merge window fix: The original code was so bogus as to be
 casting the wrong generic device to an rport and proceeding to take
 actions based on the bogus values it found.  Fortunately it seems the
 location that is dereferenced always exists, so the code hasn't oopsed
 yet, but it certainly annoys the memory checkers.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWxMJDSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWkTAP0QPfpP
 ywyhrODRRPNg73zZnF3qo3CSeswSxDdjyW/4JAD/aLgTfPOydD+EA/sr/hjcs+Z/
 DU3lt68c+CVp1kRtZ9A=
 =9sLD
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
 "Eve of merge window fix: The original code was so bogus as to be
  casting the wrong generic device to an rport and proceeding to take
  actions based on the bogus values it found.

  Fortunately it seems the location that is dereferenced always exists,
  so the code hasn't oopsed yet, but it certainly annoys the memory
  checkers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_transport_srp: Fix shost to rport translation
2018-06-02 15:54:49 -07:00
Linus Torvalds
ada7339efe amdgpu, i915, omap, dw-hdmi fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbEvsXAAoJEAx081l5xIa+LncP/355OijfiJxjbze+eJVlz+1r
 CIlZsZBMc9aqsGU2HyVCTx3dqglrx0UbXZWRyI5WgPuqgi8x/rgbrTk+RyWr3eKw
 QQD+oHMN6vXn07ss3VYyHgW+5tkEZ/bbkTaTVRagag/wCZ9eOFgtww1VllYEw1ZG
 QI+tOWKhBpcU1dnoL+Bxu1W4JckPXK5FPlcFF/zkt/bIOSedoj+3MZl5Db7c1NSz
 quw0mR8C/luf5H2Ump5WIgqKRD8j3xM/EDnY0gdIg9HMmm3k0xIhNLcU/rwNi5ns
 qvurOPGK9Fteu94QnmmIbKj1E/ms/KRDA+71UoqmW2YYKJJAYHYr6t8Q4HXlKFcC
 MGq1pDGzaZrTbOtqHPJv6iLcnA28GgjDGQ0nQuNWp7mhjmb+fbqLftFQjLeHNPUc
 lu3pmmE8FZWI4lSTiqj5ojM1ceZFgGFN2l52PS+17wVHAHln+WpIMbFpaqkxlFRz
 zwMr09d70w8qQiW/0b5Pf8n7hq7ud3SZEOhzaAjQ6ggPNXRQ1czj1s8QUsUNt+3b
 o+uYC9kSpS67PxUs4QTPFyicMueZaGB+WT5Y+Cr5d4OJIwenzkmRhlAuR05W755T
 P5vbe4SJxih+FqcxfAWFJBFoRIRC49YxB/UXYpCIK4iSXpWVVNYearwzjJxywvzA
 QppQU+Y9IfvVPNHQtzxX
 =MK3/
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "A few final fixes:

  i915:
   - fix for potential Spectre vector in the new query uAPI
   - fix NULL pointer deref (FDO #106559)
   - DMI fix to hide LVDS for Radiant P845 (FDO #105468)

  amdgpu:
   - suspend/resume DC regression fix
   - underscan flicker fix on fiji
   - gamma setting fix after dpms

  omap:
   - fix oops regression

  core:
   - fix PSR timing

  dw-hdmi:
   - fix oops regression"

* tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux:
  drm/amd/display: Update color props when modeset is required
  drm/amd/display: Make atomic-check validate underscan changes
  drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense
  drm/amd/display: Fix BUG_ON during CRTC atomic check update
  drm/i915/query: nospec expects no more than an unsigned long
  drm/i915/query: Protect tainted function pointer lookup
  drm/i915/lvds: Move acpi lid notification registration to registration phase
  drm/i915: Disable LVDS on Radiant P845
  drm/omap: fix NULL deref crash with SDI displays
  drm/psr: Fix missed entry in PSR setup time table.
2018-06-02 15:24:45 -07:00
Dave Airlie
012cfaced0 Merge branch 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Two last minute DC fixes for 4.17.  A fix for underscan on fiji and
a fix for gamma settings getting after dpms.

* 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/display: Update color props when modeset is required
  drm/amd/display: Make atomic-check validate underscan changes
2018-06-03 06:13:57 +10:00
Linus Torvalds
4277e6b9fd Final MIPS fixes for 4.17
A final few MIPS fixes for 4.17:
 
  - Drop Lantiq gphy reboot/remove reset (4.14)
 
  - prctl(PR_SET_FP_MODE): Disallow PRE without FR (4.0)
 
  - ptrace(PTRACE_PEEKUSR): Fix 64-bit FGRs (3.15)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQS7lRNBWUYtqfDOVL41zuSGKxAj8gUCWxHFOwAKCRA1zuSGKxAj
 8i39AQCX9phkffpRDnA1e/MiGGeRZ5+f9FBOzuS1x2nzdUagoQD9FzWxdcx57Syj
 ye0kUtmc/wm8U6kz3qC3OInSeVuIEQg=
 =3NGM
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from James Hogan:
 "A final few MIPS fixes for 4.17:

   - drop Lantiq gphy reboot/remove reset (4.14)

   - prctl(PR_SET_FP_MODE): Disallow PRE without FR (4.0)

   - ptrace(PTRACE_PEEKUSR): Fix 64-bit FGRs (3.15)"

* tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: ptrace: Fix PTRACE_PEEKUSR requests for 64-bit FGRs
  MIPS: prctl: Disallow FRE without FR with PR_SET_FP_MODE requests
  MIPS: lantiq: gphy: Drop reboot/remove reset asserts
2018-06-02 10:12:23 -07:00
Linus Torvalds
7172a69c10 VFIO fix v4.17
- Revert a pfn page mapping optimization identified as introducing
    a bad page state regression (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJbEsWYAAoJECObm247sIsi1aUP/ROnej4AhAT8CI9/6YMAdejd
 M5aJVAGjsa3fcmGmm92IVkNsnYk9cm7YlmwvTVuOq8zoN0ZAuZvBkQwyfdqXoI8M
 ow1GQkv9kjZA+A1vH+A8HW3Liaf7wqPU0Db5nkypOYlaPPNqSAj2VUmbaiBcq+W+
 FEshsiIjshRr+YJxKvV1nCRQGlBPMPxMLqqiufVDJE9cSHwwul8lEBHRg1WRcfpN
 kKTO+RUEDT99B+HE77icq9l6IVQqnlGMXZ5/ODtpFwhIeQRo1eklspHdajJgUUnT
 ksQzrFXCgOnkFJt9pCoquPF/aXGhbTHZbEBudwobeNPZiI3Rqri1RlDcSaqdjZbT
 eJUvHt7xSPv0zbqClUOZmqr1QPEym901jA0NrFaoQLjjA1KpjEUIv7/FEKsaE2w8
 N2leTwCo04Atkp+gm9XKzZnsrBLaG5vo4pYJrhv6BTBLrJyHQMnLFjYQCzTaBsjM
 cnFyv9zlkztSqbLYB9cgSQRNQc8pxfckdPrLhVnrsissLF8Ce5XA3oKzjBu+0Z80
 PfvEeA//vzmReYTrm+M0i/gA3whvllEkUMWFMMqH93CtygmyxWrJuLDI+EVg1pbK
 ZnwaPnl9q12dPM3MtIefbqXiHSj5rmvHS966oppY/KSUcyM6iEU5XIOZYjfXkluh
 W5fYxvPyBE+AvtYwr0wa
 =Qb4S
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio

Pull VFIO fix from Alex Williamson:
 "Revert a pfn page mapping optimization identified as introducing a bad
  page state regression (Alex Williamson)"

* tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio:
  Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"
2018-06-02 10:08:45 -07:00
Linus Torvalds
6ac9f42cda Char/Misc driver fixes for 4.17-rc8
Here are 4 small bugfixes for some char/misc drivers.  Well, really 3
 fixes and one fix for one of those fixes due to problems found by 0-day.
 
 This resolves some reported issues with the hwtracing drivers, and a
 reported regression for the thunderbolt subsystem.  All of these have
 been in linux-next for a while now with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWxKMCw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymK2wCdEsDr7v19XalCGEUwrUlTiVM8Du0An2MkgogQ
 EzZn7+QsxTgLMYG4N9gl
 =Z00b
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are four small bugfixes for some char/misc drivers. Well, really
  three fixes and one fix for one of those fixes due to problems found
  by 0-day.

  This resolves some reported issues with the hwtracing drivers, and a
  reported regression for the thunderbolt subsystem. All of these have
  been in linux-next for a while now with no reported problems"

* tag 'char-misc-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  hwtracing: stm: fix build error on some arches
  intel_th: Use correct device when freeing buffers
  stm class: Use vmalloc for the master map
  thunderbolt: Handle NULL boot ACL entries properly
2018-06-02 10:05:45 -07:00
Linus Torvalds
34a8e640d1 IIO driver fixes for 4.17-rc8
Here are some old IIO driver fixes that were sitting in my tree for a
 few weeks.  Sorry about not getting them to you sooner.  They fix a
 number of small IIO driver issues that have been reported.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWxKN4g8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymkogCfbe2/iQJcT8kXD0s/73A/KqkKPksAn0NbVWyh
 HEroVoD7yDcdm2A+t36A
 =OWdB
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull IIO driver fixes from Greg KH:
 "Here are some old IIO driver fixes that were sitting in my tree for a
  few weeks. Sorry about not getting them to you sooner. They fix a
  number of small IIO driver issues that have been reported.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: adc: select buffer for at91-sama5d2_adc
  iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume
  iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels
  iio:kfifo_buf: check for uint overflow
  iio:buffer: make length types match kfifo types
  iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock
  iio: adc: stm32-dfsdm: fix successive oversampling settings
  iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ
2018-06-02 10:02:14 -07:00
Linus Torvalds
7fdf3e8616 Merge candidates for 4.17-rc
- bnxt netdev changes merged this cycle caused the bnxt RDMA driver to crash under
   certain situations
 - Arnd found (several, unfortunately) kconfig problems with the patches adding
   INFINIBAND_ADDR_TRANS. Reverting this last part, will fix it more fully
   outside -rc.
 - Subtle change in error code for a uapi function caused breakage in userspace.
   This was bug was subtly introduced cycle
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbEXg1AAoJEDht9xV+IJsax2YQAJjXnfQ9+QsbtT8sqFzGFLo3
 r5aqqwF6RkGDFsovVDRH/9S62JjeqJRQA+4ykooajD+6mXU06Sf3p4tcoco0Dqhn
 66b/lkdPFzXSytlne7AnUnA3xkKG4u5jYGReSryIQjXu29iwt8scgiiqt8nX9Gzi
 eC9U2UQn5ZF65yRo4V/UGuHjdnUXiPYfg2Ff5YqLUxdL0XE42ftpuiR3Xuzgfj8c
 /rdqDwnvdViQwPeTSNTJoZzeV+49WKp9BP+lzsCeIXvzuzY1aOd94z0i7fq342hB
 jXpz6PtRTSBkZ4xBuvtopnoz0HQXMv7kQFMobkyjaP3qXcKz6Dx9d3QvWkLq8qdQ
 D4MPYjVCONJAJvXppxuNSzyz0lg5EaSICWeZhkr5P68Ja0fptiXAumVCTr/kEifV
 Yz8y0xsEVMIxBy3C9HOCe4khzWKqd9Uoo4/VDM+GdhKHIpUCnnKTte9vIZcYg1JL
 1doCithudNX7KF4K79JJqADwtFhXoPaHt3XF9YRhgouDN9gC+/hB5ZZvD4jjcqhF
 tLfRh1+LeK6QuVTE//ON/OS5xGgEzcZVLUJSUs/NdB+5YAXREz/2+IoK/0+rFiP0
 wlHlgUk9ZQx3j7La5iiPrRdK+h6u0LUYd02XuMevnnswGqv+DWrxDnFy/icovPCt
 AxHZ0KxdLJeo2euOd755
 =Ppe+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Just three small last minute regressions that were found in the last
  week. The Broadcom fix is a bit big for rc7, but since it is fixing
  driver crash regressions that were merged via netdev into rc1, I am
  sending it.

   - bnxt netdev changes merged this cycle caused the bnxt RDMA driver
     to crash under certain situations

   - Arnd found (several, unfortunately) kconfig problems with the
     patches adding INFINIBAND_ADDR_TRANS. Reverting this last part,
     will fix it more fully outside -rc.

   - Subtle change in error code for a uapi function caused breakage in
     userspace. This was bug was subtly introduced cycle"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/core: Fix error code for invalid GID entry
  IB: Revert "remove redundant INFINIBAND kconfig dependencies"
  RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes
2018-06-02 09:55:44 -07:00
Linus Torvalds
a36b796890 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "A documentation bugfix and a MAINTAINERS addition"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: ocores: update HDL sources URL
  i2c: xlp9xx: Add MAINTAINERS entry
2018-06-02 09:52:22 -07:00
Linus Torvalds
0938a8f52d Merge branch 'akpm' (patches from Andrew)
Merge two fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: fix the NULL mapping case in __isolate_lru_page()
  mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()
2018-06-02 09:44:15 -07:00
Hugh Dickins
145e1a71e0 mm: fix the NULL mapping case in __isolate_lru_page()
George Boole would have noticed a slight error in 4.16 commit
69d763fc6d ("mm: pin address_space before dereferencing it while
isolating an LRU page").  Fix it, to match both the comment above it,
and the original behaviour.

Although anonymous pages are not marked PageDirty at first, we have an
old habit of calling SetPageDirty when a page is removed from swap
cache: so there's a category of ex-swap pages that are easily
migratable, but were inadvertently excluded from compaction's async
migration in 4.16.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805302014001.12558@eggly.anvils
Fixes: 69d763fc6d ("mm: pin address_space before dereferencing it while isolating an LRU page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by:  Ivan Kalvachev <ikalvachev@gmail.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-02 09:33:47 -07:00
Hugh Dickins
2d077d4b59 mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()
Swapping load on huge=always tmpfs (with khugepaged tuned up to be very
eager, but I'm not sure that is relevant) soon hung uninterruptibly,
waiting for page lock in shmem_getpage_gfp()'s find_lock_entry(), most
often when "cp -a" was trying to write to a smallish file.  Debug showed
that the page in question was not locked, and page->mapping NULL by now,
but page->index consistent with having been in a huge page before.

Reproduced in minutes on a 4.15 kernel, even with 4.17's 605ca5ede7
("mm/huge_memory.c: reorder operations in __split_huge_page_tail()") added
in; but took hours to reproduce on a 4.17 kernel (no idea why).

The culprit proved to be the __ClearPageDirty() on tails beyond i_size in
__split_huge_page(): the non-atomic __bitoperation may have been safe when
4.8's baa355fd33 ("thp: file pages support for split_huge_page()")
introduced it, but liable to erase PageWaiters after 4.10's 6290602709
("mm: add PageWaiters indicating tasks are waiting for a page bit").

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805291841070.3197@eggly.anvils
Fixes: 6290602709 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-02 09:33:47 -07:00
Alex Williamson
89c29def6b Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"
Bisection by Amadeusz Sławiński implicates this commit leading to bad
page state issues after VM shutdown, likely due to unbalanced page
references.  The original commit was intended only as a performance
improvement, therefore revert for offline rework.

Link: https://lkml.org/lkml/2018/6/2/97
Fixes: 356e88ebe4 ("vfio/type1: Improve memory pinning process for raw PFN mapping")
Cc: Jason Cai (Xiang Feng) <jason.cai@linux.alibaba.com>
Reported-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-02 08:41:44 -06:00
David S. Miller
cd075ce467 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-06-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) BPF uapi fix in struct bpf_prog_info and struct bpf_map_info in
   order to fix offsets on 32 bit archs.

This will have a minor merge conflict with net-next which has the
__u32 gpl_compatible:1 bitfield in struct bpf_prog_info at this
location. Resolution is to use the gpl_compatible member.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-02 08:07:52 -04:00
Daniel Borkmann
36f9814a49 bpf: fix uapi hole for 32 bit compat applications
In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the
case of struct bpf_map_info but also struct bpf_prog_info. In net-next
commit b85fab0e67 ("bpf: Add gpl_compatible flag to struct bpf_prog_info")
added a bitfield into it to expose some flags related to programs. Thus,
add an unnamed __u32 bitfield for both so that alignment keeps the same
in both 32 and 64 bit cases, and can be naturally extended from there
as in b85fab0e67.

Before:

  # file test.o
  test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
  # pahole test.o
  struct bpf_map_info {
	__u32                      type;                 /*     0     4 */
	__u32                      id;                   /*     4     4 */
	__u32                      key_size;             /*     8     4 */
	__u32                      value_size;           /*    12     4 */
	__u32                      max_entries;          /*    16     4 */
	__u32                      map_flags;            /*    20     4 */
	char                       name[16];             /*    24    16 */
	__u32                      ifindex;              /*    40     4 */
	__u64                      netns_dev;            /*    44     8 */
	__u64                      netns_ino;            /*    52     8 */

	/* size: 64, cachelines: 1, members: 10 */
	/* padding: 4 */
  };

After (same as on 64 bit):

  # file test.o
  test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
  # pahole test.o
  struct bpf_map_info {
	__u32                      type;                 /*     0     4 */
	__u32                      id;                   /*     4     4 */
	__u32                      key_size;             /*     8     4 */
	__u32                      value_size;           /*    12     4 */
	__u32                      max_entries;          /*    16     4 */
	__u32                      map_flags;            /*    20     4 */
	char                       name[16];             /*    24    16 */
	__u32                      ifindex;              /*    40     4 */

	/* XXX 4 bytes hole, try to pack */

	__u64                      netns_dev;            /*    48     8 */
	__u64                      netns_ino;            /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */

	/* size: 64, cachelines: 1, members: 10 */
	/* sum members: 60, holes: 1, sum holes: 4 */
  };

Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Fixes: 52775b33bb ("bpf: offload: report device information about offloaded maps")
Fixes: 675fc275a3 ("bpf: offload: report device information for offloaded programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-01 20:41:35 -07:00
Daniele Palmas
9f7c728332 net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
Testing Telit LM940 with ICMP packets > 14552 bytes revealed that
the modem needs FLAG_SEND_ZLP to properly work, otherwise the cdc
mbim data interface won't be anymore responsive.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01 14:01:42 -04:00
David S. Miller
8a11801581 Merge branch 'tunnel-mtus'
Nicolas Dichtel says:

====================
ip[6] tunnels: fix mtu calculations

The first patch restores the possibility to bind an ip4 tunnel to an
interface whith a large mtu.
The second patch was spotted after the first fix. I also target it to net
because it fixes the max mtu value that can be used for ipv6 tunnels.

v2: remove the 0xfff8 in ip_tunnel_newlink()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01 13:56:31 -04:00
Nicolas Dichtel
f7ff1fde94 ip6_tunnel: remove magic mtu value 0xFFF8
I don't know where this value comes from (probably a copy and paste and
paste and paste ...).
Let's use standard values which are a bit greater.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01 13:56:30 -04:00
Nicolas Dichtel
82612de1c9 ip_tunnel: restore binding to ifaces with a large mtu
After commit f6cc9c054e, the following conf is broken (note that the
default loopback mtu is 65536, ie IP_MAX_MTU + 1):

$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
add tunnel "gre0" failed: Invalid argument
$ ip l a type dummy
$ ip l s dummy1 up
$ ip l s dummy1 mtu 65535
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
add tunnel "gre0" failed: Invalid argument

dev_set_mtu() doesn't allow to set a mtu which is too large.
First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
the magic value 0xFFF8 and use IP_MAX_MTU instead.
0xFFF8 seems to be there for ages, I don't know why this value was used.

With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
$ ip l s dummy1 mtu 66000
After that patch, it's also possible to bind an ip tunnel on that kind of
interface.

CC: Petr Machata <petrm@mellanox.com>
CC: Ido Schimmel <idosch@mellanox.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Fixes: f6cc9c054e ("ip_tunnel: Emit events for post-register MTU changes")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01 13:56:29 -04:00
David S. Miller
ccfde6e27d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2018-05-31

1) Avoid possible overflow of the offset variable
   in  _decode_session6(), this fixes an infinite
   lookp there. From Eric Dumazet.

2) We may use an error pointer in the error path of
   xfrm_bundle_create(). Fix this by returning this
   pointer directly to the caller.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01 13:25:41 -04:00
Damien Thébault
a95691bc54 net: dsa: b53: Add BCM5389 support
This patch adds support for the BCM5389 switch connected through MDIO.

Signed-off-by: Damien Thébault <damien.thebault@vitec.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01 11:15:42 -04:00
Kirill Tkhai
eb7f54b90b kcm: Fix use-after-free caused by clonned sockets
(resend for properly queueing in patchwork)

kcm_clone() creates kernel socket, which does not take net counter.
Thus, the net may die before the socket is completely destructed,
i.e. kcm_exit_net() is executed before kcm_done().

Reported-by: syzbot+5f1a04e374a635efc426@syzkaller.appspotmail.com
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01 10:28:07 -04:00
Alexander Duyck
664088f8d6 net-sysfs: Fix memory leak in XPS configuration
This patch reorders the error cases in showing the XPS configuration so
that we hold off on memory allocation until after we have verified that we
can support XPS on a given ring.

Fixes: 184c449f91 ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31 23:02:42 -04:00
Ondřej Hlavatý
16e6653c82 ixgbe: fix parsing of TC actions for HW offload
The previous code was optimistic, accepting the offload of whole action
chain when there was a single known action (drop/redirect). This results
in offloading a rule which should not be offloaded, because its behavior
cannot be reproduced in the hardware.

For example:

$ tc filter add dev eno1 parent ffff: protocol ip \
    u32 ht 800: order 1 match tcp src 42 FFFF \
    action mirred egress mirror dev enp1s16 pipe \
    drop

The controller is unable to mirror the packet to a VF, but still
offloads the rule by dropping the packet.

Change the approach of the function to a pessimistic one, rejecting the
chain when an unknown action is found. This is better suited for future
extensions.

Note that both recognized actions always return TC_ACT_SHOT, therefore
it is safe to ignore actions behind them.

Signed-off-by: Ondřej Hlavatý <ohlavaty@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31 23:01:00 -04:00