linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-12 23:46:40 +07:00

Author	SHA1	Message	Date
Vincent Guittot	287cdaac57	sched/fair: Fix scale_rt_capacity() for SMT Since commit: `523e979d31` ("sched/core: Use PELT for scale_rt_capacity()") scale_rt_capacity() returns the remaining capacity and not a scale factor to apply on cpu_capacity_orig. arch_scale_cpu() is directly called by scale_rt_capacity() so we must take the sched_domain argument. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `523e979d31` ("sched/core: Use PELT for scale_rt_capacity()") Link: http://lkml.kernel.org/r/20180904093626.GA23936@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 10:13:47 +02:00
Steve Muckle	d0cdb3ce88	sched/fair: Fix vruntime_normalized() for remote non-migration wakeup When a task which previously ran on a given CPU is remotely queued to wake up on that same CPU, there is a period where the task's state is TASK_WAKING and its vruntime is not normalized. This is not accounted for in vruntime_normalized() which will cause an error in the task's vruntime if it is switched from the fair class during this time. For example if it is boosted to RT priority via rt_mutex_setprio(), rq->min_vruntime will not be subtracted from the task's vruntime but it will be added again when the task returns to the fair class. The task's vruntime will have been erroneously doubled and the effective priority of the task will be reduced. Note this will also lead to inflation of all vruntimes since the doubled vruntime value will become the rq's min_vruntime when other tasks leave the rq. This leads to repeated doubling of the vruntime and priority penalty. Fix this by recognizing a WAKING task's vruntime as normalized only if sched_remote_wakeup is true. This indicates a migration, in which case the vruntime would have been normalized in migrate_task_rq_fair(). Based on a similar patch from John Dias <joaodias@google.com>. Suggested-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Steve Muckle <smuckle@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Redpath <Chris.Redpath@arm.com> Cc: John Dias <joaodias@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miguel de Dios <migueldedios@google.com> Cc: Morten Rasmussen <Morten.Rasmussen@arm.com> Cc: Patrick Bellasi <Patrick.Bellasi@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: kernel-team@android.com Fixes: `b5179ac70d` ("sched/fair: Prepare to fix fairness problems on migration") Link: http://lkml.kernel.org/r/20180831224217.169476-1-smuckle@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 10:13:47 +02:00
Vincent Guittot	12b04875d6	sched/pelt: Fix update_blocked_averages() for RT and DL classes update_blocked_averages() is called to periodiccally decay the stalled load of idle CPUs and to sync all loads before running load balance. When cfs rq is idle, it trigs a load balance during pick_next_task_fair() in order to potentially pull tasks and to use this newly idle CPU. This load balance happens whereas prev task from another class has not been put and its utilization updated yet. This may lead to wrongly account running time as idle time for RT or DL classes. Test that no RT or DL task is running when updating their utilization in update_blocked_averages(). We still update RT and DL utilization instead of simply skipping them to make sure that all metrics are synced when used during load balance. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `371bf42732` ("sched/rt: Add rt_rq utilization tracking") Fixes: `3727e0e163` ("sched/dl: Add dl_rq utilization tracking") Link: http://lkml.kernel.org/r/1535728975-22799-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 10:13:46 +02:00
Srikar Dronamraju	e5e96fafd9	sched/topology: Set correct NUMA topology type With the following commit: `051f3ca02e` ("sched/topology: Introduce NUMA identity node sched domain") the scheduler introduced a new NUMA level. However this leads to the NUMA topology on 2 node systems to not be marked as NUMA_DIRECT anymore. After this commit, it gets reported as NUMA_BACKPLANE, because sched_domains_numa_level is now 2 on 2 node systems. Fix this by allowing setting systems that have up to 2 NUMA levels as NUMA_DIRECT. While here remove code that assumes that level can be 0. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andre Wild <wild@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org> Fixes: `051f3ca02e` "Introduce NUMA identity node sched domain" Link: http://lkml.kernel.org/r/1533920419-17410-1-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 10:13:45 +02:00
Jiada Wang	e73e81975f	sched/debug: Fix potential deadlock when writing to sched_features The following lockdep report can be triggered by writing to /sys/kernel/debug/sched_features: ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18 Not tainted ------------------------------------------------------ sh/3358 is trying to acquire lock: 000000004ad3989d (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0x14/0x30 but task is already holding lock: 00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&sb->s_type->i_mutex_key#3){+.+.}: lock_acquire+0xb8/0x148 down_write+0xac/0x140 start_creating+0x5c/0x168 debugfs_create_dir+0x18/0x220 opp_debug_register+0x8c/0x120 _add_opp_dev+0x104/0x1f8 dev_pm_opp_get_opp_table+0x174/0x340 _of_add_opp_table_v2+0x110/0x760 dev_pm_opp_of_add_table+0x5c/0x240 dev_pm_opp_of_cpumask_add_table+0x5c/0x100 cpufreq_init+0x160/0x430 cpufreq_online+0x1cc/0xe30 cpufreq_add_dev+0x78/0x198 subsys_interface_register+0x168/0x270 cpufreq_register_driver+0x1c8/0x278 dt_cpufreq_probe+0xdc/0x1b8 platform_drv_probe+0xb4/0x168 driver_probe_device+0x318/0x4b0 __device_attach_driver+0xfc/0x1f0 bus_for_each_drv+0xf8/0x180 __device_attach+0x164/0x200 device_initial_probe+0x10/0x18 bus_probe_device+0x110/0x178 device_add+0x6d8/0x908 platform_device_add+0x138/0x3d8 platform_device_register_full+0x1cc/0x1f8 cpufreq_dt_platdev_init+0x174/0x1bc do_one_initcall+0xb8/0x310 kernel_init_freeable+0x4b8/0x56c kernel_init+0x10/0x138 ret_from_fork+0x10/0x18 -> #2 (opp_table_lock){+.+.}: lock_acquire+0xb8/0x148 __mutex_lock+0x104/0xf50 mutex_lock_nested+0x1c/0x28 _of_add_opp_table_v2+0xb4/0x760 dev_pm_opp_of_add_table+0x5c/0x240 dev_pm_opp_of_cpumask_add_table+0x5c/0x100 cpufreq_init+0x160/0x430 cpufreq_online+0x1cc/0xe30 cpufreq_add_dev+0x78/0x198 subsys_interface_register+0x168/0x270 cpufreq_register_driver+0x1c8/0x278 dt_cpufreq_probe+0xdc/0x1b8 platform_drv_probe+0xb4/0x168 driver_probe_device+0x318/0x4b0 __device_attach_driver+0xfc/0x1f0 bus_for_each_drv+0xf8/0x180 __device_attach+0x164/0x200 device_initial_probe+0x10/0x18 bus_probe_device+0x110/0x178 device_add+0x6d8/0x908 platform_device_add+0x138/0x3d8 platform_device_register_full+0x1cc/0x1f8 cpufreq_dt_platdev_init+0x174/0x1bc do_one_initcall+0xb8/0x310 kernel_init_freeable+0x4b8/0x56c kernel_init+0x10/0x138 ret_from_fork+0x10/0x18 -> #1 (subsys mutex#6){+.+.}: lock_acquire+0xb8/0x148 __mutex_lock+0x104/0xf50 mutex_lock_nested+0x1c/0x28 subsys_interface_register+0xd8/0x270 cpufreq_register_driver+0x1c8/0x278 dt_cpufreq_probe+0xdc/0x1b8 platform_drv_probe+0xb4/0x168 driver_probe_device+0x318/0x4b0 __device_attach_driver+0xfc/0x1f0 bus_for_each_drv+0xf8/0x180 __device_attach+0x164/0x200 device_initial_probe+0x10/0x18 bus_probe_device+0x110/0x178 device_add+0x6d8/0x908 platform_device_add+0x138/0x3d8 platform_device_register_full+0x1cc/0x1f8 cpufreq_dt_platdev_init+0x174/0x1bc do_one_initcall+0xb8/0x310 kernel_init_freeable+0x4b8/0x56c kernel_init+0x10/0x138 ret_from_fork+0x10/0x18 -> #0 (cpu_hotplug_lock.rw_sem){++++}: __lock_acquire+0x203c/0x21d0 lock_acquire+0xb8/0x148 cpus_read_lock+0x58/0x1c8 static_key_enable+0x14/0x30 sched_feat_write+0x314/0x428 full_proxy_write+0xa0/0x138 __vfs_write+0xd8/0x388 vfs_write+0xdc/0x318 ksys_write+0xb4/0x138 sys_write+0xc/0x18 __sys_trace_return+0x0/0x4 other info that might help us debug this: Chain exists of: cpu_hotplug_lock.rw_sem --> opp_table_lock --> &sb->s_type->i_mutex_key#3 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#3); lock(opp_table_lock); lock(&sb->s_type->i_mutex_key#3); lock(cpu_hotplug_lock.rw_sem); * DEADLOCK * 2 locks held by sh/3358: #0: 00000000a8c4b363 (sb_writers#10){.+.+}, at: vfs_write+0x238/0x318 #1: 00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428 stack backtrace: CPU: 5 PID: 3358 Comm: sh Not tainted 4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18 Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT) Call trace: dump_backtrace+0x0/0x288 show_stack+0x14/0x20 dump_stack+0x13c/0x1ac print_circular_bug.isra.10+0x270/0x438 check_prev_add.constprop.16+0x4dc/0xb98 __lock_acquire+0x203c/0x21d0 lock_acquire+0xb8/0x148 cpus_read_lock+0x58/0x1c8 static_key_enable+0x14/0x30 sched_feat_write+0x314/0x428 full_proxy_write+0xa0/0x138 __vfs_write+0xd8/0x388 vfs_write+0xdc/0x318 ksys_write+0xb4/0x138 sys_write+0xc/0x18 __sys_trace_return+0x0/0x4 This is because when loading the cpufreq_dt module we first acquire cpu_hotplug_lock.rw_sem lock, then in cpufreq_init(), we are taking the &sb->s_type->i_mutex_key lock. But when writing to /sys/kernel/debug/sched_features, the cpu_hotplug_lock.rw_sem lock depends on the &sb->s_type->i_mutex_key lock. To fix this bug, reverse the lock acquisition order when writing to sched_features, this way cpu_hotplug_lock.rw_sem no longer depends on &sb->s_type->i_mutex_key. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Jiada Wang <jiada_wang@mentor.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Eugeniu Rosca <erosca@de.adit-jv.com> Cc: George G. Davis <george_davis@mentor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180731121222.26195-1-jiada_wang@mentor.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 10:13:45 +02:00
Thomas Hellstrom	e13e2366d8	locking/mutex: Fix mutex debug call and ww_mutex documentation The following commit: `08295b3b5b` ("Implement an algorithm choice for Wound-Wait mutexes") introduced a reference in the documentation to a function that was removed in an earlier commit. It also forgot to remove a call to debug_mutex_add_waiter() which is now unconditionally called by __mutex_add_waiter(). Fix those bugs. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Fixes: `08295b3b5b` ("Implement an algorithm choice for Wound-Wait mutexes") Link: http://lkml.kernel.org/r/20180903140708.2401-1-thellstrom@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 10:05:10 +02:00
Borislav Petkov	34e12b864e	jump_label: Use static_key_linked() accessor ... instead of open-coding it, in static_key_mod(). No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180909114252.17575-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 08:23:37 +02:00
Ingo Molnar	fa94351b56	perf/urgent fixes: Kernel: - Modify breakpoint fixes (Jiri Olsa) perf annotate: - Fix parsing aarch64 branch instructions after objdump update (Kim Phillips) - Fix parsing indirect calls in 'perf annotate' (Martin Liška) perf probe: - Ignore SyS symbols irrespective of endianness on PowerPC (Sandipan Das) perf trace: - Fix include path for asm-generic/unistd.h on arm64 (Kim Phillips) Core libraries: - Fix potential null pointer dereference in perf_evsel__new_idx() (Hisao Tanabe) - Use fixed size string for comms instead of scanf("%m"), that is not present in the bionic libc and leads to a crash (Chris Phlipot) - Fix bad memory access in trace info on 32-bit systems, we were reading 8 bytes from a 4-byte long variable when saving the command line in the perf.data file. (Chris Phlipot) Build system: - Streamline bpf examples and headers installation, clarifying some install messages. (Arnaldo Carvalho de Melo) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCW41FnAAKCRCyPKLppCJ+ J8MCAP4/RC5GwNrO5KYJ+G1iYb7QiNq9X/wsM7jCBlqWnTH+zgD9GYPIT3WQWKBN Rv94N4PNsYP4cpP7hTzWG0ar7p70owo= =+ODq -----END PGP SIGNATURE----- Merge tag 'perf-urgent-for-mingo-4.19-20180903' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: Kernel: - Modify breakpoint fixes (Jiri Olsa) perf annotate: - Fix parsing aarch64 branch instructions after objdump update (Kim Phillips) - Fix parsing indirect calls in 'perf annotate' (Martin Liška) perf probe: - Ignore SyS symbols irrespective of endianness on PowerPC (Sandipan Das) perf trace: - Fix include path for asm-generic/unistd.h on arm64 (Kim Phillips) Core libraries: - Fix potential null pointer dereference in perf_evsel__new_idx() (Hisao Tanabe) - Use fixed size string for comms instead of scanf("%m"), that is not present in the bionic libc and leads to a crash (Chris Phlipot) - Fix bad memory access in trace info on 32-bit systems, we were reading 8 bytes from a 4-byte long variable when saving the command line in the perf.data file. (Chris Phlipot) Build system: - Streamline bpf examples and headers installation, clarifying some install messages. (Arnaldo Carvalho de Melo) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-09 21:36:31 +02:00
Linus Torvalds	3567994a05	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping fixes from Thomas Gleixner: "Two fixes for timekeeping: - Revert to the previous kthread based update, which is unfortunately required due to lock ordering issues. The removal caused boot failures on old Core2 machines. Add a proper comment why the thread needs to stay to prevent accidental removal in the future. - Fix a silly typo in a function declaration" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: Revert "Remove kthread" timekeeping: Fix declaration of read_persistent_wall_and_boot_offset()	2018-09-09 06:55:27 -07:00
Linus Torvalds	e0a0d05848	Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cpu hotplug fixes from Thomas Gleixner: "Two fixes for the hotplug state machine code: - Move the misplaces smb() in the hotplug thread function to the proper place, otherwise a half update control struct could be observed - Prevent state corruption on error rollback, which causes the state to advance by one and as a consequence skip it in the bringup sequence" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Prevent state corruption on error rollback cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()	2018-09-09 06:48:06 -07:00
Christoph Hellwig	dc3c05504d	dma-mapping: remove dma_deconfigure This goes through a lot of hooks just to call arch_teardown_dma_ops. Replace it with a direct call instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>	2018-09-08 11:19:28 +02:00
Christoph Hellwig	ccf640f4c9	dma-mapping: remove dma_configure There is no good reason for this indirection given that the method always exists. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>	2018-09-08 11:19:20 +02:00
Peter Zijlstra	e2c631ba75	clocksource: Revert "Remove kthread" I turns out that the silly spawn kthread from worker was actually needed. clocksource_watchdog_kthread() cannot be called directly from clocksource_watchdog_work(), because clocksource_select() calls timekeeping_notify() which uses stop_machine(). One cannot use stop_machine() from a workqueue() due lock inversions wrt CPU hotplug. Revert the patch but add a comment that explain why we jump through such apparently silly hoops. Fixes: `7197e77abc` ("clocksource: Remove kthread") Reported-by: Siegfried Metz <frame@mailbox.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Niklas Cassel <niklas.cassel@linaro.org> Tested-by: Kevin Shanahan <kevin@shanahan.id.au> Tested-by: viktor_jaegerskuepper@freenet.de Tested-by: Siegfried Metz <frame@mailbox.org> Cc: rafael.j.wysocki@intel.com Cc: len.brown@intel.com Cc: diego.viola@gmail.com Cc: rui.zhang@intel.com Cc: bjorn.andersson@linaro.org Link: https://lkml.kernel.org/r/20180905084158.GR24124@hirez.programming.kicks-ass.net	2018-09-06 23:38:35 +02:00
Igor Stoppa	0d42d73a37	seccomp: remove unnecessary unlikely() WARN_ON() already contains an unlikely(), so it's not necessary to wrap it into another. Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: linux-security-module@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: James Morris <james.morris@microsoft.com>	2018-09-06 13:29:59 -07:00
Linus Torvalds	be65e2595b	This fixes two bugs: - The first one is a side effect caused by using SRCU for rcuidle tracepoints. It seems that the perf was depending on the rcuidle tracepoints to make RCU watch when it wasn't. The real fix will bet to have perf use SRCU instead of depending on RCU watching, but that can't be done until SRCU is safe to use in NMI context (Paul's working on that). - The second bug fix is for a bug that's been periodically making my tests fail randomly for some time. I haven't had time to track it down, but finally have. It has to do with stressing NMIs (via perf) while enabling or disabling ftrace function handling with lockdep enabled. If an interrupt happens and just as it returns, it sets lockdep back to "interrupts enabled" but before it returns an NMI is triggered, and if this happens while printk_nmi_enter has a breakpoint attached to it (because ftrace is converting it to or from nop to call fentry), the breakpoint trap also calls into lockdep, and since returning from the NMI to a interrupt handler, interrupts were disabled when the NMI went off, lockdep keeps its state as interrupts disabled when it returns back from the interrupt handler where interrupts are enabled. This causes lockdep_assert_irqs_enabled() to trigger a false positive. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW5FM2hQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qqdLAP4/M46VXnwt8hZ0g7K0Cc4M3MwkLnNT xWN3yNSydd/VTAEA13JXiJoKuGTCrYLet+xcvhQxoGsITUrgL+ADJMRy9ww= =h3hB -----END PGP SIGNATURE----- Merge tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This fixes two annoying bugs: - The first one is a side effect caused by using SRCU for rcuidle tracepoints. It seems that the perf was depending on the rcuidle tracepoints to make RCU watch when it wasn't. The real fix will be to have perf use SRCU instead of depending on RCU watching, but that can't be done until SRCU is safe to use in NMI context (Paul's working on that). - The second bug fix is for a bug that's been periodically making my tests fail randomly for some time. I haven't had time to track it down, but finally have. It has to do with stressing NMIs (via perf) while enabling or disabling ftrace function handling with lockdep enabled. If an interrupt happens and just as it returns, it sets lockdep back to "interrupts enabled" but before it returns an NMI is triggered, and if this happens while printk_nmi_enter has a breakpoint attached to it (because ftrace is converting it to or from nop to call fentry), the breakpoint trap also calls into lockdep, and since returning from the NMI to a interrupt handler, interrupts were disabled when the NMI went off, lockdep keeps its state as interrupts disabled when it returns back from the interrupt handler where interrupts are enabled. This causes lockdep_assert_irqs_enabled() to trigger a false positive" * tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: printk/tracing: Do not trace printk_nmi_enter() tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints	2018-09-06 09:06:49 -07:00
Steven Rostedt (VMware)	d1c392c9e2	printk/tracing: Do not trace printk_nmi_enter() I hit the following splat in my tests: ------------[ cut here ]------------ IRQs not enabled as expected WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 EIP: tick_nohz_idle_enter+0x44/0x8c Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00 75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0 e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007 ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296 CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0 Call Trace: do_idle+0x33/0x202 cpu_startup_entry+0x61/0x63 start_secondary+0x18e/0x1ed startup_32_smp+0x164/0x168 irq event stamp: 18773830 hardirqs last enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10 hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10 softirqs last enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b ---[ end trace b7c64aa79e17954a ]--- After a bit of debugging, I found what was happening. This would trigger when performing "perf" with a high NMI interrupt rate, while enabling and disabling function tracer. Ftrace uses breakpoints to convert the nops at the start of functions to calls to the function trampolines. The breakpoint traps disable interrupts and this makes calls into lockdep via the trace_hardirqs_off_thunk in the entry.S code. What happens is the following: do_idle { [interrupts enabled] <interrupt> [interrupts disabled] TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET test if pt_regs say return to interrupts enabled [yes] TRACE_IRQS_ON [lockdep says irqs are on] <nmi> nmi_enter() { printk_nmi_enter() [traced by ftrace] [ hit ftrace breakpoint ] <breakpoint exception> TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET [return from breakpoint] test if pt_regs say interrupts enabled [no] [iret back to interrupt] [iret back to code] tick_nohz_idle_enter() { lockdep_assert_irqs_enabled() [lockdep say no!] Although interrupts are indeed enabled, lockdep thinks it is not, and since we now do asserts via lockdep, it gives a false warning. The issue here is that printk_nmi_enter() is called before lockdep_off(), which disables lockdep (for this reason) in NMIs. By simply not allowing ftrace to see printk_nmi_enter() (via notrace annotation) we keep lockdep from getting confused. Cc: stable@vger.kernel.org Fixes: `42a0bb3f71` ("printk/nmi: generic solution for safe printk in NMI") Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-09-06 11:24:05 -04:00
Thomas Gleixner	69fa6eb7d6	cpu/hotplug: Prevent state corruption on error rollback When a teardown callback fails, the CPU hotplug code brings the CPU back to the previous state. The previous state becomes the new target state. The rollback happens in undo_cpu_down() which increments the state unconditionally even if the state is already the same as the target. As a consequence the next CPU hotplug operation will start at the wrong state. This is easily to observe when __cpu_disable() fails. Prevent the unconditional undo by checking the state vs. target before incrementing state and fix up the consequently wrong conditional in the unplug code which handles the failure of the final CPU take down on the control CPU side. Fixes: `4dddfb5faa` ("smp/hotplug: Rewrite AP state machine core") Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: josh@joshtriplett.org Cc: peterz@infradead.org Cc: jiangshanlai@gmail.com Cc: dzickus@redhat.com Cc: brendan.jackman@arm.com Cc: malat@debian.org Cc: sramana@codeaurora.org Cc: linux-arm-msm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1809051419580.1416@nanos.tec.linutronix.de ----	2018-09-06 15:21:38 +02:00
Neeraj Upadhyay	f8b7530aa0	cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun() The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the load of st->should_run to prevent reordering of the later load/stores w.r.t. the load of st->should_run. Fixes: `4dddfb5faa` ("smp/hotplug: Rewrite AP state machine core") Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infraded.org> Cc: josh@joshtriplett.org Cc: peterz@infradead.org Cc: jiangshanlai@gmail.com Cc: dzickus@redhat.com Cc: brendan.jackman@arm.com Cc: malat@debian.org Cc: mojha@codeaurora.org Cc: sramana@codeaurora.org Cc: linux-arm-msm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neeraju@codeaurora.org	2018-09-06 15:21:37 +02:00
Alexei Starovoitov	a9c676bc8f	bpf/verifier: fix verifier instability Edward Cree says: In check_mem_access(), for the PTR_TO_CTX case, after check_ctx_access() has supplied a reg_type, the other members of the register state are set appropriately. Previously reg.range was set to 0, but as it is in a union with reg.map_ptr, which is larger, upper bytes of the latter were left in place. This then caused the memcmp() in regsafe() to fail, preventing some branches from being pruned (and occasionally causing the same program to take a varying number of processed insns on repeated verifier runs). Fix the instability by clearing bpf_reg_state in __mark_reg_[un]known() Fixes: `f1174f77b5` ("bpf/verifier: rework value tracking") Debugged-by: Edward Cree <ecree@solarflare.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-09-05 22:21:00 -07:00
David S. Miller	36302685f5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2018-09-04 21:33:03 -07:00
Linus Torvalds	0e9b103950	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: nilfs2: convert to SPDX license tags drivers/dax/device.c: convert variable to vm_fault_t type lib/Kconfig.debug: fix three typos in help text checkpatch: add __ro_after_init to known $Attribute mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name memory_hotplug: fix kernel_panic on offline page processing checkpatch: add optional static const to blank line declarations test ipc/shm: properly return EIDRM in shm_lock() mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported. mm/util.c: improve kvfree() kerneldoc tools/vm/page-types.c: fix "defined but not used" warning tools/vm/slabinfo.c: fix sign-compare warning kmemleak: always register debugfs file mm: respect arch_dup_mmap() return value mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm(). mm: memcontrol: print proper OOM header when no eligible victim left	2018-09-04 17:01:11 -07:00
Nadav Amit	1ed0cc5a01	mm: respect arch_dup_mmap() return value Commit `d70f2a14b7` ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") ignored the return value of arch_dup_mmap(). As a result, on x86, a failure to duplicate the LDT (e.g. due to memory allocation error) would leave the duplicated memory mapping in an inconsistent state. Fix by using the return value, as it was before the change. Link: http://lkml.kernel.org/r/20180823051229.211856-1-namit@vmware.com Fixes: `d70f2a14b7` ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-09-04 16:45:02 -07:00
Linus Torvalds	28619527b8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Must perform TXQ teardown before unregistering interfaces in mac80211, from Toke Høiland-Jørgensen. 2) Don't allow creating mac80211_hwsim with less than one channel, from Johannes Berg. 3) Division by zero in cfg80211, fix from Johannes Berg. 4) Fix endian issue in tipc, from Haiqing Bai. 5) BPF sockmap use-after-free fixes from Daniel Borkmann. 6) Spectre-v1 in mac80211_hwsim, from Jinbum Park. 7) Missing rhashtable_walk_exit() in tipc, from Cong Wang. 8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when kvzalloc() tries to use kmalloc() pages. From Eric Dumazet. 9) Fix deadlock in hv_netvsc, from Dexuan Cui. 10) Do not restart timewait timer on RST, from Florian Westphal. 11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev. 12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu. 13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai. 14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu. 15) Use-after-free in act_ife, fix from Cong Wang. 16) Missing hold to meta module in act_ife, from Vlad Buslov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits) net: phy: sfp: Handle unimplemented hwmon limits and alarms net: sched: action_ife: take reference to meta module act_ife: fix a potential use-after-free net/mlx5: Fix SQ offset in QPs with small RQ tipc: correct spelling errors for tipc_topsrv_queue_evt() comments tipc: correct spelling errors for struct tipc_bc_base's comment bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA. bnxt_en: Clean up unused functions. bnxt_en: Fix firmware signaled resource change logic in open. sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel sctp: fix invalid reference to the index variable of the iterator net/ibm/emac: wrong emac_calc_base call was used by typo net: sched: null actions array pointer before releasing action vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition r8169: add support for NCube 8168 network card ip6_tunnel: respect ttl inherit for ip6tnl mac80211: shorten the IBSS debug messages mac80211: don't Tx a deauth frame if the AP forbade Tx mac80211: Fix station bandwidth setting after channel switch mac80211: fix a race between restart and CSA flows ...	2018-09-04 12:45:11 -07:00
Alexander Popov	964c9dff00	stackleak: Allow runtime disabling of kernel stack erasing Introduce CONFIG_STACKLEAK_RUNTIME_DISABLE option, which provides 'stack_erasing' sysctl. It can be used in runtime to control kernel stack erasing for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Alexander Popov <alex.popov@linux.com> Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2018-09-04 10:35:48 -07:00
Alexander Popov	c8d126275a	fs/proc: Show STACKLEAK metrics in the /proc file system Introduce CONFIG_STACKLEAK_METRICS providing STACKLEAK information about tasks via the /proc file system. In particular, /proc/<pid>/stack_depth shows the maximum kernel stack consumption for the current and previous syscalls. Although this information is not precise, it can be useful for estimating the STACKLEAK performance impact for your workloads. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Alexander Popov <alex.popov@linux.com> Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2018-09-04 10:35:48 -07:00
Alexander Popov	10e9ae9fab	gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack The STACKLEAK feature erases the kernel stack before returning from syscalls. That reduces the information which kernel stack leak bugs can reveal and blocks some uninitialized stack variable attacks. This commit introduces the STACKLEAK gcc plugin. It is needed for tracking the lowest border of the kernel stack, which is important for the code erasing the used part of the kernel stack at the end of syscalls (comes in a separate commit). The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Signed-off-by: Alexander Popov <alex.popov@linux.com> Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2018-09-04 10:35:47 -07:00
Alexander Popov	afaef01c00	x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>	2018-09-04 10:35:47 -07:00
Linus Torvalds	60c1f89241	dma-mapping fixes for 4.19-rc2 A few fixes for the fallout of being a little more pedantic about dma masks. -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAluMGbsLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYP0yxAAkCjQxXhQdcSvHGnJlePdvv6O7xbMoNGnn2kiJRr/ Ao5Db+vFmsVIeUEHNzFmlcI2003VMkhBTSefr3rh+EfMW8bncXseN4tizc22DgCM JabYsmaNatWIzGJTz02MHwZw+eTuI+bmTMM6Fi+mnLZDv3kNMWZUFwMPcn1JoSwk If75Qzx6SfAMbv1QG0z0AhwmTSOwg14v1N6B0vc06CXZBgAFeLvP1wGZnFK4zBEy U9RR7sBOPn5NDExzKEMjMMw2FueVmD2oBPPAWBT1neIz8IpP0TMdcconV+2tThxD rr3s9zltc5S9xthPjgjBnfkJWa3a4im/ZGJpJq2RJqhyXPkEB4U8mZE9vN91UCVq WwBys0KVJaq7/KrWt2yNCJGxhMnMzVL48kHdUzGmcHzi6pPt68PfXEti+vle6HA7 NnYZNxYO81JCpRoHUyf8jjI1fP+EVPVLOHdd6StZweR/1KojWyZrl67lVhniMa8U xoOJiCy87Fj9QlQ4zx478n3Rzgr9qEd8+OA++bOrdrlVJuxfjSJr13OOgM8FLV1G +VxmMCSiSU3lHe1GmrCrzVxTaH52mkflRuZ4MNfRNJB/fQqENOAT8Rr9pBLfjzib W0EXKnY4EFX64XsOg2f77CAioPEltJpap6zhxHjHIlhRiR/Az2iEXQOKC8eaVdfO X7o= =GFai -----END PGP SIGNATURE----- Merge tag 'dma-mapping-4.19-2' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: "A few fixes for the fallout of being a little more pedantic about dma masks" * tag 'dma-mapping-4.19-2' of git://git.infradead.org/users/hch/dma-mapping: of/platform: initialise AMBA default DMA masks sparc: set a default 32-bit dma mask for OF devices kernel/dma/direct: take DMA offset into account in dma_direct_supported	2018-09-02 20:09:36 -07:00
John Fastabend	597222f72a	bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP Currently we check sk_user_data is non NULL to determine if the sk exists in a map. However, this is not sufficient to ensure the psock or the ULP ops are not in use by another user, such as kcm or TLS. To avoid this when adding a sock to a map also verify it is of the correct ULP type. Additionally, when releasing a psock verify that it is the TCP_ULP_BPF type before releasing the ULP. The error case where we abort an update due to ULP collision can cause this error path. For example, __sock_map_ctx_update_elem() [...] err = tcp_set_ulp_id(sock, TCP_ULP_BPF) <- collides with TLS if (err) <- so err out here goto out_free [...] out_free: smap_release_sock() <- calling tcp_cleanup_ulp releases the TLS ULP incorrectly. Fixes: `2f857d0460` ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-02 22:31:10 +02:00
Linus Torvalds	1395d109cd	Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Thomas Gleixner: "Remove the stale skip_onerr member from the hotplug states" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Remove skip_onerr field from cpuhp_step structure	2018-09-02 10:09:35 -07:00
Linus Torvalds	501dacbc24	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: "A small set of updates for core code: - Prevent tracing in functions which are called from trace patching via stop_machine() to prevent executing half patched function trace entries. - Remove old GCC workarounds - Remove pointless includes of notifier.h" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Remove workaround for unreachable warnings from old GCC notifier: Remove notifier header file wherever not used watchdog: Mark watchdog touch functions as notrace	2018-09-02 09:41:45 -07:00
Christoph Hellwig	c1d0af1a1d	kernel/dma/direct: take DMA offset into account in dma_direct_supported When a device has a DMA offset the dma capable result will change due to the difference between the physical and DMA address. Take that into account. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com>	2018-09-01 15:42:28 +02:00
Mukesh Ojha	6fb86d9720	cpu/hotplug: Remove skip_onerr field from cpuhp_step structure When notifiers were there, `skip_onerr` was used to avoid calling particular step startup/teardown callbacks in the CPU up/down rollback path, which made the hotplug asymmetric. As notifiers are gone now after the full state machine conversion, the `skip_onerr` field is no longer required. Remove it from the structure and its usage. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1535439294-31426-1-git-send-email-mojha@codeaurora.org	2018-08-31 14:13:03 +02:00
Paul E. McKenney	b56ada1209	Merge branches 'doc.2018.08.30a', 'dynticks.2018.08.30b', 'srcu.2018.08.30b' and 'torture.2018.08.29a' into HEAD doc.2018.08.30a: Documentation updates dynticks.2018.08.30b: RCU flavor consolidation updates and cleanups srcu.2018.08.30b: SRCU updates torture.2018.08.29a: Torture-test updates	2018-08-30 16:12:53 -07:00
Paul E. McKenney	4e6ea4ef56	srcu: Make early-boot call_srcu() reuse workqueue lists Allocating a list_head structure that is almost never used, and, when used, is used only during early boot (rcu_init() and earlier), is a bit wasteful. This commit therefore eliminates that list_head in favor of the one in the work_struct structure. This is safe because the work_struct structure cannot be used until after rcu_init() returns. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-30 16:10:49 -07:00
Paul E. McKenney	e0fcba9ac0	srcu: Make call_srcu() available during very early boot Event tracing is moving to SRCU in order to take advantage of the fact that SRCU may be safely used from idle and even offline CPUs. However, event tracing can invoke call_srcu() very early in the boot process, even before workqueue_init_early() is invoked (let alone rcu_init()). Therefore, call_srcu()'s attempts to queue work fail miserably. This commit therefore detects this situation, and refrains from attempting to queue work before rcu_init() time, but does everything else that it would have done, and in addition, adds the srcu_struct to a global list. The rcu_init() function now invokes a new srcu_init() function, which is empty if CONFIG_SRCU=n. Otherwise, srcu_init() queues work for each srcu_struct on the list. This all happens early enough in boot that there is but a single CPU with interrupts disabled, which allows synchronization to be dispensed with. Of course, the queued work won't actually be invoked until after workqueue_init() is invoked, which happens shortly after the scheduler is up and running. This means that although call_srcu() may be invoked any time after per-CPU variables have been set up, there is still a very narrow window when synchronize_srcu() won't work, and this window extends from the time that the scheduler starts until the time that workqueue_init() returns. This can be fixed in a manner similar to the fix for synchronize_rcu_expedited() and friends, but until someone actually needs to use synchronize_srcu() during this window, this fix is added churn for no benefit. Finally, note that Tree SRCU's new srcu_init() function invokes queue_work() rather than the queue_delayed_work() function that is invoked post-boot. The reason is that queue_delayed_work() will (as you would expect) post a timer, and timers have not yet been initialized. So use of queue_work() avoids the complaints about use of uninitialized spinlocks that would otherwise result. Besides, some delay is already provide by the aforementioned fact that the queued work won't actually be invoked until after the scheduler is up and running. Requested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-30 16:10:19 -07:00
Mike Galbraith	894d45bbf7	rcu: Convert rcu_state.ofl_lock to raw_spinlock_t `1e64b15a4b` ("rcu: Fix grace-period hangs due to race with CPU offline") added spinlock_t ofl_lock to the rcu_state structure, then takes it with preemption disabled during CPU offline, which gives the -rt patchset's sleeping spinlock heartburn. This commit therefore converts ->ofl_lock to raw_spinlock_t. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2018-08-30 16:03:54 -07:00
Paul E. McKenney	8d8a9d0e7e	rcu: Remove obsolete ->dynticks_fqs and ->cond_resched_completed The rcu_data structure's ->dynticks_fqs is incremented but never accesses. Its ->cond_resched_completed field isn't used at all. This commit therefore removes both fields. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:53 -07:00
Paul E. McKenney	dc5a4f2932	rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks This commit move ->dynticks from the rcu_dynticks structure to the rcu_data structure, replacing the field of the same name. It also updates the code to access ->dynticks from the rcu_data structure and to use the rcu_data structure rather than following to now-gone ->dynticks field to the now-gone rcu_dynticks structure. While in the area, this commit also fixes up comments. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:52 -07:00
Paul E. McKenney	4c5273bf2b	rcu: Switch dyntick nesting counters to rcu_data structure This commit removes ->dynticks_nesting and ->dynticks_nmi_nesting from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:51 -07:00
Paul E. McKenney	2dba13f0b6	rcu: Switch urgent quiescent-state requests to rcu_data structure This commit removes ->rcu_need_heavy_qs and ->rcu_urgent_qs from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:50 -07:00
Paul E. McKenney	c458a89e96	rcu: Switch lazy counts to rcu_data structure This commit removes ->all_lazy, ->nonlazy_posted and ->nonlazy_posted_snap from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:49 -07:00
Paul E. McKenney	5998a75adb	rcu: Switch last accelerate/advance to rcu_data structure This commit removes ->last_accelerate and ->last_advance_all from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:48 -07:00
Paul E. McKenney	0fd79e7521	rcu: Switch ->tick_nohz_enabled_snap to rcu_data structure This commit removes ->tick_nohz_enabled_snap from the rcu_dynticks structure and updates the code to access it from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:47 -07:00
Paul E. McKenney	cc72046cc3	rcu: Merge rcu_dynticks structure into rcu_data structure Now that there is only ever one rcu_data structure per CPU, there is no need for a separate rcu_dynticks structure. This commit therefore adds the rcu_dynticks fields into the rcu_data structure in preparation for removing the rcu_dynticks structure entirely. Note that the ->dynticks field will be handled specially because there is a field by that name in both structures. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:47 -07:00
Paul E. McKenney	df63fa5bc1	rcu: Convert "1UL << x" to "BIT(x)" This commit saves a few characters by converting "1UL << x" to "BIT(x)". Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:46 -07:00
Paul E. McKenney	fced9c8cfe	rcu: Avoid resched_cpu() when rescheduling the current CPU The resched_cpu() interface is quite handy, but it does acquire the specified CPU's runqueue lock, which does not come for free. This commit therefore substitutes the following when directing resched_cpu() at the current CPU: set_tsk_need_resched(current); set_preempt_need_resched(); Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>	2018-08-30 16:03:45 -07:00
Paul E. McKenney	d3052109c0	rcu: More aggressively enlist scheduler aid for nohz_full CPUs Because nohz_full CPUs can leave the scheduler-clock interrupt disabled even when in kernel mode, RCU cannot rely on rcu_check_callbacks() to enlist the scheduler's aid in extracting a quiescent state from such CPUs. This commit therefore more aggressively uses resched_cpu() on nohz_full CPUs that fail to pass through a quiescent state in a timely manner. By default, the resched_cpu() beating starts 300 milliseconds into the quiescent state. While in the neighborhood, add a ->last_fqs_resched field to the rcu_data structure in order to rate-limit resched_cpu() calls from the RCU grace-period kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:44 -07:00
Paul E. McKenney	c06aed0e31	rcu: Compute jiffies_till_sched_qs from other kernel parameters The jiffies_till_sched_qs value used to determine how old a grace period must be before RCU enlists the help of the scheduler to force a quiescent state on the holdout CPU. Currently, this defaults to HZ/10 regardless of system size and may be set only at boot time. This can be a problem for very large systems, because if the values of the jiffies_till_first_fqs and jiffies_till_next_fqs kernel parameters are left at their defaults, they are calculated to increase as the number of CPUs actually configured on the system increases. Thus, on a sufficiently large system, RCU would enlist the help of the scheduler before the grace-period kthread had a chance to scan for idle CPUs, which wastes CPU time. This commit therefore allows jiffies_till_sched_qs to be set, if desired, but if left as default, computes is as jiffies_till_first_fqs plus twice jiffies_till_next_fqs, thus allowing three force-quiescent-state scans for idle CPUs. This scales with the number of CPUs, providing sensible default values. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:43 -07:00
Paul E. McKenney	74de6960c9	rcu: Provide functions for determining if call_rcu() has been invoked This commit adds rcu_head_init() and rcu_head_after_call_rcu() functions to help RCU users detect when another CPU has passed the specified rcu_head structure and function to call_rcu(). The rcu_head_init() should be invoked before making the structure visible to RCU readers, and then the rcu_head_after_call_rcu() may be invoked from within an RCU read-side critical section on an rcu_head structure that was obtained during a traversal of the data structure in question. The rcu_head_after_call_rcu() function will return true if the rcu_head structure has already been passed (with the specified function) to call_rcu(), otherwise it will return false. If rcu_head_init() has not been invoked on the rcu_head structure or if the rcu_head (AKA callback) has already been invoked, then rcu_head_after_call_rcu() will do WARN_ON_ONCE(). Reported-by: NeilBrown <neilb@suse.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Apply neilb naming feedback. ]	2018-08-30 16:03:42 -07:00
Paul E. McKenney	7e28c5af4e	rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structure The ->rcu_qs_ctr counter was intended to allow providing a lightweight report of a quiescent state to all RCU flavors. But now that there is only one flavor of RCU in any one running kernel, there is no point in having this feature. This commit therefore removes the ->rcu_qs_ctr field from the rcu_dynticks structure and the ->rcu_qs_ctr_snap field from the rcu_data structure. This results in the "rqc" option to the rcu_fqs trace event no longer being used, so this commit also removes the "rqc" description from the header comment. While in the neighborhood, this commit also causes the forward-progress request .rcu_need_heavy_qs be set one jiffies_till_sched_qs interval later in the grace period than the first setting of .rcu_urgent_qs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:42 -07:00
Paul E. McKenney	c5bacd9417	rcu: Motivate Tiny RCU forward progress If a long-running CPU-bound in-kernel task invokes call_rcu(), the callback won't be invoked until the next context switch. If there are no other runnable tasks (which is not an uncommon situation on deep embedded systems), the callback might never be invoked. This commit therefore causes rcu_check_callbacks() to ask the scheduler for a context switch if there are callbacks posted that are still waiting for a grace period. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:41 -07:00
Paul E. McKenney	c116dba68d	rcutorture: Dump reader protection sequence if failures or close calls Now that RCU can have readers with multiple segments, it is quite possible that a specific sequence of reader segments might result in an rcutorture failure (reader spans a full grace period as detected by one of the grace-period primitives) or an rcutorture close call (reader potentially spans a full grace period based on reading out the RCU implementation's grace-period counter, but with no ordering). In such cases, it would clearly ease debugging if the offending specific sequence was known. For the first reader encountering a failure or a close call, this commit therefore dumps out the segments, delay durations, and whether or not the reader was preempted. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Mark variables static, as suggested by kbuild test robot. ]	2018-08-30 16:03:40 -07:00
Paul E. McKenney	a0ef9ec241	rcu: Provide improved interrupt-from-idle check in rcu_check_callbacks() The patch making need_resched() respond to urgent RCU-QS needs used is_idle_task(current) to detect an interrupt from idle, which does work reasonably, but is (in theory at least) vulnerable to loops containing need_resched() invoked from within RCU_NONIDLE() or its tracepoint equivalent. This commit therefore moves rcu_is_cpu_rrupt_from_idle() to a place from which rcu_check_callbacks() can invoke it and replaces the is_idle_task(current) with rcu_is_cpu_rrupt_from_idle(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:39 -07:00
Paul E. McKenney	92aa39e9dc	rcu: Make need_resched() respond to urgent RCU-QS needs The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent need for an RCU quiescent state from the force-quiescent-state processing within the grace-period kthread to context switches and to cond_resched(). Unfortunately, such urgent needs are not communicated to need_resched(), which is sometimes used to decide when to invoke cond_resched(), for but one example, within the KVM vcpu_run() function. As of v4.15, this can result in synchronize_sched() being delayed by up to ten seconds, which can be problematic, to say nothing of annoying. This commit therefore checks rcu_dynticks.rcu_urgent_qs from within rcu_check_callbacks(), which is invoked from the scheduling-clock interrupt handler. If the current task is not an idle task and is not executing in usermode, a context switch is forced, and either way, the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current task is an idle task, then RCU's dyntick-idle code will detect the quiescent state, so no further action is required. Similarly, if the task is executing in usermode, other code in rcu_check_callbacks() and its called functions will report the corresponding quiescent state. Reported-by: Marius Hillenbrand <mhillenb@amazon.de> Reported-by: David Woodhouse <dwmw2@infradead.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:39 -07:00
Paul E. McKenney	dd46a7882c	rcu: Inline _rcu_barrier() into its sole remaining caller Because rcu_barrier() is a one-line wrapper function for _rcu_barrier() and because nothing else calls _rcu_barrier(), this commit inlines _rcu_barrier() into rcu_barrier(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:39 -07:00
Paul E. McKenney	395a2f097e	rcu: Define rcu_all_qs() only in !PREEMPT builds Now that rcu_all_qs() is used only in !PREEMPT builds, move it to tree_plugin.h so that it is defined only in those builds. This in turn means that rcu_momentary_dyntick_idle() is only used in !PREEMPT builds, but it is simply marked __maybe_unused in order to keep it near the rest of the dyntick-idle code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:37 -07:00
Paul E. McKenney	06462efc80	rcu: Clean up flavor-related definitions and comments in update.c Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:36 -07:00
Paul E. McKenney	0ae86a2726	rcu: Clean up flavor-related definitions and comments in tree_plugin.h Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:35 -07:00
Paul E. McKenney	8fa946d428	rcu: Clean up flavor-related definitions and comments in tree_exp.h Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:35 -07:00
Paul E. McKenney	49918a54e6	rcu: Clean up flavor-related definitions and comments in tree.c Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:34 -07:00
Paul E. McKenney	679d3f3092	rcu: Clean up flavor-related definitions and comments in tiny.c Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:34 -07:00
Paul E. McKenney	6eb95cc450	rcu: Clean up flavor-related definitions and comments in srcutree.h Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:34 -07:00
Paul E. McKenney	62a1a94536	rcu: Clean up flavor-related definitions and comments in rcutorture.c Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:33 -07:00
Paul E. McKenney	7f87c036fe	rcu: Clean up flavor-related definitions and comments in rcu.h Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:33 -07:00
Paul E. McKenney	8c1cf2da6f	rcu: Clean up flavor-related definitions and comments in Kconfig Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:32 -07:00
Paul E. McKenney	de3875d302	rcu: Remove now-unused rcutorture APIs This commit removes rcu_sched_get_gp_seq(), rcu_bh_get_gp_seq(), rcu_exp_batches_completed_sched(), rcu_sched_force_quiescent_state(), and rcu_bh_force_quiescent_state(), which are no longer used because rcutorture no longer does "rcu_bh" and "rcu_sched" torture types. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:30 -07:00
Paul E. McKenney	620d246065	rcuperf: Remove the "rcu_bh" and "sched" torture types Now that the RCU-bh and RCU-sched update-side functions are simple wrappers around their RCU counterparts, there isn't a whole lot of point in testing them. This commit therefore removes the "rcu_bh" and "sched" torture types from rcuperf. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:30 -07:00
Paul E. McKenney	c770c82a23	rcutorture: Remove the "rcu_bh" and "sched" torture types Now that the RCU-bh and RCU-sched update-side functions are simple wrappers around their RCU counterparts, there isn't a whole lot of point in testing them. This commit therefore removes the "rcu_bh" and "sched" torture types from rcutorture. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:29 -07:00
Paul E. McKenney	72ce30dd1f	rcu: Stop testing RCU-bh and RCU-sched Now that the RCU-bh and RCU-sched update-side functions are simple wrappers around their RCU counterparts, there isn't a whole lot of point in testing them. This commit therefore removes the self-test capability and removes the corresponding kernel-boot parameters. It also updates the various rcutorture .boot files to remove the kernel boot parameters that call for testing RCU-bh and RCU-sched. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:29 -07:00
Paul E. McKenney	2ceebc0350	rcutorture: Add RCU-bh and RCU-sched support for extended readers Since there is now a single consolidated RCU flavor, rcutorture needs to test extending of RCU readers via rcu_read_lock_bh() and rcu_read_lock_sched(). This commit adds this support, with added checks (just like for local_bh_enable()) to ensure that rcu_read_unlock_bh() will not be invoked while interrupts are disabled. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:27 -07:00
Paul E. McKenney	a8bb74acd8	rcu: Consolidate RCU-sched update-side function definitions This commit saves a few lines by consolidating the RCU-sched function definitions at the end of include/linux/rcupdate.h. This consolidation also makes it easier to remove them all when the time comes. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:26 -07:00
Paul E. McKenney	4c7e9c1434	rcu: Consolidate RCU-bh update-side function definitions This commit saves a few lines by consolidating the RCU-bh function definitions at the end of include/linux/rcupdate.h. This consolidation also makes it easier to remove them all when the time comes. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:25 -07:00
Paul E. McKenney	c3854a055b	rcu: Pull rcu_gp_kthread() FQS loop into separate function The rcu_gp_kthread() function is long and deeply indented, so this commit pulls the loop that repeatedly invokes rcu_gp_fqs() into a new rcu_gp_fqs_loop() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:24 -07:00
Paul E. McKenney	4e95020cdd	rcu: Inline increment_cpu_stall_ticks() into its sole caller Consolidation of the RCU flavors into one makes increment_cpu_stall_ticks() a trivial one-line function with only one caller. This commit therefore inlines it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:23 -07:00
Paul E. McKenney	8ff0b90780	rcu: Fix typo in force_qs_rnp()'s parameter's parameter Pointers to rcu_data structures should be named rdp, not rsp. This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:23 -07:00
Paul E. McKenney	eb7a665388	rcu: Eliminate initialization-time use of rsp Now that there is only one rcu_state structure, there is less point in maintaining a pointer to it. This commit therefore replaces rsp with &rcu_state in rcu_cpu_starting() and rcu_init_one(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:22 -07:00
Paul E. McKenney	ec9f5835f7	rcu: Eliminate RCU-barrier use of rsp Now that there is only one rcu_state structure, there is less point in maintaining a pointer to it. This commit therefore replaces rsp with &rcu_state in rcu_barrier_callback(), rcu_barrier_func(), and _rcu_barrier(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:22 -07:00
Paul E. McKenney	67a0edbf3c	rcu: Eliminate quiescent-state and grace-period-nonstart use of rsp Now that there is only one rcu_state structure, there is less point in maintaining a pointer to it. This commit therefore replaces rsp with &rcu_state in rcu_report_qs_rnp(), force_quiescent_state(), and rcu_check_gp_start_stall(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:21 -07:00
Paul E. McKenney	3c779dfef2	rcu: Eliminate callback-invocation/invocation use of rsp Now that there is only one rcu_state structure, there is less point in maintaining a pointer to it. This commit therefore replaces rsp with &rcu_state in rcu_do_batch(), invoke_rcu_callbacks(), and __call_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:21 -07:00
Paul E. McKenney	9cbc5b9702	rcu: Eliminate grace-period management code use of rsp Now that there is only one rcu_state structure, there is less point in maintaining a pointer to it. This commit therefore replaces rsp with &rcu_state in rcu_start_this_gp(), rcu_accelerate_cbs(), __note_gp_changes(), rcu_gp_init(), rcu_gp_fqs(), rcu_gp_cleanup(), rcu_gp_kthread(), and rcu_report_qs_rsp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:20 -07:00
Paul E. McKenney	4c6ed43708	rcu: Eliminate stall-warning use of rsp Now that there is only one rcu_state structure, there is less point in maintaining a pointer to it. This commit therefore replaces rsp with &rcu_state in print_other_cpu_stall(), print_cpu_stall(), and check_cpu_stall(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:20 -07:00
Paul E. McKenney	7cba4775ba	rcu: Restructure rcu_check_gp_kthread_starvation() This commit removes the rsp and gpa local variables, repurposes the j local variable and adds a gpk (GP kthread) local to improve readability. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:19 -07:00
Paul E. McKenney	f7dd7d44fd	rcu: Simplify rcutorture_get_gp_data() This commit restructures rcutorture_get_gp_data() to take advantage of the fact that there is only one flavor of RCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:19 -07:00
Paul E. McKenney	b97d23c51c	rcu: Remove for_each_rcu_flavor() flavor-traversal macro Now that there is only ever a single flavor of RCU in a given kernel build, there isn't a whole lot of point in having a flavor-traversal macro. This commit therefore removes it and converts calls to it to straightline code, inlining trivial functions as appropriate. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:18 -07:00
Paul E. McKenney	564a9ae604	rcu: Remove last non-flavor-traversal rsp local variable from tree_plugin.h This commit removes the last non-flavor-traversal rsp local variable from kernel/rcu/tree_plugin.h in favor of &rcu_state. The flavor-traversal locals will be removed with the removal of flavor traversal. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:17 -07:00
Paul E. McKenney	88d1bead85	rcu: Remove rcu_data structure's ->rsp field Now that there is only one rcu_state structure, there is no need for the rcu_data structure to indicate which it corresponds to. This commit therefore removes the rcu_data structure's ->rsp field, replacing all remaining uses of it with &rcu_state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:17 -07:00
Paul E. McKenney	aedf4ba984	rcu: Remove rsp parameter from rcu_node tree accessor macros There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's rcu_node tree's accessor macros. This commit therefore removes the rsp parameter from those macros in kernel/rcu/rcu.h, and removes some now-unused rsp local variables while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:16 -07:00
Paul E. McKenney	63d4c8c979	rcu: Remove rsp parameter from expedited grace-period functions There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from the code in kernel/rcu/tree_exp.h, and removes all of the rsp local variables while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:14 -07:00
Paul E. McKenney	4580b0541b	rcu: Remove rsp parameter from no-CBs CPU functions There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_nocb_cpu_needs_barrier(), rcu_spawn_one_nocb_kthread(), rcu_organize_nocb_kthreads(), rcu_nocb_cpu_needs_barrier(), and rcu_nohz_full_cpu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:13 -07:00
Paul E. McKenney	b21ebed951	rcu: Remove rsp parameter from print_cpu_stall_info() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from print_cpu_stall_info(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:12 -07:00
Paul E. McKenney	6dbfdc1409	rcu: Remove rsp parameter from rcu_spawn_one_boost_kthread() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_spawn_one_boost_kthread(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:11 -07:00
Paul E. McKenney	81ab59a3ad	rcu: Remove rsp parameter from dump_blkd_tasks() and friend There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from dump_blkd_tasks() and rcu_preempt_blocked_readers_cgp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:10 -07:00
Paul E. McKenney	a2887cd85f	rcu: Remove rsp parameter from rcu_print_detail_task_stall() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_print_detail_task_stall(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:09 -07:00
Paul E. McKenney	b8bb1f63cf	rcu: Remove rsp parameter from rcu_init_one() and friends There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_init_one() and rcu_dump_rcu_node_tree(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:09 -07:00
Paul E. McKenney	53b46303da	rcu: Remove rsp parameter from rcu_boot_init_percpu_data() and friends There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_boot_init_percpu_data(), rcu_init_percpu_data(), rcu_cleanup_dying_idle_cpu(), and rcu_migrate_callbacks(). While in the neighborhood, line the last three into rcutree_prepare_cpu(), rcu_report_dead() and rcutree_migrate_callbacks(), respectively. This also gets rid of the for_each_rcu_flavor() calls that were in those tree functions. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:08 -07:00
Paul E. McKenney	8344b871b1	rcu: Remove rsp parameter from _rcu_barrier() and friends There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from _rcu_barrier_trace() and _rcu_barrier(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:08 -07:00
Paul E. McKenney	98ece508b5	rcu: Remove rsp parameter from __rcu_pending() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from __rcu_pending(), and also inlines it into rcu_pending(), removing the for_each_rcu_flavor() while in the neighborhood.. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:07 -07:00
Paul E. McKenney	5c7d89676b	rcu: Remove rsp parameter from __call_rcu() and friend There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from __call_rcu_core() and __call_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:07 -07:00
Paul E. McKenney	b049fdf8e3	rcu: Remove rsp parameter from __rcu_process_callbacks() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from __rcu_process_callbacks(), and also inlines it into rcu_process_callbacks(), removing the for_each_rcu_flavor() while in the neighborhood. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:06 -07:00
Paul E. McKenney	b96f9dc4fb	rcu: Remove rsp parameter from rcu_check_gp_start_stall() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_check_gp_start_stall(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:06 -07:00
Paul E. McKenney	e9ecb780fe	rcu: Remove rsp parameter from force-quiescent-state functions There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from force_qs_rnp() and force_quiescent_state(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:05 -07:00
Paul E. McKenney	5bb5d09cc4	rcu: Remove rsp parameter from rcu_do_batch() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_do_batch(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:05 -07:00
Paul E. McKenney	780cd59083	rcu: Remove rsp parameter from CPU hotplug functions There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_cleanup_dying_cpu() and rcu_cleanup_dead_cpu(). And, as long as we are in the neighborhood, inlines them into rcutree_dying_cpu() and rcutree_dead_cpu(), respectively. This also eliminates a pair of for_each_rcu_flavor() loops. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:04 -07:00
Paul E. McKenney	8087d3e3c4	rcu: Remove rsp parameter from rcu_check_quiescent_state() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_check_quiescent_state(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:04 -07:00
Paul E. McKenney	0854a05c9f	rcu: Remove rsp parameter from rcu_gp_kthread() and friends There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_gp_init(), rcu_gp_fqs_check_wake(), rcu_gp_fqs(), rcu_gp_cleanup(), and rcu_gp_kthread(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:03 -07:00
Paul E. McKenney	22212332c1	rcu: Remove rsp parameter from rcu_gp_slow() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_gp_slow(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:03 -07:00
Paul E. McKenney	15cabdffbb	rcu: Remove rsp parameter from note_gp_changes() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from note_gp_changes(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:02 -07:00
Paul E. McKenney	c7e48f7ba3	rcu: Remove rsp parameter from __note_gp_changes() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from __note_gp_changes(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:01 -07:00
Paul E. McKenney	834f56bf54	rcu: Remove rsp parameter from rcu_advance_cbs() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_advance_cbs(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:01 -07:00
Paul E. McKenney	c6e09b97b9	rcu: Remove rsp parameter from rcu_accelerate_cbs_unlocked() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_accelerate_cbs_unlocked(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:00 -07:00
Paul E. McKenney	02f501423d	rcu: Remove rsp parameter from rcu_accelerate_cbs() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_accelerate_cbs(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:00 -07:00
Paul E. McKenney	532c00c97f	rcu: Remove rsp parameter from rcu_gp_kthread_wake() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_gp_kthread_wake(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:59 -07:00
Paul E. McKenney	3481f2eab0	rcu: Remove rsp parameter from rcu_future_gp_cleanup() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_future_gp_cleanup(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:59 -07:00
Paul E. McKenney	ea12ff2b7d	rcu: Remove rsp parameter from check_cpu_stall() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from check_cpu_stall(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:58 -07:00
Paul E. McKenney	4e8b8e08f9	rcu: Remove rsp parameter from print_cpu_stall() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from print_cpu_stall(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:58 -07:00
Paul E. McKenney	a91e7e58b1	rcu: Remove rsp parameter from print_other_cpu_stall() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from print_other_cpu_stall(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:57 -07:00
Paul E. McKenney	e1741c69d4	rcu: Remove rsp parameter from rcu_stall_kick_kthreads() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_stall_kick_kthreads(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:57 -07:00
Paul E. McKenney	33dbdbf025	rcu: Remove rsp parameter from rcu_dump_cpu_stacks() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_dump_cpu_stacks(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:56 -07:00
Paul E. McKenney	8fd119b652	rcu: Remove rsp parameter from rcu_check_gp_kthread_starvation() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_check_gp_kthread_starvation(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:56 -07:00
Paul E. McKenney	ad3832e974	rcu: Remove rsp parameter from record_gp_stall_check_time() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from record_gp_stall_check_time(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:55 -07:00
Paul E. McKenney	336a4f6c45	rcu: Remove rsp parameter from rcu_get_root() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_get_root(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:55 -07:00
Paul E. McKenney	de8e87305a	rcu: Remove rsp parameter from rcu_gp_in_progress() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_gp_in_progress(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:54 -07:00
Paul E. McKenney	33085c469a	rcu: Remove rsp parameter from rcu_report_qs_rdp() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_report_qs_rdp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:53 -07:00
Paul E. McKenney	139ad4da5a	rcu: Remove rsp parameter from rcu_report_unblock_qs_rnp() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_report_unblock_qs_rnp(), which is particularly appropriate in this case given that this parameter is no longer used. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:53 -07:00
Paul E. McKenney	aff4e9ede5	rcu: Remove rsp parameter from rcu_report_qs_rsp() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_report_qs_rsp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:52 -07:00
Paul E. McKenney	b50912d0b5	rcu: Remove rsp parameter from rcu_report_qs_rnp() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_report_qs_rnp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:51 -07:00
Paul E. McKenney	2280ee5a7d	rcu: Remove rcu_data_p pointer to default rcu_data structure The rcu_data_p pointer references the default set of per-CPU rcu_data structures, that is, those that call_rcu() uses, as opposed to call_rcu_bh() and sometimes call_rcu_sched(). But there is now only one set of per-CPU rcu_data structures, so that one set is by definition the default, which means that the rcu_data_p pointer no longer serves any useful purpose. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:51 -07:00
Paul E. McKenney	16fc9c600b	rcu: Remove rcu_state_p pointer to default rcu_state structure The rcu_state_p pointer references the default rcu_state structure, that is, the one that call_rcu() uses, as opposed to call_rcu_bh() and sometimes call_rcu_sched(). But there is now only one rcu_state structure, so that one structure is by definition the default, which means that the rcu_state_p pointer no longer serves any useful purpose. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:50 -07:00
Paul E. McKenney	da1df50d16	rcu: Remove rcu_state structure's ->rda field The rcu_state structure's ->rda field was used to find the per-CPU rcu_data structures corresponding to that rcu_state structure. But now there is only one rcu_state structure (creatively named "rcu_state") and one set of per-CPU rcu_data structures (creatively named "rcu_data"). Therefore, uses of the ->rda field can always be replaced by "rcu_data, and this commit makes that change and removes the ->rda field. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:49 -07:00
Paul E. McKenney	ec5dd444b6	rcu: Eliminate rcu_state structure's ->call field The rcu_state structure's ->call field references the corresponding RCU flavor's call_rcu() function. However, now that there is only ever one rcu_state structure in a given build of the Linux kernel, and that flavor uses plain old call_rcu(), there is not a lot of point in continuing to have the ->call field. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:48 -07:00
Paul E. McKenney	358be2d368	rcu: Remove RCU_STATE_INITIALIZER() Now that a given build of the Linux kernel has only one set of rcu_state, rcu_node, and rcu_data structures, there is no point in creating a macro to declare and compile-time initialize them. This commit therefore just does normal declaration and compile-time initialization of these structures. While in the area, this commit also removes #ifndefs of the no-longer-ever-defined preprocessor macro RCU_TREE_NONCORE. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:47 -07:00
Paul E. McKenney	709fdce754	rcu: Express Tiny RCU updates in terms of RCU rather than RCU-sched This commit renames Tiny RCU functions so that the lowest level of functionality is RCU (e.g., synchronize_rcu()) rather than RCU-sched (e.g., synchronize_sched()). This provides greater naming compatibility with Tree RCU, which will in turn permit more LoC removal once the RCU-sched and RCU-bh update-side API is removed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix Tiny call_rcu()'s EXPORT_SYMBOL() in response to a bug report from kbuild test robot. ]	2018-08-30 16:02:46 -07:00
Paul E. McKenney	45975c7d21	rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds Now that RCU-preempt knows about preemption disabling, its implementation of synchronize_rcu() works for synchronize_sched(), and likewise for the other RCU-sched update-side API members. This commit therefore confines the RCU-sched update-side code to CONFIG_PREEMPT=n builds, and defines RCU-sched's update-side API members in terms of those of RCU-preempt. This means that any given build of the Linux kernel has only one update-side flavor of RCU, namely RCU-preempt for CONFIG_PREEMPT=y builds and RCU-sched for CONFIG_PREEMPT=n builds. This in turn means that kernels built with CONFIG_RCU_NOCB_CPU=y have only one rcuo kthread per CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com>	2018-08-30 16:02:45 -07:00
Paul E. McKenney	4cf439a200	rcu: Fix typo in rcu_get_gp_kthreads_prio() header comment Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:43 -07:00
Paul E. McKenney	2bbfc25b09	rcu: Drop "wake" parameter from rcu_report_exp_rdp() The rcu_report_exp_rdp() function is always invoked with its "wake" argument set to "true", so this commit drops this parameter. The only potential call site that would use "false" is in the code driving the expedited grace period, and that code uses rcu_report_exp_cpu_mult() instead, which therefore retains its "wake" parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:43 -07:00
Paul E. McKenney	82fcecfa81	rcu: Update comments and help text for no more RCU-bh updaters This commit updates comments and help text to account for the fact that RCU-bh update-side functions are now simple wrappers for their RCU or RCU-sched counterparts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:42 -07:00
Paul E. McKenney	65cfe3583b	rcu: Define RCU-bh update API in terms of RCU Now that the main RCU API knows about softirq disabling and softirq's quiescent states, the RCU-bh update code can be dispensed with. This commit therefore removes the RCU-bh update-side implementation and defines RCU-bh's update-side API in terms of that of either RCU-preempt or RCU-sched, depending on the setting of the CONFIG_PREEMPT Kconfig option. In kernels built with CONFIG_RCU_NOCB_CPU=y this has the knock-on effect of reducing by one the number of rcuo kthreads per CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:40 -07:00
Paul E. McKenney	ba1c64c272	rcu: Report expedited grace periods at context-switch time This commit reduces the latency of expedited RCU grace periods by reporting a quiescent state for the CPU at context-switch time. In CONFIG_PREEMPT=y kernels, if the outgoing task is still within an RCU read-side critical section (and thus still blocking some grace period, perhaps including this expedited grace period), then that task will already have been placed on one of the leaf rcu_node structures' ->blkd_tasks list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:38 -07:00
Paul E. McKenney	d28139c4e9	rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe One necessary step towards consolidating the three flavors of RCU is to make sure that the resulting consolidated "one flavor to rule them all" correctly handles networking denial-of-service attacks. One thing that allows RCU-bh to do so is that __do_softirq() invokes rcu_bh_qs() every so often, and so something similar has to happen for consolidated RCU. This must be done carefully. For example, if a preemption-disabled region of code takes an interrupt which does softirq processing before returning, consolidated RCU must ignore the resulting rcu_bh_qs() invocations -- preemption is still disabled, and that means an RCU reader for the consolidated flavor. This commit therefore creates a new rcu_softirq_qs() that is called only from the ksoftirqd task, thus avoiding the interrupted-a-preempted-region problem. This new rcu_softirq_qs() function invokes rcu_sched_qs(), rcu_preempt_qs(), and rcu_preempt_deferred_qs(). The latter call handles any deferred quiescent states. Note that __do_softirq() still invokes rcu_bh_qs(). It will continue to do so until a later stage of cleanup when the RCU-bh flavor is removed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix !SMP issue located by kbuild test robot. ]	2018-08-30 16:02:38 -07:00
Paul E. McKenney	e11ec65cc8	rcu: Add warning to detect half-interrupts RCU's dyntick-idle code is written to tolerate half-interrupts, that it, either an interrupt that invokes rcu_irq_enter() but never invokes the corresponding rcu_irq_exit() on the one hand, or an interrupt that never invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit() on the other. These things really did happen at one time, as evidenced by this ca-2011 LKML post: http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com The reason why RCU tolerates half-interrupts is that usermode helpers used exceptions to invoke a system call from within the kernel such that the system call did a normal return (not a return from exception) to the calling context. This caused rcu_irq_enter() to be invoked without a matching rcu_irq_exit(). However, usermode helpers have since been rewritten to make much more housebroken use of workqueues, kernel threads, and do_execve(), and therefore should no longer produce half-interrupts. No one knows of any other source of half-interrupts, but then again, no one seems insane enough to go audit the entire kernel to verify that half-interrupts really are a relic of the past. This commit therefore adds a pair of WARN_ON_ONCE() calls that will trigger in the presence of half interrupts, which the code will continue to handle correctly. If neither of these WARN_ON_ONCE() trigger by mid-2021, then perhaps RCU can stop handling half-interrupts, which would be a considerable simplification. Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Joel Fernandes <joel@joelfernandes.org> Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2018-08-30 16:02:36 -07:00
Paul E. McKenney	fcc878e4df	rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special union The ->b.exp_need_qs field is now set only to false, so this commit removes it. The job this field used to do is now done by the rcu_data structure's ->deferred_qs field, which is a consequence of a better split between task-based (the rcu_node structure's ->exp_tasks field) and CPU-based (the aforementioned rcu_data structure's ->deferred_qs field) tracking of quiescent states for RCU-preempt expedited grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:36 -07:00
Paul E. McKenney	27c744e32a	rcu: Allow processing deferred QSes for exiting RCU-preempt readers If an RCU-preempt read-side critical section is exiting, that is, ->rcu_read_lock_nesting is negative, then it is a good time to look at the possibility of reporting deferred quiescent states. This commit therefore updates the checks in rcu_preempt_need_deferred_qs() to allow exiting critical sections to report deferred quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:35 -07:00
Paul E. McKenney	c0335743c5	rcutorture: Test extended "rcu" read-side critical sections This commit makes the "rcu" torture type test extended read-side critical sections in order to test the deferral of RCU-preempt quiescent-state testing. In CONFIG_PREEMPT=n kernels, this simply duplicates the setup already in place for the "sched" torture type. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:35 -07:00
Paul E. McKenney	3e31009898	rcu: Defer reporting RCU-preempt quiescent states when disabled This commit defers reporting of RCU-preempt quiescent states at rcu_read_unlock_special() time when any of interrupts, softirq, or preemption are disabled. These deferred quiescent states are reported at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline operation. Of course, if another RCU read-side critical section has started in the meantime, the reporting of the quiescent state will be further deferred. This also means that disabling preemption, interrupts, and/or softirqs will act as an RCU-preempt read-side critical section. This is enforced by checking preempt_count() as needed. Some special cases must be handled on an ad-hoc basis, for example, context switch is a quiescent state even though both the scheduler and do_exit() disable preemption. In these cases, additional calls to rcu_preempt_deferred_qs() override the preemption disabling. Similar logic overrides disabled interrupts in rcu_preempt_check_callbacks() because in this case the quiescent state happened just before the corresponding scheduling-clock interrupt. In theory, this change lifts a long-standing restriction that required that if interrupts were disabled across a call to rcu_read_unlock() that the matching rcu_read_lock() also be contained within that interrupts-disabled region of code. Because the reporting of the corresponding RCU-preempt quiescent state is now deferred until after interrupts have been enabled, it is no longer possible for this situation to result in deadlocks involving the scheduler's runqueue and priority-inheritance locks. This may allow some code simplification that might reduce interrupt latency a bit. Unfortunately, in practice this would also defer deboosting a low-priority task that had been subjected to RCU priority boosting, so real-time-response considerations might well force this restriction to remain in place. Because RCU-preempt grace periods are now blocked not only by RCU read-side critical sections, but also by disabling of interrupts, preemption, and softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels. This may require some additional plumbing to provide the network denial-of-service guarantees that have been traditionally provided by RCU-bh. Once these are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched. This would mean that all kernels would have but one flavor of RCU, which would open the door to significant code cleanup. Moving to a single flavor of RCU would also have the beneficial effect of reducing the NOCB kthreads by at least a factor of two. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback from Joel Fernandes. ] [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located by kbuild test robot involving recursion via rcu_preempt_deferred_qs(). ]	2018-08-30 16:02:34 -07:00
Byungchul Park	cf7614e13c	rcu: Refactor rcu_{nmi,irq}_{enter,exit}() When entering or exiting irq or NMI handlers, the current code uses ->dynticks_nmi_nesting to detect if it is in the outermost handler, that is, the one interrupting or returning to an RCU-idle context (the idle loop or nohz_full usermode execution). When entering the outermost handler via an interrupt (as opposed to NMI), it is necessary to invoke rcu_dynticks_task_exit() just before the CPU is marked non-idle from an RCU perspective and to invoke rcu_cleanup_after_idle() just after the CPU is marked non-idle. Similarly, when exiting the outermost handler via an interrupt, it is necessary to invoke rcu_prepare_for_idle() just before marking the CPU idle and to invoke rcu_dynticks_task_enter() just after marking the CPU idle. The decision to execute these four functions is currently taken in rcu_irq_enter() and rcu_irq_exit() as follows: rcu_irq_enter() /* A conditional branch with ->dynticks_nmi_nesting / rcu_nmi_enter() / A conditional branch with ->dynticks / / A conditional branch with ->dynticks_nmi_nesting / rcu_irq_exit() / A conditional branch with ->dynticks_nmi_nesting / rcu_nmi_exit() / A conditional branch with ->dynticks_nmi_nesting / / A conditional branch with ->dynticks_nmi_nesting / rcu_nmi_enter() / A conditional branch with ->dynticks / rcu_nmi_exit() / A conditional branch with ->dynticks_nmi_nesting / This works, but the conditional branches in rcu_irq_enter() and rcu_irq_exit() are redundant with those in rcu_nmi_enter() and rcu_nmi_exit(), respectively. Redundant branches are not something we want in the to/from-idle fastpaths, so this commit refactors rcu_{nmi,irq}_{enter,exit}() so they use a common inlined function passed a constant argument as follows: rcu_irq_enter() inlining rcu_nmi_enter_common(irq=true) / A conditional branch with ->dynticks / rcu_irq_exit() inlining rcu_nmi_exit_common(irq=true) / A conditional branch with ->dynticks_nmi_nesting / rcu_nmi_enter() inlining rcu_nmi_enter_common(irq=false) / A conditional branch with ->dynticks / rcu_nmi_exit() inlining rcu_nmi_exit_common(irq=false) / A conditional branch with ->dynticks_nmi_nesting */ The combination of the constant function argument and the inlining allows the compiler to discard the conditionals that previously controlled execution of rcu_dynticks_task_exit(), rcu_cleanup_after_idle(), rcu_prepare_for_idle(), and rcu_dynticks_task_enter(). This reduces both the to-idle and from-idle path lengths by two conditional branches each, and improves readability as well. This commit also changes order of execution from this: rcu_dynticks_task_exit(); rcu_dynticks_eqs_exit(); trace_rcu_dyntick(); rcu_cleanup_after_idle(); To this: rcu_dynticks_task_exit(); rcu_dynticks_eqs_exit(); rcu_cleanup_after_idle(); trace_rcu_dyntick(); In other words, the calls to rcu_cleanup_after_idle() and trace_rcu_dyntick() are reversed. This has no functional effect because the real concern is whether a given call is before or after the call to rcu_dynticks_eqs_exit(), and this patch does not change that. Before the call to rcu_dynticks_eqs_exit(), RCU is not yet watching the current CPU and after that call RCU is watching. A similar switch in calling order happens on the idle-entry path, with similar lack of effect for the same reasons. Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Applied Steven Rostedt feedback. ] Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-30 16:00:46 -07:00
Jiri Olsa	bf06278c3f	perf/hw_breakpoint: Simplify breakpoint enable in perf_event_modify_breakpoint We can safely enable the breakpoint back for both the fail and success paths by checking only the bp->attr.disabled, which either holds the new 'requested' disabled state or the original breakpoint state. Committer testing: At the end of the series, the 'perf test' entry introduced as the first patch now runs to completion without finding the fixed issues: # perf test "bp modify" 62: x86 bp modify : Ok # In verbose mode: # perf test -v "bp modify" 62: x86 bp modify : --- start --- test child forked, pid 5161 rip 5950a0, bp_1 0x5950a0 in bp_1 rip 5950a0, bp_1 0x5950a0 in bp_1 test child finished with 0 ---- end ---- x86 bp modify: Ok Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Milind Chabbi <chabbi.milind@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180827091228.2878-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 14:49:24 -03:00
Jiri Olsa	969558371b	perf/hw_breakpoint: Enable breakpoint in modify_user_hw_breakpoint Currently we enable the breakpoint back only if the breakpoint modification was successful. If it fails we can leave the breakpoint in disabled state with attr->disabled == 0. We can safely enable the breakpoint back for both the fail and success paths by checking the bp->attr.disabled, which either holds the new 'requested' disabled state or the original breakpoint state. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Milind Chabbi <chabbi.milind@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180827091228.2878-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 14:49:23 -03:00
Jiri Olsa	cb45302d7c	perf/hw_breakpoint: Remove superfluous bp->attr.disabled = 0 Once the breakpoint was succesfully modified, the attr->disabled value is in bp->attr.disabled. So there's no reason to set it again, removing that. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Milind Chabbi <chabbi.milind@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180827091228.2878-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 14:49:23 -03:00
Jiri Olsa	bd14406b78	perf/hw_breakpoint: Modify breakpoint even if the new attr has disabled set We need to change the breakpoint even if the attr with new fields has disabled set to true. Current code prevents following user code to change the breakpoint address: ptrace(PTRACE_POKEUSER, child, offsetof(struct user, u_debugreg[0]), addr_1) ptrace(PTRACE_POKEUSER, child, offsetof(struct user, u_debugreg[0]), addr_2) ptrace(PTRACE_POKEUSER, child, offsetof(struct user, u_debugreg[7]), dr7) The first PTRACE_POKEUSER creates the breakpoint with attr.disabled set to true: ptrace_set_breakpoint_addr(nr = 0) struct perf_event bp = t->ptrace_bps[nr]; ptrace_register_breakpoint(..., disabled = true) ptrace_fill_bp_fields(..., disabled) register_user_hw_breakpoint So the second PTRACE_POKEUSER will be omitted: ptrace_set_breakpoint_addr(nr = 0) struct perf_event bp = t->ptrace_bps[nr]; struct perf_event_attr attr = bp->attr; modify_user_hw_breakpoint(bp, &attr) if (!attr->disabled) modify_user_hw_breakpoint_check Reported-by: Milind Chabbi <chabbi.milind@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180827091228.2878-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 14:49:23 -03:00
Yonghong Song	c7b27c37af	bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash Added bpffs pretty print for percpu arraymap, percpu hashmap and percpu lru hashmap. For each map <key, value> pair, the format is: <key_value>: { cpu0: <value_on_cpu0> cpu1: <value_on_cpu1> ... cpun: <value_on_cpun> } For example, on my VM, there are 4 cpus, and for test_btf test in the next patch: cat /sys/fs/bpf/pprint_test_percpu_hash You may get: ... 43602: { cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} } 72847: { cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} } ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-30 14:03:53 +02:00
Mukesh Ojha	13ba17bee1	notifier: Remove notifier header file wherever not used The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org	2018-08-30 12:56:40 +02:00
Vincent Whitchurch	cb9d7fd51d	watchdog: Mark watchdog touch functions as notrace Some architectures need to use stop_machine() to patch functions for ftrace, and the assumption is that the stopped CPUs do not make function calls to traceable functions when they are in the stopped state. Commit `ce4f06dcbb` ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE") added calls to the watchdog touch functions from the stopped CPUs and those functions lack notrace annotations. This leads to crashes when enabling/disabling ftrace on ARM kernels built with the Thumb-2 instruction set. Fix it by adding the necessary notrace annotations. Fixes: `ce4f06dcbb` ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: oleg@redhat.com Cc: tj@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com	2018-08-30 12:56:40 +02:00
Edward Cree	8efea21d33	bpf/verifier: display non-spill stack slot types in print_verifier_state If a stack slot does not hold a spilled register (STACK_SPILL), then each of its eight bytes could potentially have a different slot_type. This information can be important for debugging, and previously we either did not print anything for the stack slot, or just printed fp-X=0 in the case where its first byte was STACK_ZERO. Instead, print eight characters with either 0 (STACK_ZERO), m (STACK_MISC) or ? (STACK_INVALID) for any stack slot which is neither STACK_SPILL nor entirely STACK_INVALID. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-29 18:52:12 -07:00
Edward Cree	679c782de1	bpf/verifier: per-register parent pointers By giving each register its own liveness chain, we elide the skip_callee() logic. Instead, each register's parent is the state it inherits from; both check_func_call() and prepare_func_exit() automatically connect reg states to the correct chain since when they copy the reg state across (r1-r5 into the callee as args, and r0 out as the return value) they also copy the parent pointer. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-29 18:52:12 -07:00
Paul E. McKenney	7c590fcca6	rcutorture: Maintain self-propagating CB only during forward-progress test The current forward-progress testing maintains a self-propagating callback during the full test. This could result in false negatives for stutter-end checking, where it might appear that RCU was clearing out old callbacks only because it was being continually motivated by the self-propagating callback. This commit therefore shuts down the self-propagating callback at the end of each forward-progress test interval. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	474e59b476	rcutorture: Check GP completion at stutter end The rcu_torture_writer() function invokes stutter_wait() at the end of each writer pass, which occasionally blocks for an extended time period in order to ensure that RCU can handle intermittent loads. But part of handling a busy period is invoking all the callbacks before the end of the idle period induced by stutter_wait(). This commit therefore adds a return value to stutter_wait() indicating whether stutter_wait() actually waited. In addition, this commit causes rcu_torture_writer() to test this value and if set, checks that all the elements of the rcu_tortures[] array have been freed up. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	f4de46ed5b	rcutorture: Print forward-progress test interval on error This commit prints the duration of the forward-progress test interval in the case that no forward progress was observed as an aid to debugging. When forward progress does happen, it prints out the number of rcu_torture_writer() versions and grace periods that elapsed during the forward-progress test. At the end of the run, it also prints the number of attempted and actual forward-progress tests. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	c04dd09bd3	rcutorture: Adjust number of reader kthreads per CPU-hotplug operations Currently, rcutorture provisions rcu_torture_reader() kthreads based on the initial number of CPUs. This can be problematic when CPU hotplug is enabled, as a system with a very large number of CPUs will provision a very large number of rcu_torture_reader() kthreads. All of these kthreads will continue running even if the CPU-hotplug operations result in only one remaining online CPU. This can result in all sorts of strange artifacts due simply to massive overload. This commit therefore causes the rcu_torture_reader() kthreads to start blocking as the number of online CPUs decreases. This is accomplished by numbering these kthreads, and having each check to make sure that the number of online CPUs is at least as large as its assigned number. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	fecad5091f	rcutorture: Reduce priority of forward-progress testing On !SMP tests, the forward-progress kthread might prevent RCU's grace-period kthread from running, which would defeat RCU's forward-progress measures. On PREEMPT tests without RCU priority boosting, the forward-progress kthread might preempt a reader for an extended time period, which would also defeat RCU's forward-progress measures. This commit therefore reduced rcutorture's forward-progress kthread's priority in those cases. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	1e69676592	rcutorture: Limit reader duration if irq or bh disabled There are debug checks in some environments that will complain if the duration of a bh-disabled region of code exceeds about 50 milliseconds. Because rcu_read_delay() can produce a 50-millisecond delay and because there could be up to eight reader segments with such delays, this commit limits the maximum delay to 10 milliseconds if either interrupts or softirqs are disabled. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	3cff54a830	rcutorture: Increase rcu_read_delay() longdelay_ms RCU now takes certain actions 100 and 200 milliseconds into a grace period by default, but rcutorture only runs RCU read-side critical sections with durations up to 50 milliseconds. This commit therefore increases test coverage by increasing the maximum critical-section duration to 300 milliseconds. Note that the existing code automatically dials down the probability of long delays based on the maximum duration, which means that this change should not significantly change the rate of execution of RCU read-side critical sections. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	9fdcb9afe0	rcutorture: Add self-propagating callback to forward-progress testing If rcutorture is run on a quiet system with the rcutorture.stutter module parameter set high, then there can legitimately be an extended period during which no RCU forward progress takes place. This can result in false-positive no-forward-progress splats. This commit therefore makes rcu_torture_fwd_prog() create a self-propagating RCU callback to ensure that grace periods are in progress for the duration of the forward-progress test. Note that the RCU flavor under test must define ->call(), ->sync(), and ->cb_barrier() for this self-propagating callback to be created. If one or more of those rcu_torture_ops fields are NULL, then the rcu_torture_fwd_prog() function will silently proceed without creating the self-propagating callback. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	08a7a2ec68	rcutorture: Vary forward-progress test interval Some of the Linux kernel's RCU implementations provide several mechanisms to promote forward progress that operate over different timeframes. This commit therefore causes rcu_torture_fwd_prog() to vary the duration of its forward-progress testing in order to test each such mechanism. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	152f4afbfd	rcutorture: Avoid no-test complaint if too few forward-progress tries In a too-short test, random delays can cause each attempt to do forward-progress testing to fail to complete, thus resulting in spurious splats. This commit therefore requires at least five tries before complaining about rcutorture runs that failed to produce at least one valid forward-progress testing attempt. Note that actual forward-progress failures will splat regardless of the number of tries. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	119248bec9	rcutorture: Also use GP sequence to judge forward progress Currently, rcutorture relies solely on the progress of rcu_torture_writer() to judge grace-period forward progress. In theory, this is the gold standard of forward progress, but in practice rcutorture separately detects and reports rcu_torture_writer() stalls. This commit therefore adds the grace-period sequence number (when provided) to the judgment of grace-period forward progress, which makes it easier to distinguish between failure of actual grace periods to progress on the one hand and downstream forward-progress failures on the other. For example, given this change, if rcu_torture_writer() stalls, but rcu_torture_fwd_prog() does not complain, then the grace-period computation is working, which is a hint that the failure lies in callback processing, wakeup of the rcu_torture_writer() kthread, or similar. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	1b27291b1e	rcutorture: Add forward-progress tests for RCU grace periods This commit adds a kthread that loops going into and out of RCU read-side critical sections, but also including a cond_resched(), optionally guarded by a check of need_resched(), in that same loop. This commit relies solely on rcu_torture_writer() progress to judge the forward progress of grace periods. Note that Tasks RCU and SRCU are exempted from forward-progress testing due their (intentionally) less-robust forward-progress guarantees. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	f028806442	rcuperf: Warn on bad perf type for built-in tests When running a built-in rcuperf test, specifying an invalid perf type results in what looks like a hard hang, with the error messages hidden by other boot-time output. This commit therefore executes a WARN_ON() in this case so that the splat appears just following the error messages. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	e746b55857	rcutorture: Warn on bad torture type for built-in tests When running a built-in rcutorture test, specifying an invalid torture type results in what looks like a hard hang, with the error messages hidden by other boot-time output. This commit therefore executes a WARN_ON() in this case so that the splat appears just following the error messages. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Paul E. McKenney	444da518fd	rcutorture: Force occasional reader waits Deferred quiescent states can interact with the scheduler, but rcu_torture_reader() does not force such interaction all that frequently. This commit therefore blocks for one jiffy after ten jiffies of read-side runtime. This has the beneficial effect of being most likely to block just after long-running readers, and it is exactly these readers that are most likely to have been preempted (in CONFIG_PREEMPT=y kernels). This in turn helps increase the probability that a deferred quiescent state will be seen by RCU's context-switch hooks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
YueHaibing	efbaec89c6	bpf: remove duplicated include from syscall.c Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-29 17:28:00 +02:00
Arnd Bergmann	49c39f8464	y2038: signal: Change rt_sigtimedwait to use __kernel_timespec This changes sys_rt_sigtimedwait() to use get_timespec64(), changing the timeout type to __kernel_timespec, which will be changed to use a 64-bit time_t in the future. Since the do_sigtimedwait() core function changes, we also have to modify the compat version of this system call in the same way. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2018-08-29 15:42:25 +02:00
Arnd Bergmann	474b9c777b	y2038: sched: Change sched_rr_get_interval to use __kernel_timespec This is a preparation patch for converting sys_sched_rr_get_interval to work with 64-bit time_t on 32-bit architectures. The 'interval' argument is changed to struct __kernel_timespec, which will be redefined using 64-bit time_t in the future. The compat version of the system call in turn is enabled for compilation with CONFIG_COMPAT_32BIT_TIME so the individual 32-bit architectures can share the handling of the traditional argument with 64-bit architectures providing it for their compat mode. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2018-08-29 15:42:24 +02:00
kbuild test robot	743f5cdb6c	y2038: __get_old_timespec32() can be static The kbuild test robot reports two new warnings with the previous patch: kernel/time/time.c:866:5: sparse: symbol '__get_old_timespec32' was not declared. Should it be static? kernel/time/time.c:882:5: sparse: symbol '__put_old_timespec32' was not declared. Should it be static? These are actually older bugs, but came up now after the symbol got renamed. Fortunately, commit `afef05cf23` ("time: Enable get/put_compat_itimerspec64 always") makes the two functions (__compat_get_timespec64/__compat_get_timespec64) local to time.c already, so we can mark them as 'static'. Fixes: ee16c8f415e4 ("y2038: Globally rename compat_time to old_time32") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> [arnd: added changelog text] Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2018-08-29 15:42:19 +02:00
John Fastabend	501ca81760	bpf: sockmap, decrement copied count correctly in redirect error case Currently, when a redirect occurs in sockmap and an error occurs in the redirect call we unwind the scatterlist once in the error path of bpf_tcp_sendmsg_do_redirect() and then again in sendmsg(). Then in the error path of sendmsg we decrement the copied count by the send size. However, its possible we partially sent data before the error was generated. This can happen if do_tcp_sendpages() partially sends the scatterlist before encountering a memory pressure error. If this happens we need to decrement the copied value (the value tracking how many bytes were actually sent to TCP stack) by the number of remaining bytes _not_ the entire send size. Otherwise we risk confusing userspace. Also we don't need two calls to free the scatterlist one is good enough. So remove the one in bpf_tcp_sendmsg_do_redirect() and then properly reduce copied by the number of remaining bytes which may in fact be the entire send size if no bytes were sent. To do this use bool to indicate if free_start_sg() should do mem accounting or not. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-28 09:01:06 +02:00
Daniel Borkmann	15c480efab	bpf, sockmap: fix psock refcount leak in bpf_tcp_recvmsg In bpf_tcp_recvmsg() we first took a reference on the psock, however once we find that there are skbs in the normal socket's receive queue we return with processing them through tcp_recvmsg(). Problem is that we leak the taken reference on the psock in that path. Given we don't really do anything with the psock at this point, move the skb_queue_empty() test before we fetch the psock to fix this case. Fixes: `8934ce2fd0` ("bpf: sockmap redirect ingress support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-27 20:22:05 -07:00
Daniel Borkmann	e06fa9c16c	bpf, sockmap: fix potential use after free in bpf_tcp_close bpf_tcp_close() we pop the psock linkage to a map via psock_map_pop(). A parallel update on the sock hash map can happen between psock_map_pop() and lookup_elem_raw() where we override the element under link->hash / link->key. In bpf_tcp_close()'s lookup_elem_raw() we subsequently only test whether an element is present, but we do not test whether the element is infact the element we were looking for. We lock the sock in bpf_tcp_close() during that time, so do we hold the lock in sock_hash_update_elem(). However, the latter locks the sock which is newly updated, not the one we're purging from the hash table. This means that while one CPU is doing the lookup from bpf_tcp_close(), another CPU is doing the map update in parallel, dropped our sock from the hlist and released the psock. Subsequently the first CPU will find the new sock and attempts to drop and release the old sock yet another time. Fix is that we need to check the elements for a match after lookup, similar as we do in the sock map. Note that the hash tab elems are freed via RCU, so access to their link->hash / link->key is fine since we're under RCU read side there. Fixes: `e9db4ef6bf` ("bpf: sockhash fix omitted bucket lock in sock_close") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-27 20:22:05 -07:00
Linus Torvalds	050cdc6c95	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks. 2) Better fix for AB-BA deadlock in packet scheduler code, from Cong Wang. 3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel Borkmann. 4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent attackers using it as a side-channel. From Eric Dumazet. 5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva. 6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin Liu. 7) hns and hns3 bug fixes from Huazhong Tan. 8) Fix RIF leak in mlxsw driver, from Ido Schimmel. 9) iova range check fix in vhost, from Jason Wang. 10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend. 11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng. 12) Memory exposure in TCA_U32_SEL handling, from Kees Cook. 13) TCP BBR congestion control fixes from Kevin Yang. 14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger. 15) qed driver fixes from Tomer Tayar. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits) net: sched: Fix memory exposure from short TCA_U32_SEL qed: fix spelling mistake "comparsion" -> "comparison" vhost: correctly check the iova range when waking virtqueue qlge: Fix netdev features configuration. net: macb: do not disable MDIO bus at open/close time Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency" net: macb: Fix regression breaking non-MDIO fixed-link PHYs mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge i40e: fix condition of WARN_ONCE for stat strings i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled ixgbe: fix driver behaviour after issuing VFLR ixgbe: Prevent unsupported configurations with XDP ixgbe: Replace GFP_ATOMIC with GFP_KERNEL igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback() igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init() igb: Use an advanced ctx descriptor for launchtime e1000: ensure to free old tx/rx rings in set_ringparam() e1000: check on netif_running() before calling e1000_up() ixgb: use dma_zalloc_coherent instead of allocator/memset ice: Trivial formatting fixes ...	2018-08-27 11:59:39 -07:00
Arnd Bergmann	9afc5eee65	y2038: globally rename compat_time to old_time32 Christoph Hellwig suggested a slightly different path for handling backwards compatibility with the 32-bit time_t based system calls: Rather than simply reusing the compat_sys_* entry points on 32-bit architectures unchanged, we get rid of those entry points and the compat_time types by renaming them to something that makes more sense on 32-bit architectures (which don't have a compat mode otherwise), and then share the entry points under the new name with the 64-bit architectures that use them for implementing the compatibility. The following types and interfaces are renamed here, and moved from linux/compat_time.h to linux/time32.h: old new --- --- compat_time_t old_time32_t struct compat_timeval struct old_timeval32 struct compat_timespec struct old_timespec32 struct compat_itimerspec struct old_itimerspec32 ns_to_compat_timeval() ns_to_old_timeval32() get_compat_itimerspec64() get_old_itimerspec32() put_compat_itimerspec64() put_old_itimerspec32() compat_get_timespec64() get_old_timespec32() compat_put_timespec64() put_old_timespec32() As we already have aliases in place, this patch addresses only the instances that are relevant to the system call interface in particular, not those that occur in device drivers and other modules. Those will get handled separately, while providing the 64-bit version of the respective interfaces. I'm not renaming the timex, rusage and itimerval structures, as we are still debating what the new interface will look like, and whether we will need a replacement at all. This also doesn't change the names of the syscall entry points, which can be done more easily when we actually switch over the 32-bit architectures to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix. Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2018-08-27 14:48:48 +02:00
Arnd Bergmann	33e2641819	y2038: make do_gettimeofday() and get_seconds() inline get_seconds() and do_gettimeofday() are only used by a few modules now any more (waiting for the respective patches to get accepted), and they are among the last holdouts of code that is not y2038 safe in the core kernel. Move the implementation into the timekeeping32.h header to clean up the core kernel and isolate the old interfaces further. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2018-08-27 14:45:58 +02:00
Arnd Bergmann	976516404f	y2038: remove unused time interfaces After many small patches, at least some of the deprecated interfaces have no remaining users any more and can be removed: current_kernel_time do_settimeofday get_monotonic_boottime get_monotonic_boottime64 get_monotonic_coarse get_monotonic_coarse64 getrawmonotonic64 ktime_get_real_ts timekeeping_clocktai timespec_trunc timespec_valid_strict time_to_tm For many of the remaining time functions, we are missing one or two patches that failed to make it into 4.19, they will be removed in the following merge window. The replacement functions for the removed interfaces are documented in Documentation/core-api/timekeeping.rst. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2018-08-27 12:59:56 +02:00
Linus Torvalds	d207ea8e74	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Kernel: - Improve kallsyms coverage - Add x86 entry trampolines to kcore - Fix ARM SPE handling - Correct PPC event post processing Tools: - Make the build system more robust - Small fixes and enhancements all over the place - Update kernel ABI header copies - Preparatory work for converting libtraceevnt to a shared library - License cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy' tools arch x86: Update tools's copy of cpufeatures.h perf python: Fix pyrf_evlist__read_on_cpu() interface perf mmap: Store real cpu number in 'struct perf_mmap' perf tools: Remove ext from struct kmod_path perf tools: Add gzip_is_compressed function perf tools: Add lzma_is_compressed function perf tools: Add is_compressed callback to compressions array perf tools: Move the temp file processing into decompress_kmodule perf tools: Use compression id in decompress_kmodule() perf tools: Store compression id into struct dso perf tools: Add compression id into 'struct kmod_path' perf tools: Make is_supported_compression() static perf tools: Make decompress_to_file() function static perf tools: Get rid of dso__needs_decompress() call in __open_dso() perf tools: Get rid of dso__needs_decompress() call in symbol__disassemble() perf tools: Get rid of dso__needs_decompress() call in read_object_code() tools lib traceevent: Change to SPDX License format perf llvm: Allow passing options to llc in addition to clang perf parser: Improve error message for PMU address filters ...	2018-08-26 11:25:21 -07:00
Linus Torvalds	a9ce323378	Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull licking update from Thomas Gleixner: "Mark the switch cases which fall through to the next case with the proper comment so the fallthrough compiler checks can be enabled" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Mark expected switch fall-throughs	2018-08-26 09:47:00 -07:00
Linus Torvalds	2923b27e54	libnvdimm-for-4.19_dax-memory-failure * memory_failure() gets confused by dev_pagemap backed mappings. The recovery code has specific enabling for several possible page states that needs new enabling to handle poison in dax mappings. Teach memory_failure() about ZONE_DEVICE pages. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAlt9ui8ACgkQYGjFFmlT OEpNRw//XGj9s7sezfJFeol4psJlRUd935yii/gmJRgi/yPf2VxxQG9qyM6SMBUc 75jASfOL6FSsfxHz0kplyWzMDNdrTkNNAD+9rv80FmY7GqWgcas9DaJX7jZ994vI 5SRO7pfvNZcXlo7IhqZippDw3yxkIU9Ufi0YQKaEUm7GFieptvCZ0p9x3VYfdvwM BExrxQe0X1XUF4xErp5P78+WUbKxP47DLcucRDig8Q7dmHELUdyNzo3E1SVoc7m+ 3CmvyTj6XuFQgOZw7ZKun1BJYfx/eD5ZlRJLZbx6wJHRtTXv/Uea8mZ8mJ31ykN9 F7QVd0Pmlyxys8lcXfK+nvpL09QBE0/PhwWKjmZBoU8AdgP/ZvBXLDL/D6YuMTg6 T4wwtPNJorfV4lVD06OliFkVI4qbKbmNsfRq43Ns7PCaLueu4U/eMaSwSH99UMaZ MGbO140XW2RZsHiU9yTRUmZq73AplePEjxtzR8oHmnjo45nPDPy8mucWPlkT9kXA oUFMhgiviK7dOo19H4eaPJGqLmHM93+x5tpYxGqTr0dUOXUadKWxMsTnkID+8Yi7 /kzQWCFvySz3VhiEHGuWkW08GZT6aCcpkREDomnRh4MEnETlZI8bblcuXYOCLs6c nNf1SIMtLdlsl7U1fEX89PNeQQ2y237vEDhFQZftaalPeu/JJV0= =Ftop -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm memory-failure update from Dave Jiang: "As it stands, memory_failure() gets thoroughly confused by dev_pagemap backed mappings. The recovery code has specific enabling for several possible page states and needs new enabling to handle poison in dax mappings. In order to support reliable reverse mapping of user space addresses: 1/ Add new locking in the memory_failure() rmap path to prevent races that would typically be handled by the page lock. 2/ Since dev_pagemap pages are hidden from the page allocator and the "compound page" accounting machinery, add a mechanism to determine the size of the mapping that encompasses a given poisoned pfn. 3/ Given pmem errors can be repaired, change the speculatively accessed poison protection, mce_unmap_kpfn(), to be reversible and otherwise allow ongoing access from the kernel. A side effect of this enabling is that MADV_HWPOISON becomes usable for dax mappings, however the primary motivation is to allow the system to survive userspace consumption of hardware-poison via dax. Specifically the current behavior is: mce: Uncorrected hardware memory error in user-access at af34214200 {1}[Hardware Error]: It has been corrected by h/w and requires no further action mce: [Hardware Error]: Machine check events logged {1}[Hardware Error]: event severity: corrected Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users [..] Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed mce: Memory error not recovered <reboot> ...and with these changes: Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000 Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption Memory failure: 0x20cb00: recovery action for dax page: Recovered Given all the cross dependencies I propose taking this through nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax folks" * tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm, pmem: Restore page attributes when clearing errors x86/memory_failure: Introduce {set, clear}_mce_nospec() x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses mm, memory_failure: Teach memory_failure() about dev_pagemap pages filesystem-dax: Introduce dax_lock_mapping_entry() mm, memory_failure: Collect mapping size in collect_procs() mm, madvise_inject_error: Let memory_failure() optionally take a page reference mm, dev_pagemap: Do not clear ->mapping on final put mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages filesystem-dax: Set page->index device-dax: Set page->index device-dax: Enable page_mapping() device-dax: Convert to vmf_insert_mixed and vm_fault_t	2018-08-25 18:43:59 -07:00
Linus Torvalds	596766102a	Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Just one commit from Steven to take out spin lock from trace event handlers" * 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/tracing: Move taking of spin lock out of trace event handlers	2018-08-24 13:19:27 -07:00
Linus Torvalds	9022ada8ab	Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Over the lockdep cross-release churn, workqueue lost some of the existing annotations. Johannes Berg restored it and also improved them" * 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: re-add lockdep dependencies for flushing workqueue: skip lockdep wq dependency in cancel_work_sync()	2018-08-24 13:16:36 -07:00
Linus Torvalds	4def196360	Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "This is a set of four fairly obvious bug fixes: - a switch from d_find_alias to d_find_any_alias because the xattr code perversely takes a dentry - two mutex vs copy_to_user fixes from Jann Horn - a fix to use a sanitized size not the size userspace passed in from Christian Brauner" * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: getxattr: use correct xattr length sys: don't hold uts_sem while accessing userspace memory userns: move user access out of the mutex cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()	2018-08-24 09:25:39 -07:00
Linus Torvalds	33e17876ea	Merge branch 'akpm' (patches from Andrew) Merge yet more updates from Andrew Morton: - the rest of MM - various misc fixes and tweaks * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) mm: Change return type int to vm_fault_t for fault handlers lib/fonts: convert comments to utf-8 s390: ebcdic: convert comments to UTF-8 treewide: convert ISO_8859-1 text comments to utf-8 drivers/gpu/drm/gma500/: change return type to vm_fault_t docs/core-api: mm-api: add section about GFP flags docs/mm: make GFP flags descriptions usable as kernel-doc docs/core-api: split memory management API to a separate file docs/core-api: move {str,mem}dup to "String Manipulation" docs/core-api: kill trailing whitespace in kernel-api.rst mm/util: add kernel-doc for kvfree mm/util: make strndup_user description a kernel-doc comment fs/proc/vmcore.c: hide vmcoredd_mmap_dumps() for nommu builds treewide: correct "differenciate" and "instanciate" typos fs/afs: use new return type vm_fault_t drivers/hwtracing/intel_th/msu.c: change return type to vm_fault_t mm: soft-offline: close the race against page allocation mm: fix race on soft-offlining free huge pages namei: allow restricted O_CREAT of FIFOs and regular files hfs: prevent crash on exit from failed search ...	2018-08-23 19:20:12 -07:00
Souptick Joarder	2b74030354	mm: Change return type int to vm_fault_t for fault handlers Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Ref-> commit `1c8f422059` ("mm: change return type to vm_fault_t") The aim is to change the return type of finish_fault() and handle_mm_fault() to vm_fault_t type. As part of that clean up return type of all other recursively called functions have been changed to vm_fault_t type. The places from where handle_mm_fault() is getting invoked will be change to vm_fault_t type but in a separate patch. vmf_error() is the newly introduce inline function in 4.17-rc6. [akpm@linux-foundation.org: don't shadow outer local `ret' in __do_huge_pmd_anonymous_page()] Link: http://lkml.kernel.org/r/20180604171727.GA20279@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-23 18:48:44 -07:00
Arnd Bergmann	3723c63247	treewide: convert ISO_8859-1 text comments to utf-8 Almost all files in the kernel are either plain text or UTF-8 encoded. A couple however are ISO_8859-1, usually just a few characters in a C comments, for historic reasons. This converts them all to UTF-8 for consistency. Link: http://lkml.kernel.org/r/20180724111600.4158975-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Simon Horman <horms@verge.net.au> [IPVS portion] Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [IIO] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Rob Herring <robh@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rob Herring <robh+dt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-23 18:48:43 -07:00
Salvatore Mesoraca	30aba6656f	namei: allow restricted O_CREAT of FIFOs and regular files Disallows open of FIFOs or regular files not owned by the user in world writable sticky directories, unless the owner is the same as that of the directory or the file is opened without the O_CREAT flag. The purpose is to make data spoofing attacks harder. This protection can be turned on and off separately for FIFOs and regular files via sysctl, just like the symlinks/hardlinks protection. This patch is based on Openwall's "HARDEN_FIFO" feature by Solar Designer. This is a brief list of old vulnerabilities that could have been prevented by this feature, some of them even allow for privilege escalation: CVE-2000-1134 CVE-2007-3852 CVE-2008-0525 CVE-2009-0416 CVE-2011-4834 CVE-2015-1838 CVE-2015-7442 CVE-2016-7489 This list is not meant to be complete. It's difficult to track down all vulnerabilities of this kind because they were often reported without any mention of this particular attack vector. In fact, before hardlinks/symlinks restrictions, fifos/regular files weren't the favorite vehicle to exploit them. [s.mesoraca16@gmail.com: fix bug reported by Dan Carpenter] Link: https://lkml.kernel.org/r/20180426081456.GA7060@mwanda Link: http://lkml.kernel.org/r/1524829819-11275-1-git-send-email-s.mesoraca16@gmail.com [keescook@chromium.org: drop pr_warn_ratelimited() in favor of audit changes in the future] [keescook@chromium.org: adjust commit subjet] Link: http://lkml.kernel.org/r/20180416175918.GA13494@beast Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Suggested-by: Solar Designer <solar@openwall.com> Suggested-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-23 18:48:43 -07:00
Linus Torvalds	06e386a1db	fbdev changes for v4.19: - add support for deferred console takeover, when enabled defers fbcon taking over the console from the dummy console until the first text is displayed on the console - together with the "quiet" kernel commandline option this allows fbcon to still be used together with a smooth graphical bootup (Hans de Goede) - improve console locking debugging code (Thomas Zimmermann) - copy the ACPI BGRT boot graphics to the framebuffer when deferred console takeover support is used in efifb driver (Hans de Goede) - update udlfb driver - fix lost console when the user unplugs a USB adapter, fix the screen corruption issue, fix locking and add some performance optimizations (Mikulas Patocka) - update pxafb driver - fix using uninitialized memory, switch to devm_* API, handle initialization errors and add support for lcd-supply regulator (Daniel Mack) - add support for boards booted with a DeviceTree in pxa3xx_gcu driver (Daniel Mack) - rename omap2 module to omap2fb.ko to avoid conflicts with omap1 driver (Arnd Bergmann) - enable ACPI-based enumeration for goldfishfb driver (Yu Ning) - fix goldfishfb driver to make user space Android code use 60 fps (Christoffer Dall) - print big fat warning when nomodeset kernel parameter is used in vgacon driver (Lyude Paul) - remove VLA usage from fsl-diu-fb driver (Kees Cook) - misc fixes (Julia Lawall, Geert Uytterhoeven, Fredrik Noring, Yisheng Xie, Dan Carpenter, Daniel Vetter, Anton Vasilyev, Randy Dunlap, Gustavo A. R. Silva, Colin Ian King, Fengguang Wu) - misc cleanups (Roman Kiryanov, Yisheng Xie, Colin Ian King) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJbfqB3AAoJEH4ztj+gR8ILakkP/RO8oXz11Gkb/EzqrH4woHa/ Wpzia5e3NDYllSWAP/t5UeaDbL5WuHzzLkeCEKXFUhFInR5jM19TF9sdUR8xUzGu dKCAdUaQWsO0rSe04fMHlJXk6RCSl0n0gfd7JM6qhM9YEuEpFM1k17/aKoy8pcwX m3bqfQHZAbLxo4UveBVeorTkx7F5bpanbQoCa4YDaDkU3CCTwohBLNR32zPdeMis Sy0WDdRjuvQvFui2eTU3kPFnHMKETijoS17rq72s6PNtI86yVBCvEE+08jIJZ6Hi A2iePfUuQd5FwzKilRoTR3X8XDIvkZbQ2idKeWgwYJOGM7C2JaDyMXvtkgS3/QeS muiwxLAdqZWG2/8Hb3HviRDOGqvAMWRzQbRgQsndM6pgI3MJ2CyfpvGfI1EhQoe3 WFp0eLa8rOqZ2jUx8mcEhuYOzO/PobMD4sYW+GrxGYFvrMwexvm4sJ3rxf4xx49F upvyDDDTdHHjUrOTiIM7bwTYnMMWSfZQhUBPPy7fJJsGU6/GqPUM5ReOclMCtC8m 5sbiJKuHR/SyJtbRinVV3e/cfmFXHCauD3L4wFIkpC0ZQlRaHC/bjz0Pdbwxgxua Ug1+5CFFYqh8cKUTelfNTm1g02zHqEaMh2zvHYrqS7DIHidj3Bvn6SQxj7ndeHZG QTjrpbGhTc68bSOPZZ+b =7UE9 -----END PGP SIGNATURE----- Merge tag 'fbdev-v4.19' of https://github.com/bzolnier/linux Pull fbdev updates from Bartlomiej Zolnierkiewicz: "Mostly small fixes and cleanups for fb drivers (the biggest updates are for udlfb and pxafb drivers). This also adds deferred console takeover support to the console code and efifb driver. Summary: - add support for deferred console takeover, when enabled defers fbcon taking over the console from the dummy console until the first text is displayed on the console - together with the "quiet" kernel commandline option this allows fbcon to still be used together with a smooth graphical bootup (Hans de Goede) - improve console locking debugging code (Thomas Zimmermann) - copy the ACPI BGRT boot graphics to the framebuffer when deferred console takeover support is used in efifb driver (Hans de Goede) - update udlfb driver - fix lost console when the user unplugs a USB adapter, fix the screen corruption issue, fix locking and add some performance optimizations (Mikulas Patocka) - update pxafb driver - fix using uninitialized memory, switch to devm_* API, handle initialization errors and add support for lcd-supply regulator (Daniel Mack) - add support for boards booted with a DeviceTree in pxa3xx_gcu driver (Daniel Mack) - rename omap2 module to omap2fb.ko to avoid conflicts with omap1 driver (Arnd Bergmann) - enable ACPI-based enumeration for goldfishfb driver (Yu Ning) - fix goldfishfb driver to make user space Android code use 60 fps (Christoffer Dall) - print big fat warning when nomodeset kernel parameter is used in vgacon driver (Lyude Paul) - remove VLA usage from fsl-diu-fb driver (Kees Cook) - misc fixes (Julia Lawall, Geert Uytterhoeven, Fredrik Noring, Yisheng Xie, Dan Carpenter, Daniel Vetter, Anton Vasilyev, Randy Dunlap, Gustavo A. R. Silva, Colin Ian King, Fengguang Wu) - misc cleanups (Roman Kiryanov, Yisheng Xie, Colin Ian King)" * tag 'fbdev-v4.19' of https://github.com/bzolnier/linux: (54 commits) Documentation/fb: corrections for fbcon.txt fbcon: Do not takeover the console from atomic context dummycon: Stop exporting dummycon_[un]register_output_notifier fbcon: Only defer console takeover if the current console driver is the dummycon fbcon: Only allow FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER if fbdev is builtin fbdev: omap2: omapfb: fix ifnullfree.cocci warnings fbdev: omap2: omapfb: fix bugon.cocci warnings fbdev: omap2: omapfb: fix boolreturn.cocci warnings fb: amifb: fix build warnings when not builtin fbdev/core: Disable console-lock warnings when fb.lockless_register_fb is set console: Replace #if 0 with atomic var 'ignore_console_lock_warning' udlfb: use spin_lock_irq instead of spin_lock_irqsave udlfb: avoid prefetch udlfb: optimization - test the backing buffer udlfb: allow reallocating the framebuffer udlfb: set line_length in dlfb_ops_set_par udlfb: handle allocation failure udlfb: set optimal write delay udlfb: make a local copy of fb_ops udlfb: don't switch if we are switching to the same videomode ...	2018-08-23 15:44:58 -07:00
Linus Torvalds	452938cbd8	Masami found an off by one bug in the code that keeps "notrace" functions from being traced by kprobes. During my testing, I found that there's places that we may want to add kprobes to notrace, thus we may end up changing this code before 4.19 is released. The history behind this change is that we found that adding kprobes to various notrace functions caused the kernel to crashed. We took the safe route and decided not to allow kprobes to trace any notrace function. But because notrace is added to functions that just cause weird side effects to the function tracer, but are still safe, preventing kprobes for all notrace functios may be too much of a big hammer. One such place is __schedule() is marked notrace, to keep function tracer from doing strange recursive loops when it gets traced with NEED_RESCHED set. With this change, one can not add kprobes to the scheduler. Masami also added code to use gcov on ftrace. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW34RpBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qhwzAQDITRob1VjOF8RavMbfBZZuhwHPQROF XsIFcD+jghFOUgEA1zXX4w+SwOV7S+chpGKMh3DlNKIbYPA+8eiMNFfDsA4= =u+oq -----END PGP SIGNATURE----- Merge tag 'trace-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Masami found an off by one bug in the code that keeps "notrace" functions from being traced by kprobes. During my testing, I found that there's places that we may want to add kprobes to notrace, thus we may end up changing this code before 4.19 is released. The history behind this change is that we found that adding kprobes to various notrace functions caused the kernel to crashed. We took the safe route and decided not to allow kprobes to trace any notrace function. But because notrace is added to functions that just cause weird side effects to the function tracer, but are still safe, preventing kprobes for all notrace functios may be too much of a big hammer. One such place is __schedule() is marked notrace, to keep function tracer from doing strange recursive loops when it gets traced with NEED_RESCHED set. With this change, one can not add kprobes to the scheduler. Masami also added code to use gcov on ftrace" * tag 'trace-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Fix to check notrace function with correct range tracing: Allow gcov profiling on only ftrace subsystem	2018-08-23 13:07:00 -07:00
Daniel Borkmann	c020347576	bpf: use per htab salt for bucket hash All BPF hash and LRU maps currently have a known and global seed we feed into jhash() which is 0. This is suboptimal, thus fix it by generating a random seed upon hashtab setup time which we can later on feed into jhash() on lookup, update and deletions. Fixes: `0f8e4bd8a1` ("bpf: add hashtable type of eBPF maps") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Reviewed-by: Eduardo Valentin <eduval@amazon.com>	2018-08-23 18:45:47 +02:00
Linus Torvalds	5bed49adfe	for-4.19/post-20180822 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlt9on8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpj1xEADKBmJlV9aVyxc5w6XggqAGeHqI4afFrl+v 9fW6WUQMAaBUrr7PMIEJQ0Zm4B7KxgBaEWNtuuj4ULkjpgYm2AuGUuTJSyKz41rS Ma+KNyCA2Zmq4SvwGFbcdCuCbUqnoxTycscAgCjuDvIYLW0+nFGNc47ibmu9lZIV 33Ef5LrxuCjhC2zyNxEdWpUxDCjoYzock85LW+wYyIYLU9uKdoExS+YmT8U+ebA/ AkXBcxPztNDxwkcsIwgGVoTjwxiowqGz3uueWfyEmYgaCPiNOsxkoNQAtjX4ykQE hnqnHWyzJkRwbYo7Vd/bRAZXvszKGYE1YcJmu5QrNf0dK5MSq2o5OYJAEJWbucPj m0R2u7O9qbS2JEnxGrm5+oYJwBzwNY5/Lajr15WkljTqobKnqcvn/Hdgz/XdGtek 0S1QHkkBsF7e+cax8sePWK+O3ilY7pl9CzyZKB/tJngl8A45Jv8xVojg0v3O7oS+ zZib0rwWg/bwR/uN6uPCDcEsQusqL5YovB7m6NRVshwz6cV1zVNp2q+iOulk7KuC MprW4Du9CJf8HA19XtyJIG1XLstnuz+Exy+i5BiimUJ5InoEFDuj/6OZa6Qaczbo SrDDvpGtSf4h7czKpE5kV4uZiTOrjuI30TrI+4csdZ7HQIlboxNL72seNTLJs55F nbLjRM8L6g== =FS7e -----END PGP SIGNATURE----- Merge tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block Pull more block updates from Jens Axboe: - Set of bcache fixes and changes (Coly) - The flush warn fix (me) - Small series of BFQ fixes (Paolo) - wbt hang fix (Ming) - blktrace fix (Steven) - blk-mq hardware queue count update fix (Jianchao) - Various little fixes * tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block: (31 commits) block/DAC960.c: make some arrays static const, shrinks object size blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter blk-mq: init hctx sched after update ctx and hctx mapping block: remove duplicate initialization tracing/blktrace: Fix to allow setting same value pktcdvd: fix setting of 'ret' error return for a few cases block: change return type to bool block, bfq: return nbytes and not zero from struct cftype .write() method block, bfq: improve code of bfq_bfqq_charge_time block, bfq: reduce write overcharge block, bfq: always update the budget of an entity when needed block, bfq: readd missing reset of parent-entity service blk-wbt: fix IO hang in wbt_wait() block: don't warn for flush on read-only device bcache: add the missing comments for smp_mb()/smp_wmb() bcache: remove unnecessary space before ioctl function pointer arguments bcache: add missing SPDX header bcache: move open brace at end of function definitions to next line bcache: add static const prefix to char * array declarations bcache: fix code comments style ...	2018-08-22 13:38:05 -07:00
John Fastabend	9b2e0388be	bpf: sockmap: write_space events need to be passed to TCP handler When sockmap code is using the stream parser it also handles the write space events in order to handle the case where (a) verdict redirects skb to another socket and (b) the sockmap then sends the skb but due to memory constraints (or other EAGAIN errors) needs to do a retry. But the initial code missed a third case where the skb_send_sock_locked() triggers an sk_wait_event(). A typically case would be when sndbuf size is exceeded. If this happens because we do not pass the write_space event to the lower layers we never wake up the event and it will wait for sndtimeo. Which as noted in ktls fix may be rather large and look like a hang to the user. To reproduce the best test is to reduce the sndbuf size and send 1B data chunks to stress the memory handling. To fix this pass the event from the upper layer to the lower layer. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-22 21:58:20 +02:00
Linus Torvalds	cd9b44f907	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: - the rest of MM - procfs updates - various misc things - more y2038 fixes - get_maintainer updates - lib/ updates - checkpatch updates - various epoll updates - autofs updates - hfsplus - some reiserfs work - fatfs updates - signal.c cleanups - ipc/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits) ipc/util.c: update return value of ipc_getref from int to bool ipc/util.c: further variable name cleanups ipc: simplify ipc initialization ipc: get rid of ids->tables_initialized hack lib/rhashtable: guarantee initial hashtable allocation lib/rhashtable: simplify bucket_table_alloc() ipc: drop ipc_lock() ipc/util.c: correct comment in ipc_obtain_object_check ipc: rename ipcctl_pre_down_nolock() ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid() ipc: reorganize initialization of kern_ipc_perm.seq ipc: compute kern_ipc_perm.id under the ipc lock init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp adfs: use timespec64 for time conversion kernel/sysctl.c: fix typos in comments drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md fork: don't copy inconsistent signal handler state to child signal: make get_signal() return bool signal: make sigkill_pending() return bool ...	2018-08-22 12:34:08 -07:00
Daniel Borkmann	eb29429d81	bpf, sockmap: fix sock hash count in alloc_sock_hash_elem When we try to allocate a new sock hash entry and the allocation fails, then sock hash map fails to reduce the map element counter, meaning we keep accounting this element although it was never used. Fix it by dropping the element counter on error. Fixes: `8111038444` ("bpf: sockmap, add hash map support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com>	2018-08-22 20:35:18 +02:00
Daniel Borkmann	b845c898b2	bpf, sockmap: fix sock_hash_alloc and reject zero-sized keys Currently, it is possible to create a sock hash map with key size of 0 and have the kernel return a fd back to user space. This is invalid for hash maps (and kernel also hasn't been tested for zero key size support in general at this point). Thus, reject such configuration. Fixes: `8111038444` ("bpf: sockmap, add hash map support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com>	2018-08-22 20:34:51 +02:00
Randy Dunlap	5f733e8a2d	kernel/sysctl.c: fix typos in comments Fix a few typos/spellos in kernel/sysctl.c. Link: http://lkml.kernel.org/r/bb09a8b9-f984-6dd4-b07b-3ecaf200862e@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Jann Horn	06e62a46bb	fork: don't copy inconsistent signal handler state to child Before this change, if a multithreaded process forks while one of its threads is changing a signal handler using sigaction(), the memcpy() in copy_sighand() can race with the struct assignment in do_sigaction(). It isn't clear whether this can cause corruption of the userspace signal handler pointer, but it definitely can cause inconsistency between different fields of struct sigaction. Take the appropriate spinlock to avoid this. I have tested that this patch prevents inconsistency between sa_sigaction and sa_flags, which is possible before this patch. Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	20ab7218d2	signal: make get_signal() return bool make get_signal() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-18-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	f99e9d8c5c	signal: make sigkill_pending() return bool sigkill_pending() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-17-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	a19e2c01f7	signal: make legacy_queue() return bool legacy_queue() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-16-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	acd14e62f0	signal: make wants_signal() return bool wants_signal() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-15-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	8f11351eee	signal: make flush_sigqueue_mask() void The return value of flush_sigqueue_mask() is never checked anywhere. Link: http://lkml.kernel.org/r/20180602103653.18181-14-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	67a48a2447	signal: make unhandled_signal() return bool unhandled_signal() already behaves like a boolean function. Let's actually declare it as such too. All callers treat it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-13-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	09ae854edb	signal: make recalc_sigpending_tsk() return bool recalc_sigpending_tsk() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-12-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	938696a829	signal: make has_pending_signals() return bool has_pending_signals() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-11-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	6a0cdcd788	signal: make sig_ignored() return bool sig_ignored() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-10-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:51 -07:00
Christian Brauner	41aaa48119	signal: make sig_task_ignored() return bool sig_task_ignored() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-9-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:50 -07:00
Christian Brauner	e4a8b4efbf	signal: make sig_handler_ignored() return bool sig_handler_ignored() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-8-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:50 -07:00
Christian Brauner	2a9b909409	signal: make kill_ok_by_cred() return bool kill_ok_by_cred() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-7-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:50 -07:00
Christian Brauner	d8f993b3db	signal: simplify rt_sigaction() The goto is not needed and does not add any clarity. Simply return -EINVAL on unexpected sigset_t struct size directly. Link: http://lkml.kernel.org/r/20180602103653.18181-6-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:50 -07:00
Christian Brauner	b1d294c803	signal: make do_sigpending() void do_sigpending() returned 0 unconditionally so it doesn't make sense to have it return at all. This allows us to simplify a bunch of syscall callers. Link: http://lkml.kernel.org/r/20180602103653.18181-5-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:50 -07:00
Christian Brauner	6527de9533	signal: make may_ptrace_stop() return bool may_ptrace_stop() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-4-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:50 -07:00
Christian Brauner	bb17fcca07	signal: make kill_as_cred_perm() return bool kill_as_cred_perm() already behaves like a boolean function. Let's actually declare it as such too. Link: http://lkml.kernel.org/r/20180602103653.18181-3-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:50 -07:00
Christian Brauner	52cba1a274	signal: make force_sigsegv() void Patch series "signal: refactor some functions", v3. This series refactors a bunch of functions in signal.c to simplify parts of the code. The greatest single change is declaring the static do_sigpending() helper as void which makes it possible to remove a bunch of unnecessary checks in the syscalls later on. This patch (of 17): force_sigsegv() returned 0 unconditionally so it doesn't make sense to have it return at all. In addition, there are no callers that check force_sigsegv()'s return value. Link: http://lkml.kernel.org/r/20180602103653.18181-2-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:50 -07:00
Christoph Hellwig	e05a8e4d88	sched/wait: assert the wait_queue_head lock is held in __wake_up_common Better ensure we actually hold the lock using lockdep than just commenting on it. Due to the various exported _locked interfaces it is far too easy to get the locking wrong. Link: http://lkml.kernel.org/r/20171214152344.6880-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:47 -07:00
Ard Biesheuvel	46e0c9be20	kernel: tracepoints: add support for relative references To avoid the need for relocating absolute references to tracepoint structures at boot time when running relocatable kernels (which may take a disproportionate amount of space), add the option to emit these tables as relative references instead. Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morris <james.morris@microsoft.com> Cc: James Morris <jmorris@namei.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nico@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:47 -07:00
Ard Biesheuvel	1b1eeca7e4	init: allow initcall tables to be emitted using relative references Allow the initcall tables to be emitted using relative references that are only half the size on 64-bit architectures and don't require fixups at runtime on relocatable kernels. Link: http://lkml.kernel.org/r/20180704083651.24360-5-ard.biesheuvel@linaro.org Acked-by: James Morris <james.morris@microsoft.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morris <jmorris@namei.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nico@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:47 -07:00
Ard Biesheuvel	7290d58095	module: use relative references for __ksymtab entries An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab entries, each consisting of two 64-bit fields containing absolute references, to the symbol itself and to a char array containing its name, respectively. When we build the same configuration with KASLR enabled, we end up with an additional ~192 KB of relocations in the .init section, i.e., one 24 byte entry for each absolute reference, which all need to be processed at boot time. Given how the struct kernel_symbol that describes each entry is completely local to module.c (except for the references emitted by EXPORT_SYMBOL() itself), we can easily modify it to contain two 32-bit relative references instead. This reduces the size of the __ksymtab section by 50% for all 64-bit architectures, and gets rid of the runtime relocations entirely for architectures implementing KASLR, either via standard PIE linking (arm64) or using custom host tools (x86). Note that the binary search involving __ksymtab contents relies on each section being sorted by symbol name. This is implemented based on the input section names, not the names in the ksymtab entries, so this patch does not interfere with that. Given that the use of place-relative relocations requires support both in the toolchain and in the module loader, we cannot enable this feature for all architectures. So make it dependent on whether CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined. Link: http://lkml.kernel.org/r/20180704083651.24360-4-ard.biesheuvel@linaro.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jessica Yu <jeyu@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morris <james.morris@microsoft.com> Cc: James Morris <jmorris@namei.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nico@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:47 -07:00
Dmitry Vyukov	a2e5144538	kernel/hung_task.c: allow to set checking interval separately from timeout Currently task hung checking interval is equal to timeout, as the result hung is detected anywhere between timeout and 2*timeout. This is fine for most interactive environments, but this hurts automated testing setups (syzbot). In an automated setup we need to strictly order CPU lockup < RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall is not detected as task hung and task hung is not detected as silent machine loss. The large variance in task hung detection timeout requires setting silent machine loss timeout to a very large value (e.g. if task hung is 3 mins, then silent loss need to be set to ~7 mins). The additional 3 minutes significantly reduce testing efficiency because usually we crash kernel within a minute, and this can add hours to bug localization process as it needs to do dozens of tests. Allow setting checking interval separately from timeout. This allows to set timeout to, say, 3 minutes, but checking interval to 10 secs. The interval is controlled via a new hung_task_check_interval_secs sysctl, similar to the existing hung_task_timeout_secs sysctl. The default value of 0 results in the current behavior: checking interval is equal to timeout. [akpm@linux-foundation.org: update hung_task_timeout_max's comment] Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:47 -07:00
Arnd Bergmann	91bc9aaf74	kernel/crash_core.c: print timestamp using time64_t The get_seconds() call returns a 32-bit timestamp on some architectures, and will overflow in the future. The newer ktime_get_real_seconds() always returns a 64-bit timestamp that does not suffer from this problem. Link: http://lkml.kernel.org/r/20180618150329.941903-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Marc-Andr Lureau <marcandre.lureau@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:47 -07:00
Anna-Maria Gleixner	ce0a568d32	userns: use irqsave variant of refcount_dec_and_lock() The irqsave variant of refcount_dec_and_lock handles irqsave/restore when taking/releasing the spin lock. With this variant the call of local_irq_save/restore is no longer required. [bigeasy@linutronix.de: s@atomic_dec_and_lock@refcount_dec_and_lock@g] Link: http://lkml.kernel.org/r/20180703200141.28415-7-bigeasy@linutronix.de Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:47 -07:00
Sebastian Andrzej Siewior	fc37191272	userns: use refcount_t for reference counting instead atomic_t refcount_t type and corresponding API should be used instead of atomic_t wh en the variable is used as a reference counter. This avoids accidental refcounter overflows that might lead to use-after-free situations. Link: http://lkml.kernel.org/r/20180703200141.28415-6-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:46 -07:00
Omar Sandoval	23c85094fe	proc/kcore: add vmcoreinfo note to /proc/kcore The vmcoreinfo information is useful for runtime debugging tools, not just for crash dumps. A lot of this information can be determined by other means, but this is much more convenient, and it only adds a page at most to the file. Link: http://lkml.kernel.org/r/fddbcd08eed76344863303878b12de1c1e2a04b6.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:46 -07:00
Omar Sandoval	eff4345e7f	crash_core: use VMCOREINFO_SYMBOL_ARRAY() for swapper_pg_dir This is preparation for allowing CRASH_CORE to be enabled for any architecture. swapper_pg_dir is always either an array or a macro expanding to NULL. In the latter case, VMCOREINFO_SYMBOL() won't work, as it tries to take the address of the given symbol: #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) Instead, use VMCOREINFO_SYMBOL_ARRAY(), which uses the value: #define VMCOREINFO_SYMBOL_ARRAY(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name) This is the same thing for the array case but isn't an error for the macro case. Link: http://lkml.kernel.org/r/c05f9781ec204f40fc96f95086e7b6de6a3eb2c3.1532563124.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:46 -07:00
Andrew Morton	a670468f5e	mm: zero out the vma in vma_init() Rather than in vm_area_alloc(). To ensure that the various oddball stack-based vmas are in a good state. Some of the callers were zeroing them out, others were not. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:44 -07:00
Johannes Berg	87915adc3f	workqueue: re-add lockdep dependencies for flushing In flush_work(), we need to create a lockdep dependency so that the following scenario is appropriately tagged as a problem: work_function() { mutex_lock(&mutex); ... } other_function() { mutex_lock(&mutex); flush_work(&work); // or cancel_work_sync(&work); } This is a problem since the work might be running and be blocked on trying to acquire the mutex. Similarly, in flush_workqueue(). These were removed after cross-release partially caught these problems, but now cross-release was reverted anyway. IMHO the removal was erroneous anyway though, since lockdep should be able to catch potential problems, not just actual ones, and cross-release would only have caught the problem when actually invoking wait_for_completion(). Fixes: `fd1a5b04df` ("workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-08-22 08:31:38 -07:00
Johannes Berg	d6e89786be	workqueue: skip lockdep wq dependency in cancel_work_sync() In cancel_work_sync(), we can only have one of two cases, even with an ordered workqueue: * the work isn't running, just cancelled before it started * the work is running, but then nothing else can be on the workqueue before it Thus, we need to skip the lockdep workqueue dependency handling, otherwise we get false positive reports from lockdep saying that we have a potential deadlock when the workqueue also has other work items with locking, e.g. work1_function() { mutex_lock(&mutex); ... } work2_function() { /* nothing / } other_function() { queue_work(ordered_wq, &work1); queue_work(ordered_wq, &work2); mutex_lock(&mutex); cancel_work_sync(&work2); } As described above, this isn't a problem, but lockdep will currently flag it as if cancel_work_sync() was flush_work(), which is* a problem. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-08-22 08:31:37 -07:00
Linus Torvalds	dfec4a8478	More power management updates for 4.19-rc1 - Make the idle loop handle stopped scheduler tick correctly (Rafael Wysocki). - Prevent the menu cpuidle governor from letting CPUs spend too much time in shallow idle states when it is invoked with scheduler tick stopped and clean it up somewhat (Rafael Wysocki). - Avoid invoking the platform firmware to make the platform enter the ACPI S3 sleep state with suspended PCIe root ports which may confuse the firmware and cause it to crash (Rafael Wysocki). - Fix sysfs-related race in the ondemand and conservative cpufreq governors which may cause the system to crash if the governor module is removed during an update of CPU frequency limits (Henry Willard). - Select SRCU when building the system wakeup framework to avoid a build issue in it (zhangyi). - Make the descriptions of ACPI C-states vendor-neutral to avoid confusion (Prarit Bhargava). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJbfSnVAAoJEILEb/54YlRxBn4QAKQ8PqkSYkBby+1hb90ET4dk VaLkbCYXuzLK5rIDvnbYOALhVKo4B29Ex5GdCLN7cWkZMkrVKe7oX8QQTnp3/7lF URjTKgTNec5uJG652PrE3ESAa3X/kYggj6aeQOxDR4iYKzcpJEQ92ekFW+SoJTNp Jc2kZh3qkC2On64GB3ibsZaKnmHfPvLg0t4agwzuYq/Gff8NRJFk7kMwAPzqGzZo b2UVRcYFWIRkJjgmU9iInoeHIY8mBdT3IiKwTemZP1dOhb5T1AHOXwGTk6/cS+RH A9qx4eg7I3R00KmnYvO8WytYJeOu2qb83GIUx4fIJGOqfvevm5xkxB9F+nfE+ouj ROBqO4+X4XfQGPw8slayg0rJjI9JSkXLnLdl0Qw2WRlbc4/fVWntra1C57EeKFBR EG9UAF9+7nUUx0bOCLsfFF3+r9R3SDUjk7b4thyhYncyQRsYC+FL7ztlxnMzVtzW M5SF2sPrpcQzqmcszdUdbESI10n5X8m/crJW4rsbTxBpAM+coO+uLcvHWOY4MpkW BgBsR6bMDAlG/VlTFgeeP/tkCRd5zNlJi7yBFItXuOoVKXpnHCJuxq2WkZ1Rb74M Gk1d3TduekHJm8VsLEdCJR/tEk1cMc0zVUD/a1yzI4Z21QxvXUCqMDdws4/Ey184 qmKgNR9R94vSC5xIPRhM =9GrU -----END PGP SIGNATURE----- Merge tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These fix the main idle loop and the menu cpuidle governor, clean up the latter, fix a mistake in the PCI bus type's support for system suspend and resume, fix the ondemand and conservative cpufreq governors, address a build issue in the system wakeup framework and make the ACPI C-states desciptions less confusing. Specifics: - Make the idle loop handle stopped scheduler tick correctly (Rafael Wysocki). - Prevent the menu cpuidle governor from letting CPUs spend too much time in shallow idle states when it is invoked with scheduler tick stopped and clean it up somewhat (Rafael Wysocki). - Avoid invoking the platform firmware to make the platform enter the ACPI S3 sleep state with suspended PCIe root ports which may confuse the firmware and cause it to crash (Rafael Wysocki). - Fix sysfs-related race in the ondemand and conservative cpufreq governors which may cause the system to crash if the governor module is removed during an update of CPU frequency limits (Henry Willard). - Select SRCU when building the system wakeup framework to avoid a build issue in it (zhangyi). - Make the descriptions of ACPI C-states vendor-neutral to avoid confusion (Prarit Bhargava)" * tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: menu: Handle stopped tick more aggressively sched: idle: Avoid retaining the tick when it has been stopped PCI / ACPI / PM: Resume all bridges on suspend-to-RAM cpuidle: menu: Update stale polling override comment cpufreq: governor: Avoid accessing invalid governor_data x86/ACPI/cstate: Make APCI C1 FFH MWAIT C-state description vendor-neutral cpuidle: menu: Fix white space PM / sleep: wakeup: Fix build error caused by missing SRCU support	2018-08-22 07:42:36 -07:00
Linus Torvalds	0214f46b3a	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull core signal handling updates from Eric Biederman: "It was observed that a periodic timer in combination with a sufficiently expensive fork could prevent fork from every completing. This contains the changes to remove the need for that restart. This set of changes is split into several parts: - The first part makes PIDTYPE_TGID a proper pid type instead something only for very special cases. The part starts using PIDTYPE_TGID enough so that in __send_signal where signals are actually delivered we know if the signal is being sent to a a group of processes or just a single process. - With that prep work out of the way the logic in fork is modified so that fork logically makes signals received while it is running appear to be received after the fork completes" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits) signal: Don't send signals to tasks that don't exist signal: Don't restart fork when signals come in. fork: Have new threads join on-going signal group stops fork: Skip setting TIF_SIGPENDING in ptrace_init_task signal: Add calculate_sigpending() fork: Unconditionally exit if a fatal signal is pending fork: Move and describe why the code examines PIDNS_ADDING signal: Push pid type down into complete_signal. signal: Push pid type down into __send_signal signal: Push pid type down into send_signal signal: Pass pid type into do_send_sig_info signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task signal: Pass pid type into group_send_sig_info signal: Pass pid and pid type into send_sigqueue posix-timers: Noralize good_sigevent signal: Use PIDTYPE_TGID to clearly store where file signals will be sent pid: Implement PIDTYPE_TGID pids: Move the pgrp and session pid pointers from task_struct to signal_struct kvm: Don't open code task_pid in kvm_vcpu_ioctl pids: Compute task_tgid using signal->leader_pid ...	2018-08-21 13:47:29 -07:00
Rafael J. Wysocki	01ac7c4c2e	Merge branches 'pm-cpufreq', 'pm-pci' and 'pm-sleep' Merge fixes for the ondemand and conservative cpufreq governors, PCI power management and system wakeup framework. * pm-cpufreq: cpufreq: governor: Avoid accessing invalid governor_data * pm-pci: PCI / ACPI / PM: Resume all bridges on suspend-to-RAM * pm-sleep: PM / sleep: wakeup: Fix build error caused by missing SRCU support	2018-08-21 22:39:24 +02:00
Masami Hiramatsu	9161a864ff	tracing/kprobes: Fix to check notrace function with correct range Fix within_notrace_func() to check notrace function correctly. Since the ftrace_location_range(start, end) function checks the range inclusively (start <= ftrace-loc <= end), the end address must not include the entry address of next function. However, within_notrace_func() uses kallsyms_lookup_size_offset() to get the function size and calculate the end address from adding the size to the entry address. This means the end address is the entry address of the next function. In the result, within_notrace_func() fails to find notrace function if the next function of the target function is ftraced. Let's subtract 1 from the end address so that ftrace_location_range() can check it correctly. Link: http://lkml.kernel.org/r/153485669706.16611.17726752296213785504.stgit@devbox Fixes: commit `45408c4f92` ("tracing: kprobes: Prohibit probing on notrace function") Reported-by: Michael Rodin <michael@rodin.online> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-21 09:41:12 -04:00
Masami Hiramatsu	6b7dca401c	tracing: Allow gcov profiling on only ftrace subsystem Add GCOV_PROFILE_FTRACE to allow gcov profiling on only files in ftrace subsystem. This config option will be used for checking kselftest/ftrace coverage. Link: http://lkml.kernel.org/r/153483647755.32472.4746349899604275441.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-21 09:11:49 -04:00
Linus Torvalds	7140ad3898	Updates for v4.19: - Restructure of lockdep and latency tracers This is the biggest change. Joel Fernandes restructured the hooks from irqs and preemption disabling and enabling. He got rid of a lot of the preprocessor #ifdef mess that they caused. He turned both lockdep and the latency tracers to use trace events inserted in the preempt/irqs disabling paths. But unfortunately, these started to cause issues in corner cases. Thus, parts of the code was reverted back to where lockde and the latency tracers just get called directly (without using the trace events). But because the original change cleaned up the code very nicely we kept that, as well as the trace events for preempt and irqs disabling, but they are limited to not being called in NMIs. - Have trace events use SRCU for "rcu idle" calls. This was required for the preempt/irqs off trace events. But it also had to not allow them to be called in NMI context. Waiting till Paul makes an NMI safe SRCU API. - New notrace SRCU API to allow trace events to use SRCU. - Addition of mcount-nop option support - SPDX headers replacing GPL templates. - Various other fixes and clean ups. - Some fixes are marked for stable, but were not fully tested before the merge window opened. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW3ruhRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qiM7AP47NhYdSnCFCRUJfrt6PovXmQtuCHt3 c3QMoGGdvzh9YAEAqcSXwh7uLhpHUp1LjMAPkXdZVwNddf4zJQ1zyxQ+EAU= =vgEr -----END PGP SIGNATURE----- Merge tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Restructure of lockdep and latency tracers This is the biggest change. Joel Fernandes restructured the hooks from irqs and preemption disabling and enabling. He got rid of a lot of the preprocessor #ifdef mess that they caused. He turned both lockdep and the latency tracers to use trace events inserted in the preempt/irqs disabling paths. But unfortunately, these started to cause issues in corner cases. Thus, parts of the code was reverted back to where lockdep and the latency tracers just get called directly (without using the trace events). But because the original change cleaned up the code very nicely we kept that, as well as the trace events for preempt and irqs disabling, but they are limited to not being called in NMIs. - Have trace events use SRCU for "rcu idle" calls. This was required for the preempt/irqs off trace events. But it also had to not allow them to be called in NMI context. Waiting till Paul makes an NMI safe SRCU API. - New notrace SRCU API to allow trace events to use SRCU. - Addition of mcount-nop option support - SPDX headers replacing GPL templates. - Various other fixes and clean ups. - Some fixes are marked for stable, but were not fully tested before the merge window opened. * tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) tracing: Fix SPDX format headers to use C++ style comments tracing: Add SPDX License format tags to tracing files tracing: Add SPDX License format to bpf_trace.c blktrace: Add SPDX License format header s390/ftrace: Add -mfentry and -mnop-mcount support tracing: Add -mcount-nop option support tracing: Avoid calling cc-option -mrecord-mcount for every Makefile tracing: Handle CC_FLAGS_FTRACE more accurately Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Uprobes: Simplify uprobe_register() body tracepoints: Free early tracepoints after RCU is initialized uprobes: Use synchronize_rcu() not synchronize_sched() tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister() ftrace: Remove unused pointer ftrace_swapper_pid tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing/irqsoff: Handle preempt_count for different configs tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing: irqsoff: Account for additional preempt_disable trace: Use rcu_dereference_raw for hooks from trace-event subsystem tracing/kprobes: Fix within_notrace_func() to check only notrace functions ...	2018-08-20 18:32:00 -07:00
Linus Torvalds	3933ec73cd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: "Code cleanups from Kamalesh Babulal" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: Validate module/old func name length livepatch: Remove reliable stacktrace check in klp_try_switch_task()	2018-08-20 16:10:47 -07:00
Jiri Kosina	badf58a272	Merge branch 'for-4.19/upstream' into for-linus	2018-08-20 18:33:50 +02:00
Gustavo A. R. Silva	b639186ffe	futex: Mark expected switch fall-throughs In preparation of enabling -Wimplicit-fallthrough, mark switch cases which fall through. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Link: https://lkml.kernel.org/r/20180816172124.GA2407@embeddedor.com	2018-08-20 18:23:00 +02:00
Rafael J. Wysocki	7059b36636	sched: idle: Avoid retaining the tick when it has been stopped If the tick has been stopped already, but the governor has not asked to stop it (which it can do sometimes), the idle loop should invoke tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care of this case properly. Fixes: `554c8aa8ec` (sched: idle: Select idle state before stopping the tick) Cc: 4.17+ <stable@vger.kernel.org> # 4.17+ Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-08-20 11:25:55 +02:00
Linus Torvalds	2ad0d52699	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix races in IPVS, from Tan Hu. 2) Missing unbind in matchall classifier, from Hangbin Liu. 3) Missing act_ife action release, from Vlad Buslov. 4) Cure lockdep splats in ila, from Cong Wang. 5) veth queue leak on link delete, from Toshiaki Makita. 6) Disable isdn's IIOCDBGVAR ioctl, it exposes kernel addresses. From Kees Cook. 7) RCU usage fixup in XDP, from Tariq Toukan. 8) Two TCP ULP fixes from Daniel Borkmann. 9) r8169 needs REALTEK_PHY as a Kconfig dependency, from Heiner Kallweit. 10) Always take tcf_lock with BH disabled, otherwise we can deadlock with rate estimator code paths. From Vlad Buslov. 11) Don't use MSI-X on RTL8106e r8169 chips, they don't resume properly. From Jian-Hong Pan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) ip6_vti: fix creating fallback tunnel device for vti6 ip_vti: fix a null pointer deferrence when create vti fallback tunnel r8169: don't use MSI-X on RTL8106e net: lan743x_ptp: convert to ktime_get_clocktai_ts64 net: sched: always disable bh when taking tcf_lock ip6_vti: simplify stats handling in vti6_xmit bpf: fix redirect to map under tail calls r8169: add missing Kconfig dependency tools/bpf: fix bpf selftest test_cgroup_storage failure bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist bpf, sockmap: fix map elem deletion race with smap_stop_sock bpf, sockmap: fix leakage of smap_psock_map_entry tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach tcp, ulp: add alias for all ulp modules bpf: fix a rcu usage warning in bpf_prog_array_copy_core() samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM net/xdp: Fix suspicious RCU usage warning net/mlx5e: Delete unneeded function argument Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb isdn: Disable IIOCDBGVAR ...	2018-08-19 11:51:45 -07:00
David S. Miller	6e3bf9b04f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2018-08-18 The following pull-request contains BPF updates for your net tree. The main changes are: 1) Fix a BPF selftest failure in test_cgroup_storage due to rlimit restrictions, from Yonghong. 2) Fix a suspicious RCU rcu_dereference_check() warning triggered from removing a device's XDP memory allocator by using the correct rhashtable lookup function, from Tariq. 3) A batch of BPF sockmap and ULP fixes mainly fixing leaks and races as well as enforcing module aliases for ULPs. Another fix for BPF map redirect to make them work again with tail calls, from Daniel. 4) Fix XDP BPF samples to unload their programs upon SIGTERM, from Jesper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-18 10:02:49 -07:00
Ingo Molnar	5804b11034	perf/core improvements ad fixes: kernel: . kallsyms, x86: Export addresses of PTI entry trampolines (Alexander Shishkin) . kallsyms: Simplify update_iter_mod() (Adrian Hunter) . x86: Add entry trampolines to kcore (Adrian Hunter) Hardware tracing: . Fix auxtrace queue resize (Adrian Hunter) Arch specific: . Fix uninitialized ARM SPE record error variable (Kim Phillips) . Fix trace event post-processing in powerpc (Sandipan Das) Build: . Fix check-headers.sh AND list path of execution (Alexander Kapshuk) . Remove -mcet and -fcf-protection when building the python binding with older clang versions (Arnaldo Carvalho de Melo) . Make check-headers.sh check based on kernel dir (Jiri Olsa) . Move syscall_64.tbl check into check-headers.sh (Jiri Olsa) Infrastructure: . Check for null when copying nsinfo. (Benno Evers) Libraries: . Rename libtraceevent prefixes, prep work for making it a shared library generaly available (Tzvetomir Stoyanov (VMware)) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlt0OP8ACgkQ1lAW81NS qkD0lg/8Dcjd6bCgHzrRJYcR4VUNgHq4LnpJEp3VVvtV29AmptVW3hOCF6siuwDI H8rtMzE0gflMVHae310qTaUIzo1A6/ugoRwxUKLKU8aWkA1ikl9iJn6uaTttOCIG H4a/mExILDicGfxMk6kAPdyDbYr7r+1UoF58asrWVjPQNPxoJSALJCPtMnLK7Cn8 qMZN68TqIL5zifbRe6UHKCH/SmDuowVmEIz4Nin3QtwKFPH+I02TtSdYkNWTC5WK o469/zy9cceA6a8Q+bVEUP0OD1mU8BvRnZogOiZ5SdMiYZlDkFSqG5MzgTtJUXQC DxOKkANdWu7zON/KywGDX8kcIBySzd5toTiXLvsHNNhxR8pT3bU57QvrscqLIMX+ SVbCR03h5EwLeJopvDKZjwtcSwCWp1aCXCrmLP04tcB1zi+mUIRohbzf2vZDPfwo IRtoHvPAgV9UAA7M+UAAtNc7G0Gg/K2c6LlfiuxDhgjk9Jqg4NbOz3cn6rvXtGQX B4RM9pdhoNi9tqrXNYCnLzinzKtPhWjyEbb0FBlgvXTNQNiDVjhrkt3pC1fH1zK9 GX1F6L78x24z3bZl5hUyYXmxOkktpAeprjICXAikYvwwige6aNENVLhCI/fVFHcx ByxXbFMom/dSIgU0tBfdqktd7ZmSLm94obSuA6BW8/htf2JIztM= =W1go -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-4.19-20180815' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: kernel: - kallsyms, x86: Export addresses of PTI entry trampolines (Alexander Shishkin) - kallsyms: Simplify update_iter_mod() (Adrian Hunter) - x86: Add entry trampolines to kcore (Adrian Hunter) Hardware tracing: - Fix auxtrace queue resize (Adrian Hunter) Arch specific: - Fix uninitialized ARM SPE record error variable (Kim Phillips) - Fix trace event post-processing in powerpc (Sandipan Das) Build: - Fix check-headers.sh AND list path of execution (Alexander Kapshuk) - Remove -mcet and -fcf-protection when building the python binding with older clang versions (Arnaldo Carvalho de Melo) - Make check-headers.sh check based on kernel dir (Jiri Olsa) - Move syscall_64.tbl check into check-headers.sh (Jiri Olsa) Infrastructure: - Check for null when copying nsinfo. (Benno Evers) Libraries: - Rename libtraceevent prefixes, prep work for making it a shared library generaly available (Tzvetomir Stoyanov (VMware)) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-18 13:11:51 +02:00
Linus Torvalds	6ada4e2826	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - a few misc things - a few Y2038 fixes - ntfs fixes - arch/sh tweaks - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) mm/hmm.c: remove unused variables align_start and align_end fs/userfaultfd.c: remove redundant pointer uwq mm, vmacache: hash addresses based on pmd mm/list_lru: introduce list_lru_shrink_walk_irq() mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one() mm/list_lru.c: move locking from __list_lru_walk_one() to its caller mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node() mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP mm/sparse: delete old sparse_init and enable new one mm/sparse: add new sparse_init_nid() and sparse_init() mm/sparse: move buffer init/fini to the common place mm/sparse: use the new sparse buffer functions in non-vmemmap mm/sparse: abstract sparse buffer allocations mm/hugetlb.c: don't zero 1GiB bootmem pages mm, page_alloc: double zone's batchsize mm/oom_kill.c: document oom_lock mm/hugetlb: remove gigantic page support for HIGHMEM mm, oom: remove sleep from under oom_lock kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous() mm/cma: remove unsupported gfp_mask parameter from cma_alloc() ...	2018-08-17 16:49:31 -07:00
Marek Szyprowski	d834c5ab83	kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous() The CMA memory allocator doesn't support standard gfp flags for memory allocation, so there is no point having it as a parameter for dma_alloc_from_contiguous() function. Replace it by a boolean no_warn argument, which covers all the underlaying cma_alloc() function supports. This will help to avoid giving false feeling that this function supports standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer, what has already been an issue: see commit `dd65a941f6` ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag"). Link: http://lkml.kernel.org/r/20180709122020eucas1p21a71b092975cb4a3b9954ffc63f699d1~-sqUFoa-h2939329393eucas1p2Y@eucas1p2.samsung.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Michał Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-17 16:20:32 -07:00
Marek Szyprowski	6518202970	mm/cma: remove unsupported gfp_mask parameter from cma_alloc() cma_alloc() doesn't really support gfp flags other than __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter. This will help to avoid giving false feeling that this function supports standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer, what has already been an issue: see commit `dd65a941f6` ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag"). Link: http://lkml.kernel.org/r/20180709122019eucas1p2340da484acfcc932537e6014f4fd2c29~-sqTPJKij2939229392eucas1p2j@eucas1p2.samsung.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Michał Nazarewicz <mina86@mina86.com> Acked-by: Laura Abbott <labbott@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-17 16:20:32 -07:00
Andrey Ryabinin	0207df4fa1	kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN KASAN learns about hotadded memory via the memory hotplug notifier. devm_memremap_pages() intentionally skips calling memory hotplug notifiers. So KASAN doesn't know anything about new memory added by devm_memremap_pages(). This causes a crash when KASAN tries to access non-existent shadow memory: BUG: unable to handle kernel paging request at ffffed0078000000 RIP: 0010:check_memory_region+0x82/0x1e0 Call Trace: memcpy+0x1f/0x50 pmem_do_bvec+0x163/0x720 pmem_make_request+0x305/0xac0 generic_make_request+0x54f/0xcf0 submit_bio+0x9c/0x370 submit_bh_wbc+0x4c7/0x700 block_read_full_page+0x5ef/0x870 do_read_cache_page+0x2b8/0xb30 read_dev_sector+0xbd/0x3f0 read_lba.isra.0+0x277/0x670 efi_partition+0x41a/0x18f0 check_partition+0x30d/0x5e9 rescan_partitions+0x18c/0x840 __blkdev_get+0x859/0x1060 blkdev_get+0x23f/0x810 __device_add_disk+0x9c8/0xde0 pmem_attach_disk+0x9a8/0xf50 nvdimm_bus_probe+0xf3/0x3c0 driver_probe_device+0x493/0xbd0 bus_for_each_drv+0x118/0x1b0 __device_attach+0x1cd/0x2b0 bus_probe_device+0x1ac/0x260 device_add+0x90d/0x1380 nd_async_device_register+0xe/0x50 async_run_entry_fn+0xc3/0x5d0 process_one_work+0xa0a/0x1810 worker_thread+0x87/0xe80 kthread+0x2d7/0x390 ret_from_fork+0x3a/0x50 Add kasan_add_zero_shadow()/kasan_remove_zero_shadow() - post mm_init() interface to map/unmap kasan_zero_page at requested virtual addresses. And use it to add/remove the shadow memory for hotplugged/unplugged device memory. Link: http://lkml.kernel.org/r/20180629164932.740-1-aryabinin@virtuozzo.com Fixes: `41e94a8513` ("add devm_memremap_pages") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-17 16:20:30 -07:00
Shakeel Butt	d46eb14b73	fs: fsnotify: account fsnotify metadata to kmemcg Patch series "Directed kmem charging", v8. The Linux kernel's memory cgroup allows limiting the memory usage of the jobs running on the system to provide isolation between the jobs. All the kernel memory allocated in the context of the job and marked with __GFP_ACCOUNT will also be included in the memory usage and be limited by the job's limit. The kernel memory can only be charged to the memcg of the process in whose context kernel memory was allocated. However there are cases where the allocated kernel memory should be charged to the memcg different from the current processes's memcg. This patch series contains two such concrete use-cases i.e. fsnotify and buffer_head. The fsnotify event objects can consume a lot of system memory for large or unlimited queues if there is either no or slow listener. The events are allocated in the context of the event producer. However they should be charged to the event consumer. Similarly the buffer_head objects can be allocated in a memcg different from the memcg of the page for which buffer_head objects are being allocated. To solve this issue, this patch series introduces mechanism to charge kernel memory to a given memcg. In case of fsnotify events, the memcg of the consumer can be used for charging and for buffer_head, the memcg of the page can be charged. For directed charging, the caller can use the scope API memalloc_[un]use_memcg() to specify the memcg to charge for all the __GFP_ACCOUNT allocations within the scope. This patch (of 2): A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. [shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it] Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-17 16:20:30 -07:00
Daniel Borkmann	f6069b9aa9	bpf: fix redirect to map under tail calls Commits `109980b894` ("bpf: don't select potentially stale ri->map from buggy xdp progs") and `7c30013133` ("bpf: fix ri->map_owner pointer on bpf_prog_realloc") tried to mitigate that buggy programs using bpf_redirect_map() helper call do not leave stale maps behind. Idea was to add a map_owner cookie into the per CPU struct redirect_info which was set to prog->aux by the prog making the helper call as a proof that the map is not stale since the prog is implicitly holding a reference to it. This owner cookie could later on get compared with the program calling into BPF whether they match and therefore the redirect could proceed with processing the map safely. In (obvious) hindsight, this approach breaks down when tail calls are involved since the original caller's prog->aux pointer does not have to match the one from one of the progs out of the tail call chain, and therefore the xdp buffer will be dropped instead of redirected. A way around that would be to fix the issue differently (which also allows to remove related work in fast path at the same time): once the life-time of a redirect map has come to its end we use it's map free callback where we need to wait on synchronize_rcu() for current outstanding xdp buffers and remove such a map pointer from the redirect info if found to be present. At that time no program is using this map anymore so we simply invalidate the map pointers to NULL iff they previously pointed to that instance while making sure that the redirect path only reads out the map once. Fixes: `97f91a7cf0` ("bpf: add bpf_redirect_map helper routine") Fixes: `109980b894` ("bpf: don't select potentially stale ri->map from buggy xdp progs") Reported-by: Sebastiano Miano <sebastiano.miano@polito.it> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-17 15:56:23 -07:00
Linus Torvalds	d190775206	Modules updates for v4.19 Summary of modules changes for the 4.19 merge window: - Fix modules kallsyms for livepatch. Livepatch modules can have SHN_UNDEF symbols in their module symbol tables for later symbol resolution, but kallsyms shouldn't be returning these symbols - Some code cleanups and minor reshuffling in load_module() were done to log the module name when module signature verification fails Signed-off-by: Jessica Yu <jeyu@kernel.org> -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJbdmqCAAoJEMBFfjjOO8FyR+AP/3nbWaCUYmxvae9QPt1ycXxm UV0TnRYJbrEZVlUpL1X/efd63jVizmhJSMPTGumje4s3nhSNGMbEYa+/VN0mcja3 egKqKGD/zjUrOZxwu/zxlV4lVd6Wt8mbO+pFnc/0MlmtWVnkWxRXsf/k8Uo5bzSE l6aGjQJjF1Yj5hwgVoLlL0L+6mw5Usu1tKzvEvx5IZO9IvzenwifZ1N8JowSp7G2 APd0TjxDTuZlR430JVONKVpzFCWtMqUzMxHQZl26uKxyrLkJ32GarQKTUxStOAXg LTZ1Nrjn4UToRZLx4VUmeaXBlW/dhnzF17WPwrN1AXFgcXdR6wlw0otDSJdZVh8X FM8mCANZt9Zi/HAmBTMypnnCGqyoY+rHz1TWcsNDgknwBb0lg+3Kx781fl1Z1v0g RHV8JYRw7hCt/WDfPuyrhDO5wmdqAPF3A1WlJ7ItNvLB4tP0jILCoOP5csSp8+hs q/7o1A66KoGgOCWJ4NnZ/uPadr1ahTH0CYcq4dziQWiEuzOU3XdgHJ2V+XBgs36v UkuJiqM9NyBs90+r9+8YBCkC9gCv8Rs/43aTN9sVoDoihmX57339jgVcKfLIqSNw djmJHsUW7eFU0I6m5c8bNt3VDgL4iIJp4gAwu/uuDV7nFnbaJQK9cENH8IJ23Wto y0CYhBExs26z6axi4BCv =HzLj -----END PGP SIGNATURE----- Merge tag 'modules-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: "Summary of modules changes for the 4.19 merge window: - Fix modules kallsyms for livepatch. Livepatch modules can have SHN_UNDEF symbols in their module symbol tables for later symbol resolution, but kallsyms shouldn't be returning these symbols - Some code cleanups and minor reshuffling in load_module() were done to log the module name when module signature verification fails" * tag 'modules-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: kernel/module: Use kmemdup to replace kmalloc+memcpy ARM: module: fix modsign build error modsign: log module name in the event of an error module: replace VMLINUX_SYMBOL_STR() with __stringify() or string literal module: print sensible error code module: setup load info before module_sig_check() module: make it clear when we're handling the module copy in info->hdr module: exclude SHN_UNDEF symbols from kallsyms api	2018-08-17 10:51:22 -07:00
Linus Torvalds	2645b9d1a4	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlt2mBQACgkQnJ2qBz9k QNntGQgAluTTnuJLjoUDjFfT37Fjf2x1ve8rg6xmYS3YIhYTWWA1oazUIeyBDfwa soutlfAZ/ix2bP1UEmeULxFhrCIXYBbWAe8s5MRqO/7s01QftNf0M72ASmd7gZRy rSVt2/BWpr745mWI38tEKlIF4sQJVD7IGrnc1cQslPzleeCqsCXA+uBkBPMlcDpJ ZWni2qK023y9E2dsg6RsJc1HemkQvrJtoLSVqRsdhty9GEuWseMbssdgz1zMXljQ eXIALE5BssoxISIpH6qVKZRlr7UWGxOmV4CDPmku7DFLOSiwMk/Ml0V80BwzjNNY hY8qfxcJOFOGZ8t82pWkVGMjgOAKjA== =IN6Y -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fsnotify cleanups from Amir and a small inotify improvement" * tag 'fsnotify_for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Add flag IN_MASK_CREATE for inotify_add_watch() fanotify: factor out helpers to add/remove mark fsnotify: add helper to get mask from connector fsnotify: let connector point to an abstract object fsnotify: pass connp and object type to fsnotify_add_mark() fsnotify: use typedef fsnotify_connp_t for brevity	2018-08-17 09:41:28 -07:00
Steven Rostedt (VMware)	bb730b5833	tracing: Fix SPDX format headers to use C++ style comments The Linux kernel adopted the SPDX License format headers to ease license compliance management, and uses the C++ '//' style comments for the SPDX header tags. Some files in the tracing directory used the C style /* / comments for them. To be consistent across all files, replace the / */ C style SPDX tags with the C++ // SPDX tags. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-16 19:08:06 -04:00
Steven Rostedt (VMware)	bcea3f96e1	tracing: Add SPDX License format tags to tracing files Add the SPDX License header to ease license compliance management. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-16 19:08:06 -04:00
Steven Rostedt (VMware)	179a0cc4e0	tracing: Add SPDX License format to bpf_trace.c Add the SPDX License header to ease license compliance management. Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-16 19:07:36 -04:00
Daniel Borkmann	585f5a6252	bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist The current code in sock_map_ctx_update_elem() allows for BPF_EXIST and BPF_NOEXIST map update flags. While on array-like maps this approach is rather uncommon, e.g. bpf_fd_array_map_update_elem() and others enforce map update flags to be BPF_ANY such that xchg() can be used directly, the current implementation in sock map does not guarantee that such operation with BPF_EXIST / BPF_NOEXIST is atomic. The initial test does a READ_ONCE(stab->sock_map[i]) to fetch the socket from the slot which is then tested for NULL / non-NULL. However later after __sock_map_ctx_update_elem(), the actual update is done through osock = xchg(&stab->sock_map[i], sock). Problem is that in the meantime a different CPU could have updated / deleted a socket on that specific slot and thus flag contraints won't hold anymore. I've been thinking whether best would be to just break UAPI and do an enforcement of BPF_ANY to check if someone actually complains, however trouble is that already in BPF kselftest we use BPF_NOEXIST for the map update, and therefore it might have been copied into applications already. The fix to keep the current behavior intact would be to add a map lock similar to the sock hash bucket lock only for covering the whole map. Fixes: `174a79ff95` ("bpf: sockmap with sk redirect support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-16 14:58:08 -07:00
Daniel Borkmann	166ab6f0a0	bpf, sockmap: fix map elem deletion race with smap_stop_sock The smap_start_sock() and smap_stop_sock() are each protected under the sock->sk_callback_lock from their call-sites except in the case of sock_map_delete_elem() where we drop the old socket from the map slot. This is racy because the same sock could be part of multiple sock maps, so we run smap_stop_sock() in parallel, and given at that point psock->strp_enabled might be true on both CPUs, we might for example wrongly restore the sk->sk_data_ready / sk->sk_write_space. Therefore, hold the sock->sk_callback_lock as well on delete. Looks like `2f857d0460` ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support") had this right, but later on `e9db4ef6bf` ("bpf: sockhash fix omitted bucket lock in sock_close") removed it again from delete leaving this smap_stop_sock() instance unprotected. Fixes: `e9db4ef6bf` ("bpf: sockhash fix omitted bucket lock in sock_close") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-16 14:58:08 -07:00
Daniel Borkmann	d40b0116c9	bpf, sockmap: fix leakage of smap_psock_map_entry While working on sockmap I noticed that we do not always kfree the struct smap_psock_map_entry list elements which track psocks attached to maps. In the case of sock_hash_ctx_update_elem(), these map entries are allocated outside of __sock_map_ctx_update_elem() with their linkage to the socket hash table filled. In the case of sock array, the map entries are allocated inside of __sock_map_ctx_update_elem() and added with their linkage to the psock->maps. Both additions are under psock->maps_lock each. Now, we drop these elements from their psock->maps list in a few occasions: i) in sock array via smap_list_map_remove() when an entry is either deleted from the map from user space, or updated via user space or BPF program where we drop the old socket at that map slot, or the sock array is freed via sock_map_free() and drops all its elements; ii) for sock hash via smap_list_hash_remove() in exactly the same occasions as just described for sock array; iii) in the bpf_tcp_close() where we remove the elements from the list via psock_map_pop() and iterate over them dropping themselves from either sock array or sock hash; and last but not least iv) once again in smap_gc_work() which is a callback for deferring the work once the psock refcount hit zero and thus the socket is being destroyed. Problem is that the only case where we kfree() the list entry is in case iv), which at that point should have an empty list in normal cases. So in cases from i) to iii) we unlink the elements without freeing where they go out of reach from us. Hence fix is to properly kfree() them as well to stop the leakage. Given these are all handled under psock->maps_lock there is no need for deferred RCU freeing. I later also ran with kmemleak detector and it confirmed the finding as well where in the state before the fix the object goes unreferenced while after the patch no kmemleak report related to BPF showed up. [...] unreferenced object 0xffff880378eadae0 (size 64): comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................ 50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00 PMu]............ backtrace: [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210 [<0000000045dd6d3c>] bpf_sock_map_update+0x29/0x60 [<00000000877723aa>] ___bpf_prog_run+0x1e1f/0x4960 [<000000002ef89e83>] 0xffffffffffffffff unreferenced object 0xffff880378ead240 (size 64): comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................ 00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00 .Du]............ backtrace: [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210 [<0000000030e37a3a>] sock_map_update_elem+0x125/0x240 [<000000002e5ce36e>] map_update_elem+0x4eb/0x7b0 [<00000000db453cc9>] __x64_sys_bpf+0x1f9/0x360 [<0000000000763660>] do_syscall_64+0x9a/0x300 [<00000000422a2bb2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<000000002ef89e83>] 0xffffffffffffffff [...] Fixes: `e9db4ef6bf` ("bpf: sockhash fix omitted bucket lock in sock_close") Fixes: `54fedb42c6` ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps") Fixes: `2f857d0460` ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-16 14:58:08 -07:00
Steven Rostedt (VMware)	757d914007	tracing/blktrace: Fix to allow setting same value Masami Hiramatsu reported: Current trace-enable attribute in sysfs returns an error if user writes the same setting value as current one, e.g. # cat /sys/block/sda/trace/enable 0 # echo 0 > /sys/block/sda/trace/enable bash: echo: write error: Invalid argument # echo 1 > /sys/block/sda/trace/enable # echo 1 > /sys/block/sda/trace/enable bash: echo: write error: Device or resource busy But this is not a preferred behavior, it should ignore if new setting is same as current one. This fixes the problem as below. # cat /sys/block/sda/trace/enable 0 # echo 0 > /sys/block/sda/trace/enable # echo 1 > /sys/block/sda/trace/enable # echo 1 > /sys/block/sda/trace/enable Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: stable@vger.kernel.org Fixes: `cd649b8bb8` ("blktrace: remove sysfs_blk_trace_enable_show/store()") Reported-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-16 14:13:56 -06:00
Yonghong Song	965931e3a8	bpf: fix a rcu usage warning in bpf_prog_array_copy_core() Commit `394e40a297` ("bpf: extend bpf_prog_array to store pointers to the cgroup storage") refactored the bpf_prog_array_copy_core() to accommodate new structure bpf_prog_array_item which contains bpf_prog array itself. In the old code, we had perf_event_query_prog_array(): mutex_lock(...) bpf_prog_array_copy_call(): prog = rcu_dereference_check(array, 1)->progs bpf_prog_array_copy_core(prog, ...) mutex_unlock(...) With the above commit, we had perf_event_query_prog_array(): mutex_lock(...) bpf_prog_array_copy_call(): bpf_prog_array_copy_core(array, ...): item = rcu_dereference(array)->items; ... mutex_unlock(...) The new code will trigger a lockdep rcu checking warning. The fix is to change rcu_dereference() to rcu_dereference_check() to prevent such a warning. Reported-by: syzbot+6e72317008eef84a216b@syzkaller.appspotmail.com Fixes: `394e40a297` ("bpf: extend bpf_prog_array to store pointers to the cgroup storage") Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-16 21:55:32 +02:00
Steven Rostedt (VMware)	91c1e6ba39	blktrace: Add SPDX License format header Add the SPDX License header to ease license compliance management. Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-16 15:49:02 -04:00
Vasily Gorbik	2f4df0017b	tracing: Add -mcount-nop option support -mcount-nop gcc option generates the calls to the profiling functions as nops which allows to avoid patching mcount jump with NOP instructions initially. -mcount-nop gcc option will be activated if platform selects HAVE_NOP_MCOUNT and gcc actually supports it. In addition to that CC_USING_NOP_MCOUNT is defined and could be used by architectures to adapt ftrace patching behavior. Link: http://lkml.kernel.org/r/patch-3.thread-aa7b8d.git-e02ed2dc082b.your-ad-here.call-01533557518-ext-9465@work.hours Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-15 22:38:38 -04:00
Linus Torvalds	54dbe75bbf	drm pull for 4.19-rc1 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbc41pAAoJEAx081l5xIa+ZrAP/AzKj4i4pBLVJcvNZ2BwD+UD ZNSNj2iqCJ5+Jo/WtIwQ8tLct9UqfVssUwBke6tZksiLdTigGPTUyVIAdK+9kyWD D00m3x/pToJrSF2D0FwxQlPUtPkohp9N+E6+TU7gd1oCasZfBzmcEpoVAmZf+NWE kN1xXpmGxZWpu0wc7JA2lv9MuUTijCwIqJqa5E0bB3z06G5mw+PJ89kYzMx19OyA ZYQK8y3A40ZGl8UbajZ4xg9pqFCRYFFHGqfYlpUWWTh0XMAXu8+Yqzh3dJxmak7r 4u2pdQBsxPMZO8qKBHpVvI7Zhoe0Ntnolc0XVD+2IbqqnTprVbQs0bWf3YyfUlQi 1/9bWFK67W0LEuzac6M7a7EQqFNiHF13Btao7aqENTIe/GaCZJoopaiRMAmh6EHD 4PezeYqrW8cSaPj6OKouL1BhW9Bjixsg0bvjS/uB6m4KekFCt1++BDFGzkqvm6Mo SVW7nkJoCFpCASaR7DhUEOPexaHeJ65HCDDUvYdqz9jd2w1TgvvanEZWual1NwEm ImA8A4wGZ/3KijpyyKm0gE96RX7+zMMZ3brW6p1vhUUKVYJCrvSr5jrXH5+2k6Aw Y455doGL87IRkwyje/YbQF0I8pbUZD9QS5wII13tLGwOH9/uC/Xl6dHNM40gtqyh W4gEdY+NAMJmYLvRNawa =g9rD -----END PGP SIGNATURE----- Merge tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm Pull drm updates from Dave Airlie: "This is the main drm pull request for 4.19. Rob has some new hardware support for new qualcomm hw that I'll send along separately. This has the display part of it, the remaining pull is for the acceleration engine. This also contains a wound-wait/wait-die mutex rework, Peter has acked it for merging via my tree. Otherwise mostly the usual level of activity. Summary: core: - Wound-wait/wait-die mutex rework - Add writeback connector type - Add "content type" property for HDMI - Move GEM bo to drm_framebuffer - Initial gpu scheduler documentation - GPU scheduler fixes for dying processes - Console deferred fbcon takeover support - Displayport support for CEC tunneling over AUX panel: - otm8009a panel driver fixes - Innolux TV123WAM and G070Y2-L01 panel driver - Ilitek ILI9881c panel driver - Rocktech RK070ER9427 LCD - EDT ETM0700G0EDH6 and EDT ETM0700G0BDH6 - DLC DLC0700YZG-1 - BOE HV070WSA-100 - newhaven, nhd-4.3-480272ef-atxl LCD - DataImage SCF0700C48GGU18 - Sharp LQ035Q7DB03 - p079zca: Refactor to support multiple panels tinydrm: - ILI9341 display panel New driver: - vkms - virtual kms driver to testing. i915: - Icelake: Display enablement DSI support IRQ support Powerwell support - GPU reset fixes and improvements - Full ppgtt support refactoring - PSR fixes and improvements - Execlist improvments - GuC related fixes amdgpu: - Initial amdgpu documentation - JPEG engine support on VCN - CIK uses powerplay by default - Move to using core PCIE functionality for gens/lanes - DC/Powerplay interface rework - Stutter mode support for RV - Vega12 Powerplay updates - GFXOFF fixes - GPUVM fault debugging - Vega12 GFXOFF - DC improvements - DC i2c/aux changes - UVD 7.2 fixes - Powerplay fixes for Polaris12, CZ/ST - command submission bo_list fixes amdkfd: - Raven support - Power management fixes udl: - Cleanups and fixes nouveau: - misc fixes and cleanups. msm: - DPU1 support display controller in sdm845 - GPU coredump support. vmwgfx: - Atomic modesetting validation fixes - Support for multisample surfaces armada: - Atomic modesetting support completed. exynos: - IPPv2 fixes - Move g2d to component framework - Suspend/resume support cleanups - Driver cleanups imx: - CSI configuration improvements - Driver cleanups - Use atomic suspend/resume helpers - ipu-v3 V4L2 XRGB32/XBGR32 support pl111: - Add Nomadik LCDC variant v3d: - GPU scheduler jobs management sun4i: - R40 display engine support - TCON TOP driver mediatek: - MT2712 SoC support rockchip: - vop fixes omapdrm: - Workaround for DRA7 errata i932 - Fix mm_list locking mali-dp: - Writeback implementation PM improvements - Internal error reporting debugfs tilcdc: - Single fix for deferred probing hdlcd: - Teardown fixes tda998x: - Converted to a bridge driver. etnaviv: - Misc fixes" * tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm: (1506 commits) drm/amdgpu/sriov: give 8s for recover vram under RUNTIME drm/scheduler: fix param documentation drm/i2c: tda998x: correct PLL divider calculation drm/i2c: tda998x: get rid of private fill_modes function drm/i2c: tda998x: move mode_valid() to bridge drm/i2c: tda998x: register bridge outside of component helper drm/i2c: tda998x: cleanup from previous changes drm/i2c: tda998x: allocate tda998x_priv inside tda998x_create() drm/i2c: tda998x: convert to bridge driver drm/scheduler: fix timeout worker setup for out of order job completions drm/amd/display: display connected to dp-1 does not light up drm/amd/display: update clk for various HDMI color depths drm/amd/display: program display clock on cache match drm/amd/display: Add NULL check for enabling dp ss drm/amd/display: add vbios table check for enabling dp ss drm/amd/display: Don't share clk source between DP and HDMI drm/amd/display: Fix DP HBR2 Eye Diagram Pattern on Carrizo drm/amd/display: Use calculated disp_clk_khz value for dce110 drm/amd/display: Implement custom degamma lut on dcn drm/amd/display: Destroy aux_engines only once ...	2018-08-15 17:39:07 -07:00
Linus Torvalds	9a76aba02a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Highlights: - Gustavo A. R. Silva keeps working on the implicit switch fallthru changes. - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From Luca Coelho. - Re-enable ASPM in r8169, from Kai-Heng Feng. - Add virtual XFRM interfaces, which avoids all of the limitations of existing IPSEC tunnels. From Steffen Klassert. - Convert GRO over to use a hash table, so that when we have many flows active we don't traverse a long list during accumluation. - Many new self tests for routing, TC, tunnels, etc. Too many contributors to mention them all, but I'm really happy to keep seeing this stuff. - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu. - Lots of cleanups and fixes in L2TP code from Guillaume Nault. - Add IPSEC offload support to netdevsim, from Shannon Nelson. - Add support for slotting with non-uniform distribution to netem packet scheduler, from Yousuk Seung. - Add UDP GSO support to mlx5e, from Boris Pismenny. - Support offloading of Team LAG in NFP, from John Hurley. - Allow to configure TX queue selection based upon RX queue, from Amritha Nambiar. - Support ethtool ring size configuration in aquantia, from Anton Mikaev. - Support DSCP and flowlabel per-transport in SCTP, from Xin Long. - Support list based batching and stack traversal of SKBs, this is very exciting work. From Edward Cree. - Busyloop optimizations in vhost_net, from Toshiaki Makita. - Introduce the ETF qdisc, which allows time based transmissions. IGB can offload this in hardware. From Vinicius Costa Gomes. - Add parameter support to devlink, from Moshe Shemesh. - Several multiplication and division optimizations for BPF JIT in nfp driver, from Jiong Wang. - Lots of prepatory work to make more of the packet scheduler layer lockless, when possible, from Vlad Buslov. - Add ACK filter and NAT awareness to sch_cake packet scheduler, from Toke Høiland-Jørgensen. - Support regions and region snapshots in devlink, from Alex Vesker. - Allow to attach XDP programs to both HW and SW at the same time on a given device, with initial support in nfp. From Jakub Kicinski. - Add TLS RX offload and support in mlx5, from Ilya Lesokhin. - Use PHYLIB in r8169 driver, from Heiner Kallweit. - All sorts of changes to support Spectrum 2 in mlxsw driver, from Ido Schimmel. - PTP support in mv88e6xxx DSA driver, from Andrew Lunn. - Make TCP_USER_TIMEOUT socket option more accurate, from Jon Maxwell. - Support for templates in packet scheduler classifier, from Jiri Pirko. - IPV6 support in RDS, from Ka-Cheong Poon. - Native tproxy support in nf_tables, from Máté Eckl. - Maintain IP fragment queue in an rbtree, but optimize properly for in-order frags. From Peter Oskolkov. - Improvde handling of ACKs on hole repairs, from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits) bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT" hv/netvsc: Fix NULL dereference at single queue mode fallback net: filter: mark expected switch fall-through xen-netfront: fix warn message as irq device name has '/' cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0 net: dsa: mv88e6xxx: missing unlock on error path rds: fix building with IPV6=m inet/connection_sock: prefer _THIS_IP_ to current_text_addr net: dsa: mv88e6xxx: bitwise vs logical bug net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() ieee802154: hwsim: using right kind of iteration net: hns3: Add vlan filter setting by ethtool command -K net: hns3: Set tx ring' tc info when netdev is up net: hns3: Remove tx ring BD len register in hns3_enet net: hns3: Fix desc num set to default when setting channel net: hns3: Fix for phy link issue when using marvell phy driver net: hns3: Fix for information of phydev lost problem when down/up net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero net: hns3: Add support for serdes loopback selftest bnxt_en: take coredump_record structure off stack ...	2018-08-15 15:04:25 -07:00
Linus Torvalds	fa1b5d09d0	Consolidation of Kconfig files by Christoph Hellwig. Move the source statements of arch-independent Kconfig files instead of duplicating the includes in every arch/$(SRCARCH)/Kconfig. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJbdFsfAAoJED2LAQed4NsGxHsP/1tmA57OOOj8oGxO2OXhXVbr Q0MZqCoV4bqMvK/hgCQdl9f+tp0m+j12x4xDLdVf4OqnTXMbqvPDu3uQVKvaj/k1 gHhsFA1tFgSbuJ8InltUsrPEQqbceeJsj50xHVAKijqI6LYeRPPSU7aE9obn+OzH n2nd5sLKvMI/dqdJvW6i5KPydqTH3r3iA7D+ne/XQj0s0EMXvXUPmDT1+ijTnM4a yfm6W5p7L/c3Ugf1Pz5PfnPl4BxBwZMfW5ie/UO8j5C6Rl0iPaOGuuHurocaaJb3 MefR/7NEAR3G8MhJyL2+70jbbwhjpqR2b5ooz1vpuulPHxjeU45BY60XIBWq1afR ewsc12MMCYB695ieYWoHdaWgxD/jhffyRuajfpkXKIZEMgDxS03sMhdULXENVMx1 M0ZQ01g/NLWt9ti9DY3eTKB3ymOhnBa1sa77nGGUHkITq4DQKwPX1J9FP/HT6RNt uOvzeH5kGzc7tqOlZAO0kHbwhQG1uqGcd78IYd4lgf/XfkSgDERTWjnJmnQbwr9m 3PFuST2u8eyO+8Lh1MK76TXOEkXsHMdFugPmb6SlgtMEPKGVLDPlsj52o/LFtgzl eygfMiBFr2+ttkZ6IpNcpmQ4IztmDpz6XoMk3PqDAfUTUSYpCnq1gAEuff/eisCM Odva1ZZaeQ7WpxhsP8rr =gsQJ -----END PGP SIGNATURE----- Merge tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kconfig consolidation from Masahiro Yamada: "Consolidation of Kconfig files by Christoph Hellwig. Move the source statements of arch-independent Kconfig files instead of duplicating the includes in every arch/$(SRCARCH)/Kconfig" * tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: add a Memory Management options" menu kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt kconfig: use a menu in arch/Kconfig to reduce clutter kconfig: include kernel/Kconfig.preempt from init/Kconfig Kconfig: consolidate the "Kernel hacking" menu kconfig: include common Kconfig files from top-level Kconfig kconfig: remove duplicate SWAP symbol defintions um: create a proper drivers Kconfig um: cleanup Kconfig files um: stop abusing KBUILD_KCONFIG	2018-08-15 13:05:12 -07:00
Linus Torvalds	e026bcc561	Kbuild updates for v4.19 - verify depmod is installed before modules_install - support build salt in case build ids must be unique between builds - allow users to specify additional host compiler flags via HOSTFLAGS, and rename internal variables to KBUILD_HOSTFLAGS - update buildtar script to drop vax support, add arm64 support - update builddeb script for better debarch support - document the pit-fall of if_changed usage - fix parallel build of UML with O= option - make 'samples' target depend on headers_install to fix build errors - remove deprecated host-progs variable - add a new coccinelle script for refcount_t vs atomic_t check - improve double-test coccinelle script - misc cleanups and fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJbdFZ0AAoJED2LAQed4NsGcHYP/23txxk3GRP7O4UkfPw9Rtky MHiXTgcoy2vbG+l12BgzWX+qFii8XTUe3dQtK4HnGQFUIBtEBV/hpZPJtxfgGSev Zou5cv1kr5rNzTkCn//TG3O6/WIkTBCe2hahDCtmGDI3kd/cPK4dHbU/q6KpaqIJ qzZYBXIvCeu2GM8idQoCRrwdMpgu1pBz1gz2sDje1yHH2toI7T6cXHRLQDgx+HPq LIP7W9GUsoDdXjecvPD51LiW89E6BUxETBh5Ft9r9uzwB5ylQQMcw6Qyu2DiYDUX PPsHCMiolYV+Ttcy+vj/67KOvKmEaFotssck+RD/xDCF17zKhRkup+YM8kPLHTVZ TcAUZadbnT6U/s2W6GFwvVbN/P7cc3aif+aNCC/Pl23yagp3pydlSCocYxQgiVR7 /rx48haYDEgu/MJ1X0dOpSO0ErY7zu2OoAlNerW+D9QizwbP+WtZO/CJH8SxQRuN dQ1xmyNrie+ODgi9tbc4eBrsb+1rioX927TP5MbJcfXt5CTsxDmIqop5XwyYIoQN ZWWlzC8Ii3P2trAVpBgM2IEbngSxwr6T9Wbf1ScJnPKr/o1rq+pBk49cYstTz3kQ OwJ8gPwUrkW4R+hlD7L6mL/WcrKzZBQS0Ij1QW2kVSEhRrsKo99psE1/rGehnHu9 KGB0LYYCqGSOHR4zOjg0 =VjfG -----END PGP SIGNATURE----- Merge tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - verify depmod is installed before modules_install - support build salt in case build ids must be unique between builds - allow users to specify additional host compiler flags via HOSTFLAGS, and rename internal variables to KBUILD_HOSTFLAGS - update buildtar script to drop vax support, add arm64 support - update builddeb script for better debarch support - document the pit-fall of if_changed usage - fix parallel build of UML with O= option - make 'samples' target depend on headers_install to fix build errors - remove deprecated host-progs variable - add a new coccinelle script for refcount_t vs atomic_t check - improve double-test coccinelle script - misc cleanups and fixes * tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits) coccicheck: return proper error code on fail Coccinelle: doubletest: reduce side effect false positives kbuild: remove deprecated host-progs variable kbuild: make samples really depend on headers_install um: clean up archheaders recipe kbuild: add %asm-generic to no-dot-config-targets um: fix parallel building with O= option scripts: Add Python 3 support to tracing/draw_functrace.py builddeb: Add automatic support for sh{3,4}{,eb} architectures builddeb: Add automatic support for riscv* architectures builddeb: Add automatic support for m68k architecture builddeb: Add automatic support for or1k architecture builddeb: Add automatic support for sparc64 architecture builddeb: Add automatic support for mips{,64}r6{,el} architectures builddeb: Add automatic support for mips64el architecture builddeb: Add automatic support for ppc64 and powerpcspe architectures builddeb: Introduce functions to simplify kconfig tests in set_debarch builddeb: Drop check for 32-bit s390 builddeb: Change architecture detection fallback to use dpkg-architecture builddeb: Skip architecture detection when KBUILD_DEBARCH is set ...	2018-08-15 12:09:03 -07:00
Linus Torvalds	b125d90388	Printk changes for 4.19 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJbct5hAAoJEFKgDEdIgJTysFoP/jRoozkNpUr4fqcbN+1GCSjY /vaWMJ6cQBHm9b5AE+EYwomTqx5gkaaAw8oR3tnHvBa2NGsE4xzO2LUyuH95FgMo cE74MGS492xeCskEBHk0pbrdD+DOuIIqWlQcDP0mH0OhlBiuW/+j72HhDfOU5yTv uE/X+SyxnYgWW9rKGrkmHqVwJPiEgjtr1tsfsewWVF4MNndCAFYeOgpFZBfhBdnq g57IRjTHLnkAClFrOzUpGIZ/tvN6iyENug860EORYFzSO2SVZ3Jm7v8AcXLQ6lj0 LKmBAoFWeJ95rqJS+rBq3s0hWQLtbVJgQyz2ujN5+a+oZ/+brtQmTHjD9Nri0cSf cKUmSfMX4tqdMcGjc7B2RqAtdthst2kYy6RXlp4BzYcdoxDr4U45VTqS8gZQuPSt cdLUgoUEEX7nt1zlQER6GDzWc3FVKDXSHbDM+4OykFF3k1JCgoDq0kemAmsZw2A6 2QGs7RqBKYidR7DB+beZj2YV0ihRkykwb5c+NUASJV5mIB8TjjmOEw52Q0Fc9Hl1 WbA0wm/+pNG7ur9hF0VZ7zy9Rfxgx2Ec5ekWGTgt8xog3B1Dsqix5RpGRDX/f1u8 /dJxHD70pzADwdXXF5hrdCRtCXsTd/27SnWREYhMhfqwu/awc3boOKonOITCYmvv 7eiffcIaUbfiKsIhgrf+ =TTFG -----END PGP SIGNATURE----- Merge tag 'printk-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Different vendors have a different expectation about a console quietness. Make it configurable to reduce bike-shedding about the upstream default - Decide about the message visibility when the message is stored. It avoids races caused by a delayed console handling - Always store printk() messages into the per-CPU buffers again in NMI. The only exception is when flushing trace log in panic(). There the risk of loosing messages is worth an eventual reordering - Handle invalid %pO printf modifiers correctly - Better handle %p printf modifier tests before crng is initialized - Some clean up * tag 'printk-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Do not handle %pO[^F] as %px printk: Fix warning about unused suppress_message_printing printk/nmi: Prevent deadlock when accessing the main log buffer in NMI printk: Create helper function to queue deferred console handling printk: Split the code for storing a message into the log buffer printk: Clean up syslog_print_all() printk: Remove unnecessary kmalloc() from syslog during clear printk: Make CONSOLE_LOGLEVEL_QUIET configurable printk: make sure to print log on console. lib/test_printf.c: accept "ptrval" as valid result for plain 'p' tests	2018-08-15 11:18:53 -07:00
Linus Torvalds	8c32685030	audit/stable-4.18 PR 20180814 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAltzOI8UHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQVeRaWujKfIqvOw/9F48G0G87ETN2M1O3fFLpANd2GtPH cYtg6SG8b3nZzkllUBOVZjoj7y0OyTgi+ksm5XSmgcR4npELDHWRhXaCma1s8Hxw Zpk2SWrEmwdudec9zZ8sBtvd6t63P9TkVSk3S4z+1HgH6uzPg6lMes4w/cQIdn0e uEIiuNkOOHXxWLxgphxPJ/XffQCfV6ltm9j41Z2S6RZHe/0UsJSKCwD+LwW+jLe3 0e9oUPWVnxZlhpJoctWpgtaYmG8rtqxfXruFwg2mR5d2VBs966h3FCz9b9qhtA+R HPy/KagJuUFRL+6ZlpX1dtkkNi07LKjsx3OJYBNHNKmYDA5yJF3X/mMV4uMkroFz q8k88Wi1dYWKJvLhAGALRGSbiYljLgDFJH31tN2FWHb93+l3CXLhjmwIyUxrtiW1 DfUJYM8/ZzMjghpb6YoHi1Dq+Z1cG07S1Eo0pTmtxVT2g43eXSL4OvxFhJKkCFv9 n1J/fo6JVw9i+4DqWkk+8NFGXYYZFlfrGl14szTp+8Q9Gw6OlWWcnF/o3+KCtPez 5z5+E64pAHhLa9/gaHu62eK4lIpiXv3IE8fA7OvFxrUpSJbtYc0RnGv8eKhO8J6v Q9TQKJGbbOLwP+mF/8sPexsREDaU3fB68dTj/FF5AHLH8KFUgAQYcm0bIzpNDBSd w+REW+TTiht8W6E= =tnPz -----END PGP SIGNATURE----- Merge tag 'audit-pr-20180814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit patches from Paul Moore: "Twelve audit patches for v4.19 and they run the full gamut from fixes to features. Notable changes include the ability to use the "exe" audit filter field in a wider variety of filter types, a fix for our comparison of GID/EGID in audit filter rules, better association of related audit records (connecting related audit records together into one audit event), and a fix for a potential use-after-free in audit_add_watch(). All the patches pass the audit-testsuite and merge cleanly on your current master branch" * tag 'audit-pr-20180814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix use-after-free in audit_add_watch audit: use ktime_get_coarse_real_ts64() for timestamps audit: use ktime_get_coarse_ts64() for time access audit: simplify audit_enabled check in audit_watch_log_rule_change() audit: check audit_enabled in audit_tree_log_remove_rule() cred: conditionally declare groups-related functions audit: eliminate audit_enabled magic number comparison audit: rename FILTER_TYPE to FILTER_EXCLUDE audit: Fix extended comparison of GID/EGID audit: tie ANOM_ABEND records to syscall audit: tie SECCOMP records to syscall audit: allow other filter list types for AUDIT_EXE	2018-08-15 10:46:54 -07:00
Linus Torvalds	92d4a03674	Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: - kstrdup() return value fix from Eric Biggers - Add new security_load_data hook to differentiate security checking of kernel-loaded binaries in the case of there being no associated file descriptor, from Mimi Zohar. - Add ability to IMA to specify a policy at build-time, rather than just via command line params or by loading a custom policy, from Mimi. - Allow IMA and LSMs to prevent sysfs firmware load fallback (e.g. if using signed firmware), from Mimi. - Allow IMA to deny loading of kexec kernel images, as they cannot be measured by IMA, from Mimi. * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security: check for kstrdup() failure in lsm_append() security: export security_kernel_load_data function ima: based on policy warn about loading firmware (pre-allocated buffer) module: replace the existing LSM hook in init_module ima: add build time policy ima: based on policy require signed firmware (sysfs fallback) firmware: add call to LSM hook before firmware sysfs fallback ima: based on policy require signed kexec kernel images kexec: add call to LSM hook in original kexec_load syscall security: define new LSM hook named security_kernel_load_data MAINTAINERS: remove the outdated "LINUX SECURITY MODULE (LSM) FRAMEWORK" entry	2018-08-15 10:25:26 -07:00
Linus Torvalds	1202f4fdbc	arm64 updates for 4.19 A bunch of good stuff in here: - Wire up support for qspinlock, replacing our trusty ticket lock code - Add an IPI to flush_icache_range() to ensure that stale instructions fetched into the pipeline are discarded along with the I-cache lines - Support for the GCC "stackleak" plugin - Support for restartable sequences, plus an arm64 port for the selftest - Kexec/kdump support on systems booting with ACPI - Rewrite of our syscall entry code in C, which allows us to zero the GPRs on entry from userspace - Support for chained PMU counters, allowing 64-bit event counters to be constructed on current CPUs - Ensure scheduler topology information is kept up-to-date with CPU hotplug events - Re-enable support for huge vmalloc/IO mappings now that the core code has the correct hooks to use break-before-make sequences - Miscellaneous, non-critical fixes and cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJbbV41AAoJELescNyEwWM0WoEIALhrKtsIn6vqFlSs/w6aDuJL cMWmFxjTaKLmIq2+cJIdFLOJ3CH80Pu9gB+nEv/k+cZdCTfUVKfRf28HTpmYWsht bb4AhdHMC7yFW752BHk+mzJspeC8h/2Rm8wMuNVplZ3MkPrwo3vsiuJTofLhVL/y BihlU3+5sfBvCYIsWnuEZIev+/I/s/qm1ASiqIcKSrFRZP6VTt5f9TC75vFI8seW 7yc3odKb0CArexB8yBjiPNziehctQF42doxQyL45hezLfWw4qdgHOSiwyiOMxEz9 Fwwpp8Tx33SKLNJgqoqYznGW9PhYJ7n2Kslv19uchJrEV+mds82vdDNaWRULld4= =kQn6 -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "A bunch of good stuff in here. Worth noting is that we've pulled in the x86/mm branch from -tip so that we can make use of the core ioremap changes which allow us to put down huge mappings in the vmalloc area without screwing up the TLB. Much of the positive diffstat is because of the rseq selftest for arm64. Summary: - Wire up support for qspinlock, replacing our trusty ticket lock code - Add an IPI to flush_icache_range() to ensure that stale instructions fetched into the pipeline are discarded along with the I-cache lines - Support for the GCC "stackleak" plugin - Support for restartable sequences, plus an arm64 port for the selftest - Kexec/kdump support on systems booting with ACPI - Rewrite of our syscall entry code in C, which allows us to zero the GPRs on entry from userspace - Support for chained PMU counters, allowing 64-bit event counters to be constructed on current CPUs - Ensure scheduler topology information is kept up-to-date with CPU hotplug events - Re-enable support for huge vmalloc/IO mappings now that the core code has the correct hooks to use break-before-make sequences - Miscellaneous, non-critical fixes and cleanups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits) arm64: alternative: Use true and false for boolean values arm64: kexec: Add comment to explain use of __flush_icache_range() arm64: sdei: Mark sdei stack helper functions as static arm64, kaslr: export offset in VMCOREINFO ELF notes arm64: perf: Add cap_user_time aarch64 efi/libstub: Only disable stackleak plugin for arm64 arm64: drop unused kernel_neon_begin_partial() macro arm64: kexec: machine_kexec should call __flush_icache_range arm64: svc: Ensure hardirq tracing is updated before return arm64: mm: Export __sync_icache_dcache() for xen-privcmd drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory arm64: Add support for STACKLEAK gcc plugin arm64: Add stack information to on_accessible_stack drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported arm64: fix ACPI dependencies rseq/selftests: Add support for arm64 arm64: acpi: fix alignment fault in accessing ACPI efi/arm: map UEFI memory map even w/o runtime services enabled efi/arm: preserve early mapping of UEFI memory map longer for BGRT drivers: acpi: add dependency of EFI for arm64 ...	2018-08-14 16:39:13 -07:00
Alexander Shishkin	d83212d5dd	kallsyms, x86: Export addresses of PTI entry trampolines Currently, the addresses of PTI entry trampolines are not exported to user space. Kernel profiling tools need these addresses to identify the kernel code, so add a symbol and address for each CPU's PTI entry trampoline. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1528289651-4113-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-14 19:12:29 -03:00
Adrian Hunter	b966794220	kallsyms: Simplify update_iter_mod() The logic in update_iter_mod() is overcomplicated and gets worse every time another get_ksymbol_* function is added. In preparation for adding another get_ksymbol_* function, simplify logic in update_iter_mod(). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: (ftrace changes only) Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1528289651-4113-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-14 19:10:23 -03:00
zhangyi (F)	3df6f61fff	PM / sleep: wakeup: Fix build error caused by missing SRCU support Commit `ea0212f40c` (power: auto select CONFIG_SRCU) made the code in drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to select CONFIG_SRCU in Kconfig, which leads to the following build error if CONFIG_SRCU is not selected somewhere else: drivers/built-in.o: In function `wakeup_source_remove': (.text+0x3c6fc): undefined reference to `synchronize_srcu' drivers/built-in.o: In function `pm_print_active_wakeup_sources': (.text+0x3c7a8): undefined reference to `__srcu_read_lock' drivers/built-in.o: In function `pm_print_active_wakeup_sources': (.text+0x3c84c): undefined reference to `__srcu_read_unlock' drivers/built-in.o: In function `device_wakeup_arm_wake_irqs': (.text+0x3d1d8): undefined reference to `__srcu_read_lock' drivers/built-in.o: In function `device_wakeup_arm_wake_irqs': (.text+0x3d228): undefined reference to `__srcu_read_unlock' drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs': (.text+0x3d24c): undefined reference to `__srcu_read_lock' drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs': (.text+0x3d29c): undefined reference to `__srcu_read_unlock' drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu' Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled. Fixes: `ea0212f40c` (power: auto select CONFIG_SRCU) Cc: 4.2+ <stable@vger.kernel.org> # 4.2+ Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> [ rjw: Minor subject/changelog fixups ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-08-15 00:07:08 +02:00
Abel Vesa	269777aa53	cpu/hotplug: Non-SMP machines do not make use of booted_once Commit `0cc3cd2165` ("cpu/hotplug: Boot HT siblings at least once") breaks non-SMP builds. [ I suspect the 'bool' fields should just be made to be bitfields and be exposed regardless of configuration, but that's a separate cleanup that I'll leave to the owners of this file for later. - Linus ] Fixes: `0cc3cd2165` ("cpu/hotplug: Boot HT siblings at least once") Cc: Dave Hansen <dave.hansen@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Abel Vesa <abelvesa@linux.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-14 15:00:00 -07:00
Linus Torvalds	e6ecec342f	This was a moderately busy cycle for docs, with the usual collection of small fixes and updates. We also have new ktime_get_() docs from Arnd, some kernel-doc fixes, a new set of Italian translations (non so se vale la pena, ma non fa male - speriamo bene), and some extensive early memory-management documentation improvements from Mike Rapoport. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbcZVtAAoJEI3ONVYwIuV64ekP/jgTlMi/fErRu6zlsrsWgiIK ir8ueCQ1OSiwOA+N2fUb+2+zlLlfTLgQ+o5IwmZk6rizG87fQ3Rp6i+bvYAZWITh YUuls3VhtRlJZqG1EW7gww1Q2IhRO6GhcpsIamAvhrSLFPaCKiN3JomJi/X47Pfj Ibl24HsruI2fDM1JwWRwCtE5J6vCL9lH1/5v4zVv7xdrVgTrwkZ/hAsE7HBNNat5 dSku2u9HSAXa4KR4sLWrVJ8UI5+fylwilz/57HhCeduQDwKCHE/mfhxLdqL4Oa4q oHTCNq2zTUj4w7GTvHS1g0P3y/iWMYjAzH2is+BokilpIC65NwwsKx2ybZd3Srdh zwP/kYk5U+mYSgdDlyNqwPCibw8KDXB3srKMzyQSN6tkosKCOHFSXF0Js0eupi7t NqmGigl3Qozj1uvU6Wy7vh58u+GFeuO4PF566t2m70Jp0cWzuVKLrBvgNO1X37rB aEBrpOYB/H54t/qf79IFW//pptWXFNZ3S9AgyDVIcmX5C2ihaCoaPNRTom+KbH/D QEoH9rwWSoCi2DGoR83D+G8thCUfB4yfEGulSSIA4pUR7qvIR5rd1ZioI/qtgAHm l7MjTbLpPwiMnpFkBrxxxlFFb4gbETakMBGYoYee8ww5WbQLu0qA93AbwIXyjhE8 mqCOLyBdCAZ0mNxqPSsc =x/P0 -----END PGP SIGNATURE----- Merge tag 'docs-4.19' of git://git.lwn.net/linux Pull documentation update from Jonathan Corbet: "This was a moderately busy cycle for docs, with the usual collection of small fixes and updates. We also have new ktime_get_() docs from Arnd, some kernel-doc fixes, a new set of Italian translations (non so se vale la pena, ma non fa male - speriamo bene), and some extensive early memory-management documentation improvements from Mike Rapoport" * tag 'docs-4.19' of git://git.lwn.net/linux: (52 commits) Documentation: corrections to console/console.txt Documentation: add ioctl number entry for v4l2-subdev.h Remove gendered language from management style documentation scripts/kernel-doc: Escape all literal braces in regexes docs/mm: add description of boot time memory management docs/mm: memblock: add overview documentation docs/mm: memblock: add kernel-doc description for memblock types docs/mm: memblock: add kernel-doc comments for memblock_add[_node] docs/mm: memblock: update kernel-doc comments mm/memblock: add a name for memblock flags enumeration docs/mm: bootmem: add overview documentation docs/mm: bootmem: add kernel-doc description of 'struct bootmem_data' docs/mm: bootmem: fix kernel-doc warnings docs/mm: nobootmem: fixup kernel-doc comments mm/bootmem: drop duplicated kernel-doc comments Documentation: vm.txt: Adding 'nr_hugepages_mempolicy' parameter description. doc:it_IT: translation for kernel-hacking docs: Fix the reference labels in Locking.rst doc: tracing: Fix a typo of trace_stat mm: Introduce new type vm_fault_t ...	2018-08-14 14:29:31 -07:00
Linus Torvalds	b018fc9800	Power management updates for 4.19-rc1 - Add a new framework for CPU idle time injection (Daniel Lezcano). - Add AVS support to the armada-37xx cpufreq driver (Gregory CLEMENT). - Add support for current CPU frequency reporting to the ACPI CPPC cpufreq driver (George Cherian). - Rework the cooling device registration in the imx6q/thermal driver (Bastian Stender). - Make the pcc-cpufreq driver refuse to work with dynamic scaling governors on systems with many CPUs to avoid scalability issues with it (Rafael Wysocki). - Fix the intel_pstate driver to report different maximum CPU frequencies on systems where they really are different and to ignore the turbo active ratio if hardware-managend P-states (HWP) are in use; make it use the match_string() helper (Xie Yisheng, Srinivas Pandruvada). - Fix a minor deferred probe issue in the qcom-kryo cpufreq driver (Niklas Cassel). - Add a tracepoint for the tracking of frequency limits changes (from Andriod) to the cpufreq core (Ruchi Kandoi). - Fix a circular lock dependency between CPU hotplug and sysfs locking in the cpufreq core reported by lockdep (Waiman Long). - Avoid excessive error reports on driver registration failures in the ARM cpuidle driver (Sudeep Holla). - Add a new device links flag to the driver core to make links go away automatically on supplier driver removal (Vivek Gautam). - Eliminate potential race condition between system-wide power management transitions and system shutdown (Pingfan Liu). - Add a quirk to save NVS memory on system suspend for the ASUS 1025C laptop (Willy Tarreau). - Make more systems use suspend-to-idle (instead of ACPI S3) by default (Tristian Celestin). - Get rid of stack VLA usage in the low-level hibernation code on 64-bit x86 (Kees Cook). - Fix error handling in the hibernation core and mark an expected fall-through switch in it (Chengguang Xu, Gustavo Silva). - Extend the generic power domains (genpd) framework to support attaching a device to a power domain by name (Ulf Hansson). - Fix device reference counting and user limits initialization in the devfreq core (Arvind Yadav, Matthias Kaehlcke). - Fix a few issues in the rk3399_dmc devfreq driver and improve its documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner). - Drop a redundant error message from the exynos-ppmu devfreq driver (Markus Elfring). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJbcqOqAAoJEILEb/54YlRxOxMP/2ZFvnXU0pey/VX/+TelLMS7 /ROVGQ+s75QP1c9P/3BjvnXc0dsMRLRFPog+7wyoG/2DbEIV25COyAYsmSE0TRni XUaZO6YAx4/e3pm2AfamYbLCPvjw85eucHg5QJQ4b1mSVRNJOsNv+fUo6lmxwvnm j9kHvfttFeIhoa/3wa7hbhPKLln46atnpVSxCIceY7L5EFNhkKBvQt6B5yx9geb9 QMY6ohgkyN+bnK9QySXX+trcWpzx1uGX0apI07NkX7n9QGFdU4lCW8lsAf8jMC3g PPValTsUQsdRONUJJsrgqBioq4tvtgQWibyS2tfRrOGXYvHpJNpGmHVplfsrf/SE cvlsciR47YbmrXZuqg/r8hql+qefNN16/rnZIZ9VnbcG806VBy2z8IzI5wcdWR7p vzxhbCqVqOHcEdEwRwvuM2io67MWvkGtKsbCP+33DBh8SubpsECpKN4nIDboa3SE CJ15RUqXnF6enmmfCKOoHZeu7iXWDz6Pi71XmRzaj9DqbITVV281IerqLgV3rbal BVa53+202iD0IP+2b7KedGe/5ALlI97ffN0gB+L/eB832853DKSZQKzcvvpRhEN7 Iv2crnUwuQED9ns8P7hzp1Bk9CFCAOLW8UM43YwZRPWnmdeSsPJusJ5lzkAf7bss wfsFoUE3RaY4msnuHyCh =kv2M -----END PGP SIGNATURE----- Merge tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add a new framework for CPU idle time injection, to be used by all of the idle injection code in the kernel in the future, fix some issues and add a number of relatively small extensions in multiple places. Specifics: - Add a new framework for CPU idle time injection (Daniel Lezcano). - Add AVS support to the armada-37xx cpufreq driver (Gregory CLEMENT). - Add support for current CPU frequency reporting to the ACPI CPPC cpufreq driver (George Cherian). - Rework the cooling device registration in the imx6q/thermal driver (Bastian Stender). - Make the pcc-cpufreq driver refuse to work with dynamic scaling governors on systems with many CPUs to avoid scalability issues with it (Rafael Wysocki). - Fix the intel_pstate driver to report different maximum CPU frequencies on systems where they really are different and to ignore the turbo active ratio if hardware-managend P-states (HWP) are in use; make it use the match_string() helper (Xie Yisheng, Srinivas Pandruvada). - Fix a minor deferred probe issue in the qcom-kryo cpufreq driver (Niklas Cassel). - Add a tracepoint for the tracking of frequency limits changes (from Andriod) to the cpufreq core (Ruchi Kandoi). - Fix a circular lock dependency between CPU hotplug and sysfs locking in the cpufreq core reported by lockdep (Waiman Long). - Avoid excessive error reports on driver registration failures in the ARM cpuidle driver (Sudeep Holla). - Add a new device links flag to the driver core to make links go away automatically on supplier driver removal (Vivek Gautam). - Eliminate potential race condition between system-wide power management transitions and system shutdown (Pingfan Liu). - Add a quirk to save NVS memory on system suspend for the ASUS 1025C laptop (Willy Tarreau). - Make more systems use suspend-to-idle (instead of ACPI S3) by default (Tristian Celestin). - Get rid of stack VLA usage in the low-level hibernation code on 64-bit x86 (Kees Cook). - Fix error handling in the hibernation core and mark an expected fall-through switch in it (Chengguang Xu, Gustavo Silva). - Extend the generic power domains (genpd) framework to support attaching a device to a power domain by name (Ulf Hansson). - Fix device reference counting and user limits initialization in the devfreq core (Arvind Yadav, Matthias Kaehlcke). - Fix a few issues in the rk3399_dmc devfreq driver and improve its documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner). - Drop a redundant error message from the exynos-ppmu devfreq driver (Markus Elfring)" * tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits) PM / reboot: Eliminate race between reboot and suspend PM / hibernate: Mark expected switch fall-through cpufreq: intel_pstate: Ignore turbo active ratio in HWP cpufreq: Fix a circular lock dependency problem cpu/hotplug: Add a cpus_read_trylock() function x86/power/hibernate_64: Remove VLA usage cpufreq: trace frequency limits change cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC cpufreq: armada-37xx: Add AVS support dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding PM / devfreq: rk3399_dmc: Fix duplicated opp table on reload. PM / devfreq: Init user limits from OPP limits, not viceversa PM / devfreq: rk3399_dmc: fix spelling mistakes. PM / devfreq: rk3399_dmc: do not print error when get supply and clk defer. dt-bindings: devfreq: rk3399_dmc: move interrupts to be optional. PM / devfreq: rk3399_dmc: remove wait for dcf irq event. dt-bindings: clock: add rk3399 DDR3 standard speed bins. ...	2018-08-14 13:12:24 -07:00
Linus Torvalds	f66dc72320	dma-mapping updates for 4.19 - a series from Robin to fix bus imposed dma limits by adding a separate mask for them to struct device instead of trying to squeeze a second meaning out of the existing dma mask as we did before. This has ACKs from the various other subsystems touched - a small swiotlb cleanup from Kees (acked by Konrad) - conversion of nios2 and sh to the new generic dma-noncoherent code. Various other architecture conversions will come through the architectures maintainers trees. -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAltxLuILHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYPKMA//WY7UGhTrQ9kSjrfCRYJnSgLPjkqwY1wuYdXg1FSw rUYtQXCem59W48aX9QqHzGiCXf3uVJDBCO7/ikP8R/hHJTC/ih6tTJAtKMBTaDm5 NEVU4lZCzHfzliF3vhj/3+ifWThI0gSo5QQolf4hD5255a95HMLZgJ9zt0IRVgdj aKfFfKG824LX+G4iuN27jwmvfOJM4KwzaAFID6Rc4Pt1BXy3CmLkED63p6YvTAk/ pCR9LZpYWPccEuA1YfHoDJrB4onTw7zkVHFBtMZ5xW+ypJwRACSpLp/8IjeG841x /vA3BsDNu8zrRj2cHpD+fViPriMm/qdMew45keII/0jCeCs640wg9Cy7f6yl2QQh nsCJA8DEKvY8ebaX12XyWC3GNbLNru1XORrMl9+Lqu0kUHbdNxRDg47fWILLdxGc QvgassFOxb7wg9GwIsoGpOItagl2hHP3j0vMrHlfWaBNefi5ZtWiZs8aJBgm2rJm kzjmXGzx9IBy5SWSpou5UDCvf4xF7CkTA11YKPyBt2H3hquFKBH6R+6SzAm0n4rn WcAlqx4BQX1no/p0hPM4b01yPwgd3OmNmITAaYKWTvDXKpImRkPYWIfvLQBx4/PW 9TyVd1QWO1wMbCziVCaHVwpsNAHFTK/kw41OLjK4H1P+HMorGH0crhimfl5BiaJY XJk= =WNwP -----END PGP SIGNATURE----- Merge tag 'dma-mapping-4.19' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - a series from Robin to fix bus imposed dma limits by adding a separate mask for them to struct device instead of trying to squeeze a second meaning out of the existing dma mask as we did before. This has ACKs from the various other subsystems touched - a small swiotlb cleanup from Kees (acked by Konrad) - conversion of nios2 and sh to the new generic dma-noncoherent code. Various other architecture conversions will come through the architectures maintainers trees. * tag 'dma-mapping-4.19' of git://git.infradead.org/users/hch/dma-mapping: sh: use generic dma_noncoherent_ops sh: split arch/sh/mm/consistent.c sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case sh: introduce a sh_cacheop_vaddr helper sh: simplify get_arch_dma_ops OF: Don't set default coherent DMA mask ACPI/IORT: Don't set default coherent DMA mask iommu/dma: Respect bus DMA limit for IOVAs of/device: Set bus DMA mask as appropriate ACPI/IORT: Set bus DMA mask as appropriate dma-mapping: Generalise dma_32bit_limit flag ACPI/IORT: Support address size limit for root complexes of/platform: Initialise default DMA masks nios2: use generic dma_noncoherent_ops swiotlb: clean up reporting dma-mapping: relax warning for per-device areas	2018-08-14 11:11:52 -07:00
Linus Torvalds	73ba2fb33c	for-4.19/block-20180812 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltwvasQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpv65EACTq5gSLnJBI6ZPr1RAHruVDnjfzO2Veitl tUtjm0XfWmnEiwQ3dYvnyhy99xbyaG3900d9BClCTlH6xaUdSiQkDpcKG/R2F36J 5mZitYukQcpFAQJWF8YKsTTE7JPl4VglCIDqYiC4+C3rOSVi8lrKn2qp4J4MMCFn thRg3jCcq7c5s9Eigsop1pXWQSasubkXfk55Krcp4oybKYpYRKXXf74Mj14QAbwJ QHN3VisyAUWoBRg7UQZo1Npe2oPk6bbnJypnjf8M0M2EnlvddEkIlHob91sodka8 6p4APOEu5cbyXOBCAQsw/koff14mb8aEadqeQA68WvXfIdX9ZjfxCX0OoC3sBEXk yqJhZ0C980AM13zIBD8ejv4uasGcPca8W+47mE5P8sRiI++5kBsFWDZPCtUBna0X 2Kh24NsmEya9XRR5vsB84dsIPQ3tLMkxg/IgQRVDaSnfJz0c/+zm54xDyKRaFT4l 5iERk2WSkm9+8jNfVmWG0edrv6nRAXjpGwFfOCPh6/LCSCi4xQRULYN7sVzsX8ZK FRjt24HftBI8mJbh4BtweJvg+ppVe1gAk3IO3HvxAQhv29Hz+uvFYe9kL+3N8LJA Qosr9n9O4+wKYizJcDnw+5iPqCHfAwOm9th4pyedR+R7SmNcP3yNC8AbbheNBiF5 Zolos5H+JA== =b9ib -----END PGP SIGNATURE----- Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ...	2018-08-14 10:23:25 -07:00
Linus Torvalds	958f338e96	Merge branch 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Merge L1 Terminal Fault fixes from Thomas Gleixner: "L1TF, aka L1 Terminal Fault, is yet another speculative hardware engineering trainwreck. It's a hardware vulnerability which allows unprivileged speculative access to data which is available in the Level 1 Data Cache when the page table entry controlling the virtual address, which is used for the access, has the Present bit cleared or other reserved bits set. If an instruction accesses a virtual address for which the relevant page table entry (PTE) has the Present bit cleared or other reserved bits set, then speculative execution ignores the invalid PTE and loads the referenced data if it is present in the Level 1 Data Cache, as if the page referenced by the address bits in the PTE was still present and accessible. While this is a purely speculative mechanism and the instruction will raise a page fault when it is retired eventually, the pure act of loading the data and making it available to other speculative instructions opens up the opportunity for side channel attacks to unprivileged malicious code, similar to the Meltdown attack. While Meltdown breaks the user space to kernel space protection, L1TF allows to attack any physical memory address in the system and the attack works across all protection domains. It allows an attack of SGX and also works from inside virtual machines because the speculation bypasses the extended page table (EPT) protection mechanism. The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646 The mitigations provided by this pull request include: - Host side protection by inverting the upper address bits of a non present page table entry so the entry points to uncacheable memory. - Hypervisor protection by flushing L1 Data Cache on VMENTER. - SMT (HyperThreading) control knobs, which allow to 'turn off' SMT by offlining the sibling CPU threads. The knobs are available on the kernel command line and at runtime via sysfs - Control knobs for the hypervisor mitigation, related to L1D flush and SMT control. The knobs are available on the kernel command line and at runtime via sysfs - Extensive documentation about L1TF including various degrees of mitigations. Thanks to all people who have contributed to this in various ways - patches, review, testing, backporting - and the fruitful, sometimes heated, but at the end constructive discussions. There is work in progress to provide other forms of mitigations, which might be less horrible performance wise for a particular kind of workloads, but this is not yet ready for consumption due to their complexity and limitations" * 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) x86/microcode: Allow late microcode loading with SMT disabled tools headers: Synchronise x86 cpufeatures.h for L1TF additions x86/mm/kmmio: Make the tracer robust against L1TF x86/mm/pat: Make set_memory_np() L1TF safe x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert x86/speculation/l1tf: Invert all not present mappings cpu/hotplug: Fix SMT supported evaluation KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry x86/speculation: Simplify sysfs report of VMX L1TF vulnerability Documentation/l1tf: Remove Yonah processors from not vulnerable list x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr() x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d x86: Don't include linux/irq.h from asm/hardirq.h x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond' x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush() cpu/hotplug: detect SMT disabled by BIOS ...	2018-08-14 09:46:06 -07:00
Petr Mladek	9f68cb5791	Merge branch 'for-4.19-nmi' into for-linus	2018-08-14 13:36:15 +02:00
Rafael J. Wysocki	7425ecd5e3	Merge branch 'pm-cpufreq' Merge cpufreq changes for 4.19. These are driver extensions, some driver and core fixes and a new tracepoint for the tracking of frequency limits changes (coming from Android). * pm-cpufreq: cpufreq: intel_pstate: Ignore turbo active ratio in HWP cpufreq: Fix a circular lock dependency problem cpu/hotplug: Add a cpus_read_trylock() function cpufreq: trace frequency limits change cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC cpufreq: armada-37xx: Add AVS support dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding cpufreq: imx6q/thermal: imx: register cooling device depending on OF cpufreq: intel_pstate: use match_string() helper	2018-08-14 09:57:34 +02:00
Rafael J. Wysocki	17bc3432e3	Merge branches 'pm-core', 'pm-domains', 'pm-sleep', 'acpi-pm' and 'pm-cpuidle' Merge changes in the PM core, system-wide PM infrastructure, generic power domains (genpd) framework, ACPI PM infrastructure and cpuidle for 4.19. * pm-core: driver core: Add flag to autoremove device link on supplier unbind driver core: Rename flag AUTOREMOVE to AUTOREMOVE_CONSUMER * pm-domains: PM / Domains: Introduce dev_pm_domain_attach_by_name() PM / Domains: Introduce option to attach a device by name to genpd PM / Domains: dt: Add a power-domain-names property * pm-sleep: PM / reboot: Eliminate race between reboot and suspend PM / hibernate: Mark expected switch fall-through x86/power/hibernate_64: Remove VLA usage PM / hibernate: cast PAGE_SIZE to int when comparing with error code * acpi-pm: ACPI / PM: save NVS memory for ASUS 1025C laptop ACPI / PM: Default to s2idle in all machines supporting LP S0 * pm-cpuidle: ARM: cpuidle: silence error on driver registration failure	2018-08-14 09:48:10 +02:00
Linus Torvalds	e5a32b5b21	Here are the main MIPS changes for 4.19. An overview of the general architecture changes: - Massive DMA ops refactoring from Christoph Hellwig (huzzah for deleting crufty code!). - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding regsets to expose DSP ASE & floating point mode state respectively, both for live debugging & core dumps. - We better optimize our code by hard-coding cpu_has_* macros at compile time where their values are known due to the ISA revision that the kernel build is targeting. - The EJTAG exception handler now better handles SMP systems, where it was previously possible for CPUs to clobber a register value saved by another CPU. - Our implementation of memset() gained a couple of fixes for MIPSr6 systems to return correct values in some cases where stores fault. - We now implement ioremap_wc() using the uncached-accelerated cache coherency attribute where supported, which is detected during boot, and fall back to plain uncached access where necessary. The MIPS-specific (and unused in tree) ioremap_uncached_accelerated() & ioremap_cacheable_cow() are removed. - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP systems by reworking the way we ensure remote CPUs that may be running threads within the affected process switch mode. - Systems using the MIPS Coherence Manager will now set the MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache maintenance overhead when flushing the icache. - A few fixes were made for building with clang/LLVM, which now sucessfully builds kernels for many of our platforms. - Miscellaneous cleanups all over. And some platform-specific changes: - ar7 gained stubs for a few clock API functions to fix build failures for some drivers. - ath79 gained support for a few new SoCs, a few fixes & better gpio-keys support. - Ci20 now exposes its SPI bus using the spi-gpio driver. - The generic platform can now auto-detect a suitable value for PHYS_OFFSET based upon the memory map described by the device tree, allowing us to avoid wasting memory on page book-keeping for systems where RAM starts at a non-zero physical address. - Ingenic systems using the jz4740 platform code now link their vmlinuz higher to allow for kernels of a realistic size. - Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2 to avoid CPU errata. - Loongson64 gains a couple of fixes, a workaround for a write buffering issue & support for the Loongson 3A R3.1 CPU. - Malta now uses the piix4-poweroff driver to handle powering down. - Microsemi Ocelot gained support for its SPI bus & NOR flash, its second MDIO bus and can now be supported by a FIT/.itb image. - Octeon saw a bunch of header cleanups which remove a lot of duplicate or unused code. -----BEGIN PGP SIGNATURE----- iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW3G6JxUccGF1bC5idXJ0 b25AbWlwcy5jb20ACgkQPqefrLV1AN0n/gD/Rpdgay31G/4eTTKBmBrcaju6Shjt /2Iu6WC5Sj4hDHUBAJSbuI+B9YjcNsjekBYxB/LLD7ImcLBl6nLMIvKmXLAL =cUiF -----END PGP SIGNATURE----- Merge tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Paul Burton: "Here are the main MIPS changes for 4.19. An overview of the general architecture changes: - Massive DMA ops refactoring from Christoph Hellwig (huzzah for deleting crufty code!). - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding regsets to expose DSP ASE & floating point mode state respectively, both for live debugging & core dumps. - We better optimize our code by hard-coding cpu_has_* macros at compile time where their values are known due to the ISA revision that the kernel build is targeting. - The EJTAG exception handler now better handles SMP systems, where it was previously possible for CPUs to clobber a register value saved by another CPU. - Our implementation of memset() gained a couple of fixes for MIPSr6 systems to return correct values in some cases where stores fault. - We now implement ioremap_wc() using the uncached-accelerated cache coherency attribute where supported, which is detected during boot, and fall back to plain uncached access where necessary. The MIPS-specific (and unused in tree) ioremap_uncached_accelerated() & ioremap_cacheable_cow() are removed. - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP systems by reworking the way we ensure remote CPUs that may be running threads within the affected process switch mode. - Systems using the MIPS Coherence Manager will now set the MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache maintenance overhead when flushing the icache. - A few fixes were made for building with clang/LLVM, which now sucessfully builds kernels for many of our platforms. - Miscellaneous cleanups all over. And some platform-specific changes: - ar7 gained stubs for a few clock API functions to fix build failures for some drivers. - ath79 gained support for a few new SoCs, a few fixes & better gpio-keys support. - Ci20 now exposes its SPI bus using the spi-gpio driver. - The generic platform can now auto-detect a suitable value for PHYS_OFFSET based upon the memory map described by the device tree, allowing us to avoid wasting memory on page book-keeping for systems where RAM starts at a non-zero physical address. - Ingenic systems using the jz4740 platform code now link their vmlinuz higher to allow for kernels of a realistic size. - Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2 to avoid CPU errata. - Loongson64 gains a couple of fixes, a workaround for a write buffering issue & support for the Loongson 3A R3.1 CPU. - Malta now uses the piix4-poweroff driver to handle powering down. - Microsemi Ocelot gained support for its SPI bus & NOR flash, its second MDIO bus and can now be supported by a FIT/.itb image. - Octeon saw a bunch of header cleanups which remove a lot of duplicate or unused code" * tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (123 commits) MIPS: Remove remnants of UASM_ISA MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send() MIPS: VDSO: Force link endianness MIPS: Always specify -EB or -EL when using clang MIPS: Use dins to simplify __write_64bit_c0_split() MIPS: Use read-write output operand in __write_64bit_c0_split() MIPS: Avoid using array as parameter to write_c0_kpgd() MIPS: vdso: Allow clang's --target flag in VDSO cflags MIPS: genvdso: Remove GOT checks MIPS: Remove obsolete MIPS checks for DST node "chosen@0" MIPS: generic: Remove input symbols from defconfig MIPS: Delete unused code in linux32.c MIPS: Remove unused sys_32_mmap2 MIPS: Remove nabi_no_regargs mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123 mips: dts: mscc: Add spi on Ocelot MIPS: Loongson: Merge load addresses MIPS: Loongson: Set Loongson32 to MIPS32R1 MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller MIPS: generic: Select MIPS_AUTO_PFN_OFFSET ...	2018-08-13 19:24:32 -07:00
Linus Torvalds	2280a5360e	Merge branch 'parisc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - parisc now uses the generic dma_noncoherent_ops implementation (Christoph Hellwig) - further memory barrier and spinlock improvements (John David Anglin) - prepare removal of current_text_addr() functions (Nick Desaulniers) - improve kernel stack unwinding on parisc (me) - drop ENOTSUP which was defined on parisc only (me) * 'parisc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix and improve kernel stack unwinding parisc: Remove unnecessary barriers from spinlock.h parisc: Remove ordered stores from syscall.S parisc: prefer _THIS_IP_ and _RET_IP_ statement expressions parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature parisc: Drop architecture-specific ENOTSUP define parisc: use generic dma_noncoherent_ops parisc: always use flush_kernel_dcache_range for DMA cache maintainance parisc: merge pcx_dma_ops and pcxl_dma_ops	2018-08-13 19:18:02 -07:00
Linus Torvalds	13e091b6dd	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer updates from Thomas Gleixner: "Early TSC based time stamping to allow better boot time analysis. This comes with a general cleanup of the TSC calibration code which grew warts and duct taping over the years and removes 250 lines of code. Initiated and mostly implemented by Pavel with help from various folks" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) x86/kvmclock: Mark kvm_get_preset_lpj() as __init x86/tsc: Consolidate init code sched/clock: Disable interrupts when calling generic_sched_clock_init() timekeeping: Prevent false warning when persistent clock is not available sched/clock: Close a hole in sched_clock_init() x86/tsc: Make use of tsc_calibrate_cpu_early() x86/tsc: Split native_calibrate_cpu() into early and late parts sched/clock: Use static key for sched_clock_running sched/clock: Enable sched clock early sched/clock: Move sched clock initialization and merge with generic clock x86/tsc: Use TSC as sched clock early x86/tsc: Initialize cyc2ns when tsc frequency is determined x86/tsc: Calibrate tsc only once ARM/time: Remove read_boot_clock64() s390/time: Remove read_boot_clock64() timekeeping: Default boot time offset to local_clock() timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset() s390/time: Add read_persistent_wall_and_boot_offset() x86/xen/time: Output xen sched_clock time from 0 x86/xen/time: Initialize pv xen time in init_hypervisor_platform() ...	2018-08-13 18:28:19 -07:00
Ravi Bangoria	6d43743e90	Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Add addition argument 'arch_uprobe' to uprobe_write_opcode(). We need this in later set of patches. Link: http://lkml.kernel.org/r/20180809041856.1547-3-ravi.bangoria@linux.ibm.com Reviewed-by: Song Liu <songliubraving@fb.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-13 20:08:33 -04:00
Ravi Bangoria	38e967ae1e	Uprobes: Simplify uprobe_register() body Simplify uprobe_register() function body and let __uprobe_register() handle everything. Also move dependency functions around to avoid build failures. Link: http://lkml.kernel.org/r/20180809041856.1547-2-ravi.bangoria@linux.ibm.com Reviewed-by: Song Liu <songliubraving@fb.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-13 20:07:06 -04:00
Linus Torvalds	203b4fc903	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Thomas Gleixner: - Make lazy TLB mode even lazier to avoid pointless switch_mm() operations, which reduces CPU load by 1-2% for memcache workloads - Small cleanups and improvements all over the place * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove redundant check for kmem_cache_create() arm/asm/tlb.h: Fix build error implicit func declaration x86/mm/tlb: Make clear_asid_other() static x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off() x86/mm/tlb: Always use lazy TLB mode x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs x86/mm/tlb: Make lazy TLB mode lazier x86/mm/tlb: Restructure switch_mm_irqs_off() x86/mm/tlb: Leave lazy TLB mode at page table free time mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids x86/mm: Add TLB purge to free pmd/pte page interfaces ioremap: Update pgtable free interfaces with addr x86/mm: Disable ioremap free page handling on x86-PAE	2018-08-13 16:29:35 -07:00
Linus Torvalds	1e45e9a95e	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timers departement more or less proudly presents: - More Y2038 timekeeping work mostly in the core code. The work is slowly, but steadily targeting the actuall syscalls. - Enhanced timekeeping suspend/resume support by utilizing clocksources which do not stop during suspend, but are otherwise not the main timekeeping clocksources. - Make NTP adjustmets more accurate and immediate when the frequency is set directly and not incrementally. - Sanitize the overrung handing of posix timers - A new timer driver for Mediatek SoCs - The usual pile of fixes and updates all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) clockevents: Warn if cpu_all_mask is used as cpumask tick/broadcast-hrtimer: Use cpu_possible_mask for ce_broadcast_hrtimer clocksource/drivers/arm_arch_timer: Fix bogus cpu_all_mask usage clocksource: ti-32k: Remove CLOCK_SOURCE_SUSPEND_NONSTOP flag timers: Clear timer_base::must_forward_clk with timer_base::lock held clocksource/drivers/sprd: Register one always-on timer to compensate suspend time clocksource/drivers/timer-mediatek: Add support for system timer clocksource/drivers/timer-mediatek: Convert the driver to timer-of clocksource/drivers/timer-mediatek: Use specific prefix for GPT clocksource/drivers/timer-mediatek: Rename mtk_timer to timer-mediatek clocksource/drivers/timer-mediatek: Add system timer bindings clocksource/drivers: Set clockevent device cpumask to cpu_possible_mask time: Introduce one suspend clocksource to compensate the suspend time time: Fix extra sleeptime injection when suspend fails timekeeping/ntp: Constify some function arguments ntp: Use kstrtos64 for s64 variable ntp: Remove redundant arguments timer: Fix coding style ktime: Provide typesafe ktime_to_ns() hrtimer: Improve kernel message printing ...	2018-08-13 13:02:31 -07:00
Linus Torvalds	8603596a32	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf update from Thomas Gleixner: "The perf crowd presents: Kernel updates: - Removal of jprobes - Cleanup and consolidatation the handling of kprobes - Cleanup and consolidation of hardware breakpoints - The usual pile of fixes and updates to PMUs and event descriptors Tooling updates: - Updates and improvements all over the place. Nothing outstanding, just the (good) boring incremental grump work" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) perf trace: Do not require --no-syscalls to suppress strace like output perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h perf tools: Allow overriding MAX_NR_CPUS at compile time perf bpf: Show better message when failing to load an object perf list: Unify metric group description format with PMU event description perf vendor events arm64: Update ThunderX2 implementation defined pmu core events perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet perf cs-etm: Fix start tracing packet handling perf build: Fix installation directory for eBPF perf c2c report: Fix crash for empty browser perf tests: Fix indexing when invoking subtests perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg perf trace beauty: Do not print NULL strarray entries perf beauty: Add a generator for IPPROTO_ socket's protocol constants tools include uapi: Grab a copy of linux/in.h perf tests: Fix complex event name parsing perf evlist: Fix error out while applying initial delay and LBR ...	2018-08-13 12:55:49 -07:00
Linus Torvalds	de5d1b39ea	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking/atomics update from Thomas Gleixner: "The locking, atomics and memory model brains delivered: - A larger update to the atomics code which reworks the ordering barriers, consolidates the atomic primitives, provides the new atomic64_fetch_add_unless() primitive and cleans up the include hell. - Simplify cmpxchg() instrumentation and add instrumentation for xchg() and cmpxchg_double(). - Updates to the memory model and documentation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) locking/atomics: Rework ordering barriers locking/atomics: Instrument cmpxchg_double() locking/atomics: Instrument xchg() locking/atomics: Simplify cmpxchg() instrumentation locking/atomics/x86: Reduce arch_cmpxchg64() instrumentation tools/memory-model: Rename litmus tests to comply to norm7 tools/memory-model/Documentation: Fix typo, smb->smp sched/Documentation: Update wake_up() & co. memory-barrier guarantees locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock() sched/core: Use smp_mb() in wake_woken_function() tools/memory-model: Add informal LKMM documentation to MAINTAINERS locking/atomics/Documentation: Describe atomic_set() as a write operation tools/memory-model: Make scripts executable tools/memory-model: Remove ACCESS_ONCE() from model tools/memory-model: Remove ACCESS_ONCE() from recipes locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example MAINTAINERS: Add Daniel Lustig as an LKMM reviewer tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name tools/memory-model: Add litmus test for full multicopy atomicity locking/refcount: Always allow checked forms ...	2018-08-13 12:23:39 -07:00
Linus Torvalds	1c59477428	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug update from Thomas Gleixner: "A trivial name fix for the hotplug state machine" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Clarify CPU hotplug step name for timers	2018-08-13 12:21:46 -07:00
Linus Torvalds	f7951c33f0	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Cleanup and improvement of NUMA balancing - Refactoring and improvements to the PELT (Per Entity Load Tracking) code - Watchdog simplification and related cleanups - The usual pile of small incremental fixes and improvements * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) watchdog: Reduce message verbosity stop_machine: Reflow cpu_stop_queue_two_works() sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() sched/numa: Use group_weights to identify if migration degrades locality sched/numa: Update the scan period without holding the numa_group lock sched/numa: Remove numa_has_capacity() sched/numa: Modify migrate_swap() to accept additional parameters sched/numa: Remove unused task_capacity from 'struct numa_stats' sched/numa: Skip nodes that are at 'hoplimit' sched/debug: Reverse the order of printing faults sched/numa: Use task faults only if numa_group is not yet set up sched/numa: Set preferred_node based on best_cpu sched/numa: Simplify load_too_imbalanced() sched/numa: Evaluate move once per node sched/numa: Remove redundant field sched/debug: Show the sum wait time of a task group sched/fair: Remove #ifdefs from scale_rt_capacity() sched/core: Remove get_cpu() from sched_fork() sched/cpufreq: Clarify sugov_get_util() sched/sysctl: Remove unused sched_time_avg_ms sysctl ...	2018-08-13 11:25:07 -07:00
Linus Torvalds	2406fb8d94	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "A single bugfix to prevent a pinned thread which queues stomp machine work to be preempted by the stopper thread on its CPU which causes a live lock as it is unable to wake the second CPUs stopper thread" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stop_machine: Atomically queue and wake stopper threads	2018-08-13 11:21:34 -07:00
Linus Torvalds	b99cdfdf0b	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Thomas Gleixner: "A large update to RCU: Preparatory work for consolidating the RCU flavors: - Introduce grace-period sequence numbers to the RCU-bh, RCU-preempt, and RCU-sched flavors, replacing the old ->gpnum and ->completed pair of fields. This change allows lockless code to obtain the complete grace-period state with a single READ_ONCE(), which is needed to maintain tolerable lock contention during the upcoming consolidation of the three RCU flavors. Note that grace-period sequence numbers are already used by rcu_barrier(), expedited RCU grace periods, and SRCU, and are thus already heavily used and well-tested. Joel Fernandes contributed a number of excellent fixes and improvements. - Clean up some grace-period-reporting loose ends, including improving the handling of quiescent states from offline CPUs and fixing some false-positive WARN_ON_ONCE() invocations. (Strictly speaking, the WARN_ON_ONCE() invocations were quite correct, but their invariants were (harmlessly) violated by the earlier sloppy handling of quiescent states from offline CPUs.) In addition, improve grace-period forward-progress guarantees so as to allow removal of fail-safe checks that required otherwise needless lock acquisitions. Finally, add more diagnostics to help debug the upcoming consolidation of the RCU-bh, RCU-preempt, and RCU-sched flavors. The rest: - SRCU updates - Updates to rcutorture and associated scripting. - The usual pile of miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits) rcutorture: Fix rcu_barrier successes counter rcutorture: Add support to detect if boost kthread prio is too low rcutorture: Use monotonic timestamp for stall detection rcutorture: Make boost test more robust rcutorture: Disable RT throttling for boost tests rcutorture: Emphasize testing of single reader protection type rcutorture: Handle extended read-side critical sections rcutorture: Make rcu_torture_timer() use rcu_torture_one_read() rcutorture: Use per-CPU random state for rcu_torture_timer() rcutorture: Use atomic increment for n_rcu_torture_timers rcutorture: Extract common code from rcu_torture_reader() rcuperf: Remove unused torturing_tasks() function rcu: Remove rcutorture test version and sequence number rcutorture: Change units of onoff_interval to jiffies rcu: Assign higher prio to RCU threads if rcutorture is built-in rculist: Improve documentation for list_for_each_entry_from_rcu() srcu: Add grace-period number to rcutorture statistics printout rcu: Print stall-warning NMI dyntick state in hexadecimal MAINTAINERS: Update RCU, SRCU, and TORTURE-TEST entries rcu: Make rcu_seq_diff() more exact ...	2018-08-13 10:49:41 -07:00
Linus Torvalds	d0daaeaf60	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull genirq updates from Thomas Gleixner: "The irq departement provides: - A synchronization fix for free_irq() to synchronize just the removed interrupt thread on shared interrupt lines. - Consolidate the multi low level interrupt entry handling and mvoe it to the generic code instead of adding yet another copy for RISC-V - Refactoring of the ARM LPI allocator and LPI exposure to the hypervisor - Yet another interrupt chip driver for the JZ4725B SoC - Speed up for /proc/interrupts as people seem to love reading this file with high frequency - Miscellaneous fixes and updates" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) irqchip/gic-v3-its: Make its_lock a raw_spin_lock_t genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete openrisc: Use the new GENERIC_IRQ_MULTI_HANDLER arm64: Use the new GENERIC_IRQ_MULTI_HANDLER ARM: Convert to GENERIC_IRQ_MULTI_HANDLER irqchip: Port the ARM IRQ drivers to GENERIC_IRQ_MULTI_HANDLER irqchip/gic-v3-its: Reduce minimum LPI allocation to 1 for PCI devices dt-bindings: irqchip: renesas-irqc: Document r8a77980 support dt-bindings: irqchip: renesas-irqc: Document r8a77470 support irqchip/ingenic: Add support for the JZ4725B SoC irqchip/stm32: Add exti0 translation for stm32mp1 genirq: Remove redundant NULL pointer check in __free_irq() irqchip/gic-v3-its: Honor hypervisor enforced LPI range irqchip/gic-v3: Expose GICD_TYPER in the rdist structure irqchip/gic-v3-its: Drop chunk allocation compatibility irqchip/gic-v3-its: Move minimum LPI requirements to individual busses irqchip/gic-v3-its: Use full range of LPIs irqchip/gic-v3-its: Refactor LPI allocator genirq: Synchronize only with single thread on free_irq() genirq: Update code comments wrt recycled thread_mask ...	2018-08-13 10:47:26 -07:00
David S. Miller	c1617fb4c5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-08-13 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Add driver XDP support for veth. This can be used in conjunction with redirect of another XDP program e.g. sitting on NIC so the xdp_frame can be forwarded to the peer veth directly without modification, from Toshiaki. 2) Add a new BPF map type REUSEPORT_SOCKARRAY and prog type SK_REUSEPORT in order to provide more control and visibility on where a SO_REUSEPORT sk should be located, and the latter enables to directly select a sk from the bpf map. This also enables map-in-map for application migration use cases, from Martin. 3) Add a new BPF helper bpf_skb_ancestor_cgroup_id() that returns the id of cgroup v2 that is the ancestor of the cgroup associated with the skb at the ancestor_level, from Andrey. 4) Implement BPF fs map pretty-print support based on BTF data for regular hash table and LRU map, from Yonghong. 5) Decouple the ability to attach BTF for a map from the key and value pretty-printer in BPF fs, and enable further support of BTF for maps for percpu and LPM trie, from Daniel. 6) Implement a better BPF sample of using XDP's CPU redirect feature for load balancing SKB processing to remote CPU. The sample implements the same XDP load balancing as Suricata does which is symmetric hash based on IP and L4 protocol, from Jesper. 7) Revert adding NULL pointer check with WARN_ON_ONCE() in __xdp_return()'s critical path as it is ensured that the allocator is present, from Björn. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-13 10:07:23 -07:00
Helge Deller	93cb8e20d5	parisc: Drop architecture-specific ENOTSUP define parisc is the only Linux architecture which has defined a value for ENOTSUP. All other architectures #define ENOTSUP as EOPNOTSUPP in their libc headers. Having an own value for ENOTSUP which is different than EOPNOTSUPP often gives problems with userspace programs which expect both to be the same. One such example is a build error in the libuv package, as can be seen in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900237. Since we dropped HP-UX support, there is no real benefit in keeping an own value for ENOTSUP. This patch drops the parisc value for ENOTSUP from the kernel sources. glibc needs no patch, it reuses the exported headers. Signed-off-by: Helge Deller <deller@gmx.de>	2018-08-13 09:30:41 +02:00
Daniel Borkmann	e8d2bec045	bpf: decouple btf from seq bpf fs dump and enable more maps Commit `a26ca7c982` ("bpf: btf: Add pretty print support to the basic arraymap") and `699c86d6ec` ("bpf: btf: add pretty print for hash/lru_hash maps") enabled support for BTF and dumping via BPF fs for array and hash/lru map. However, both can be decoupled from each other such that regular BPF maps can be supported for attaching BTF key/value information, while not all maps necessarily need to dump via map_seq_show_elem() callback. The basic sanity check which is a prerequisite for all maps is that key/value size has to match in any case, and some maps can have extra checks via map_check_btf() callback, e.g. probing certain types or indicating no support in general. With that we can also enable retrieving BTF info for per-cpu map types and lpm. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com>	2018-08-13 00:52:45 +02:00
Linus Torvalds	b5b1404d08	init: rename and re-order boot_cpu_state_init() This is purely a preparatory patch for upcoming changes during the 4.19 merge window. We have a function called "boot_cpu_state_init()" that isn't really about the bootup cpu state: that is done much earlier by the similarly named "boot_cpu_init()" (note lack of "state" in name). This function initializes some hotplug CPU state, and needs to run after the percpu data has been properly initialized. It even has a comment to that effect. Except it _doesn't_ actually run after the percpu data has been properly initialized. On x86 it happens to do that, but on at least arm and arm64, the percpu base pointers are initialized by the arch-specific 'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init(). This had some unexpected results, and in particular we have a patch pending for the merge window that did the obvious cleanup of using 'this_cpu_write()' in the cpu hotplug init code: - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); which is obviously the right thing to do. Except because of the ordering issue, it actually failed miserably and unexpectedly on arm64. So this just fixes the ordering, and changes the name of the function to be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu hotplug state, because the core CPU state was supposed to have already been done earlier. Marked for stable, since the (not yet merged) patch that will show this problem is marked for stable. Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-12 12:19:42 -07:00
David S. Miller	6a92ef08a1	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net	2018-08-11 17:52:00 -07:00
Jann Horn	42a0cc3478	sys: don't hold uts_sem while accessing userspace memory Holding uts_sem as a writer while accessing userspace memory allows a namespace admin to stall all processes that attempt to take uts_sem. Instead, move data through stack buffers and don't access userspace memory while uts_sem is held. Cc: stable@vger.kernel.org Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2018-08-11 02:05:53 -05:00
Jann Horn	5820f140ed	userns: move user access out of the mutex The old code would hold the userns_state_mutex indefinitely if memdup_user_nul stalled due to e.g. a userfault region. Prevent that by moving the memdup_user_nul in front of the mutex_lock(). Note: This changes the error precedence of invalid buf/count/*ppos vs map already written / capabilities missing. Fixes: `22d917d80e` ("userns: Rework the user_namespace adding uid/gid...") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Christian Brauner <christian@brauner.io> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2018-08-11 02:05:53 -05:00
Martin KaFai Lau	2dbb9b9e6d	bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT This patch adds a BPF_PROG_TYPE_SK_REUSEPORT which can select a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY. Like other non SK_FILTER/CGROUP_SKB program, it requires CAP_SYS_ADMIN. BPF_PROG_TYPE_SK_REUSEPORT introduces "struct sk_reuseport_kern" to store the bpf context instead of using the skb->cb[48]. At the SO_REUSEPORT sk lookup time, it is in the middle of transiting from a lower layer (ipv4/ipv6) to a upper layer (udp/tcp). At this point, it is not always clear where the bpf context can be appended in the skb->cb[48] to avoid saving-and-restoring cb[]. Even putting aside the difference between ipv4-vs-ipv6 and udp-vs-tcp. It is not clear if the lower layer is only ipv4 and ipv6 in the future and will it not touch the cb[] again before transiting to the upper layer. For example, in udp_gro_receive(), it uses the 48 byte NAPI_GRO_CB instead of IP[6]CB and it may still modify the cb[] after calling the udp[46]_lib_lookup_skb(). Because of the above reason, if sk->cb is used for the bpf ctx, saving-and-restoring is needed and likely the whole 48 bytes cb[] has to be saved and restored. Instead of saving, setting and restoring the cb[], this patch opts to create a new "struct sk_reuseport_kern" and setting the needed values in there. The new BPF_PROG_TYPE_SK_REUSEPORT and "struct sk_reuseport_(kern\|md)" will serve all ipv4/ipv6 + udp/tcp combinations. There is no protocol specific usage at this point and it is also inline with the current sock_reuseport.c implementation (i.e. no protocol specific requirement). In "struct sk_reuseport_md", this patch exposes data/data_end/len with semantic similar to other existing usages. Together with "bpf_skb_load_bytes()" and "bpf_skb_load_bytes_relative()", the bpf prog can peek anywhere in the skb. The "bind_inany" tells the bpf prog that the reuseport group is bind-ed to a local INANY address which cannot be learned from skb. The new "bind_inany" is added to "struct sock_reuseport" which will be used when running the new "BPF_PROG_TYPE_SK_REUSEPORT" bpf prog in order to avoid repeating the "bind INANY" test on "sk_v6_rcv_saddr/sk->sk_rcv_saddr" every time a bpf prog is run. It can only be properly initialized when a "sk->sk_reuseport" enabled sk is adding to a hashtable (i.e. during "reuseport_alloc()" and "reuseport_add_sock()"). The new "sk_select_reuseport()" is the main helper that the bpf prog will use to select a SO_REUSEPORT sk. It is the only function that can use the new BPF_MAP_TYPE_REUSEPORT_ARRAY. As mentioned in the earlier patch, the validity of a selected sk is checked in run time in "sk_select_reuseport()". Doing the check in verification time is difficult and inflexible (consider the map-in-map use case). The runtime check is to compare the selected sk's reuseport_id with the reuseport_id that we want. This helper will return -EXXX if the selected sk cannot serve the incoming request (e.g. reuseport_id not match). The bpf prog can decide if it wants to do SK_DROP as its discretion. When the bpf prog returns SK_PASS, the kernel will check if a valid sk has been selected (i.e. "reuse_kern->selected_sk != NULL"). If it does , it will use the selected sk. If not, the kernel will select one from "reuse->socks[]" (as before this patch). The SK_DROP and SK_PASS handling logic will be in the next patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-11 01:58:46 +02:00
Martin KaFai Lau	5dc4c4b7d4	bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY This patch introduces a new map type BPF_MAP_TYPE_REUSEPORT_SOCKARRAY. To unleash the full potential of a bpf prog, it is essential for the userspace to be capable of directly setting up a bpf map which can then be consumed by the bpf prog to make decision. In this case, decide which SO_REUSEPORT sk to serve the incoming request. By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control and visibility on where a SO_REUSEPORT sk should be located in a bpf map. The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that the bpf prog can directly select a sk from the bpf map. That will raise the programmability of the bpf prog attached to a reuseport group (a group of sk serving the same IP:PORT). For example, in UDP, the bpf prog can peek into the payload (e.g. through the "data" pointer introduced in the later patch) to learn the application level's connection information and then decide which sk to pick from a bpf map. The userspace can tightly couple the sk's location in a bpf map with the application logic in generating the UDP payload's connection information. This connection info contact/API stays within the userspace. Also, when used with map-in-map, the userspace can switch the old-server-process's inner map to a new-server-process's inner map in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)". The bpf prog will then direct incoming requests to the new process instead of the old process. The old process can finish draining the pending requests (e.g. by "accept()") before closing the old-fds. [Note that deleting a fd from a bpf map does not necessary mean the fd is closed] During map_update_elem(), Only SO_REUSEPORT sk (i.e. which has already been added to a reuse->socks[]) can be used. That means a SO_REUSEPORT sk that is "bind()" for UDP or "bind()+listen()" for TCP. These conditions are ensured in "reuseport_array_update_check()". A SO_REUSEPORT sk can only be added once to a map (i.e. the same sk cannot be added twice even to the same map). SO_REUSEPORT already allows another sk to be created for the same IP:PORT. There is no need to re-create a similar usage in the BPF side. When a SO_REUSEPORT is deleted from the "reuse->socks[]" (e.g. "close()"), it will notify the bpf map to remove it from the map also. It is done through "bpf_sk_reuseport_detach()" and it will only be called if >=1 of the "reuse->sock[]" has ever been added to a bpf map. The map_update()/map_delete() has to be in-sync with the "reuse->socks[]". Hence, the same "reuseport_lock" used by "reuse->socks[]" has to be used here also. Care has been taken to ensure the lock is only acquired when the adding sk passes some strict tests. and freeing the map does not require the reuseport_lock. The reuseport_array will also support lookup from the syscall side. It will return a sock_gen_cookie(). The sock_gen_cookie() is on-demand (i.e. a sk's cookie is not generated until the very first map_lookup_elem()). The lookup cookie is 64bits but it goes against the logical userspace expectation on 32bits sizeof(fd) (and as other fd based bpf maps do also). It may catch user in surprise if we enforce value_size=8 while userspace still pass a 32bits fd during update. Supporting different value_size between lookup and update seems unintuitive also. We also need to consider what if other existing fd based maps want to return 64bits value from syscall's lookup in the future. Hence, reuseport_array supports both value_size 4 and 8, and assuming user will usually use value_size=4. The syscall's lookup will return ENOSPC on value_size=4. It will will only return 64bits value from sock_gen_cookie() when user consciously choose value_size=8 (as a signal that lookup is desired) which then requires a 64bits value in both lookup and update. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-11 01:58:46 +02:00
Steven Rostedt (VMware)	f8a79d5c7e	tracepoints: Free early tracepoints after RCU is initialized When enabling trace events via the kernel command line, I hit this warning: WARNING: CPU: 0 PID: 13 at kernel/rcu/srcutree.c:236 check_init_srcu_struct+0xe/0x61 Modules linked in: CPU: 0 PID: 13 Comm: watchdog/0 Not tainted 4.18.0-rc6-test+ #6 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 RIP: 0010:check_init_srcu_struct+0xe/0x61 Code: 48 c7 c6 ec 8a 65 b4 e8 ff 79 fe ff 48 89 df 31 f6 e8 f2 fa ff ff 5a 5b 41 5c 5d c3 0f 1f 44 00 00 83 3d 68 94 b8 01 01 75 02 <0f> 0b 48 8b 87 f0 0a 00 00 a8 03 74 45 55 48 89 e5 41 55 41 54 4c RSP: 0000:ffff96eb9ea03e68 EFLAGS: 00010246 RAX: ffff96eb962b5b01 RBX: ffffffffb4a87420 RCX: 0000000000000001 RDX: ffffffffb3107969 RSI: ffff96eb962b5b40 RDI: ffffffffb4a87420 RBP: ffff96eb9ea03eb0 R08: ffffabbd00cd7f48 R09: 0000000000000000 R10: ffff96eb9ea03e68 R11: ffffffffb4a6eec0 R12: ffff96eb962b5b40 R13: ffff96eb9ea03ef8 R14: ffffffffb3107969 R15: ffffffffb3107948 FS: 0000000000000000(0000) GS:ffff96eb9ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff96eb13ab2000 CR3: 0000000192a1e001 CR4: 00000000001606f0 Call Trace: <IRQ> ? __call_srcu+0x2d/0x290 ? rcu_process_callbacks+0x26e/0x448 ? allocate_probes+0x2b/0x2b call_srcu+0x13/0x15 rcu_free_old_probes+0x1f/0x21 rcu_process_callbacks+0x2ed/0x448 __do_softirq+0x172/0x336 irq_exit+0x62/0xb2 smp_apic_timer_interrupt+0x161/0x19e apic_timer_interrupt+0xf/0x20 </IRQ> The problem is that the enabling of trace events before RCU is set up will cause SRCU to give this warning. To avoid this, add a list to store probes that need to be freed till after RCU is initialized, and then free them then. Link: http://lkml.kernel.org/r/20180810113554.1df28050@gandalf.local.home Link: http://lkml.kernel.org/r/20180810123517.5e9714ad@gandalf.local.home Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Fixes: `e6753f23d9` ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-10 15:32:53 -04:00
Steven Rostedt (VMware)	016f8ffc48	uprobes: Use synchronize_rcu() not synchronize_sched() While debugging another bug, I was looking at all the synchronize() functions being used in kernel/trace, and noticed that trace_uprobes was using synchronize_sched(), with a comment to synchronize with {u,ret}_probe_trace_func(). When looking at those functions, the data is protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This is using the wrong synchronize_() function. Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home Cc: stable@vger.kernel.org Fixes: `70ed91c6ec` ("tracing/uprobes: Support ftrace_event_file base multibuffer") Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-10 15:32:28 -04:00
Steven Rostedt (VMware)	e0a568dcd1	tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister() Now that some trace events can be protected by srcu_read_lock(tracepoint_srcu), we need to make sure all locations that depend on this are also protected. There were many places that did a synchronize_sched() thinking that it was enough to protect againts access to trace events. This use to be the case, but now that we use SRCU for _rcuidle() trace events, they may not be protected by synchronize_sched(), as they may be called in paths that RCU is not watching for preempt disable. Fixes: `e6753f23d9` ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-10 15:12:01 -04:00
Colin Ian King	b207de3ec5	ftrace: Remove unused pointer ftrace_swapper_pid Pointer ftrace_swapper_pid is defined but is never used hence it is redundant and can be removed. The use of this variable was removed in commit `345ddcc882` ("ftrace: Have set_ftrace_pid use the bitmap like events do"). Cleans up clang warning: warning: 'ftrace_swapper_pid' defined but not used [-Wunused-const-variable=] Link: http://lkml.kernel.org/r/20180809125609.13142-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-10 15:12:01 -04:00
Steven Rostedt (VMware)	3f1756dc21	tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage" Joel Fernandes created a nice patch that cleaned up the duplicate hooks used by lockdep and irqsoff latency tracer. It made both use tracepoints. But the latency tracer is triggering warnings when using tracepoints to call into the latency tracer's routines. Mainly, they can be called from NMI context. If that happens, then the SRCU may not work properly because on some architectures, SRCU is not safe to be called in both NMI and non-NMI context. This is a partial revert of the clean up patch `c3bc8fd637` ("tracing: Centralize preemptirq tracepoints and unify their usage") that adds back the direct calls into the latency tracer. It also only calls the trace events when not in NMI. Link: http://lkml.kernel.org/r/20180809210654.622445925@goodmis.org Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Fixes: `c3bc8fd637` ("tracing: Centralize preemptirq tracepoints and unify their usage") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-10 15:12:00 -04:00
Steven Rostedt (VMware)	f27107fa20	tracing/irqsoff: Handle preempt_count for different configs I was hitting the following warning: WARNING: CPU: 0 PID: 1 at kernel/trace/trace_irqsoff.c:631 tracer_hardirqs_off+0x15/0x2a Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc6-test+ #13 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 EIP: tracer_hardirqs_off+0x15/0x2a Code: ff 85 c0 74 0e 8b 45 00 8b 50 04 8b 45 04 e8 35 ff ff ff 5d c3 55 64 a1 cc 37 51 c1 a9 ff ff ff 7f 89 e5 53 89 d3 89 ca 75 02 <0f> 0b e8 90 fc ff ff 85 c0 74 07 89 d8 e8 0c ff ff ff 5b 5d c3 55 EAX: 80000000 EBX: c04337f0 ECX: c04338e3 EDX: c04338e3 ESI: c04337f0 EDI: c04338e3 EBP: f2aa1d68 ESP: f2aa1d64 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210046 CR0: 80050033 CR2: 00000000 CR3: 01668000 CR4: 001406f0 Call Trace: trace_irq_disable_rcuidle+0x63/0x6c trace_hardirqs_off+0x26/0x30 default_send_IPI_mask_allbutself_logical+0x31/0x93 default_send_IPI_allbutself+0x37/0x48 native_send_call_func_ipi+0x4d/0x6a smp_call_function_many+0x165/0x19d ? add_nops+0x34/0x34 ? trace_hardirqs_on_caller+0x2d/0x2d ? add_nops+0x34/0x34 smp_call_function+0x1f/0x23 on_each_cpu+0x15/0x43 ? trace_hardirqs_on_caller+0x2d/0x2d ? trace_hardirqs_on_caller+0x2d/0x2d ? trace_irq_disable_rcuidle+0x1/0x6c text_poke_bp+0xa0/0xc2 ? trace_hardirqs_on_caller+0x2d/0x2d arch_jump_label_transform+0xa7/0xcb ? trace_irq_disable_rcuidle+0x5/0x6c __jump_label_update+0x3e/0x6d jump_label_update+0x7d/0x81 static_key_slow_inc_cpuslocked+0x58/0x6d static_key_slow_inc+0x19/0x20 tracepoint_probe_register_prio+0x19e/0x1d1 ? start_critical_timings+0x1c/0x1c tracepoint_probe_register+0xf/0x11 irqsoff_tracer_init+0x21/0xf2 tracer_init+0x16/0x1a trace_selftest_startup_irqsoff+0x25/0xc4 run_tracer_selftest+0xca/0x131 register_tracer+0xd5/0x172 ? trace_event_define_fields_preemptirq_template+0x45/0x45 init_irqsoff_tracer+0xd/0x11 do_one_initcall+0xab/0x1e8 ? rcu_read_lock_sched_held+0x3d/0x44 ? trace_initcall_level+0x52/0x86 kernel_init_freeable+0x195/0x21a ? rest_init+0xb4/0xb4 kernel_init+0xd/0xe4 ret_from_fork+0x2e/0x38 It is due to running a CONFIG_PREEMPT_VOLUNTARY kernel, which would trigger this warning every time: WARN_ON_ONCE(preempt_count()); Because on CONFIG_PREEMPT_VOLUNTARY, preempt_count() is always zero. This warning is to make sure preempt_count is set because tracer_hardirqs_on() does a preempt_enable_notrace() to make the preempt_trace() work properly, as being called by a trace event, the trace event code disables preemption, and the tracer wants to know what the preemption was before it was called. Instead of enabling preemption like this, just record the preempt_count, subtract PREEMPT_DISABLE_OFFSET from it (which is zero with !CONFIG_PREEMPT set), and pass that value to the necessary functions, which should use the passed in parameter instead of calling preempt_count() directly. Fixes: `da5b3ebb45` ("tracing: irqsoff: Account for additional preempt_disable") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-10 15:12:00 -04:00
Steven Rostedt (VMware)	bff1b208a5	tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage" Joel Fernandes created a nice patch that cleaned up the duplicate hooks used by lockdep and irqsoff latency tracer. It made both use tracepoints. But it caused lockdep to trigger several false positives. We have not figured out why yet, but removing lockdep from using the trace event hooks and just call its helper functions directly (like it use to), makes the problem go away. This is a partial revert of the clean up patch `c3bc8fd637` ("tracing: Centralize preemptirq tracepoints and unify their usage") that adds direct calls for lockdep, but also keeps most of the clean up done to get rid of the horrible preprocessor if statements. Link: http://lkml.kernel.org/r/20180806155058.5ee875f4@gandalf.local.home Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Fixes: `c3bc8fd637` ("tracing: Centralize preemptirq tracepoints and unify their usage") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-10 15:11:25 -04:00
Yonghong Song	699c86d6ec	bpf: btf: add pretty print for hash/lru_hash maps Commit `a26ca7c982` ("bpf: btf: Add pretty print support to the basic arraymap") added pretty print support to array map. This patch adds pretty print for hash and lru_hash maps. The following example shows the pretty-print result of a pinned hashmap: struct map_value { int count_a; int count_b; }; cat /sys/fs/bpf/pinned_hash_map: 87907: {87907,87908} 57354: {37354,57355} 76625: {76625,76626} ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-10 20:54:07 +02:00
Yonghong Song	dc1508a579	bpf: fix bpffs non-array map seq_show issue In function map_seq_next() of kernel/bpf/inode.c, the first key will be the "0" regardless of the map type. This works for array. But for hash type, if it happens key "0" is in the map, the bpffs map show will miss some items if the key "0" is not the first element of the first bucket. This patch fixed the issue by guaranteeing to get the first element, if the seq_show is just started, by passing NULL pointer key to map_get_next_key() callback. This way, no missing elements will occur for bpffs hash table show even if key "0" is in the map. Fixes: `a26ca7c982` ("bpf: btf: Add pretty print support to the basic arraymap") Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-10 20:54:07 +02:00
Jesper Dangaard Brouer	1bf9116d08	xdp: fix bug in devmap teardown code path Like cpumap teardown, the devmap teardown code also flush remaining xdp_frames, via bq_xmit_all() in case map entry is removed. The code can call xdp_return_frame_rx_napi, from the the wrong context, in-case ndo_xdp_xmit() fails. Fixes: `389ab7f01a` ("xdp: introduce xdp_return_frame_rx_napi") Fixes: `735fc4054b` ("xdp: change ndo_xdp_xmit API to support bulking") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-09 21:50:44 +02:00
Jesper Dangaard Brouer	ad0ab027fc	xdp: fix bug in cpumap teardown code path When removing a cpumap entry, a number of syncronization steps happen. Eventually the teardown code __cpu_map_entry_free is invoked from/via call_rcu. The teardown code __cpu_map_entry_free() flushes remaining xdp_frames, by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi(). The issues is that the teardown code is not running in the RX NAPI code path. Thus, it is not allowed to invoke the NAPI variant of xdp_return_frame. This bug was found and triggered by using the --stress-mode option to the samples/bpf program xdp_redirect_cpu. It is hard to trigger, because the ptr_ring have to be full and cpumap bulk queue max contains 8 packets, and a remote CPU is racing to empty the ptr_ring queue. Fixes: `389ab7f01a` ("xdp: introduce xdp_return_frame_rx_napi") Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-09 21:50:44 +02:00
Eric W. Biederman	c3ad2c3b02	signal: Don't restart fork when signals come in. Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn> report that a periodic signal received during fork can cause fork to continually restart preventing an application from making progress. The code was being overly pessimistic. Fork needs to guarantee that a signal sent to multiple processes is logically delivered before the fork and just to the forking process or logically delivered after the fork to both the forking process and it's newly spawned child. For signals like periodic timers that are always delivered to a single process fork can safely complete and let them appear to logically delivered after the fork(). While examining this issue I also discovered that fork today will miss signals delivered to multiple processes during the fork and handled by another thread. Similarly the current code will also miss blocked signals that are delivered to multiple process, as those signals will not appear pending during fork. Add a list of each thread that is currently forking, and keep on that list a signal set that records all of the signals sent to multiple processes. When fork completes initialize the new processes shared_pending signal set with it. The calculate_sigpending function will see those signals and set TIF_SIGPENDING causing the new task to take the slow path to userspace to handle those signals. Making it appear as if those signals were received immediately after the fork. It is not possible to send real time signals to multiple processes and exceptions don't go to multiple processes, which means that that are no signals sent to multiple processes that require siginfo. This means it is safe to not bother collecting siginfo on signals sent during fork. The sigaction of a child of fork is initially the same as the sigaction of the parent process. So a signal the parent ignores the child will also initially ignore. Therefore it is safe to ignore signals sent to multiple processes and ignored by the forking process. Signals sent to only a single process or only a single thread and delivered during fork are treated as if they are received after the fork, and generally not dealt with. They won't cause any problems. V2: Added removal from the multiprocess list on failure. V3: Use -ERESTARTNOINTR directly V4: - Don't queue both SIGCONT and SIGSTOP - Initialize signal_struct.multiprocess in init_task - Move setting of shared_pending to before the new task is visible to signals. This prevents signals from comming in before shared_pending.signal is set to delayed.signal and being lost. V5: - rework list add and delete to account for idle threads v6: - Use sigdelsetmask when removing stop signals Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447 Reported-by: Wen Yang <wen.yang99@zte.com.cn> and Reported-by: majiang <ma.jiang@zte.com.cn> Fixes: `4a2c7a7837` ("[PATCH] make fork() atomic wrt pgrp/session signals") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-08-09 13:07:01 -05:00
Daniel Borkmann	7c81c71730	bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path In bpf_tcp_sendmsg() the sk_alloc_sg() may fail. In the case of ENOMEM, it may also mean that we've partially filled the scatterlist entries with pages. Later jumping to sk_stream_wait_memory() we could further fail with an error for several reasons, however we miss to call free_start_sg() if the local sk_msg_buff was used. Fixes: `4f738adba3` ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-08 12:06:17 -07:00
Daniel Borkmann	5121700b34	bpf, sockmap: fix bpf_tcp_sendmsg sock error handling While working on bpf_tcp_sendmsg() code, I noticed that when a sk->sk_err is set we error out with err = sk->sk_err. However this is problematic since sk->sk_err is a positive error value and therefore we will neither go into sk_stream_error() nor will we report an error back to user space. I had this case with EPIPE and user space was thinking sendmsg() succeeded since EPIPE is a positive value, thinking we submitted 32 bytes. Fix it by negating the sk->sk_err value. Fixes: `4f738adba3` ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-08 12:06:17 -07:00
David S. Miller	1ba982806c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-08-07 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Add cgroup local storage for BPF programs, which provides a fast accessible memory for storing various per-cgroup data like number of transmitted packets, etc, from Roman. 2) Support bpf_get_socket_cookie() BPF helper in several more program types that have a full socket available, from Andrey. 3) Significantly improve the performance of perf events which are reported from BPF offload. Also convert a couple of BPF AF_XDP samples overto use libbpf, both from Jakub. 4) seg6local LWT provides the End.DT6 action, which allows to decapsulate an outer IPv6 header containing a Segment Routing Header. Adds this action now to the seg6local BPF interface, from Mathieu. 5) Do not mark dst register as unbounded in MOV64 instruction when both src and dst register are the same, from Arthur. 6) Define u_smp_rmb() and u_smp_wmb() to their respective barrier instructions on arm64 for the AF_XDP sample code, from Brian. 7) Convert the tcp_client.py and tcp_server.py BPF selftest scripts over from Python 2 to Python 3, from Jeremy. 8) Enable BTF build flags to the BPF sample code Makefile, from Taeung. 9) Remove an unnecessary rcu_read_lock() in run_lwt_bpf(), from Taehee. 10) Several improvements to the README.rst from the BPF documentation to make it more consistent with RST format, from Tobin. 11) Replace all occurrences of strerror() by calls to strerror_r() in libbpf and fix a FORTIFY_SOURCE build error along with it, from Thomas. 12) Fix a bug in bpftool's get_btf() function to correctly propagate an error via PTR_ERR(), from Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-07 11:02:05 -07:00
Roman Gushchin	85fc4b16aa	bpf: introduce update_effective_progs() __cgroup_bpf_attach() and __cgroup_bpf_detach() functions have a good amount of duplicated code, which is possible to eliminate by introducing the update_effective_progs() helper function. The update_effective_progs() calls compute_effective_progs() and then in case of success it calls activate_effective_progs() for each descendant cgroup. In case of failure (OOM), it releases allocated prog arrays and return the error code. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-07 14:29:55 +02:00
Thomas Gleixner	bc2d8d262c	cpu/hotplug: Fix SMT supported evaluation Josh reported that the late SMT evaluation in cpu_smt_state_init() sets cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied on the kernel command line as it cannot differentiate between SMT disabled by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and makes the sysfs interface unusable. Rework this so that during bringup of the non boot CPUs the availability of SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a 'primary' thread then set the local cpu_smt_available marker and evaluate this explicitely right after the initial SMP bringup has finished. SMT evaulation on x86 is a trainwreck as the firmware has all the information _before_ booting the kernel, but there is no interface to query it. Fixes: `73d5e2b472` ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2018-08-07 12:25:30 +02:00
Joel Fernandes (Google)	da5b3ebb45	tracing: irqsoff: Account for additional preempt_disable Recently we tried to make the preemptirqsoff tracer to use irqsoff tracepoint probes. However this causes issues as reported by Masami: [2.271078] Testing tracer preemptirqsoff: .. no entries found ..FAILED! [2.381015] WARNING: CPU: 0 PID: 1 at /home/mhiramat/ksrc/linux/kernel/ trace/trace.c:1512 run_tracer_selftest+0xf3/0x154 This is due to the tracepoint code increasing the preempt nesting count by calling an additional preempt_disable before calling into the preemptoff tracer which messes up the preempt_count() check in tracer_hardirqs_off. To fix this, make the irqsoff tracer probes balance the additional outer preempt_disable with a preempt_enable_notrace. The other way to fix this is to just use SRCU for all tracepoints. However we can't do that because we can't use NMIs from RCU context. Link: http://lkml.kernel.org/r/20180806034049.67949-1-joel@joelfernandes.org Fixes: `c3bc8fd637` ("tracing: Centralize preemptirq tracepoints and unify their usage") Fixes: `e6753f23d9` ("tracepoint: Make rcuidle tracepoint callers use SRCU") Reported-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-06 21:55:24 -04:00
Thomas Gleixner	9e90c79852	irqchip updates for 4.19 - GICv3 ITS LPI allocation revamp - GICv3 support for hypervisor-enforced LPI range - GICv3 ITS conversion to raw spinlock -----BEGIN PGP SIGNATURE----- iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltoBXMVHG1hcmMuenlu Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDyUYP/1feAq3F7ZmhCIZka4c6y/m4EBpq BjWEEgOAGMEyyB4s98flsRtZcEUxxp6CqEXo2FgCsd1Nj+og7oA7vwOlqy3aGzsi 9f/Z5Wi6SlG06lH5tmYNkyVbGk2tE3s2FzkH5Rg8qZGk+X3OCOdNs/+G20pYAkSp ESePWSapbQUJSExJ1MqzfdHFidtVA1V+ev8BKdIp2ykl1NRae8LJeKHIbqac49Ym JclfCLFpQM1M1ElB9j0E8hAvZhz10oOz7TtBR737O/1QEifVyFqGBckPzldvwIJM zZ+nR+Yzj1ruD109xwaF1iKy9AinZWhiqrtN7UXJ3jwHtNih+sy0R6FQ38GMNoOC 0K02n/qStR5xglGr4BmAcWlOuFtBYWfz6HpSVMqaTWWmOxHEiqS6pXtEA+dV/YyI wHLbo0YzpWTQm6t1+b/PoByAJ0/hOcD1nOD57b+NGjX7tZV0sGjpGsecvFhTSywh BN3COBi9k/FOBrOTGDX1qUAI+mEf76vc2BAC+BkkoiiMg3WlY0E9qfQJguUxHdrb 0LS3lDZoHCNoz8RZLrUyenTT0NYGcjPGUTinMDJWG79VGXOWFexTDdCuX0kF90CK 1Zie3O6lrTYolmaiyLUxwukKp1SVUyoA5IpKVwfDJQYUhEfk27yvlzg2MBMcHDRA uy3QSkmjx9vw/sAu =gKw8 -----END PGP SIGNATURE----- Merge tag 'irqchip-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - GICv3 ITS LPI allocation revamp - GICv3 support for hypervisor-enforced LPI range - GICv3 ITS conversion to raw spinlock	2018-08-06 12:45:42 +02:00
Pingfan Liu	55f2503c3b	PM / reboot: Eliminate race between reboot and suspend At present, "systemctl suspend" and "shutdown" can run in parrallel. A system can suspend after devices_shutdown(), and resume. Then the shutdown task goes on to power off. This causes many devices are not really shut off. Hence replacing reboot_mutex with system_transition_mutex (renamed from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as system_transition_mutex can be better to reflect the purpose of the mutex. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-08-06 12:35:20 +02:00
Gustavo A. R. Silva	82837ad5bd	PM / hibernate: Mark expected switch fall-through In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This addresses Coverity-ID: 114713 ("Missing break in switch"). Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-08-06 10:32:32 +02:00
Rafael J. Wysocki	6ccbe1dcdd	Merge back cpufreq changes for 4.19.	2018-08-06 10:09:52 +02:00
Jens Axboe	05b9ba4b55	Linux 4.18-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76 sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA 1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR 9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2 L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt vJJlibI= =42a5 -----END PGP SIGNATURE----- Merge tag 'v4.18-rc6' into for-4.19/block2 Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <axboe@kernel.dk>	2018-08-05 19:32:09 -06:00
David S. Miller	c1c8626fce	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net Lots of overlapping changes, mostly trivial in nature. The mlxsw conflict was resolving using the example resolution at: https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-05 13:04:31 -07:00
Prasad Sodagudi	cfd355145c	stop_machine: Atomically queue and wake stopper threads When cpu_stop_queue_work() releases the lock for the stopper thread that was queued into its wake queue, preemption is enabled, which leads to the following deadlock: CPU0 CPU1 sched_setaffinity(0, ...) __set_cpus_allowed_ptr() stop_one_cpu(0, ...) stop_two_cpus(0, 1, ...) cpu_stop_queue_work(0, ...) cpu_stop_queue_two_works(0, ..., 1, ...) -grabs lock for migration/0- -spins with preemption disabled, waiting for migration/0's lock to be released- -adds work items for migration/0 and queues migration/0 to its wake_q- -releases lock for migration/0 and preemption is enabled- -current thread is preempted, and __set_cpus_allowed_ptr has changed the thread's cpu allowed mask to CPU1 only- -acquires migration/0 and migration/1's locks- -adds work for migration/0 but does not add migration/0 to wake_q, since it is already in a wake_q- -adds work for migration/1 and adds migration/1 to its wake_q- -releases migration/0 and migration/1's locks, wakes migration/1, and enables preemption- -since migration/1 is requested to run, migration/1 begins to run and waits on migration/0, but migration/0 will never be able to run, since the thread that can wake it is affine to CPU1- Disable preemption in cpu_stop_queue_work() before queueing works for stopper threads, and queueing the stopper thread in the wake queue, to ensure that the operation of queueing the works and waking the stopper threads is atomic. Fixes: `0b26351b91` ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock") Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: matt@codeblueprint.co.uk Cc: bigeasy@linutronix.de Cc: gregkh@linuxfoundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org Co-Developed-by: Isaac J. Manjarres <isaacm@codeaurora.org>	2018-08-05 21:58:31 +02:00
Linus Torvalds	2f3672cbf9	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two oneliners addressing NOHZ failures: - Use a bitmask to check for the pending timer softirq and not the bit number. The existing code using the bit number checked for the wrong bit, which caused timers to either expire late or stop completely. - Make the nohz evaluation on interrupt exit more robust. The existing code did not re-arm the hardware when interrupting a running softirq in task context (ksoftirqd or tail of local_bh_enable()), which caused timers to either expire late or stop completely" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: Fix missing tick reprogram when interrupting an inline softirq nohz: Fix local_timer_softirq_pending()	2018-08-05 09:25:29 -07:00
Thomas Gleixner	f2701b77bb	Merge 4.18-rc7 into master to pick up the KVM dependcy Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2018-08-05 16:39:29 +02:00
Eric W. Biederman	924de3b8c9	fork: Have new threads join on-going signal group stops There are only two signals that are delivered to every member of a signal group: SIGSTOP and SIGKILL. Signal delivery requires every signal appear to be delivered either before or after a clone syscall. SIGKILL terminates the clone so does not need to be considered. Which leaves only SIGSTOP that needs to be considered when creating new threads. Today in the event of a group stop TIF_SIGPENDING will get set and the fork will restart ensuring the fork syscall participates in the group stop. A fork (especially of a process with a lot of memory) is one of the most expensive system so we really only want to restart a fork when necessary. It is easy so check to see if a SIGSTOP is ongoing and have the new thread join it immediate after the clone completes. Making it appear the clone completed happened just before the SIGSTOP. The calculate_sigpending function will see the bits set in jobctl and set TIF_SIGPENDING to ensure the new task takes the slow path to userspace. V2: The call to task_join_group_stop was moved before the new task is added to the thread group list. This should not matter as sighand->siglock is held over both the addition of the threads, the call to task_join_group_stop and do_signal_stop. But the change is trivial and it is one less thing to worry about when reading the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-08-03 20:20:14 -05:00
Eric W. Biederman	088fe47ce9	signal: Add calculate_sigpending() Add a function calculate_sigpending to test to see if any signals are pending for a new task immediately following fork. Signals have to happen either before or after fork. Today our practice is to push all of the signals to before the fork, but that has the downside that frequent or periodic signals can make fork take much much longer than normal or prevent fork from completing entirely. So we need move signals that we can after the fork to prevent that. This updates the code to set TIF_SIGPENDING on a new task if there are signals or other activities that have moved so that they appear to happen after the fork. As the code today restarts if it sees any such activity this won't immediately have an effect, as there will be no reason for it to set TIF_SIGPENDING immediately after the fork. Adding calculate_sigpending means the code in fork can safely be changed to not always restart if a signal is pending. The new calculate_sigpending function sets sigpending if there are pending bits in jobctl, pending signals, the freezer needs to freeze the new task or the live kernel patching framework need the new thread to take the slow path to userspace. I have verified that setting TIF_SIGPENDING does make a new process take the slow path to userspace before it executes it's first userspace instruction. I have looked at the callers of signal_wake_up and the code paths setting TIF_SIGPENDING and I don't see anything else that needs to be handled. The code probably doesn't need to set TIF_SIGPENDING for the kernel live patching as it uses a separate thread flag as well. But at this point it seems safer reuse the recalc_sigpending logic and get the kernel live patching folks to sort out their story later. V2: I have moved the test into schedule_tail where siglock can be grabbed and recalc_sigpending can be reused directly. Further as the last action of setting up a new task this guarantees that TIF_SIGPENDING will be properly set in the new process. The helper calculate_sigpending takes the siglock and uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending clear TIF_SIGPENDING if it is unnecessary. This allows reusing the existing code and keeps maintenance of the conditions simple. Oleg Nesterov <oleg@redhat.com> suggested the movement and pointed out the need to take siglock if this code was going to be called while the new task is discoverable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-08-03 20:10:31 -05:00
Frederic Weisbecker	0a0e0829f9	nohz: Fix missing tick reprogram when interrupting an inline softirq The full nohz tick is reprogrammed in irq_exit() only if the exit is not in a nesting interrupt. This stands as an optimization: whether a hardirq or a softirq is interrupted, the tick is going to be reprogrammed when necessary at the end of the inner interrupt, with even potential new updates on the timer queue. When soft interrupts are interrupted, it's assumed that they are executing on the tail of an interrupt return. In that case tick_nohz_irq_exit() is called after softirq processing to take care of the tick reprogramming. But the assumption is wrong: softirqs can be processed inline as well, ie: outside of an interrupt, like in a call to local_bh_enable() or from ksoftirqd. Inline softirqs don't reprogram the tick once they are done, as opposed to interrupt tail softirq processing. So if a tick interrupts an inline softirq processing, the next timer will neither be reprogrammed from the interrupting tick's irq_exit() nor after the interrupted softirq processing. This situation may leave the tick unprogrammed while timers are armed. To fix this, simply keep reprogramming the tick even if a softirq has been interrupted. That can be optimized further, but for now correctness is more important. Note that new timers enqueued in nohz_full mode after a softirq gets interrupted will still be handled just fine through self-IPIs triggered by the timer code. Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: stable@vger.kernel.org # 4.14+ Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.org	2018-08-03 15:52:10 +02:00
Joel Fernandes (Google)	da25a672cf	trace: Use rcu_dereference_raw for hooks from trace-event subsystem Since we switched to using SRCU for tracepoints used in the idle path, we can no longer use rcu_dereference_sched for dereferencing points in trace-event hooks. Since tracepoints can now use either SRCU or sched-RCU, just use rcu_dereference_raw for traceevents just like we're doing when dereferencing the tracepoint table. This prevents an RCU warning reported by Masami: [ 282.060593] WARNING: can't dereference registers at 00000000f3c7f62b [ 282.063200] ============================= [ 282.064082] WARNING: suspicious RCU usage [ 282.064963] 4.18.0-rc6+ #15 Tainted: G W [ 282.066048] ----------------------------- [ 282.066923] /home/mhiramat/ksrc/linux/kernel/trace/trace_events.c:242 suspicious rcu_dereference_check() usage! [ 282.068974] [ 282.068974] other info that might help us debug this: [ 282.068974] [ 282.070770] [ 282.070770] RCU used illegally from idle CPU! [ 282.070770] rcu_scheduler_active = 2, debug_locks = 1 [ 282.072938] RCU used illegally from extended quiescent state! [ 282.074183] no locks held by swapper/0/0. [ 282.075071] [ 282.075071] stack backtrace: [ 282.076121] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W [ 282.077782] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) [ 282.079604] Call Trace: [ 282.080212] <IRQ> [ 282.080755] dump_stack+0x85/0xcb [ 282.081523] trace_event_ignore_this_pid+0x66/0x70 [ 282.082541] trace_event_raw_event_preemptirq_template+0xa2/0xb0 [ 282.083774] ? interrupt_entry+0xc4/0xe0 [ 282.084665] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 282.085669] trace_hardirqs_off_caller+0x90/0xd0 [ 282.086597] trace_hardirqs_off_thunk+0x1a/0x1c [ 282.087433] ? call_function_interrupt+0xa/0x20 [ 282.088201] interrupt_entry+0xc4/0xe0 [ 282.088848] ? call_function_interrupt+0xa/0x20 [ 282.089579] </IRQ> [ 282.090029] ? native_safe_halt+0x2/0x10 [ 282.090695] ? default_idle+0x1f/0x160 [ 282.091330] ? default_idle_call+0x24/0x40 [ 282.091997] ? do_idle+0x210/0x250 [ 282.092658] ? cpu_startup_entry+0x6f/0x80 [ 282.093338] ? start_kernel+0x49d/0x4bd [ 282.093987] ? secondary_startup_64+0xa5/0xb0 Link: http://lkml.kernel.org/r/20180803023407.225852-1-joel@joelfernandes.org Reported-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: `e6753f23d9` ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-03 09:38:39 -04:00
Thomas Gleixner	d1f0301b33	genirq: Make force irq threading setup more robust The support of force threading interrupts which are set up with both a primary and a threaded handler wreckaged the setup of regular requested threaded interrupts (primary handler == NULL). The reason is that it does not check whether the primary handler is set to the default handler which wakes the handler thread. Instead it replaces the thread handler with the primary handler as it would do with force threaded interrupts which have been requested via request_irq(). So both the primary and the thread handler become the same which then triggers the warnon that the thread handler tries to wakeup a not configured secondary thread. Fortunately this only happens when the driver omits the IRQF_ONESHOT flag when requesting the threaded interrupt, which is normaly caught by the sanity checks when force irq threading is disabled. Fix it by skipping the force threading setup when a regular threaded interrupt is requested. As a consequence the interrupt request which lacks the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging it. Fixes: `2a1d3ab898` ("genirq: Handle force threading of irqs with primary and thread handler") Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de> Cc: stable@vger.kernel.org	2018-08-03 15:19:01 +02:00
Sinan Kaya	1b6266ebe3	watchdog: Reduce message verbosity Code is emitting the following error message during boot on systems without PMU hardware support while probing NMI capability. NMI watchdog: Perf event create on CPU 0 failed with -2 This error is emitted as the perf subsystem returns -ENOENT due to lack of PMUs in the system. It is followed by the warning that NMI watchdog is disabled: NMI watchdog: Perf NMI watchdog permanently disabled While NMI disabled information is useful for ordinary users, seeing a PERF event create failed with error code -2 is not. Reduce the message severity to debug so that if debugging is still possible in case the error code returned by perf is required for analysis. Signed-off-by: Sinan Kaya <okaya@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Colin Ian King <colin.king@canonical.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599368 Link: https://lkml.kernel.org/r/20180803060943.2643-1-okaya@kernel.org	2018-08-03 12:19:08 +02:00
Palmer Dabbelt	4f7799d96e	genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete Now that every user of MULTI_IRQ_HANDLER has been convereted over to use GENERIC_IRQ_MULTI_HANDLER remove the references to MULTI_IRQ_HANDLER. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux@armlinux.org.uk Cc: catalin.marinas@arm.com Cc: Will Deacon <will.deacon@arm.com> Cc: jonas@southpole.se Cc: stefan.kristiansson@saunalahti.fi Cc: shorne@gmail.com Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: Arnd Bergmann <arnd@arndb.de> Cc: nicolas.pitre@linaro.org Cc: vladimir.murzin@arm.com Cc: keescook@chromium.org Cc: jinb.park7@gmail.com Cc: yamada.masahiro@socionext.com Cc: alexandre.belloni@bootlin.com Cc: pombredanne@nexb.com Cc: Greg KH <gregkh@linuxfoundation.org> Cc: kstewart@linuxfoundation.org Cc: jhogan@kernel.org Cc: mark.rutland@arm.com Cc: ard.biesheuvel@linaro.org Cc: james.morse@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: openrisc@lists.librecores.org Link: https://lkml.kernel.org/r/20180622170126.6308-6-palmer@sifive.com	2018-08-03 12:14:10 +02:00
Roman Gushchin	cd33943176	bpf: introduce the bpf_get_local_storage() helper function The bpf_get_local_storage() helper function is used to get a pointer to the bpf local storage from a bpf program. It takes a pointer to a storage map and flags as arguments. Right now it accepts only cgroup storage maps, and flags argument has to be 0. Further it can be extended to support other types of local storage: e.g. thread local storage etc. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-03 00:47:32 +02:00
Roman Gushchin	7b5dd2bde7	bpf: don't allow create maps of cgroup local storages As there is one-to-one relation between a bpf program and cgroup local storage map, there is no sense in creating a map of cgroup local storage maps. Forbid it explicitly to avoid possible side effects. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-03 00:47:32 +02:00
Roman Gushchin	3e6a4b3e02	bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE BPF_MAP_TYPE_CGROUP_STORAGE maps are special in a way that the access from the bpf program side is lookup-free. That means the result is guaranteed to be a valid pointer to the cgroup storage; no NULL-check is required. This patch introduces BPF_PTR_TO_MAP_VALUE return type, which is required to cause the verifier accept programs, which are not checking the map value pointer for being NULL. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-03 00:47:32 +02:00
Roman Gushchin	394e40a297	bpf: extend bpf_prog_array to store pointers to the cgroup storage This patch converts bpf_prog_array from an array of prog pointers to the array of struct bpf_prog_array_item elements. This allows to save a cgroup storage pointer for each bpf program efficiently attached to a cgroup. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-03 00:47:32 +02:00
Roman Gushchin	d7bf2c10af	bpf: allocate cgroup storage entries on attaching bpf programs If a bpf program is using cgroup local storage, allocate a bpf_cgroup_storage structure automatically on attaching the program to a cgroup and save the pointer into the corresponding bpf_prog_list entry. Analogically, release the cgroup local storage on detaching of the bpf program. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-03 00:47:32 +02:00
Roman Gushchin	aa0ad5b039	bpf: pass a pointer to a cgroup storage using pcpu variable This commit introduces the bpf_cgroup_storage_set() helper, which will be used to pass a pointer to a cgroup storage to the bpf helper. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-03 00:47:32 +02:00
Roman Gushchin	de9cbbaadb	bpf: introduce cgroup storage maps This commit introduces BPF_MAP_TYPE_CGROUP_STORAGE maps: a special type of maps which are implementing the cgroup storage. >From the userspace point of view it's almost a generic hash map with the (cgroup inode id, attachment type) pair used as a key. The only difference is that some operations are restricted: 1) a user can't create new entries, 2) a user can't remove existing entries. The lookup from userspace is o(log(n)). Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-03 00:47:32 +02:00
Roman Gushchin	0a4c58f570	bpf: add ability to charge bpf maps memory dynamically This commits extends existing bpf maps memory charging API to support dynamic charging/uncharging. This is required to account memory used by maps, if all entries are created dynamically after the map initialization. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-03 00:47:31 +02:00
David S. Miller	89b1698c93	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net The BTF conflicts were simple overlapping changes. The virtio_net conflict was an overlap of a fix of statistics counter, happening alongisde a move over to a bonafide statistics structure rather than counting value on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-02 10:55:32 -07:00
Masami Hiramatsu	6bc6c77cfc	tracing/kprobes: Fix within_notrace_func() to check only notrace functions Fix within_notrace_func() to check only notrace functions and to ignore the kprobe-event which can not solve symbol addresses. within_notrace_func() returns true if the given kprobe events probe point seems to be out-of-range. But that is not the correct place to check for it, it should be done in kprobes afterwards. kprobe-events allow users to define a probe point on "currently unloaded module" so that it can trace the event during module load. In this case, the user will put a probe on a symbol which is not in kallsyms yet and it hits the within_notrace_func(). As a result, kprobe-events always refuses if user defines a probe on a "currenly unloaded module". Fixes: commit `45408c4f92` ("tracing: kprobes: Prohibit probing on notrace function") Link: http://lkml.kernel.org/r/153319624799.29007.13604430345640129926.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-02 12:34:41 -04:00
zhong jiang	9be936f4b3	kernel/module: Use kmemdup to replace kmalloc+memcpy we prefer to the kmemdup rather than kmalloc+memcpy. so just replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>	2018-08-02 18:03:17 +02:00
Peter Zijlstra	b80a2bfce8	stop_machine: Reflow cpu_stop_queue_two_works() The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by lifting the preempt_disable() to the top to create more natural nesting wrt the spinlocks and make the wake_up_q() and preempt_enable() unconditional at the end. Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait with preemption enabled. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: isaacm@codeaurora.org Cc: matt@codeblueprint.co.uk Cc: psodagud@codeaurora.org Cc: gregkh@linuxfoundation.org Cc: pkondeti@codeaurora.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net	2018-08-02 15:25:20 +02:00
Sudeep Holla	fbfa926008	clockevents: Warn if cpu_all_mask is used as cpumask Using cpu_all_mask in clockevents cpumask may result in issues while comparing multiple clockevent devices to choose the preferred one. On one of the platforms with 2 system (i.e. non per-CPU) timers with different ratings, having cpu_all_mask for one of the device resulted in a boot hang due to a endless loop in clockevents_notify_released() as both were clocksources were selected as preferred. In order to prevent such issues in the future, warn if any clockevent driver sets cpu_all_mask as it's cpumask and just override it to use cpu_possible_mask. All the existing occurrences of cpu_all_mask are already replaced with cpu_possible_mask. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: https://lkml.kernel.org/r/1531308264-24220-3-git-send-email-sudeep.holla@arm.com	2018-08-02 14:55:53 +02:00
Sudeep Holla	234b3840d7	tick/broadcast-hrtimer: Use cpu_possible_mask for ce_broadcast_hrtimer This is the last instance of cpu_all_mask usage in the core framework. Replace it with cpu_possible_mask like all other instances in the clockevent drivers. This makes it possible to add a warning in the core clockevents_register_device on usage of cpu_all_mask from any clockevent drivers in the future. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: https://lkml.kernel.org/r/1531308264-24220-2-git-send-email-sudeep.holla@arm.com	2018-08-02 14:55:52 +02:00
Gaurav Kohli	363e934d88	timers: Clear timer_base::must_forward_clk with timer_base::lock held timer_base::must_forward_clock is indicating that the base clock might be stale due to a long idle sleep. The forwarding of the base clock takes place in the timer softirq or when a timer is enqueued to a base which is idle. If the enqueue of timer to an idle base happens from a remote CPU, then the following race can happen: CPU0 CPU1 run_timer_softirq mod_timer base = lock_timer_base(timer); base->must_forward_clk = false if (base->must_forward_clk) forward(base); -> skipped enqueue_timer(base, timer, idx); -> idx is calculated high due to stale base unlock_timer_base(timer); base = lock_timer_base(timer); forward(base); The root cause is that timer_base::must_forward_clk is cleared outside the timer_base::lock held region, so the remote queuing CPU observes it as cleared, but the base clock is still stale. This can cause large granularity values for timers, i.e. the accuracy of the expiry time suffers. Prevent this by clearing the flag with timer_base::lock held, so that the forwarding takes place before the cleared flag is observable by a remote CPU. Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john.stultz@linaro.org Cc: sboyd@kernel.org Cc: linux-arm-msm@vger.kernel.org Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org	2018-08-02 12:52:38 +02:00
Ingo Molnar	16e0e6a83b	Merge branch 'perf/urgent' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-08-02 09:59:20 +02:00
Gustavo A. R. Silva	44ec3ec01f	ftrace: Use true and false for boolean values in ops_references_rec() Return statements in functions returning bool should use true or false instead of an integer value. This code was detected with the help of Coccinelle. Link: http://lkml.kernel.org/r/20180802010056.GA31012@embeddedor.com Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-01 21:15:31 -04:00
Steven Rostedt (VMware)	d7224c0e12	ring-buffer: Make ring_buffer_record_is_set_on() return bool The value of ring_buffer_record_is_set_on() is either true or false, so have its return value be bool. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-01 21:09:50 -04:00
Steven Rostedt (VMware)	3ebea280d7	ring-buffer: Make ring_buffer_record_is_on() return bool The value of ring_buffer_record_is_on() is either true or false, so have its return value be bool. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-01 21:08:30 -04:00
Christoph Hellwig	87a4c37599	kconfig: include kernel/Kconfig.preempt from init/Kconfig Almost all architectures include it. Add a ARCH_NO_PREEMPT symbol to disable preempt support for alpha, hexagon, non-coldfire m68k and user mode Linux. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>	2018-08-02 08:06:54 +09:00
Steven Rostedt (VMware)	ec57350883	tracing: Make tracer_tracing_is_on() return bool There's code that expects tracer_tracing_is_on() to be either true or false, not some random number. Currently, it should only return one or zero, but just in case, change its return value to bool, to enforce it. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-01 16:08:57 -04:00
Steven Rostedt (VMware)	978defee11	tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists The start function of the hwlat tracer should never be called when the hwlat thread already exists. If it is called, do a WARN_ON(). Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-01 16:06:02 -04:00
Erica Bugden	82fbc8c48a	ftrace: Add missing check for existing hwlat thread The hwlat tracer uses a kernel thread to measure latencies. The function that creates this kernel thread, start_kthread(), can be called when the tracer is initialized and when the tracer is explicitly enabled. start_kthread() does not check if there is an existing hwlat kernel thread and will create a new one each time it is called. This causes the reference to the previous thread to be lost. Without the thread reference, the old kernel thread becomes unstoppable and continues to use CPU time even after the hwlat tracer has been disabled. This problem can be observed when a system is booted with tracing enabled and the hwlat tracer is configured like this: echo hwlat > current_tracer; echo 1 > tracing_on Add the missing check for an existing kernel thread in start_kthread() to prevent this problem. This function and the rest of the hwlat kernel thread setup and teardown are already serialized because they are called through the tracer core code with trace_type_lock held. [ Note, this only fixes the symptom. The real fix was not to call this function when tracing_on was already one. But this still makes the code more robust, so we'll add it. ] Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de Signed-off-by: Erica Bugden <erica.bugden@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-01 16:04:24 -04:00
Steven Rostedt (VMware)	f143641bfe	tracing: Do not call start/stop() functions when tracing_on does not change Currently, when one echo's in 1 into tracing_on, the current tracer's "start()" function is executed, even if tracing_on was already one. This can lead to strange side effects. One being that if the hwlat tracer is enabled, and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's start() function is called again which will recreate another kernel thread, and make it unable to remove the old one. Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de Cc: stable@vger.kernel.org Fixes: `2df8f8a6a8` ("tracing: Fix regression with irqsoff tracer and tracing_on file") Reported-by: Erica Bugden <erica.bugden@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-01 16:01:02 -04:00
Josef Bacik	2c323017e3	blk-cgroup: clear the throttle queue on fork We were hitting a panic in production where we put too many times on the request queue. This is because we'd get the throttle_queue of the parent if we fork()'ed while we needed to be throttled, but we didn't have a reference on it. Instead just clear these flags on fork so the child doesn't pay for the sins of its father. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-01 09:16:04 -06:00
Linus Torvalds	37b71411b7	audit/stable-4.18 PR 20180731 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAltguv8UHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQVeRaWujKfIqYOA/9GgMzBJYU+bVCbNSagcq6LluWFoYV ObZb9sfsf23wL0YgtKgkWaCefWAAYnWOr6bUvDa+5oMRLVR+bsP+YEkCVK45CJr0 g44oe4VH9t5inX2F2JSkoVbkUDZIwwOxiTi/L4Emqhv8cT9zc89tcKRjYhqt50d1 4Gm4++jZcTHQNKkzYUIIpKc0TZmKW5mRNmFaGogWPi72FWrhbDfjKLZZUvd+kIUC HSKnv6pKnwbxLPhd9i0p5NchuTM6kRCptGzN07UUzeww6UVvs8t62+DzHUM1o3Ft sraIx7BLenGC8OBCgi8aNkE+yseQE4h2OTym3paEkLVJsl/9qcsSyXL1dwO4Z96U HFq/TpDZoBieZihHDBk4ry7ox942mE5N51QTDUh+cygEWeNvqGwqpAUbI14J23oh 3p7w7hgXAtdtuj4pzqUARemHvIR0Xbpn8ritH9cx1s1mDdycyyBDn9mFw3Ehigom XIpUrSJtdfJYFj+z6wA4vXssvXe4TITrJTUmPAM1Alk1p+LhRkTA8JxBjHmL3qjR mFIxA40t+ON5OtCqTtGsapaoJy2Jj97dPEp5i5Jg49BQclQoTG2rpYuIu/aKrixG EZwdezckD3DPQUQdQidru7dS1J/phIaDDvEauq291ERHPfNAxQuMllXHeczzyJkc eVRMkj0/E5lihlE= =MN3z -----END PGP SIGNATURE----- Merge tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A single small audit fix to guard against memory allocation failures when logging information about a kernel module load. It's small, easy to understand, and self-contained; while nothing is zero risk, this should be pretty low" * tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix potential null dereference 'context->module.name'	2018-07-31 13:17:46 -07:00
Arthur Fabre	fbeb1603bf	bpf: verifier: MOV64 don't mark dst reg unbounded When check_alu_op() handles a BPF_MOV64 between two registers, it calls check_reg_arg(DST_OP) on the dst register, marking it as unbounded. If the src and dst register are the same, this marks the src as unbounded, which can lead to unexpected errors for further checks that rely on bounds info. For example: BPF_MOV64_IMM(BPF_REG_2, 0), BPF_MOV64_REG(BPF_REG_2, BPF_REG_2), BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), Results in: "math between ctx pointer and register with unbounded min value is not allowed" check_alu_op() now uses check_reg_arg(DST_OP_NO_MARK), and MOVs that need to mark the dst register (MOVIMM, MOV32) do so. Added a test case for MOV64 dst == src, and dst != src. Signed-off-by: Arthur Fabre <afabre@cloudflare.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-07-31 22:09:33 +02:00
Anna-Maria Gleixner	80d20d35af	nohz: Fix local_timer_softirq_pending() local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ. This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit. Use BIT(TIMER_SOFTIRQ) instead. Fixes: `5d62c183f9` ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Cc: bigeasy@linutronix.de Cc: peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de	2018-07-31 22:08:44 +02:00
Joel Fernandes (Google)	c3bc8fd637	tracing: Centralize preemptirq tracepoints and unify their usage This patch detaches the preemptirq tracepoints from the tracers and keeps it separate. Advantages: * Lockdep and irqsoff event can now run in parallel since they no longer have their own calls. * This unifies the usecase of adding hooks to an irqsoff and irqson event, and a preemptoff and preempton event. 3 users of the events exist: - Lockdep - irqsoff and preemptoff tracers - irqs and preempt trace events The unification cleans up several ifdefs and makes the code in preempt tracer and irqsoff tracers simpler. It gets rid of all the horrific ifdeferry around PROVE_LOCKING and makes configuration of the different users of the tracepoints more easy and understandable. It also gets rid of the time_* function calls from the lockdep hooks used to call into the preemptirq tracer which is not needed anymore. The negative delta in lines of code in this patch is quite large too. In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS as a single point for registering probes onto the tracepoints. With this, the web of config options for preempt/irq toggle tracepoints and its users becomes: PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING \| \| \ \| \| \ (selects) / \ \ (selects) / TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS \ / \ (depends on) / PREEMPTIRQ_TRACEPOINTS Other than the performance tests mentioned in the previous patch, I also ran the locking API test suite. I verified that all tests cases are passing. I also injected issues by not registering lockdep probes onto the tracepoints and I see failures to confirm that the probes are indeed working. This series + lockdep probes not registered (just to inject errors): [ 0.000000] hard-irqs-on + irq-safe-A/21: ok \| ok \| ok \| [ 0.000000] soft-irqs-on + irq-safe-A/21: ok \| ok \| ok \| [ 0.000000] sirq-safe-A => hirqs-on/12:FAILED\|FAILED\| ok \| [ 0.000000] sirq-safe-A => hirqs-on/21:FAILED\|FAILED\| ok \| [ 0.000000] hard-safe-A + irqs-on/12:FAILED\|FAILED\| ok \| [ 0.000000] soft-safe-A + irqs-on/12:FAILED\|FAILED\| ok \| [ 0.000000] hard-safe-A + irqs-on/21:FAILED\|FAILED\| ok \| [ 0.000000] soft-safe-A + irqs-on/21:FAILED\|FAILED\| ok \| [ 0.000000] hard-safe-A + unsafe-B #1/123: ok \| ok \| ok \| [ 0.000000] soft-safe-A + unsafe-B #1/123: ok \| ok \| ok \| With this series + lockdep probes registered, all locking tests pass: [ 0.000000] hard-irqs-on + irq-safe-A/21: ok \| ok \| ok \| [ 0.000000] soft-irqs-on + irq-safe-A/21: ok \| ok \| ok \| [ 0.000000] sirq-safe-A => hirqs-on/12: ok \| ok \| ok \| [ 0.000000] sirq-safe-A => hirqs-on/21: ok \| ok \| ok \| [ 0.000000] hard-safe-A + irqs-on/12: ok \| ok \| ok \| [ 0.000000] soft-safe-A + irqs-on/12: ok \| ok \| ok \| [ 0.000000] hard-safe-A + irqs-on/21: ok \| ok \| ok \| [ 0.000000] soft-safe-A + irqs-on/21: ok \| ok \| ok \| [ 0.000000] hard-safe-A + unsafe-B #1/123: ok \| ok \| ok \| [ 0.000000] soft-safe-A + unsafe-B #1/123: ok \| ok \| ok \| Link: http://lkml.kernel.org/r/20180730222423.196630-4-joel@joelfernandes.org Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-31 11:32:27 -04:00
Thomas Zimmermann	56e6c104e4	console: Replace #if 0 with atomic var 'ignore_console_lock_warning' The macro WARN_CONSOLE_UNLOCKED prints a warning when a thread enters the console's critical section without having acquired the console lock. The console lock can be ignored when debugging the console using printk, but this makes WARN_CONSOLE_UNLOCKED generate unnecessary warnings. The variable ignore_console_lock_warning temporarily disables WARN_CONSOLE_UNLOCKED. Developers interested in debugging the console's critical sections should increment it before entering the CS and decrement it after leaving the CS. Setting ignore_console_lock_warning is only for debugging. Regular operation should not manipulate it. Acknoledgements: This patch is based on an earlier version by Steven Rostedt. The use of atomic increment/decrement was suggested by Petr Mladek. Link: http://lkml.kernel.org/r/717e6337-e7a6-7a92-1c1b-8929a25696b5@suse.de Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> [b.zolnierkie: use <linux/atomic.h>] Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>	2018-07-31 13:06:57 +02:00
Michael O'Farrell	9d2dcc8fc6	arm64: perf: Add cap_user_time aarch64 It is useful to get the running time of a thread. Doing so in an efficient manner can be important for performance of user applications. Avoiding system calls in `clock_gettime` when handling CLOCK_THREAD_CPUTIME_ID is important. Other clocks are handled in the VDSO, but CLOCK_THREAD_CPUTIME_ID falls back on the system call. CLOCK_THREAD_CPUTIME_ID is not handled in the VDSO since it would have costs associated with maintaining updated user space accessible time offsets. These offsets have to be updated everytime the a thread is scheduled/descheduled. However, for programs regularly checking the running time of a thread, this is a performance improvement. This patch takes a middle ground, and adds support for cap_user_time an optional feature of the perf_event API. This way costs are only incurred when the perf_event api is enabled. This is done the same way as it is in x86. Ultimately this allows calculating the thread running time in userspace on aarch64 as follows (adapted from perf_event_open manpage): u32 seq, time_mult, time_shift; u64 running, count, time_offset, quot, rem, delta; struct perf_event_mmap_page pc; pc = buf; // buf is the perf event mmaped page as documented in the API. if (pc->cap_usr_time) { do { seq = pc->lock; barrier(); running = pc->time_running; count = readCNTVCT_EL0(); // Read ARM hardware clock. time_offset = pc->time_offset; time_mult = pc->time_mult; time_shift = pc->time_shift; barrier(); } while (pc->lock != seq); quot = (count >> time_shift); rem = count & (((u64)1 << time_shift) - 1); delta = time_offset + quot time_mult + ((rem * time_mult) >> time_shift); running += delta; // running now has the current nanosecond level thread time. } Summary of changes in the patch: For aarch64 systems, make arch_perf_update_userpage update the timing information stored in the perf_event page. Requiring the following calculations: - Calculate the appropriate time_mult, and time_shift factors to convert ticks to nano seconds for the current clock frequency. - Adjust the mult and shift factors to avoid shift factors of 32 bits. (possibly unnecessary) - The time_offset userspace should apply when doing calculations: negative the current sched time (now), because time_running and time_enabled fields of the perf_event page have just been updated. Toggle bits to appropriate values: - Enable cap_user_time Signed-off-by: Michael O'Farrell <micpof@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-07-31 10:14:00 +01:00
Linus Torvalds	f67077deb4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Several smallish fixes, I don't think any of this requires another -rc but I'll leave that up to you: 1) Don't leak uninitialzed bytes to userspace in xfrm_user, from Eric Dumazet. 2) Route leak in xfrm_lookup_route(), from Tommi Rantala. 3) Premature poll() returns in AF_XDP, from Björn Töpel. 4) devlink leak in netdevsim, from Jakub Kicinski. 5) Don't BUG_ON in fib_compute_spec_dst, the condition can legitimately happen. From Lorenzo Bianconi. 6) Fix some spectre v1 gadgets in generic socket code, from Jeremy Cline. 7) Don't allow user to bind to out of range multicast groups, from Dmitry Safonov with a follow-up by Dmitry Safonov. 8) Fix metrics leak in fib6_drop_pcpu_from(), from Sabrina Dubroca" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) netlink: Don't shift with UB on nlk->ngroups net/ipv6: fix metrics leak xen-netfront: wait xenbus state change when load module manually can: ems_usb: Fix memory leak on ems_usb_disconnect() openvswitch: meter: Fix setting meter id for new entries netlink: Do not subscribe to non-existent groups NET: stmmac: align DMA stuff to largest cache line length tcp_bbr: fix bw probing to raise in-flight data for very small BDPs net: socket: Fix potential spectre v1 gadget in sock_is_registered net: socket: fix potential spectre v1 gadget in socketcall net: mdio-mux: bcm-iproc: fix wrong getter and setter pair ipv4: remove BUG_ON() from fib_compute_spec_dst enic: handle mtu change for vf properly net: lan78xx: fix rx handling before first packet is send nfp: flower: fix port metadata conversion bug bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog() bpf: fix bpf_skb_load_bytes_relative pkt length check perf build: Build error in libbpf missing initialization net: ena: Fix use of uninitialized DMA address bits field bpf: btf: Use exact btf value_size match in map_check_btf() ...	2018-07-30 21:40:37 -07:00
Joel Fernandes (Google)	e6753f23d9	tracepoint: Make rcuidle tracepoint callers use SRCU In recent tests with IRQ on/off tracepoints, a large performance overhead ~10% is noticed when running hackbench. This is root caused to calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the tracepoint code. Following a long discussion on the list [1] about this, we concluded that srcu is a better alternative for use during rcu idle. Although it does involve extra barriers, its lighter than the sched-rcu version which has to do additional RCU calls to notify RCU idle about entry into RCU sections. In this patch, we change the underlying implementation of the trace_*_rcuidle API to use SRCU. This has shown to improve performance alot for the high frequency irq enable/disable tracepoints. Test: Tested idle and preempt/irq tracepoints. Here are some performance numbers: With a run of the following 30 times on a single core x86 Qemu instance with 1GB memory: hackbench -g 4 -f 2 -l 3000 Completion times in seconds. CONFIG_PROVE_LOCKING=y. No patches (without this series) Mean: 3.048 Median: 3.025 Std Dev: 0.064 With Lockdep using irq tracepoints with RCU implementation: Mean: 3.451 (-11.66 %) Median: 3.447 (-12.22%) Std Dev: 0.049 With Lockdep using irq tracepoints with SRCU implementation (this series): Mean: 3.020 (I would consider the improvement against the "without this series" case as just noise). Median: 3.013 Std Dev: 0.033 [1] https://patchwork.kernel.org/patch/10344297/ [remove rcu_read_lock_sched_notrace as its the equivalent of preempt_disable_notrace and is unnecessary to call in tracepoint code] Link: http://lkml.kernel.org/r/20180730222423.196630-3-joel@joelfernandes.org Cleaned-up-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ Simplified WARN_ON_ONCE() ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-30 19:13:03 -04:00
Joel Fernandes (Google)	01f38497c6	lockdep: Use this_cpu_ptr instead of get_cpu_var stats get_cpu_var disables preemption which has the potential to call into the preemption disable trace points causing some complications. There's also no need to disable preemption in uses of get_lock_stats anyway since preempt is already disabled. So lets simplify the code. Link: http://lkml.kernel.org/r/20180730222423.196630-2-joel@joelfernandes.org Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-30 19:06:54 -04:00
Francis Deslauriers	d899926f55	selftest/ftrace: Move kprobe selftest function to separate compile unit Move selftest function to its own compile unit so it can be compiled with the ftrace cflags (CC_FLAGS_FTRACE) allowing it to be probed during the ftrace startup tests. Link: http://lkml.kernel.org/r/153294604271.32740.16490677128630177030.stgit@devbox Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-30 18:41:04 -04:00
Masami Hiramatsu	45408c4f92	tracing: kprobes: Prohibit probing on notrace function Prohibit kprobe-events probing on notrace functions. Since probing on a notrace function can cause a recursive event call. In most cases those are just skipped, but in some cases it falls into an infinite recursive call. This protection can be disabled by the kconfig CONFIG_KPROBE_EVENTS_ON_NOTRACE=y, but it is highly recommended to keep it "n" for normal kernel builds. Note that this is only available if "kprobes on ftrace" has been implemented on the target arch and CONFIG_KPROBES_ON_FTRACE=y. Link: http://lkml.kernel.org/r/153294601436.32740.10557881188933661239.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Francis Deslauriers <francis.deslauriers@efficios.com> [ Slight grammar and spelling fixes ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-30 18:28:52 -04:00
Yi Wang	b305f7ed0f	audit: fix potential null dereference 'context->module.name' The variable 'context->module.name' may be null pointer when kmalloc return null, so it's better to check it before using to avoid null dereference. Another one more thing this patch does is using kstrdup instead of (kmalloc + strcpy), and signal a lost record via audit_log_lost. Cc: stable@vger.kernel.org # 4.11 Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-07-30 18:09:37 -04:00
Mukesh Ojha	d018031f56	cpu/hotplug: Clarify CPU hotplug step name for timers After commit 249d4a9b3246 ("timers: Reinitialize per cpu bases on hotplug") i.e. the introduction of state CPUHP_TIMERS_PREPARE instead of CPUHP_TIMERS_DEAD the step name "timers:dead" is not longer accurate. Rename it to "timers:prepare". [ tglx: Massaged changelog ] Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gkohli@codeaurora.org Cc: neeraju@codeaurora.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Brendan Jackman <brendan.jackman@arm.com> Cc: Mathieu Malaterre <malat@debian.org> Link: https://lkml.kernel.org/r/1532443668-26810-1-git-send-email-mojha@codeaurora.org	2018-07-30 21:30:52 +02:00
Linus Torvalds	ae3e10aba5	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: - a deadline scheduler related bug fix which triggered a kernel warning - an RT_RUNTIME_SHARE fix - a stop_machine preemption fix - a potential NULL dereference fix in sched_domain_debug_one()" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHARE sched/deadline: Update rq_clock of later_rq when pushing a task stop_machine: Disable preemption after queueing stopper threads sched/topology: Check variable group before dereferencing it	2018-07-30 12:13:56 -07:00
Linus Torvalds	0634922a78	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - AMD IBS data corruptor fix (uncovered by UBSAN) - an Intel PEBS entry unwind error fix - a HW-tracing crash fix - a MAINTAINERS update" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix crash when using HW tracing kernel filters perf/x86/intel: Fix unwind errors from PEBS entries (mk-II) MAINTAINERS: Add Naveen N. Rao as kprobes co-maintainer perf/x86/amd/ibs: Don't access non-started event	2018-07-30 11:45:30 -07:00
Linus Torvalds	fb20c03d37	Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "A paravirt UP-patching fix, and an I2C MUX driver lockdep warning fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/pvqspinlock/x86: Use LOCK_PREFIX in __pv_queued_spin_unlock() assembly code i2c/mux, locking/core: Annotate the nested rt_mutex usage locking/rtmutex: Allow specifying a subclass for nested locking	2018-07-30 11:37:16 -07:00
Pavel Tatashin	bd9f943e5d	sched/clock: Disable interrupts when calling generic_sched_clock_init() sched_clock_init() used be called early during boot when interrupts were still disabled. After the recent changes to utilize sched clock early the sched_clock_init() call happens when interrupts are already enabled, which triggers the following warning: WARNING: CPU: 0 PID: 0 at kernel/time/sched_clock.c:180 sched_clock_register+0x44/0x278 [<c001a13c>] (warn_slowpath_null) from [<c052367c>] (sched_clock_register+0x44/0x278) [<c052367c>] (sched_clock_register) from [<c05238d8>] (generic_sched_clock_init+0x28/0x88) [<c05238d8>] (generic_sched_clock_init) from [<c0521a00>] (sched_clock_init+0x54/0x74) [<c0521a00>] (sched_clock_init) from [<c0519c18>] (start_kernel+0x310/0x3e4) [<c0519c18>] (start_kernel) from [<00000000>] ( (null)) Disable IRQs for the duration of generic_sched_clock_init(). Fixes: `857baa87b6` ("sched/clock: Enable sched clock early") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Link: https://lkml.kernel.org/r/20180730135252.24599-1-pasha.tatashin@oracle.com	2018-07-30 19:33:35 +02:00
Pavel Tatashin	684ad537ab	timekeeping: Prevent false warning when persistent clock is not available On arches with no persistent clock a message like this is printed during boot: [ 0.000000] Persistent clock returned invalid value The value is not invalid: Zero means that no persistent clock is available and the absence of persistent clock should be quietly accepted. Fixes: `3eca993740` ("timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: sboyd@kernel.org Cc: john.stultz@linaro.org Link: https://lkml.kernel.org/r/20180725200018.23722-1-pasha.tatashin@oracle.com	2018-07-30 19:32:29 +02:00
Rafael J. Wysocki	5a4c996764	Merge back cpufreq material for 4.19.	2018-07-30 11:27:01 +02:00
Dave Airlie	3fce461827	BackMerge v4.18-rc7 into drm-next rmk requested this for armada and I think we've had a few conflicts build up. Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-07-30 10:39:22 +10:00
David S. Miller	958b4cd8fa	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2018-07-28 The following pull-request contains BPF updates for your net tree. The main changes are: 1) API fixes for libbpf's BTF mapping of map key/value types in order to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR() markings, from Martin. 2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached consumer pointer of the RX queue, from Björn. 3) Fix __xdp_return() to check for NULL pointer after the rhashtable lookup that retrieves the allocator object, from Taehee. 4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue by 4 bytes which got removed from overall stack usage, from Wang. 5) Fix bpf_skb_load_bytes_relative() length check to use actual packet length, from Daniel. 6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple() handler, from Thomas. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-28 21:02:21 -07:00
kbuild test robot	518eeca05c	tracing: preemptirq_delay_run() can be static Automatically found by kbuild test robot. Fixes: ffdc73a3b2ad ("lib: Add module for testing preemptoff/irqsoff latency tracers") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-27 17:58:34 -04:00
Linus Torvalds	864af0d40c	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kvm, mm: account shadow page tables to kmemcg zswap: re-check zswap_is_full() after do zswap_shrink() include/linux/eventfd.h: include linux/errno.h mm: fix vma_is_anonymous() false-positives mm: use vma_init() to initialize VMAs on stack and data segments mm: introduce vma_init() mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL ipc/sem.c: prevent queue.status tearing in semop mm: disallow mappings that conflict for devm_memremap_pages() kasan: only select SLUB_DEBUG with SYSFS=y delayacct: fix crash in delayacct_blkio_end() after delayacct init failure	2018-07-27 10:30:47 -07:00
Robin Murphy	f07d141fe9	dma-mapping: Generalise dma_32bit_limit flag Whilst the notion of an upstream DMA restriction is most commonly seen in PCI host bridges saddled with a 32-bit native interface, a more general version of the same issue can exist on complex SoCs where a bus or point-to-point interconnect link from a device's DMA master interface to another component along the path to memory (often an IOMMU) may carry fewer address bits than the interfaces at both ends nominally support. In order to properly deal with this, the first step is to expand the dma_32bit_limit flag into an arbitrary mask. To minimise the impact on existing code, we'll make sure to only consider this new mask valid if set. That makes sense anyway, since a mask of zero would represent DMA not being wired up at all, and that would be better handled by not providing valid ops in the first place. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-07-27 19:01:04 +02:00
Linus Torvalds	3ebb6fb03d	Various fixes to the tracing infrastructure: - Fix double free when the reg() call fails in event_trigger_callback() - Fix anomoly of snapshot causing tracing_on flag to change - Add selftest to test snapshot and tracing_on affecting each other - Fix setting of tracepoint flag on error that prevents probes from being deleted. - Fix another possible double free that is similar to event_trigger_callback() - Quiet a gcc warning of a false positive unused variable - Fix crash of partial exposed task->comm to trace events -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW1pToBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qijEAQCzqQsnlO6YBCYajRBq2wFaM7J6tVnJ LxLZlVE8lJlHZQD/YpyGOPq98CB81BfQV7RA/CAVd4RZAhTjldDgGyfL/QI= =wU8I -----END PGP SIGNATURE----- Merge tag 'trace-v4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various fixes to the tracing infrastructure: - Fix double free when the reg() call fails in event_trigger_callback() - Fix anomoly of snapshot causing tracing_on flag to change - Add selftest to test snapshot and tracing_on affecting each other - Fix setting of tracepoint flag on error that prevents probes from being deleted. - Fix another possible double free that is similar to event_trigger_callback() - Quiet a gcc warning of a false positive unused variable - Fix crash of partial exposed task->comm to trace events" * tag 'trace-v4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: kthread, tracing: Don't expose half-written comm when creating kthreads tracing: Quiet gcc warning about maybe unused link variable tracing: Fix possible double free in event_enable_trigger_func() tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure selftests/ftrace: Add snapshot and tracing_on test case ring_buffer: tracing: Inherit the tracing setting to next ring buffer tracing: Fix double free of event_trigger_data	2018-07-27 09:50:33 -07:00
Steven Rostedt (VMware)	87107a25a2	tracing/kprobes: Simplify the logic of enable_trace_kprobe() The function enable_trace_kprobe() performs slightly differently if the file parameter is passed in as NULL on non-NULL. Instead of checking file twice, move the code between the two tests into a static inline helper function to make the code easier to follow. Link: http://lkml.kernel.org/r/20180725224728.7b1d5db2@vmware.local.home Link: http://lkml.kernel.org/r/20180726121152.4dd54330@gandalf.local.home Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-27 09:36:20 -04:00
Kirill A. Shutemov	027232da7c	mm: introduce vma_init() Not all VMAs allocated with vm_area_alloc(). Some of them allocated on stack or in data segment. The new helper can be use to initialize VMA properly regardless where it was allocated. Link: http://lkml.kernel.org/r/20180724121139.62570-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-26 19:38:03 -07:00
Dan Williams	31c5bda3a6	mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL Commit `e763848843` ("mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS") added two EXPORT_SYMBOL_GPL() symbols, but these symbols are required by the inlined put_page(), thus accidentally making put_page() a GPL export only. This breaks OpenAFS (at least). Mark them EXPORT_SYMBOL() instead. Link: http://lkml.kernel.org/r/153128611970.2928.11310692420711601254.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: `e763848843` ("mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Joe Gorse <jhgorse@gmail.com> Reported-by: John Hubbard <jhubbard@nvidia.com> Tested-by: Joe Gorse <jhgorse@gmail.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Mark Vitale <mvitale@sinenomine.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-26 19:38:03 -07:00
Dave Jiang	15d36fecd0	mm: disallow mappings that conflict for devm_memremap_pages() When pmem namespaces created are smaller than section size, this can cause an issue during removal and gpf was observed: general protection fault: 0000 1 SMP PTI CPU: 36 PID: 3941 Comm: ndctl Tainted: G W 4.14.28-1.el7uek.x86_64 #2 task: ffff88acda150000 task.stack: ffffc900233a4000 RIP: 0010:__put_page+0x56/0x79 Call Trace: devm_memremap_pages_release+0x155/0x23a release_nodes+0x21e/0x260 devres_release_all+0x3c/0x48 device_release_driver_internal+0x15c/0x207 device_release_driver+0x12/0x14 unbind_store+0xba/0xd8 drv_attr_store+0x27/0x31 sysfs_kf_write+0x3f/0x46 kernfs_fop_write+0x10f/0x18b __vfs_write+0x3a/0x16d vfs_write+0xb2/0x1a1 SyS_write+0x55/0xb9 do_syscall_64+0x79/0x1ae entry_SYSCALL_64_after_hwframe+0x3d/0x0 Add code to check whether we have a mapping already in the same section and prevent additional mappings from being created if that is the case. Link: http://lkml.kernel.org/r/152909478401.50143.312364396244072931.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Robert Elliott <elliott@hpe.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-26 19:38:03 -07:00
Martin KaFai Lau	5f300e8004	bpf: btf: Use exact btf value_size match in map_check_btf() The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects '> map->value_size' to ensure map_seq_show_elem() will not access things beyond an array element. Yonghong suggested that using '!=' is a more correct check. The 8 bytes round_up on value_size is stored in array->elem_size. Hence, using '!=' on map->value_size is a proper check. This patch also adds new tests to check the btf array key type and value type. Two of these new tests verify the btf's value_size (the change in this patch). It also fixes two existing tests that wrongly encoded a btf's type size (pprint_test) and the value_type_id (in one of the raw_tests[]). However, that do not affect these two BTF verification tests before or after this test changes. These two tests mainly failed at array creation time after this patch. Fixes: `a26ca7c982` ("bpf: btf: Add pretty print support to the basic arraymap") Suggested-by: Yonghong Song <yhs@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-07-27 03:45:49 +02:00
Masami Hiramatsu	1fee4f7752	doc: tracing: Fix a typo of trace_stat The name of the directory for per-cpu function statistics is trace_stat, not trace_stats. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2018-07-26 15:48:10 -06:00
Masami Hiramatsu	72809cbf67	tracing: Remove orphaned function ftrace_nr_registered_ops() Remove ftrace_nr_registered_ops() because it is no longer used. ftrace_nr_registered_ops() has been introduced by commit `ea701f11da` ("ftrace: Add selftest to test function trace recursion protection"), but its caller has been removed by commit `05cbbf643b` ("tracing: Fix selftest function recursion accounting"). So it is not called anymore. Link: http://lkml.kernel.org/r/153260907227.12474.5234899025934963683.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-26 10:58:43 -04:00
Masami Hiramatsu	7b144b6c79	tracing: Remove orphaned function using_ftrace_ops_list_func(). Remove using_ftrace_ops_list_func() since it is no longer used. Using ftrace_ops_list_func() has been introduced by commit `7eea4fce02` ("tracing/stack_trace: Skip 4 instead of 3 when using ftrace_ops_list_func") as a helper function, but its caller has been removed by commit `72ac426a5b` ("tracing: Clean up stack tracing and fix fentry updates"). So it is not called anymore. Link: http://lkml.kernel.org/r/153260904427.12474.9952096317439329851.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-26 10:53:05 -04:00
Steven Rostedt (VMware)	f6b7425cfb	tracing: Make unregister_trigger() static Nothing uses unregister_trigger() outside of trace_events_trigger.c file, thus it should be static. Not sure why this was ever converted, because its counter part, register_trigger(), was always static. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-26 10:50:18 -04:00
Joel Fernandes (Google)	f96e8577da	lib: Add module for testing preemptoff/irqsoff latency tracers Here we introduce a test module for introducing a long preempt or irq disable delay in the kernel which the preemptoff or irqsoff tracers can detect. This module is to be used only for test purposes and is default disabled. Following is the expected output (only briefly shown) that can be parsed to verify that the tracers are working correctly. We will use this from the kselftests in future patches. For the preemptoff tracer: echo preemptoff > /d/tracing/current_tracer sleep 1 insmod ./preemptirq_delay_test.ko test_mode=preempt delay=500000 sleep 1 bash-4.3# cat /d/tracing/trace preempt -1066 2...2 0us@: preemptirq_delay_run <-preemptirq_delay_run preempt -1066 2...2 500002us : preemptirq_delay_run <-preemptirq_delay_run preempt -1066 2...2 500004us : tracer_preempt_on <-preemptirq_delay_run preempt -1066 2...2 500012us : <stack trace> => kthread => ret_from_fork For the irqsoff tracer: echo irqsoff > /d/tracing/current_tracer sleep 1 insmod ./preemptirq_delay_test.ko test_mode=irq delay=500000 sleep 1 bash-4.3# cat /d/tracing/trace irq dis -1069 1d..1 0us@: preemptirq_delay_run irq dis -1069 1d..1 500001us : preemptirq_delay_run irq dis -1069 1d..1 500002us : tracer_hardirqs_on <-preemptirq_delay_run irq dis -1069 1d..1 500005us : <stack trace> => ret_from_fork Link: http://lkml.kernel.org/r/20180712213611.GA8743@joelaf.mtv.corp.google.com Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Julia Cartwright <julia@ni.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Glexiner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [ Erick is a co-developer of this commit ] Signed-off-by: Erick Reyes <erickreyes@google.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-26 10:50:17 -04:00
Joel Fernandes (Google)	2b27ece6c5	tracing/irqsoff: Split reset into separate functions Split reset functions into seperate functions in preparation of future patches that need to do tracer specific reset. Link: http://lkml.kernel.org/r/20180628182149.226164-4-joel@joelfernandes.org Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-26 10:50:17 -04:00

... 6 7 8 9 10 ...

28872 Commits