mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-11-24 18:10:52 +07:00
d67a537209
29353 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Petr Mladek
|
d67a537209 |
livepatch: Remove ordering (stacking) of the livepatches
The atomic replace and cumulative patches were introduced as a more secure way to handle dependent patches. They simplify the logic: + Any new cumulative patch is supposed to take over shadow variables and changes made by callbacks from previous livepatches. + All replaced patches are discarded and the modules can be unloaded. As a result, there is only one scenario when a cumulative livepatch gets disabled. The different handling of "normal" and cumulative patches might cause confusion. It would make sense to keep only one mode. On the other hand, it would be rude to enforce using the cumulative livepatches even for trivial and independent (hot) fixes. However, the stack of patches is not really necessary any longer. The patch ordering was never clearly visible via the sysfs interface. Also the "normal" patches need a lot of caution anyway. Note that the list of enabled patches is still necessary but the ordering is not longer enforced. Otherwise, the code is ready to disable livepatches in an random order. Namely, klp_check_stack_func() always looks for the function from the livepatch that is being disabled. klp_func structures are just removed from the related func_stack. Finally, the ftrace handlers is removed only when the func_stack becomes empty. Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
Petr Mladek
|
d697bad588 |
livepatch: Remove Nop structures when unused
Replaced patches are removed from the stack when the transition is finished. It means that Nop structures will never be needed again and can be removed. Why should we care? + Nop structures give the impression that the function is patched even though the ftrace handler has no effect. + Ftrace handlers do not come for free. They cause slowdown that might be visible in some workloads. The ftrace-related slowdown might actually be the reason why the function is no longer patched in the new cumulative patch. One would expect that cumulative patch would help solve these problems as well. + Cumulative patches are supposed to replace any earlier version of the patch. The amount of NOPs depends on which version was replaced. This multiplies the amount of scenarios that might happen. One might say that NOPs are innocent. But there are even optimized NOP instructions for different processors, for example, see arch/x86/kernel/alternative.c. And klp_ftrace_handler() is much more complicated. + It sounds natural to clean up a mess that is no longer needed. It could only be worse if we do not do it. This patch allows to unpatch and free the dynamic structures independently when the transition finishes. The free part is a bit tricky because kobject free callbacks are called asynchronously. We could not wait for them easily. Fortunately, we do not have to. Any further access can be avoided by removing them from the dynamic lists. Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
Jason Baron
|
e1452b607c |
livepatch: Add atomic replace
Sometimes we would like to revert a particular fix. Currently, this is not easy because we want to keep all other fixes active and we could revert only the last applied patch. One solution would be to apply new patch that implemented all the reverted functions like in the original code. It would work as expected but there will be unnecessary redirections. In addition, it would also require knowing which functions need to be reverted at build time. Another problem is when there are many patches that touch the same functions. There might be dependencies between patches that are not enforced on the kernel side. Also it might be pretty hard to actually prepare the patch and ensure compatibility with the other patches. Atomic replace && cumulative patches: A better solution would be to create cumulative patch and say that it replaces all older ones. This patch adds a new "replace" flag to struct klp_patch. When it is enabled, a set of 'nop' klp_func will be dynamically created for all functions that are already being patched but that will no longer be modified by the new patch. They are used as a new target during the patch transition. The idea is to handle Nops' structures like the static ones. When the dynamic structures are allocated, we initialize all values that are normally statically defined. The only exception is "new_func" in struct klp_func. It has to point to the original function and the address is known only when the object (module) is loaded. Note that we really need to set it. The address is used, for example, in klp_check_stack_func(). Nevertheless we still need to distinguish the dynamically allocated structures in some operations. For this, we add "nop" flag into struct klp_func and "dynamic" flag into struct klp_object. They need special handling in the following situations: + The structures are added into the lists of objects and functions immediately. In fact, the lists were created for this purpose. + The address of the original function is known only when the patched object (module) is loaded. Therefore it is copied later in klp_init_object_loaded(). + The ftrace handler must not set PC to func->new_func. It would cause infinite loop because the address points back to the beginning of the original function. + The various free() functions must free the structure itself. Note that other ways to detect the dynamic structures are not considered safe. For example, even the statically defined struct klp_object might include empty funcs array. It might be there just to run some callbacks. Also note that the safe iterator must be used in the free() functions. Otherwise already freed structures might get accessed. Special callbacks handling: The callbacks from the replaced patches are _not_ called by intention. It would be pretty hard to define a reasonable semantic and implement it. It might even be counter-productive. The new patch is cumulative. It is supposed to include most of the changes from older patches. In most cases, it will not want to call pre_unpatch() post_unpatch() callbacks from the replaced patches. It would disable/break things for no good reasons. Also it should be easier to handle various scenarios in a single script in the new patch than think about interactions caused by running many scripts from older patches. Not to say that the old scripts even would not expect to be called in this situation. Removing replaced patches: One nice effect of the cumulative patches is that the code from the older patches is no longer used. Therefore the replaced patches can be removed. It has several advantages: + Nops' structs will no longer be necessary and might be removed. This would save memory, restore performance (no ftrace handler), allow clear view on what is really patched. + Disabling the patch will cause using the original code everywhere. Therefore the livepatch callbacks could handle only one scenario. Note that the complication is already complex enough when the patch gets enabled. It is currently solved by calling callbacks only from the new cumulative patch. + The state is clean in both the sysfs interface and lsmod. The modules with the replaced livepatches might even get removed from the system. Some people actually expected this behavior from the beginning. After all a cumulative patch is supposed to "completely" replace an existing one. It is like when a new version of an application replaces an older one. This patch does the first step. It removes the replaced patches from the list of patches. It is safe. The consistency model ensures that they are no longer used. By other words, each process works only with the structures from klp_transition_patch. The removal is done by a special function. It combines actions done by __disable_patch() and klp_complete_transition(). But it is a fast track without all the transaction-related stuff. Signed-off-by: Jason Baron <jbaron@akamai.com> [pmladek@suse.com: Split, reuse existing code, simplified] Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
Jason Baron
|
20e5502595 |
livepatch: Use lists to manage patches, objects and functions
Currently klp_patch contains a pointer to a statically allocated array of struct klp_object and struct klp_objects contains a pointer to a statically allocated array of klp_func. In order to allow for the dynamic allocation of objects and functions, link klp_patch, klp_object, and klp_func together via linked lists. This allows us to more easily allocate new objects and functions, while having the iterator be a simple linked list walk. The static structures are added to the lists early. It allows to add the dynamically allocated objects before klp_init_object() and klp_init_func() calls. Therefore it reduces the further changes to the code. This patch does not change the existing behavior. Signed-off-by: Jason Baron <jbaron@akamai.com> [pmladek@suse.com: Initialize lists before init calls] Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
Petr Mladek
|
958ef1e39d |
livepatch: Simplify API by removing registration step
The possibility to re-enable a registered patch was useful for immediate patches where the livepatch module had to stay until the system reboot. The improved consistency model allows to achieve the same result by unloading and loading the livepatch module again. Also we are going to add a feature called atomic replace. It will allow to create a patch that would replace all already registered patches. The aim is to handle dependent patches more securely. It will obsolete the stack of patches that helped to handle the dependencies so far. Then it might be unclear when a cumulative patch re-enabling is safe. It would be complicated to support the many modes. Instead we could actually make the API and code easier to understand. Therefore, remove the two step public API. All the checks and init calls are moved from klp_register_patch() to klp_enabled_patch(). Also the patch is automatically freed, including the sysfs interface when the transition to the disabled state is completed. As a result, there is never a disabled patch on the top of the stack. Therefore we do not need to check the stack in __klp_enable_patch(). And we could simplify the check in __klp_disable_patch(). Also the API and logic is much easier. It is enough to call klp_enable_patch() in module_init() call. The patch can be disabled by writing '0' into /sys/kernel/livepatch/<patch>/enabled. Then the module can be removed once the transition finishes and sysfs interface is freed. The only problem is how to free the structures and kobjects safely. The operation is triggered from the sysfs interface. We could not put the related kobject from there because it would cause lock inversion between klp_mutex and kernfs locks, see kn->count lockdep map. Therefore, offload the free task to a workqueue. It is perfectly fine: + The patch can no longer be used in the livepatch operations. + The module could not be removed until the free operation finishes and module_put() is called. + The operation is asynchronous already when the first klp_try_complete_transition() fails and another call is queued with a delay. Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
Petr Mladek
|
68007289bf |
livepatch: Don't block the removal of patches loaded after a forced transition
module_put() is currently never called in klp_complete_transition() when klp_force is set. As a result, we might keep the reference count even when klp_enable_patch() fails and klp_cancel_transition() is called. This might give the impression that a module might get blocked in some strange init state. Fortunately, it is not the case. The reference count is ignored when mod->init fails and erroneous modules are always removed. Anyway, this might be confusing. Instead, this patch moves the global klp_forced flag into struct klp_patch. As a result, we block only modules that might still be in use after a forced transition. Newly loaded livepatches might be eventually completely removed later. It is not a big deal. But the code is at least consistent with the reality. Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
Petr Mladek
|
0430f78bf3 |
livepatch: Consolidate klp_free functions
The code for freeing livepatch structures is a bit scattered and tricky: + direct calls to klp_free_*_limited() and kobject_put() are used to release partially initialized objects + klp_free_patch() removes the patch from the public list and releases all objects except for patch->kobj + object_put(&patch->kobj) and the related wait_for_completion() are called directly outside klp_mutex; this code is duplicated; Now, we are going to remove the registration stage to simplify the API and the code. This would require handling more situations in klp_enable_patch() error paths. More importantly, we are going to add a feature called atomic replace. It will need to dynamically create func and object structures. We will want to reuse the existing init() and free() functions. This would create even more error path scenarios. This patch implements more straightforward free functions: + checks kobj_added flag instead of @limit[*] + initializes patch->list early so that the check for empty list always works + The action(s) that has to be done outside klp_mutex are done in separate klp_free_patch_finish() function. It waits only when patch->kobj was really released via the _start() part. The patch does not change the existing behavior. [*] We need our own flag to track that the kobject was successfully added to the hierarchy. Note that kobj.state_initialized only indicates that kobject has been initialized, not whether is has been added (and needs to be removed on cleanup). Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Jessica Yu <jeyu@kernel.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Jason Baron <jbaron@akamai.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
Petr Mladek
|
26c3e98e2f |
livepatch: Shuffle klp_enable_patch()/klp_disable_patch() code
We are going to simplify the API and code by removing the registration step. This would require calling init/free functions from enable/disable ones. This patch just moves the code to prevent more forward declarations. This patch does not change the code except for two forward declarations. Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
Petr Mladek
|
19514910d0 |
livepatch: Change unsigned long old_addr -> void *old_func in struct klp_func
The address of the to be patched function and new function is stored in struct klp_func as: void *new_func; unsigned long old_addr; The different naming scheme and type are derived from the way the addresses are set. @old_addr is assigned at runtime using kallsyms-based search. @new_func is statically initialized, for example: static struct klp_func funcs[] = { { .old_name = "cmdline_proc_show", .new_func = livepatch_cmdline_proc_show, }, { } }; This patch changes unsigned long old_addr -> void *old_func. It removes some confusion when these address are later used in the code. It is motivated by a followup patch that adds special NOP struct klp_func where we want to assign func->new_func = func->old_addr respectively func->new_func = func->old_func. This patch does not modify the existing behavior. Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Alice Ferrazzi <alice.ferrazzi@gmail.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
Linus Torvalds
|
a65981109f |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: - procfs updates - various misc bits - lib/ updates - epoll updates - autofs - fatfs - a few more MM bits * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits) mm/page_io.c: fix polled swap page in checkpatch: add Co-developed-by to signature tags docs: fix Co-Developed-by docs drivers/base/platform.c: kmemleak ignore a known leak fs: don't open code lru_to_page() fs/: remove caller signal_pending branch predictions mm/: remove caller signal_pending branch predictions arch/arc/mm/fault.c: remove caller signal_pending_branch predictions kernel/sched/: remove caller signal_pending branch predictions kernel/locking/mutex.c: remove caller signal_pending branch predictions mm: select HAVE_MOVE_PMD on x86 for faster mremap mm: speed up mremap by 20x on large regions mm: treewide: remove unused address argument from pte_alloc functions initramfs: cleanup incomplete rootfs scripts/gdb: fix lx-version string output kernel/kcov.c: mark write_comp_data() as notrace kernel/sysctl: add panic_print into sysctl panic: add options to print system info when panic happens bfs: extra sanity checking and static inode bitmap exec: separate MM_ANONPAGES and RLIMIT_STACK accounting ... |
||
Davidlohr Bueso
|
34ec35ad8f |
kernel/sched/: remove caller signal_pending branch predictions
This is already done for us internally by the signal machinery. Link: http://lkml.kernel.org/r/20181116002713.8474-3-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Davidlohr Bueso
|
3bb5f4ac55 |
kernel/locking/mutex.c: remove caller signal_pending branch predictions
This is already done for us internally by the signal machinery. Link: http://lkml.kernel.org/r/20181116002713.8474-2-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Anders Roxell
|
6347244316 |
kernel/kcov.c: mark write_comp_data() as notrace
Since __sanitizer_cov_trace_const_cmp4 is marked as notrace, the
function called from __sanitizer_cov_trace_const_cmp4 shouldn't be
traceable either. ftrace_graph_caller() gets called every time func
write_comp_data() gets called if it isn't marked 'notrace'. This is the
backtrace from gdb:
#0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
#1 0xffffff8010201920 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
#2 0xffffff8010439714 in write_comp_data (type=5, arg1=0, arg2=0, ip=18446743524224276596) at ../kernel/kcov.c:116
#3 0xffffff8010439894 in __sanitizer_cov_trace_const_cmp4 (arg1=<optimized out>, arg2=<optimized out>) at ../kernel/kcov.c:188
#4 0xffffff8010201874 in prepare_ftrace_return (self_addr=18446743524226602768, parent=0xffffff801014b918, frame_pointer=18446743524223531344) at ./include/generated/atomic-instrumented.h:27
#5 0xffffff801020194c in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182
Rework so that write_comp_data() that are called from
__sanitizer_cov_trace_*_cmp*() are marked as 'notrace'.
Commit
|
||
Feng Tang
|
81c9d43f94 |
kernel/sysctl: add panic_print into sysctl
So that we can also runtime chose to print out the needed system info for panic, other than setting the kernel cmdline. Link: http://lkml.kernel.org/r/1543398842-19295-3-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Feng Tang
|
d999bd9392 |
panic: add options to print system info when panic happens
Kernel panic issues are always painful to debug, partially because it's not easy to get enough information of the context when panic happens. And we have ramoops and kdump for that, while this commit tries to provide a easier way to show the system info by adding a cmdline parameter, referring some idea from sysrq handler. Link: http://lkml.kernel.org/r/1543398842-19295-2-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Yi Wang
|
fb5bf31722 |
fork: fix some -Wmissing-prototypes warnings
We get a warning when building kernel with W=1:
kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]
Add the missing declaration in head file to fix this.
Also, remove arch_release_thread_stack() completely because no arch
seems to implement it since
|
||
Tetsuo Handa
|
304ae42739 |
kernel/hung_task.c: break RCU locks based on jiffies
check_hung_uninterruptible_tasks() is currently calling rcu_lock_break() for every 1024 threads. But check_hung_task() is very slow if printk() was called, and is very fast otherwise. If many threads within some 1024 threads called printk(), the RCU grace period might be extended enough to trigger RCU stall warnings. Therefore, calling rcu_lock_break() for every some fixed jiffies will be safer. Link: http://lkml.kernel.org/r/1544800658-11423-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Liu, Chuansheng
|
168e06f793 |
kernel/hung_task.c: force console verbose before panic
Based on commit
|
||
Cheng Lin
|
09be178400 |
proc/sysctl: fix return error for proc_doulongvec_minmax()
If the number of input parameters is less than the total parameters, an EINVAL error will be returned. For example, we use proc_doulongvec_minmax to pass up to two parameters with kern_table: { .procname = "monitor_signals", .data = &monitor_sigs, .maxlen = 2*sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, Reproduce: When passing two parameters, it's work normal. But passing only one parameter, an error "Invalid argument"(EINVAL) is returned. [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 1 2 [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals -bash: echo: write error: Invalid argument [root@cl150 ~]# echo $? 1 [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 3 2 [root@cl150 ~]# The following is the result after apply this patch. No error is returned when the number of input parameters is less than the total parameters. [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 1 2 [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals [root@cl150 ~]# echo $? 0 [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 3 2 [root@cl150 ~]# There are three processing functions dealing with digital parameters, __do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax. This patch deals with __do_proc_doulongvec_minmax, just as __do_proc_dointvec does, adding a check for parameters 'left'. In __do_proc_douintvec, its code implementation explicitly does not support multiple inputs. static int __do_proc_douintvec(...){ ... /* * Arrays are not supported, keep this simple. *Do not* add * support for them. */ if (vleft != 1) { *lenp = 0; return -EINVAL; } ... } So, just __do_proc_doulongvec_minmax has the problem. And most use of proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one parameter. Link: http://lkml.kernel.org/r/1544081775-15720-1-git-send-email-cheng.lin130@zte.com.cn Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
594cc251fd |
make 'user_access_begin()' do 'access_ok()'
Originally, the rule used to be that you'd have to do access_ok() separately, and then user_access_begin() before actually doing the direct (optimized) user access. But experience has shown that people then decide not to do access_ok() at all, and instead rely on it being implied by other operations or similar. Which makes it very hard to verify that the access has actually been range-checked. If you use the unsafe direct user accesses, hardware features (either SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged Access Never - on ARM) do force you to use user_access_begin(). But nothing really forces the range check. By putting the range check into user_access_begin(), we actually force people to do the right thing (tm), and the range check vill be visible near the actual accesses. We have way too long a history of people trying to avoid them. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
96d4f267e4 |
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
43d86ee8c6 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller: "Several fixes here. Basically split down the line between newly introduced regressions and long existing problems: 1) Double free in tipc_enable_bearer(), from Cong Wang. 2) Many fixes to nf_conncount, from Florian Westphal. 3) op->get_regs_len() can throw an error, check it, from Yunsheng Lin. 4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman driver, from Scott Wood. 5) Inifnite loop in fib_empty_table(), from Yue Haibing. 6) Use after free in ax25_fillin_cb(), from Cong Wang. 7) Fix socket locking in nr_find_socket(), also from Cong Wang. 8) Fix WoL wakeup enable in r8169, from Heiner Kallweit. 9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani. 10) Fix ptr_ring wrap during queue swap, from Cong Wang. 11) Missing shutdown callback in hinic driver, from Xue Chaojing. 12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano Brivio. 13) BPF out of bounds speculation fixes from Daniel Borkmann" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits) ipv6: Consider sk_bound_dev_if when binding a socket to an address ipv6: Fix dump of specific table with strict checking bpf: add various test cases to selftests bpf: prevent out of bounds speculation on pointer arithmetic bpf: fix check_map_access smin_value test when pointer contains offset bpf: restrict unknown scalars of mixed signed bounds for unprivileged bpf: restrict stack pointer arithmetic for unprivileged bpf: restrict map value pointer arithmetic for unprivileged bpf: enable access to ax register also from verifier rewrite bpf: move tmp variable into ax register in interpreter bpf: move {prev_,}insn_idx into verifier env isdn: fix kernel-infoleak in capi_unlocked_ioctl ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error net/hamradio/6pack: use mod_timer() to rearm timers net-next/hinic:add shutdown callback net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT ip: validate header length on virtual device xmit tap: call skb_probe_transport_header after setting skb->dev ptr_ring: wrap back ->producer in __ptr_ring_swap_queue() net: rds: remove unnecessary NULL check ... |
||
Linus Torvalds
|
e6b9257280 |
NFS client updates for Linux 4.21
Note that there is a conflict with the rdma tree in this pull request, since we delete a file that has been changed in the rdma tree. Hopefully that's easy enough to resolve! We also were unable to track down a maintainer for Neil Brown's changes to the generic cred code that are prerequisites to his RPC cred cleanup patches. We've been asking around for several months without any response, so hopefully it's okay to include those patches in this pull request. Stable bugfixes: - xprtrdma: Yet another double DMA-unmap # v4.20 Features: - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG - Per-xprt rdma receive workqueues - Drop support for FMR memory registration - Make port= mount option optional for RDMA mounts Other bugfixes and cleanups: - Remove unused nfs4_xdev_fs_type declaration - Fix comments for behavior that has changed - Remove generic RPC credentials by switching to 'struct cred' - Fix crossing mountpoints with different auth flavors - Various xprtrdma fixes from testing and auditing the close code - Fixes for disconnect issues when using xprtrdma with krb5 - Clean up and improve xprtrdma trace points - Fix NFS v4.2 async copy reboot recovery -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlwtO50ACgkQ18tUv7Cl QOtZWQ//e5Hhp2TnQZ6U+99YKedjwBHP6psH3GKSEdeHSNdlSpZ5ckgHxvMb9TBa 6t4ecgv5P/uYLIePQ0u2ubUFc9+TlyGi7Iacx13/YhK7kihGHDPnZhfl0QbYixV7 rwa9bFcKmOrXs8ld+Hw3P2UL22G1gMf/LHDhPNshbW7LFZmcshKz+mKTk70kwkq9 v7tFC59p6GwV8Sr2YI2NXn2fOWsUS00sQfgj2jceJYJ8PsNa+wHYF4wPj2IY5NsE D5Oq2kLPbytBhCllOHgopNZaf4qb5BfqhVETyc1O+kDF3BZKUhQ1PoDi2FPinaHM 5/d8hS+5fr3eMBsQrPWQLXYjWQFUXnkQQJvU3Bo52AIgomsk/8uBq3FvH7XmFcBd C8sgnuUAkAS8feMes8GCS50BTxclnGuYGdyFJyCRXoG9Kn9rMrw9EKitky6EVq0v NmXhW79jK84a3yDXVlAIpZ8Y9BU/HQ3GviGX8lQEdZU9YiYRzDIHvpMFwzMgqaBi XvLbr8PlLOm8GZokThS8QYT/G2Wu6IwfUq/AufVjVD4+HiL3duKKfWSGAvcm6aAa GoRF6UG+OmjWlzKojtRc1dI+sy22Fzh+DW+Mx6tuf/b/66wkmYnW7eKcV4rt6Tm5 /JEhvTMo9q7elL/4FgCoMCcdoc5eXqQyXRXrQiOU7YHLzn2aWU0= =DvVW -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Yet another double DMA-unmap # v4.20 Features: - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG - Per-xprt rdma receive workqueues - Drop support for FMR memory registration - Make port= mount option optional for RDMA mounts Other bugfixes and cleanups: - Remove unused nfs4_xdev_fs_type declaration - Fix comments for behavior that has changed - Remove generic RPC credentials by switching to 'struct cred' - Fix crossing mountpoints with different auth flavors - Various xprtrdma fixes from testing and auditing the close code - Fixes for disconnect issues when using xprtrdma with krb5 - Clean up and improve xprtrdma trace points - Fix NFS v4.2 async copy reboot recovery" * tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) sunrpc: convert to DEFINE_SHOW_ATTRIBUTE sunrpc: Add xprt after nfs4_test_session_trunk() sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS sunrpc: handle ENOMEM in rpcb_getport_async NFS: remove unnecessary test for IS_ERR(cred) xprtrdma: Prevent leak of rpcrdma_rep objects NFSv4.2 fix async copy reboot recovery xprtrdma: Don't leak freed MRs xprtrdma: Add documenting comment for rpcrdma_buffer_destroy xprtrdma: Replace outdated comment for rpcrdma_ep_post xprtrdma: Update comments in frwr_op_send SUNRPC: Fix some kernel doc complaints SUNRPC: Simplify defining common RPC trace events NFS: Fix NFSv4 symbolic trace point output xprtrdma: Trace mapping, alloc, and dereg failures xprtrdma: Add trace points for calls to transport switch methods xprtrdma: Relocate the xprtrdma_mr_map trace points xprtrdma: Clean up of xprtrdma chunk trace points xprtrdma: Remove unused fields from rpcrdma_ia xprtrdma: Cull dprintk() call sites ... |
||
Daniel Borkmann
|
979d63d50c |
bpf: prevent out of bounds speculation on pointer arithmetic
Jann reported that the original commit back in |
||
Daniel Borkmann
|
b7137c4eab |
bpf: fix check_map_access smin_value test when pointer contains offset
In check_map_access() we probe actual bounds through __check_map_access() with offset of reg->smin_value + off for lower bound and offset of reg->umax_value + off for the upper bound. However, even though the reg->smin_value could have a negative value, the final result of the sum with off could be positive when pointer arithmetic with known and unknown scalars is combined. In this case we reject the program with an error such as "R<x> min value is negative, either use unsigned index or do a if (index >=0) check." even though the access itself would be fine. Therefore extend the check to probe whether the actual resulting reg->smin_value + off is less than zero. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Daniel Borkmann
|
9d7eceede7 |
bpf: restrict unknown scalars of mixed signed bounds for unprivileged
For unknown scalars of mixed signed bounds, meaning their smin_value is negative and their smax_value is positive, we need to reject arithmetic with pointer to map value. For unprivileged the goal is to mask every map pointer arithmetic and this cannot reliably be done when it is unknown at verification time whether the scalar value is negative or positive. Given this is a corner case, the likelihood of breaking should be very small. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Daniel Borkmann
|
e4298d2583 |
bpf: restrict stack pointer arithmetic for unprivileged
Restrict stack pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a stack pointer as a destination we simulate a check_stack_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. This is analog to map value pointer arithmetic and needed for masking later on. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Daniel Borkmann
|
0d6303db79 |
bpf: restrict map value pointer arithmetic for unprivileged
Restrict map value pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a map value pointer as a destination it will simulate a check_map_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. We use this later on for masking any pointer arithmetic with the remainder of the map value space. The likelihood of breaking any existing real-world unprivileged eBPF program is very small for this corner case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Daniel Borkmann
|
9b73bfdd08 |
bpf: enable access to ax register also from verifier rewrite
Right now we are using BPF ax register in JIT for constant blinding as well as in interpreter as temporary variable. Verifier will not be able to use it simply because its use will get overridden from the former in bpf_jit_blind_insn(). However, it can be made to work in that blinding will be skipped if there is prior use in either source or destination register on the instruction. Taking constraints of ax into account, the verifier is then open to use it in rewrites under some constraints. Note, ax register already has mappings in every eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Daniel Borkmann
|
144cd91c4c |
bpf: move tmp variable into ax register in interpreter
This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run() into the hidden ax register. The latter is currently only used in JITs for constant blinding as a temporary scratch register, meaning the BPF interpreter will never see the use of ax. Therefore it is safe to use it for the cases where tmp has been used earlier. This is needed to later on allow restricted hidden use of ax in both interpreter and JITs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Daniel Borkmann
|
c08435ec7f |
bpf: move {prev_,}insn_idx into verifier env
Move prev_insn_idx and insn_idx from the do_check() function into the verifier environment, so they can be read inside the various helper functions for handling the instructions. It's easier to put this into the environment rather than changing all call-sites only to pass it along. insn_idx is useful in particular since this later on allows to hold state in env->insn_aux_data[env->insn_idx]. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Linus Torvalds
|
d9a7fa67b4 |
Merge branch 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull seccomp updates from James Morris: - Add SECCOMP_RET_USER_NOTIF - seccomp fixes for sparse warnings and s390 build (Tycho) * 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: seccomp, s390: fix build for syscall type change seccomp: fix poor type promotion samples: add an example of seccomp user trap seccomp: add a return code to trap to userspace seccomp: switch system call argument type to void * seccomp: hoist struct seccomp_data recalculation higher |
||
Linus Torvalds
|
fcf010449e |
kgdb patches for 4.20-rc1
Mostly clean ups although whilst Doug's was chasing down a odd lockdep warning he also did some work to improved debugger resilience when some CPUs fail to respond to the round up request. The main changes are: * Fixing a lockdep warning on architectures that cannot use an NMI for the round up plus related changes to make CPU round up and all CPU backtrace more resilient. * Constify the arch ops tables * A couple of other small clean ups Two of the three patchsets here include changes that spill over into arch/. Changes in the arch space are relatively narrow in scope (and directly related to kgdb). Didn't get comprehensive acks but all impacted maintainers were Cc:ed in good time. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAlwonoUbHGRhbmllbC50 aG9tcHNvbkBsaW5hcm8ub3JnAAoJEHzjJV0594ihmooP/1uzSMGQIoQMB8XeU/jT Da2iILybi6hGp7ILA27d0yN3tsJBxWGWs8wzNdzMo3NQ3J0o4foAUnS/R0Vjkg9w uphe5EA4HDsIrH05OouNb984BeEgNaC9HSqtyr9fXuh024NboULFKIm7REYm+QHT C5SrBtmonL1xE7FmAhudLWjl7ZlvxM6DJeoVViH4kKq0raTiILt6VJaGl9JfcAdL m9GEf9r/nh0sCq3GNgyc0y4BvHed+Kxzy1fsIi3jE6t8elaYYR72gNRQ5LaFxcnQ F04/UtH75qB4rqYsqqV1q0rFi+tj+p9wYTmxixaGWsVDX4Gb5KXuLWJhaRb5IvwC bdq/0IAXRr4vUL3y0tFWfCj7pHGaVc/gfXi8aieRXLGAZG+tdfuu99NCiulIZTfc QqZz12Z+99/qi6dK7dBQtaN8SyPeB1QXKWefeGo2Bt5QqiBmcKHxsQYMUo3nkf3J UXHpj4LG6Ldsi/w8VZfvXmM0/vbO/jrus9m+X2v+4tJyisjrsyv0FRnREI4avfbC l09P1ajv7RrAaxtab0smV9krqWZ/mSn0zcgcaD6RdKe0+SwsiP/CEx1z1Wb1MH9c wjEiClXjdVB39YVT0YVfG2Ho7qH8WRErxVyNb/f4QKHMXL1Mu91hFWhBBpUOGUj2 7Jrq2zK1uWramtt7GBDpHYYH =Aqlc -----END PGP SIGNATURE----- Merge tag 'kgdb-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "Mostly clean ups although while Doug's was chasing down a odd lockdep warning he also did some work to improved debugger resilience when some CPUs fail to respond to the round up request. The main changes are: - Fixing a lockdep warning on architectures that cannot use an NMI for the round up plus related changes to make CPU round up and all CPU backtrace more resilient. - Constify the arch ops tables - A couple of other small clean ups Two of the three patchsets here include changes that spill over into arch/. Changes in the arch space are relatively narrow in scope (and directly related to kgdb). Didn't get comprehensive acks but all impacted maintainers were Cc:ed in good time" * tag 'kgdb-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops mips/kgdb: prepare arch_kgdb_ops for constness kdb: use bool for binary state indicators kdb: Don't back trace on a cpu that didn't round up kgdb: Don't round up a CPU that failed rounding up before kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function() kgdb: Remove irq flags from roundup |
||
Linus Torvalds
|
495d714ad1 |
Tracing changes for v4.21:
- Rework of the kprobe/uprobe and synthetic events to consolidate all the dynamic event code. This will make changes in the future easier. - Partial rewrite of the function graph tracing infrastructure. This will allow for multiple users of hooking onto functions to get the callback (return) of the function. This is the ground work for having kprobes and function graph tracer using one code base. - Clean up of the histogram code that will facilitate adding more features to the histograms in the future. - Addition of str_has_prefix() and a few use cases. There currently is a similar function strstart() that is used in a few places, but only returns a bool and not a length. These instances will be removed in the future to use str_has_prefix() instead. - A few other various clean ups as well. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXCawlBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qhbcAQCFeT0fWWTUxofBQz5jqsHaRnVg21+9 X4sTldYRYEn4YgEAmWOyiwq7zvrsAu4ZwkNBMeqxn3tVymYHiGOGe3Y4BAw= =u96o -----END PGP SIGNATURE----- Merge tag 'trace-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Rework of the kprobe/uprobe and synthetic events to consolidate all the dynamic event code. This will make changes in the future easier. - Partial rewrite of the function graph tracing infrastructure. This will allow for multiple users of hooking onto functions to get the callback (return) of the function. This is the ground work for having kprobes and function graph tracer using one code base. - Clean up of the histogram code that will facilitate adding more features to the histograms in the future. - Addition of str_has_prefix() and a few use cases. There currently is a similar function strstart() that is used in a few places, but only returns a bool and not a length. These instances will be removed in the future to use str_has_prefix() instead. - A few other various clean ups as well. * tag 'trace-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits) tracing: Use the return of str_has_prefix() to remove open coded numbers tracing: Have the historgram use the result of str_has_prefix() for len of prefix tracing: Use str_has_prefix() instead of using fixed sizes tracing: Use str_has_prefix() helper for histogram code string.h: Add str_has_prefix() helper function tracing: Make function ‘ftrace_exports’ static tracing: Simplify printf'ing in seq_print_sym tracing: Avoid -Wformat-nonliteral warning tracing: Merge seq_print_sym_short() and seq_print_sym_offset() tracing: Add hist trigger comments for variable-related fields tracing: Remove hist trigger synth_var_refs tracing: Use hist trigger's var_ref array to destroy var_refs tracing: Remove open-coding of hist trigger var_ref management tracing: Use var_refs[] for hist trigger reference checking tracing: Change strlen to sizeof for hist trigger static strings tracing: Remove unnecessary hist trigger struct field tracing: Fix ftrace_graph_get_ret_stack() to use task and not current seq_buf: Use size_t for len in seq_buf_puts() seq_buf: Make seq_buf_puts() null-terminate the buffer arm64: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack ... |
||
Linus Torvalds
|
e3ed513bcf |
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"This is a revert for a lockup in cgroups-intense workloads - the real
fixes will come later"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix infinite loop in update_blocked_averages() by reverting
|
||
Linus Torvalds
|
c40f7d74c7 |
sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b
Zhipeng Xie, Xie XiuQi and Sargun Dhillon reported lockups in the scheduler under high loads, starting at around the v4.18 time frame, and Zhipeng Xie tracked it down to bugs in the rq->leaf_cfs_rq_list manipulation. Do a (manual) revert of: |
||
Nicholas Mc Guire
|
7faedcd4de |
kdb: use bool for binary state indicators
defcmd_in_progress is the state trace for command group processing - within a command group or not - usable is an indicator if a command set is valid (allocated/non-empty) - so use a bool for those binary indication here. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> |
||
Douglas Anderson
|
162bc7f5af |
kdb: Don't back trace on a cpu that didn't round up
If you have a CPU that fails to round up and then run 'btc' you'll end up crashing in kdb becaue we dereferenced NULL. Let's add a check. It's wise to also set the task to NULL when leaving the debugger so that if we fail to round up on a later entry into the debugger we won't backtrace a stale task. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> |
||
Douglas Anderson
|
87b0959285 |
kgdb: Don't round up a CPU that failed rounding up before
If we're using the default implementation of kgdb_roundup_cpus() that uses smp_call_function_single_async() we can end up hanging kgdb_roundup_cpus() if we try to round up a CPU that failed to round up before. Specifically smp_call_function_single_async() will try to wait on the csd lock for the CPU that we're trying to round up. If the previous round up never finished then that lock could still be held and we'll just sit there hanging. There's not a lot of use trying to round up a CPU that failed to round up before. Let's keep a flag that indicates whether the CPU started but didn't finish to round up before. If we see that flag set then we'll skip the next round up. In general we have a few goals here: - We never want to end up calling smp_call_function_single_async() when the csd is still locked. This is accomplished because flush_smp_call_function_queue() unlocks the csd _before_ invoking the callback. That means that when kgdb_nmicallback() runs we know for sure the the csd is no longer locked. Thus when we set "rounding_up = false" we know for sure that the csd is unlocked. - If there are no timeouts rounding up we should never skip a round up. NOTE #1: In general trying to continue running after failing to round up CPUs doesn't appear to be supported in the debugger. When I simulate this I find that kdb reports "Catastrophic error detected" when I try to continue. I can overrule and continue anyway, but it should be noted that we may be entering the land of dragons here. Possibly the "Catastrophic error detected" was added _because_ of the future failure to round up, but even so this is an area of the code that hasn't been strongly tested. NOTE #2: I did a bit of testing before and after this change. I introduced a 10 second hang in the kernel while holding a spinlock that I could invoke on a certain CPU with 'taskset -c 3 cat /sys/...". Before this change if I did: - Invoke hang - Enter debugger - g (which warns about Catastrophic error, g again to go anyway) - g - Enter debugger ...I'd hang the rest of the 10 seconds without getting a debugger prompt. After this change I end up in the debugger the 2nd time after only 1 second with the standard warning about 'Timed out waiting for secondary CPUs.' I'll also note that once the CPU finished waiting I could actually debug it (aka "btc" worked) I won't promise that everything works perfectly if the errant CPU comes back at just the wrong time (like as we're entering or exiting the debugger) but it certainly seems like an improvement. NOTE #3: setting 'kgdb_info[cpu].rounding_up = false' is in kgdb_nmicallback() instead of kgdb_call_nmi_hook() because some implementations override kgdb_call_nmi_hook(). It shouldn't hurt to have it in kgdb_nmicallback() in any case. NOTE #4: this logic is really only needed because there is no API call like "smp_try_call_function_single_async()" or "smp_csd_is_locked()". If such an API existed then we'd use it instead, but it seemed a bit much to add an API like this just for kgdb. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> |
||
Douglas Anderson
|
3cd99ac355 |
kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
When I had lockdep turned on and dropped into kgdb I got a nice splat on my system. Specifically it hit: DEBUG_LOCKS_WARN_ON(current->hardirq_context) Specifically it looked like this: sysrq: SysRq : DEBUG ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(current->hardirq_context) WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27 pstate: 604003c9 (nZCv DAIF +PAN -UAO) pc : lockdep_hardirqs_on+0xf0/0x160 ... Call trace: lockdep_hardirqs_on+0xf0/0x160 trace_hardirqs_on+0x188/0x1ac kgdb_roundup_cpus+0x14/0x3c kgdb_cpu_enter+0x53c/0x5cc kgdb_handle_exception+0x180/0x1d4 kgdb_compiled_brk_fn+0x30/0x3c brk_handler+0x134/0x178 do_debug_exception+0xfc/0x178 el1_dbg+0x18/0x78 kgdb_breakpoint+0x34/0x58 sysrq_handle_dbg+0x54/0x5c __handle_sysrq+0x114/0x21c handle_sysrq+0x30/0x3c qcom_geni_serial_isr+0x2dc/0x30c ... ... irq event stamp: ...45 hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4 hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130 softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34 softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100 ---[ end trace adf21f830c46e638 ]--- Looking closely at it, it seems like a really bad idea to be calling local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems like it could violate spinlock semantics and cause a deadlock. Instead, let's use a private csd alongside smp_call_function_single_async() to round up the other CPUs. Using smp_call_function_single_async() doesn't require interrupts to be enabled so we can remove the offending bit of code. In order to avoid duplicating this across all the architectures that use the default kgdb_roundup_cpus(), we'll add a "weak" implementation to debug_core.c. Looking at all the people who previously had copies of this code, there were a few variants. I've attempted to keep the variants working like they used to. Specifically: * For arch/arc we passed NULL to kgdb_nmicallback() instead of get_irq_regs(). * For arch/mips there was a bit of extra code around kgdb_nmicallback() NOTE: In this patch we will still get into trouble if we try to round up a CPU that failed to round up before. We'll try to round it up again and potentially hang when we try to grab the csd lock. That's not new behavior but we'll still try to do better in a future patch. Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> |
||
Douglas Anderson
|
9ef7fa507d |
kgdb: Remove irq flags from roundup
The function kgdb_roundup_cpus() was passed a parameter that was documented as: > the flags that will be used when restoring the interrupts. There is > local_irq_save() call before kgdb_roundup_cpus(). Nobody used those flags. Anyone who wanted to temporarily turn on interrupts just did local_irq_enable() and local_irq_disable() without looking at them. So we can definitely remove the flags. Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> |
||
Linus Torvalds
|
769e47094d |
Kconfig updates for v4.21
- support -y option for merge_config.sh to avoid downgrading =y to =m - remove S_OTHER symbol type, and touch include/config/*.h files correctly - fix file name and line number in lexer warnings - fix memory leak when EOF is encountered in quotation - resolve all shift/reduce conflicts of the parser - warn no new line at end of file - make 'source' statement more strict to take only string literal - rewrite the lexer and remove the keyword lookup table - convert to SPDX License Identifier - compile C files independently instead of including them from zconf.y - fix various warnings of gconfig - misc cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJcJieuAAoJED2LAQed4NsGHlIP/1s0fQ86XD9dIMyHzAO0gh2f 7rylfe2kEXJgIzJ0DyZdLu4iZtwbkEUqTQrRS1abriNGVemPkfBAnZdM5d92lOQX 3iREa700AJ2xo7V7gYZ6AbhZoG3p0S9U9Q2qE5S+tFTe8c2Gy4xtjnODF+Vel85r S0P8tF5sE1/d00lm+yfMI/CJVfDjyNaMm+aVEnL0kZTPiRkaktjWgo6Fc2p4z1L5 HFmMMP6/iaXmRZ+tHJGPQ2AT70GFVZw5ePxPcl50EotUP25KHbuUdzs8wDpYm3U/ rcESVsIFpgqHWmTsdBk6dZk0q8yFZNkMlkaP/aYukVZpUn/N6oAXgTFckYl8dmQL fQBkQi6DTfr9EBPVbj18BKm7xI3Y4DdQ2fzTfYkJ2XwNRGFA5r9N3sjd7ZTVGjxC aeeMHCwvGdSx1x8PeZAhZfsUHW8xVDMSQiT713+ljBY+6cwzA+2NF0kP7B6OAqwr ETFzd4Xu2/lZcL7gQRH8WU3L2S5iedmDG6RnZgJMXI0/9V4qAA+nlsWaCgnl1TgA mpxYlLUMrd6AUJevE34FlnyFdk8IMn9iKRFsvF0f3doO5C7QzTVGqFdJu5a0CuWO 4NBJvZjFT8/4amoWLfnDlfApWXzTfwLbKG+r6V2F30fLuXpYg5LxWhBoGRPYLZSq oi4xN1Mpx3TvXz6WcKVZ =r3Fl -----END PGP SIGNATURE----- Merge tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kconfig updates from Masahiro Yamada: - support -y option for merge_config.sh to avoid downgrading =y to =m - remove S_OTHER symbol type, and touch include/config/*.h files correctly - fix file name and line number in lexer warnings - fix memory leak when EOF is encountered in quotation - resolve all shift/reduce conflicts of the parser - warn no new line at end of file - make 'source' statement more strict to take only string literal - rewrite the lexer and remove the keyword lookup table - convert to SPDX License Identifier - compile C files independently instead of including them from zconf.y - fix various warnings of gconfig - misc cleanups * tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits) kconfig: surround dbg_sym_flags with #ifdef DEBUG to fix gconf warning kconfig: split images.c out of qconf.cc/gconf.c to fix gconf warnings kconfig: add static qualifiers to fix gconf warnings kconfig: split the lexer out of zconf.y kconfig: split some C files out of zconf.y kconfig: convert to SPDX License Identifier kconfig: remove keyword lookup table entirely kconfig: update current_pos in the second lexer kconfig: switch to ASSIGN_VAL state in the second lexer kconfig: stop associating kconf_id with yylval kconfig: refactor end token rules kconfig: stop supporting '.' and '/' in unquoted words treewide: surround Kconfig file paths with double quotes microblaze: surround string default in Kconfig with double quotes kconfig: use T_WORD instead of T_VARIABLE for variables kconfig: use specific tokens instead of T_ASSIGN for assignments kconfig: refactor scanning and parsing "option" properties kconfig: use distinct tokens for type and default properties kconfig: remove redundant token defines kconfig: rename depends_list to comment_option_list ... |
||
Linus Torvalds
|
6f9d71c9c7 |
Merge branch 'for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: - Waiman's cgroup2 cpuset support has been finally merged closing one of the last remaining feature gaps. - cgroup.procs could show non-leader threads when cgroup2 threaded mode was used in certain ways. I forgot to push the fix during the last cycle. - A patch to fix mount option parsing when all mount options have been consumed by someone else (LSM). - cgroup_no_v1 boot param can now block named cgroup1 hierarchies too. * 'for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Add named hierarchy disabling to cgroup_no_v1 boot param cgroup: fix parsing empty mount option string cpuset: Remove set but not used variable 'cs' cgroup: fix CSS_TASK_ITER_PROCS cgroup: Add .__DEBUG__. prefix to debug file names cpuset: Minor cgroup2 interface updates cpuset: Expose cpuset.cpus.subpartitions with cgroup_debug cpuset: Add documentation about the new "cpuset.sched.partition" flag cpuset: Use descriptive text when reading/writing cpuset.sched.partition cpuset: Expose cpus.effective and mems.effective on cgroup v2 root cpuset: Make generate_sched_domains() work with partition cpuset: Make CPU hotplug work with partition cpuset: Track cpusets that use parent's effective_cpus cpuset: Add an error state to cpuset.sched.partition cpuset: Add new v2 cpuset.sched.partition flag cpuset: Simply allocation and freeing of cpumasks cpuset: Define data structures to support scheduling partition cpuset: Enable cpuset controller in default hierarchy cgroup: remove unnecessary unlikely() |
||
Linus Torvalds
|
b07039b79c |
Driver core patches for 4.21-rc1
Here is the "big" set of driver core patches for 4.21-rc1. It's not really big, just a number of small changes for some reported issues, some documentation updates to hopefully make it harder for people to abuse the driver model, and some other minor cleanups. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXCY/dA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylZrgCeIi+rWj0mqlyKZk0A+gurH2BPmfwAniGfiHJp w60Fr5/EbCqUr1d1wQIO =4N7R -----END PGP SIGNATURE----- Merge tag 'driver-core-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core patches for 4.21-rc1. It's not really big, just a number of small changes for some reported issues, some documentation updates to hopefully make it harder for people to abuse the driver model, and some other minor cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: mm, memory_hotplug: update a comment in unregister_memory() component: convert to DEFINE_SHOW_ATTRIBUTE sysfs: Disable lockdep for driver bind/unbind files driver core: Add missing dev->bus->need_parent_lock checks kobject: return error code if writing /sys/.../uevent fails driver core: Move async_synchronize_full call driver core: platform: Respect return code of platform_device_register_full() kref/kobject: Improve documentation drivers/base/memory.c: Use DEVICE_ATTR_RO and friends driver core: Replace simple_strto{l,ul} by kstrtou{l,ul} kernfs: Improve kernfs_notify() poll notification latency kobject: Fix warnings in lib/kobject_uevent.c kobject: drop unnecessary cast "%llu" for u64 driver core: fix comments for device_block_probing() driver core: Replace simple_strtol by kstrtoint |
||
Linus Torvalds
|
f346b0becb |
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton: - large KASAN update to use arm's "software tag-based mode" - a few misc things - sh updates - ocfs2 updates - just about all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits) kernel/fork.c: mark 'stack_vm_area' with __maybe_unused memcg, oom: notify on oom killer invocation from the charge path mm, swap: fix swapoff with KSM pages include/linux/gfp.h: fix typo mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization memory_hotplug: add missing newlines to debugging output mm: remove __hugepage_set_anon_rmap() include/linux/vmstat.h: remove unused page state adjustment macro mm/page_alloc.c: allow error injection mm: migrate: drop unused argument of migrate_page_move_mapping() blkdev: avoid migration stalls for blkdev pages mm: migrate: provide buffer_migrate_page_norefs() mm: migrate: move migrate_page_lock_buffers() mm: migrate: lock buffers before migrate_page_move_mapping() mm: migration: factor out code to compute expected number of page references mm, page_alloc: enable pcpu_drain with zone capability kmemleak: add config to select auto scan mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init ... |
||
Linus Torvalds
|
af7ddd8a62 |
DMA mapping updates for Linux 4.21
A huge update this time, but a lot of that is just consolidating or removing code: - provide a common DMA_MAPPING_ERROR definition and avoid indirect calls for dma_map_* error checking - use direct calls for the DMA direct mapping case, avoiding huge retpoline overhead for high performance workloads - merge the swiotlb dma_map_ops into dma-direct - provide a generic remapping DMA consistent allocator for architectures that have devices that perform DMA that is not cache coherent. Based on the existing arm64 implementation and also used for csky now. - improve the dma-debug infrastructure, including dynamic allocation of entries (Robin Murphy) - default to providing chaining scatterlist everywhere, with opt-outs for the few architectures (alpha, parisc, most arm32 variants) that can't cope with it - misc sparc32 dma-related cleanups - remove the dma_mark_clean arch hook used by swiotlb on ia64 and replace it with the generic noncoherent infrastructure - fix the return type of dma_set_max_seg_size (Niklas Söderlund) - move the dummy dma ops for not DMA capable devices from arm64 to common code (Robin Murphy) - ensure dma_alloc_coherent returns zeroed memory to avoid kernel data leaks through userspace. We already did this for most common architectures, but this ensures we do it everywhere. dma_zalloc_coherent has been deprecated and can hopefully be removed after -rc1 with a coccinelle script. -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlwctQgLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYMxgQ//dBpAfS4/J76CdAbYry2zqgcOUU9hIrD6NHiEMWov ltJxyvEl3LsUmIdEj3aCrYL9jZN0qsnCzn5BVj2c3jDIVgD64fAr7HDf/PbEEfKb j6/GgEnVLPZV+sQMvhNA5jOzHrkseaqPa4/pNLFZ/l8jnuZ2d+btusDWJpMoVDer TXVwtIfgeIu0gTygYOShLYXd5qptWKWsZEpbTZOO2sE6+x+ZJX7yQYUxYDTlcOIj JWVO2l5QNHPc5T9o2at+6L5aNUvnZOxT79sWgyZLn0Kc+FagKAVwfLqUEl0v7foG 8k/xca5/8p3afB1DfrIrtplJqis7cVgdyGxriwuuoO8X4F0nPyWwpGmxsBhrWwwl xTqC4UorEJ7QwoP6Azopk/vYI2QXIUBLjuCJCuFXZj9+2BGf4IfvBY1S2cLM9qLs HMcxQonuXJii044KEFS96ePEuiT+igVINweIFBKWcgNCEG0UQtyL6RQ1U5297ipF JiWZAqD+p9X52UdKS+oKfAiZEekMXn6Xyo97+YCiNpfOo0GP5eEcwhL+JpY4AiRq apPXtsRy2o1s8yfjdraUIM2Mc2n62vFKb35oUbGCd/QO9piPrFQHl6T0HHcHk4YR XrUXcHieFZBCYqh7ZVa4RL8Msq1wvGuTL4Dxl43mXdsMoUFRR6eSNWLoAV4IpOLZ WgA= =in72 -----END PGP SIGNATURE----- Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping Pull DMA mapping updates from Christoph Hellwig: "A huge update this time, but a lot of that is just consolidating or removing code: - provide a common DMA_MAPPING_ERROR definition and avoid indirect calls for dma_map_* error checking - use direct calls for the DMA direct mapping case, avoiding huge retpoline overhead for high performance workloads - merge the swiotlb dma_map_ops into dma-direct - provide a generic remapping DMA consistent allocator for architectures that have devices that perform DMA that is not cache coherent. Based on the existing arm64 implementation and also used for csky now. - improve the dma-debug infrastructure, including dynamic allocation of entries (Robin Murphy) - default to providing chaining scatterlist everywhere, with opt-outs for the few architectures (alpha, parisc, most arm32 variants) that can't cope with it - misc sparc32 dma-related cleanups - remove the dma_mark_clean arch hook used by swiotlb on ia64 and replace it with the generic noncoherent infrastructure - fix the return type of dma_set_max_seg_size (Niklas Söderlund) - move the dummy dma ops for not DMA capable devices from arm64 to common code (Robin Murphy) - ensure dma_alloc_coherent returns zeroed memory to avoid kernel data leaks through userspace. We already did this for most common architectures, but this ensures we do it everywhere. dma_zalloc_coherent has been deprecated and can hopefully be removed after -rc1 with a coccinelle script" * tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping: (73 commits) dma-mapping: fix inverted logic in dma_supported dma-mapping: deprecate dma_zalloc_coherent dma-mapping: zero memory returned from dma_alloc_* sparc/iommu: fix ->map_sg return value sparc/io-unit: fix ->map_sg return value arm64: default to the direct mapping in get_arch_dma_ops PCI: Remove unused attr variable in pci_dma_configure ia64: only select ARCH_HAS_DMA_COHERENT_TO_PFN if swiotlb is enabled dma-mapping: bypass indirect calls for dma-direct vmd: use the proper dma_* APIs instead of direct methods calls dma-direct: merge swiotlb_dma_ops into the dma_direct code dma-direct: use dma_direct_map_page to implement dma_direct_map_sg dma-direct: improve addressability error reporting swiotlb: remove dma_mark_clean swiotlb: remove SWIOTLB_MAP_ERROR ACPI / scan: Refactor _CCA enforcement dma-mapping: factor out dummy DMA ops dma-mapping: always build the direct mapping code dma-mapping: move dma_cache_sync out of line dma-mapping: move various slow path functions out of line ... |
||
Linus Torvalds
|
0e9da3fbf7 |
for-4.21/block-20181221
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlwb7R8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpjiID/97oDjMhNT7rwpuMbHw855h62j1hEN/m+N3 FI0uxivYoYZLD+eJRnMcBwHlKjrCX8iJQAcv9ffI3ThtFW7dnZT3atUacaZVR/Dt IrxdymdBP3qsmuaId5NYBug7rJ+AiqFJKjEvCcSPu5X397J4I3SEbzhfvYLJ/aZX 16o0HJlVVIrcbmq1IP4HwiIIOaKXvPaw04L4z4fpeynRSWG7EAi8NLSnhlR4Rxbb BTiMkCTsjRCFdyO6da4fvNQKWmPGPa3bJkYy3qR99cvJCeIbQjRyCloQlWNJRRgi 3eJpCHVxqFmN0/+DNTJVQEEr4H8o0AVucrLVct1Jc4pessenkpoUniP8vELqwlng Z2VHLkhTfCEmvFlk82grrYdNvGATRsrbswt/PlP4T7rBfr1IpDk8kXDWF59EL2dy ly35Sk3wJGHBl8qa+vEPXOAnaWdqJXuVGpwB4ifOIatOls8mOxwfZjiRc7x05/fC 1O4rR2IfLwRqwoYHs0AJ+h6ohOSn1mkGezl2Tch1VSFcJUOHmuYvraTaUi6hblpA SslaAoEhO39hRBL0HsvsMeqVWM9uzqvFkLDCfNPdiA81H1258CIbo4vF8z6czCIS eeXnTJxVhPVbZgb3a1a93SPwM6KIDZFoIijyd+NqjpU94thlnhYD0QEcKJIKH7os 2p4aHs6ktw== =TRdW -----END PGP SIGNATURE----- Merge tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "This is the main pull request for block/storage for 4.21. Larger than usual, it was a busy round with lots of goodies queued up. Most notable is the removal of the old IO stack, which has been a long time coming. No new features for a while, everything coming in this week has all been fixes for things that were previously merged. This contains: - Use atomic counters instead of semaphores for mtip32xx (Arnd) - Cleanup of the mtip32xx request setup (Christoph) - Fix for circular locking dependency in loop (Jan, Tetsuo) - bcache (Coly, Guoju, Shenghui) * Optimizations for writeback caching * Various fixes and improvements - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith) * host and target support for NVMe over TCP * Error log page support * Support for separate read/write/poll queues * Much improved polling * discard OOM fallback * Tracepoint improvements - lightnvm (Hans, Hua, Igor, Matias, Javier) * Igor added packed metadata to pblk. Now drives without metadata per LBA can be used as well. * Fix from Geert on uninitialized value on chunk metadata reads. * Fixes from Hans and Javier to pblk recovery and write path. * Fix from Hua Su to fix a race condition in the pblk recovery code. * Scan optimization added to pblk recovery from Zhoujie. * Small geometry cleanup from me. - Conversion of the last few drivers that used the legacy path to blk-mq (me) - Removal of legacy IO path in SCSI (me, Christoph) - Removal of legacy IO stack and schedulers (me) - Support for much better polling, now without interrupts at all. blk-mq adds support for multiple queue maps, which enables us to have a map per type. This in turn enables nvme to have separate completion queues for polling, which can then be interrupt-less. Also means we're ready for async polled IO, which is hopefully coming in the next release. - Killing of (now) unused block exports (Christoph) - Unification of the blk-rq-qos and blk-wbt wait handling (Josef) - Support for zoned testing with null_blk (Masato) - sx8 conversion to per-host tag sets (Christoph) - IO priority improvements (Damien) - mq-deadline zoned fix (Damien) - Ref count blkcg series (Dennis) - Lots of blk-mq improvements and speedups (me) - sbitmap scalability improvements (me) - Make core inflight IO accounting per-cpu (Mikulas) - Export timeout setting in sysfs (Weiping) - Cleanup the direct issue path (Jianchao) - Export blk-wbt internals in block debugfs for easier debugging (Ming) - Lots of other fixes and improvements" * tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits) kyber: use sbitmap add_wait_queue/list_del wait helpers sbitmap: add helpers for add/del wait queue handling block: save irq state in blkg_lookup_create() dm: don't reuse bio for flushes nvme-pci: trace SQ status on completions nvme-rdma: implement polling queue map nvme-fabrics: allow user to pass in nr_poll_queues nvme-fabrics: allow nvmf_connect_io_queue to poll nvme-core: optionally poll sync commands block: make request_to_qc_t public nvme-tcp: fix spelling mistake "attepmpt" -> "attempt" nvme-tcp: fix endianess annotations nvmet-tcp: fix endianess annotations nvme-pci: refactor nvme_poll_irqdisable to make sparse happy nvme-pci: only set nr_maps to 2 if poll queues are supported nvmet: use a macro for default error location nvmet: fix comparison of a u16 with -1 blk-mq: enable IO poll if .nr_queues of type poll > 0 blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() blk-mq: skip zero-queue maps in blk_mq_map_swqueue ... |
||
Linus Torvalds
|
b12a9124ee |
y2038: more syscalls and cleanups
This concludes the main part of the system call rework for 64-bit time_t, which has spread over most of year 2018, the last six system calls being - ppoll - pselect6 - io_pgetevents - recvmmsg - futex - rt_sigtimedwait As before, nothing changes for 64-bit architectures, while 32-bit architectures gain another entry point that differs only in the layout of the timespec structure. Hopefully in the next release we can wire up all 22 of those system calls on all 32-bit architectures, which gives us a baseline version for glibc to start using them. This does not include the clock_adjtime, getrusage/waitid, and getitimer/setitimer system calls. I still plan to have new versions of those as well, but they are not required for correct operation of the C library since they can be emulated using the old 32-bit time_t based system calls. Aside from the system calls, there are also a few cleanups here, removing old kernel internal interfaces that have become unused after all references got removed. The arch/sh cleanups are part of this, there were posted several times over the past year without a reaction from the maintainers, while the corresponding changes made it into all other architectures. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJcHCCRAAoJEGCrR//JCVInkqsP/3TuLgSyQwolFRXcoBOjR1Ar JoX33GuDlAxHSqPadButVfflmRIWvL3aNMFFwcQM4uYgQ593FoHbmnusCdFgHcQ7 Q13pGo7szbfEFxydhnDMVust/hxd5C9Y5zNSJ+eMLGLLJXosEyjd9YjRoHDROWal oDLqpPCArlLN1B1XFhjH8J847+JgS+hUrAfk3AOU0B2TuuFkBnRImlCGCR5JcgPh XIpHRBOgEMP4kZ3LjztPfS3v/XJeGrguRcbD3FsPKdPeYO9QRUiw0vahEQRr7qXL 9hOgDq1YHPUQeUFhy3hJPCZdsDFzWoIE7ziNkZCZvGBw+qSw9i8KChGUt6PcSNlJ nqKJY5Wneb4svu+kOdK7d8ONbTdlVYvWf5bj/sKoNUA4BVeIjNcDXplvr3cXiDzI e40CcSQ3oLEvrIxMcoyNPPG63b+FYG8nMaCOx4dB4pZN7sSvZUO9a1DbDBtzxMON xy5Kfk1n5gIHcfBJAya5CnMQ1Jm4FCCu/LHVanYvb/nXA/2jEegSm24Md17icE/Q VA5jJqIdICExor4VHMsG0lLQxBJsv/QqYfT2OCO6Oykh28mjFqf+X+9Ctz1w6KVG VUkY1u97x8jB0M4qolGO7ZGn6P1h0TpNVFD1zDNcDt2xI63cmuhgKWiV2pv5b7No ty6insmmbJWt3tOOPyfb =yIAT -----END PGP SIGNATURE----- Merge tag 'y2038-for-4.21' of ssh://gitolite.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 updates from Arnd Bergmann: "More syscalls and cleanups This concludes the main part of the system call rework for 64-bit time_t, which has spread over most of year 2018, the last six system calls being - ppoll - pselect6 - io_pgetevents - recvmmsg - futex - rt_sigtimedwait As before, nothing changes for 64-bit architectures, while 32-bit architectures gain another entry point that differs only in the layout of the timespec structure. Hopefully in the next release we can wire up all 22 of those system calls on all 32-bit architectures, which gives us a baseline version for glibc to start using them. This does not include the clock_adjtime, getrusage/waitid, and getitimer/setitimer system calls. I still plan to have new versions of those as well, but they are not required for correct operation of the C library since they can be emulated using the old 32-bit time_t based system calls. Aside from the system calls, there are also a few cleanups here, removing old kernel internal interfaces that have become unused after all references got removed. The arch/sh cleanups are part of this, there were posted several times over the past year without a reaction from the maintainers, while the corresponding changes made it into all other architectures" * tag 'y2038-for-4.21' of ssh://gitolite.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: timekeeping: remove obsolete time accessors vfs: replace current_kernel_time64 with ktime equivalent timekeeping: remove timespec_add/timespec_del timekeeping: remove unused {read,update}_persistent_clock sh: remove board_time_init() callback sh: remove unused rtc_sh_get/set_time infrastructure sh: sh03: rtc: push down rtc class ops into driver sh: dreamcast: rtc: push down rtc class ops into driver y2038: signal: Add compat_sys_rt_sigtimedwait_time64 y2038: signal: Add sys_rt_sigtimedwait_time32 y2038: socket: Add compat_sys_recvmmsg_time64 y2038: futex: Add support for __kernel_timespec y2038: futex: Move compat implementation into futex.c io_pgetevents: use __kernel_timespec pselect6: use __kernel_timespec ppoll: use __kernel_timespec signal: Add restore_user_sigmask() signal: Add set_user_sigmask() |
||
Matthew Wilcox
|
1a80dade01 |
Fix failure path in alloc_pid()
The failure path removes the allocated PIDs from the wrong namespace.
This could lead to us inadvertently reusing PIDs in the leaf namespace
and leaking PIDs in parent namespaces.
Fixes:
|
||
YueHaibing
|
0f4991e8fd |
kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
Fixes gcc '-Wunused-but-set-variable' warning when CONFIG_VMAP_STACK is not set: kernel/fork.c: In function 'dup_task_struct': kernel/fork.c:843:20: warning: variable 'stack_vm_area' set but not used [-Wunused-but-set-variable] Link: http://lkml.kernel.org/r/1545965190-2381-1-git-send-email-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |