linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-05 02:36:53 +07:00

Author	SHA1	Message	Date
Arnaldo Carvalho de Melo	fad2918ed5	perf report: Move logic to warn about kptr_restrict'ed kernels to separate function Its too big, better have a separate function for it so that the main logic gets shorter/clearer. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-ahh6vfzyh8fsygjwrsbroeu0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:25 -03:00
Cody P Schafer	88aca8d966	tools perf: Comment typo fix s/temr/term/ Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1389199434-21761-1-git-send-email-cody@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:24 -03:00
Andi Kleen	8f3dd2b096	perf stat: Fix --delay option in man page The --delay option was documented as --initial-delay in the manpage. Fix this. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1389132847-31982-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:24 -03:00
Jiri Olsa	a18382b68f	perf tools: Make perf_event__synthesize_mmap_events global Making perf_event__synthesize_mmap_events global, it will be used in following patch from test code. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1389098853-14466-4-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:24 -03:00
Jiri Olsa	14bd6d20fe	perf machine: Fix id_hdr_size initialization The id_hdr_size field was not properly initialized, set it to zero, as the machine struct may have come from some non zeroing allocation routine or from the stack without any field being initialized. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1389098853-14466-3-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:24 -03:00
Jiri Olsa	c4eb6c0e7a	perf tools: Automate setup of FEATURE_CHECK_(C\|LD)FLAGS-all variables Instead of explicitly adding same value into FEATURE_CHECK_(C\|LD)FLAGS-all variables we can do that automatically. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1389098853-14466-2-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:24 -03:00
Arnaldo Carvalho de Melo	98eafce6bd	perf trace: Pack 'struct trace' Initial struct stats: /* size: 368, cachelines: 6, members: 24 / / sum members: 353, holes: 3, sum holes: 15 / / last cacheline: 48 bytes / After reorg: [acme@ssdandy linux]$ pahole -C trace ~/bin/trace \| tail -4 / size: 360, cachelines: 6, members: 24 / / padding: 7 / / last cacheline: 40 bytes */ }; [acme@ssdandy linux]$ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-6jimc80yu89qkx6zb8465s6t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:23 -03:00
Arnaldo Carvalho de Melo	3ba4d2e1a8	perf header: Pack 'struct perf_session_env' Initial struct: [acme@ssdandy linux]$ pahole -C perf_session_env ~/bin/perf struct perf_session_env { char * hostname; /* 0 8 / char os_release; /* 8 8 / char version; /* 16 8 / char arch; /* 24 8 / int nr_cpus_online; / 32 4 / int nr_cpus_avail; / 36 4 / char cpu_desc; /* 40 8 / char cpuid; /* 48 8 / long long unsigned int total_mem; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / int nr_cmdline; / 64 4 / / XXX 4 bytes hole, try to pack / char cmdline; /* 72 8 / int nr_sibling_cores; / 80 4 / / XXX 4 bytes hole, try to pack / char sibling_cores; /* 88 8 / int nr_sibling_threads; / 96 4 / / XXX 4 bytes hole, try to pack / char sibling_threads; /* 104 8 / int nr_numa_nodes; / 112 4 / / XXX 4 bytes hole, try to pack / char numa_nodes; /* 120 8 / / --- cacheline 2 boundary (128 bytes) --- / int nr_pmu_mappings; / 128 4 / / XXX 4 bytes hole, try to pack / char pmu_mappings; /* 136 8 / int nr_groups; / 144 4 / / size: 152, cachelines: 3, members: 20 / / sum members: 128, holes: 5, sum holes: 20 / / padding: 4 / / last cacheline: 24 bytes / }; [acme@ssdandy linux]$ [acme@ssdandy linux]$ pahole -C perf_session_env --reorganize --show_reorg_steps ~/bin/perf \| grep ^/ \| grep -v Final / Moving 'nr_sibling_cores' from after 'cmdline' to after 'nr_cmdline' / / Moving 'nr_numa_nodes' from after 'sibling_threads' to after 'nr_sibling_threads' / / Moving 'nr_groups' from after 'pmu_mappings' to after 'nr_pmu_mappings' / [acme@ssdandy linux]$ Final struct stats: [acme@ssdandy linux]$ pahole -C perf_session_env --reorganize --show_reorg_steps ~/bin/perf \| tail -4 / --- cacheline 2 boundary (128 bytes) --- / / size: 128, cachelines: 2, members: 20 / }; / saved 24 bytes and 1 cacheline! */ [acme@ssdandy linux]$ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-3d9tshamloinzxcqeb7mtd1n@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:23 -03:00
Jiri Olsa	9bb8e5edcf	tools lib traceevent: Shut up plugins make message Getting rid of following build output: $ make O=/tmp/build/perf -C tools/perf/ install-bin ... make[3]: Nothing to be done for `plugins'. make[2]: Nothing to be done for `plugins'. ... which triggers when traceevent library needs to be rebuilt, but we have plugins built already. Adding extra 'plugins' target with nop which is visible and triggers in both Makefile parts (for detached output directory (O=...) the traceevent Makefile spawns sub make for the build itself). Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1388595050-23005-2-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:23 -03:00
Jiri Olsa	198430b56d	tools lib traceevent: Replace tabs with spaces for all non-commands statements The tabbed indentation in non-commands statements could be sometimes considered as follow up for the rule command in the Makefile. This error is hard to find, so as a precaution replacing tabs with spaces for all non-commands statements. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://marc.info/?t=136484403900003&r=1&w=2 Link: http://lkml.kernel.org/r/20140102095304.GA1196@krava.brq.redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:23 -03:00
Jiri Olsa	f7c6447424	perf tests: Fix installation tests path setup Currently installation tests work only over x86_64, adding arch check to make it work over i386 as well. NOTE looks like x86 is the only arch running tests, we need some IS_(32/64) flag to make this generic. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1388759553-12974-2-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:23 -03:00
Jiri Olsa	a6cf5f3923	perf tools: Move arch setup into seprate Makefile I need to use arch related setup in the tests/make, so moving arch setup into Makefile.arch. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1388759553-12974-1-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:22 -03:00
Arnaldo Carvalho de Melo	41cde47675	perf stat: Remove misplaced __maybe_unused That 'argc' argument _is_ being used. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-t2gsxc15zulkorieg8zq996o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:22 -03:00
Arnaldo Carvalho de Melo	2d4352c077	perf tests: Fixup leak on error path in parse events test We need to call the evlist destructor when failing to parse events. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-ilslu69s7v7bpvdgqtrlp8f5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:22 -03:00
Arnaldo Carvalho de Melo	983874d173	perf evlist: Auto unmap on destructor Removing further boilerplate after making sure perf_evlist__munmap can be called multiple times for the same evlist. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-o0luenuld4abupm4nmrgzm6f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:22 -03:00
Arnaldo Carvalho de Melo	f26e1c7cb2	perf evlist: Close fds on destructor Since it is safe to call perf_evlist__close() multiple times, autoclose it and remove the calls to the close from existing tools, reducing the tooling boilerplate. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-2kq9v7p1rude1tqxa0aue2tk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:22 -03:00
Arnaldo Carvalho de Melo	03ad9747c5	perf evlist: Move destruction of maps to evlist destructor Instead of requiring tools to do an extra destructor call just before calling perf_evlist__delete. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-0jd2ptzyikxb5wp7inzz2ah2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:21 -03:00
Arnaldo Carvalho de Melo	3e2be2da8f	perf record: Remove old evsel_list usage To be consistent with other places, use just 'evlist' for the evsel list variable, and since we have it in 'struct record', use it directly from there. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-396bnfvmlxrsj3o2tk47b8t1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:21 -03:00
Arnaldo Carvalho de Melo	735f7e0bbe	perf evlist: Move the SIGUSR1 error reporting logic to prepare_workload So that we have the boilerplate in the preparation method, instead of open coded in tools wanting the reporting when the exec fails. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-purbdzcphdveskh7wwmnm4t7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:21 -03:00
Arnaldo Carvalho de Melo	f33cbe72e6	perf evlist: Send the errno in the signal when workload fails When a tool uses perf_evlist__start_workload and the supplied workload fails (e.g.: its binary wasn't found), perror was being used to print the error reason. This is undesirable, as the caller may be a GUI, when it wants to have total control of the error reporting process. So move to using sigaction(SA_SIGINFO) + siginfo_t->sa_value->sival_int to communicate to the caller the errno and let it print it using the UI of its choosing. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-epgcv7kjq8ll2udqfken92pz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:21 -03:00
Arnaldo Carvalho de Melo	6af206fd91	perf stat: Don't show counter information when workload fails When starting a workload 'stat' wasn't using prepare_workload evlist method's signal based exec() error reporting mechanism. Use it so that the we don't report 'not counted' counters. Before: [acme@zoo linux]$ perf stat dfadsfa dfadsfa: No such file or directory Performance counter stats for 'dfadsfa': <not counted> task-clock <not counted> context-switches <not counted> cpu-migrations <not counted> page-faults <not counted> cycles <not counted> stalled-cycles-frontend <not supported> stalled-cycles-backend <not counted> instructions <not counted> branches <not counted> branch-misses 0.001831462 seconds time elapsed [acme@zoo linux]$ After: [acme@zoo linux]$ perf stat dfadsfa dfadsfa: No such file or directory [acme@zoo linux]$ Reported-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-5yui3bv7e3hitxucnjsn6z8q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2014-01-13 10:06:21 -03:00
Davidlohr Bueso	b0c29f79ec	futexes: Avoid taking the hb->lock if there's nothing to wake up In futex_wake() there is clearly no point in taking the hb->lock if we know beforehand that there are no tasks to be woken. While the hash bucket's plist head is a cheap way of knowing this, we cannot rely 100% on it as there is a racy window between the futex_wait call and when the task is actually added to the plist. To this end, we couple it with the spinlock check as tasks trying to enter the critical region are most likely potential waiters that will be added to the plist, thus preventing tasks sleeping forever if wakers don't acknowledge all possible waiters. Furthermore, the futex ordering guarantees are preserved, ensuring that waiters either observe the changed user space value before blocking or is woken by a concurrent waker. For wakers, this is done by relying on the barriers in get_futex_key_refs() -- for archs that do not have implicit mb in atomic_inc(), we explicitly add them through a new futex_get_mm function. For waiters we rely on the fact that spin_lock calls already update the head counter, so spinners are visible even if the lock hasn't been acquired yet. For more details please refer to the updated comments in the code and related discussion: https://lkml.org/lkml/2013/11/26/556 Special thanks to tglx for careful review and feedback. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Scott Norton <scott.norton@hp.com> Cc: Tom Vaden <tom.vaden@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Jason Low <jason.low2@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1389569486-25487-5-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:45:21 +01:00
Thomas Gleixner	99b60ce697	futexes: Document multiprocessor ordering guarantees That's essential, if you want to hack on futexes. Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Scott Norton <scott.norton@hp.com> Cc: Tom Vaden <tom.vaden@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Jason Low <jason.low2@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1389569486-25487-4-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:45:19 +01:00
Davidlohr Bueso	a52b89ebb6	futexes: Increase hash table size for better performance Currently, the futex global hash table suffers from its fixed, smallish (for today's standards) size of 256 entries, as well as its lack of NUMA awareness. Large systems, using many futexes, can be prone to high amounts of collisions; where these futexes hash to the same bucket and lead to extra contention on the same hb->lock. Furthermore, cacheline bouncing is a reality when we have multiple hb->locks residing on the same cacheline and different futexes hash to adjacent buckets. This patch keeps the current static size of 16 entries for small systems, or otherwise, 256 * ncpus (or larger as we need to round the number to a power of 2). Note that this number of CPUs accounts for all CPUs that can ever be available in the system, taking into consideration things like hotpluging. While we do impose extra overhead at bootup by making the hash table larger, this is a one time thing, and does not shadow the benefits of this patch. Furthermore, as suggested by tglx, by cache aligning the hash buckets we can avoid access across cacheline boundaries and also avoid massive cache line bouncing if multiple cpus are hammering away at different hash buckets which happen to reside in the same cache line. Also, similar to other core kernel components (pid, dcache, tcp), by using alloc_large_system_hash() we benefit from its NUMA awareness and thus the table is distributed among the nodes instead of in a single one. For a custom microbenchmark that pounds on the uaddr hashing -- making the wait path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of futexes, we can see the following benefits on a 80-core, 8-socket 1Tb server: +---------+--------------------+------------------------+-----------------------+-------------------------------+ \| threads \| baseline (ops/sec) \| aligned-only (ops/sec) \| large table (ops/sec) \| large table+aligned (ops/sec) \| +---------+--------------------+------------------------+-----------------------+-------------------------------+ \| 512 \| 32426 \| 50531 (+55.8%) \| 255274 (+687.2%) \| 292553 (+802.2%) \| \| 256 \| 65360 \| 99588 (+52.3%) \| 443563 (+578.6%) \| 508088 (+677.3%) \| \| 128 \| 125635 \| 200075 (+59.2%) \| 742613 (+491.1%) \| 835452 (+564.9%) \| \| 80 \| 193559 \| 323425 (+67.1%) \| 1028147 (+431.1%) \| 1130304 (+483.9%) \| \| 64 \| 247667 \| 443740 (+79.1%) \| 997300 (+302.6%) \| 1145494 (+362.5%) \| \| 32 \| 628412 \| 721401 (+14.7%) \| 965996 (+53.7%) \| 1122115 (+78.5%) \| +---------+--------------------+------------------------+-----------------------+-------------------------------+ Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Waiman Long <Waiman.Long@hp.com> Reviewed-and-tested-by: Jason Low <jason.low2@hp.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Scott Norton <scott.norton@hp.com> Cc: Tom Vaden <tom.vaden@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Link: http://lkml.kernel.org/r/1389569486-25487-3-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:45:18 +01:00
Jason Low	0d00c7b20c	futexes: Clean up various details - Remove unnecessary head variables. - Delete unused parameter in queue_unlock(). Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Scott Norton <scott.norton@hp.com> Cc: Tom Vaden <tom.vaden@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1389569486-25487-2-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:45:17 +01:00
Ingo Molnar	1c62448e39	Linux 3.13-rc8 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu 75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz +Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw= =lEeV -----END PGP SIGNATURE----- Merge tag 'v3.13-rc8' into core/locking Refresh the tree with the latest fixes, before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:44:41 +01:00
Michael Schmitz	a0b7b24226	m68k/irq - Use polled IRQ flag for MFP timer cascaded interrupts Some Atari hardware has no capacity to raise interrupts (e.g. network or USB adapter hardware attached via ROM port). The driver interrupt routine is called from a timer interrupt (timer D) in these cases, using chained device specific pseudo interrupts (IRQ_MFP_TIMER1 ff.) These interrupts will more often than not, return IRQ_NONE as there is not always work for the device handler when called. Too many unhandled interrupts will result in the interrupt being disabled by the stuck interrupt watchdog. As preferred option to flag interrupts as needing exclusion from the watchdog mechanism, tglx added the IRQ_IS_POLLED flag for use in such a case. Currently, two interrupts need to use this flag. Add more users as needed. Signed-off-by: Michael Schmitz <schmitz@debian.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>	2014-01-13 09:29:10 +01:00
Linus Torvalds	a6da83f982	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fix from Ben Herrenschmidt: "Here's one regression fix for 3.13 that I would appreciate if you could still pull in. It was an "interesting" one to debug, basically it's an old bug that got somewhat "exposed" by new code breaking the boot on PA Semi boards (yes, it does appear that some people are still using these!)" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc: Check return value of instance-to-package OF call	2014-01-13 10:59:05 +07:00
Linus Torvalds	061f49ec2d	Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Sorry, meant to push out this batch earlier this weekend" * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround ftrace/x86: Load ftrace_ops in parameter not the variable holding it	2014-01-13 07:28:49 +07:00
Benjamin Herrenschmidt	10348f5976	powerpc: Check return value of instance-to-package OF call On PA-Semi firmware, the instance-to-package callback doesn't seem to be implemented. We didn't check for error, however, thus subsequently passed the -1 value returned into stdout_node to thins like prom_getprop etc... Thus caused the firmware to load values around 0 (physical) internally as node structures. It somewhat "worked" as long as we had a NULL in the right place (address 8) at the beginning of the kernel, we didn't "see" the bug. But commit `5c0484e25e` "powerpc: Endian safe trampoline" changed the kernel entry point causing that old bug to now cause a crash early during boot. This fixes booting on PA-Semi board by properly checking the return value from instance-to-package. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Olof Johansson <olof@lixom.net> ---	2014-01-13 09:49:17 +11:00
Ingo Molnar	1341f3e4c0	User visible changes: Improvements: . Support showing source code, asking for variables to be collected at probe time and other 'perf probe' operations that use DWARF information. This supports only binaries with debugging information at this time, detached debuginfo (aka debuginfo packages) support should come in later patches. (Masami Hiramatsu) . Add a perf.data file header window in the 'perf report' TUI, associated with the 'i' hotkey, providing a counterpart to the --header option in the stdio UI. (Namhyung Kim) . Guest related improvements to 'perf kvm', including allowing to specify a directory with guest specific /proc information. (Dongsheng Yang) . Print session information only if --stdio is given (Namhyung Kim) Developer stuff: Fixes: . Get rid of a duplicate va_end() in error reporting (Namhyung Kim) . If a hist entry doesn't have symbol information, compare it with its address. Affects upcoming new feature (--cumulate) (Namhyung Kim) Improvements: . Make libtraceevent install target quieter (Jiri Olsa) . Make tests/make output more compact (Jiri Olsa) . Ignore generated files in feature-checks (Chunwei Chen) New APIs: . Introduce pevent_filter_strerror() in libtraceevent, similar in purpose to libc's strerror() function. (Namhyung Kim) Refactorings: . Use perf_data_file methods to write output file in 'record' and 'inject' (Jiri Olsa) . Use pr_() functions where applicable in 'report' (Namhyumg Kim) . Add 'machine' 'addr_location' struct to have full picture (machine, thread, map, symbol, addr) for a (partially) resolved address, reducing function signatures (Arnaldo Carvalho de Melo) . Reduce code duplication in the histogram entry creation/insertion. (Arnaldo Carvalho de Melo) . Auto allocate annotation histogram data structures, (Arnaldo Carvalho de Melo) . No need to test against NULL before calling free, also set freed memory in struct pointers to NULL, to help fixing use after free bugs. (Arnaldo Carvalho de Melo> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJSveX1AAoJENZQFvNTUqpAHsoP/R6raJVlZOQsOw/GxpWdazs/ 1Hca1YZ3JKKySrL15yPD19P2kp9B9TtF18zIdLtqymt2oWOz7r9uHP8KgUEdRXOn l055PlVFy8O2HWRJnK1By3tdtR9YzSZRGblX84mnXmrAGcogpA07jPD0oZtem+l0 9jC9szOcHRHXmlI1xgXEKBad9+P0Y+VXNQjzKQ2ZW5U44rISY0jpjMOmk//Rjwpz /mATGyzxFG8bNGt3Z4g/2MfJu4t6c6blilDyUFGvLCtUGdfQJ+f5uq3ayLeDWQxo iq8Lf3LBqAniTw14vh5TvfO2/Myz1QaJfcU/Y+rSv1+F/eDORCJuI2LaxU84xTCV euDtmlk/ro95QVJXPNgqYwLWcYv3cUu6Q82aPwiip5OwY76RezIBHNy23FpFIA+b BbVNS+BGUNqwFOTb20RzpH3af2BUog1wVShcBeeXRovDQLPyf+R31U3pKKNVMJP5 lU4YZM7eK6wjhVcQyxqftvay8XXzANwoyaKcJJWzTOJM8jUhZ3xFtDivL7tuXgYm SKTrEbNp89Ui7i14r+ABrPDHsS73eChK2/ylKfYT0I7VVx1JtA2KtIMEx+tmk3lh MWJNgW74X2DWZRRbnkJMvahXcY3t2cJ6lWantHoDz0QIlZnlkfAaPHSy/q7E6+B+ gZw0dnqM23vZLAheolDG =lh8y -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf updates from Arnaldo Carvalho de Melo: User visible changes: Improvements: Support showing source code, asking for variables to be collected at probe time and other 'perf probe' operations that use DWARF information. This supports only binaries with debugging information at this time, detached debuginfo (aka debuginfo packages) support should come in later patches. (Masami Hiramatsu) * Add a perf.data file header window in the 'perf report' TUI, associated with the 'i' hotkey, providing a counterpart to the --header option in the stdio UI. (Namhyung Kim) * Guest related improvements to 'perf kvm', including allowing to specify a directory with guest specific /proc information. (Dongsheng Yang) * Print session information only if --stdio is given (Namhyung Kim) Developer stuff: Fixes: * Get rid of a duplicate va_end() in error reporting (Namhyung Kim) * If a hist entry doesn't have symbol information, compare it with its address. Affects upcoming new feature (--cumulate) (Namhyung Kim) Improvements: * Make libtraceevent install target quieter (Jiri Olsa) * Make tests/make output more compact (Jiri Olsa) * Ignore generated files in feature-checks (Chunwei Chen) New APIs: * Introduce pevent_filter_strerror() in libtraceevent, similar in purpose to libc's strerror() function. (Namhyung Kim) Refactorings: * Use perf_data_file methods to write output file in 'record' and 'inject' (Jiri Olsa) * Use pr_() functions where applicable in 'report' (Namhyumg Kim) Add 'machine' 'addr_location' struct to have full picture (machine, thread, map, symbol, addr) for a (partially) resolved address, reducing function signatures (Arnaldo Carvalho de Melo) * Reduce code duplication in the histogram entry creation/insertion. (Arnaldo Carvalho de Melo) * Auto allocate annotation histogram data structures, (Arnaldo Carvalho de Melo) * No need to test against NULL before calling free, also set freed memory in struct pointers to NULL, to help fixing use after free bugs. (Arnaldo Carvalho de Melo> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 17:39:47 +01:00
Taras Kondratiuk	b25f3e1c35	ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling Kexec disables outer cache before jumping to reboot code, but it doesn't flush it explicitly. Flush is done implicitly inside of l2x0_disable(). But some SoC's override default .disable handler and don't flush cache. This may lead to a corrupted memory during Kexec reboot on these platforms. This patch adds cache flush inside of OMAP4 and Highbank outer_cache.disable() handlers to make it consistent with default l2x0_disable(). Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-01-12 14:15:27 +00:00
Prarit Bhargava	9345005f4e	x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs During heavy CPU-hotplug operations the following spurious kernel warnings can trigger: do_IRQ: No ... irq handler for vector (irq -1) [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ] When downing a cpu it is possible that there are unhandled irqs left in the APIC IRR register. The following code path shows how the problem can occur: 1. CPU 5 is to go down. 2. cpu_disable() on CPU 5 executes with interrupt flag cleared by local_irq_save() via stop_machine(). 3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because interrupt flag is cleared (CPU unabled to handle the irq) 4. IRQs are migrated off of CPU 5, and the vectors' irqs are set to -1. 5. stop_machine() finishes cpu_disable() 6. cpu_die() for CPU 5 executes in normal context. 7. CPU 5 attempts to handle IRQ 12 because the IRR is set for IRQ 12. The code attempts to find the vector's IRQ and cannot because it has been set to -1. 8. do_IRQ() warning displays warning about CPU 5 IRQ 12. I added a debug printk to output which CPU & vector was retriggered and discovered that that we are getting bogus events. I see a 100% correlation between this debug printk in fixup_irqs() and the do_IRQ() warning. This patchset resolves this by adding definitions for VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying the code to use them. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831 Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Rui Wang <rui.y.wang@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Yang Zhang <yang.z.zhang@Intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: janet.morgan@Intel.com Cc: tony.luck@Intel.com Cc: ruiv.wang@gmail.com Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com [ Cleaned up the code a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 13:13:02 +01:00
Linus Torvalds	7e22e91102	Linux 3.13-rc8	2014-01-12 17:04:18 +07:00
Steven Rostedt	3dc91d4338	SELinux: Fix possible NULL pointer dereference in selinux_inode_permission() While running stress tests on adding and deleting ftrace instances I hit this bug: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: selinux_inode_permission+0x85/0x160 PGD 63681067 PUD 7ddbe067 PMD 0 Oops: 0000 [#1] PREEMPT CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000 RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160 RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840 RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000 RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54 R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000 R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000 FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0 Call Trace: security_inode_permission+0x1c/0x30 __inode_permission+0x41/0xa0 inode_permission+0x18/0x50 link_path_walk+0x66/0x920 path_openat+0xa6/0x6c0 do_filp_open+0x43/0xa0 do_sys_open+0x146/0x240 SyS_open+0x1e/0x20 system_call_fastpath+0x16/0x1b Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff RIP selinux_inode_permission+0x85/0x160 CR2: 0000000000000020 Investigating, I found that the inode->i_security was NULL, and the dereference of it caused the oops. in selinux_inode_permission(): isec = inode->i_security; rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd); Note, the crash came from stressing the deletion and reading of debugfs files. I was not able to recreate this via normal files. But I'm not sure they are safe. It may just be that the race window is much harder to hit. What seems to have happened (and what I have traced), is the file is being opened at the same time the file or directory is being deleted. As the dentry and inode locks are not held during the path walk, nor is the inodes ref counts being incremented, there is nothing saving these structures from being discarded except for an rcu_read_lock(). The rcu_read_lock() protects against freeing of the inode, but it does not protect freeing of the inode_security_struct. Now if the freeing of the i_security happens with a call_rcu(), and the i_security field of the inode is not changed (it gets freed as the inode gets freed) then there will be no issue here. (Linus Torvalds suggested not setting the field to NULL such that we do not need to check if it is NULL in the permission check). Note, this is a hack, but it fixes the problem at hand. A real fix is to restructure the destroy_inode() to call all the destructor handlers from the RCU callback. But that is a major job to do, and requires a lot of work. For now, we just band-aid this bug with this fix (it works), and work on a more maintainable solution in the future. Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-12 16:53:13 +07:00
Hugh Dickins	eecc1e426d	thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only We see General Protection Fault on RSI in copy_page_rep: that RSI is what you get from a NULL struct page pointer. RIP: 0010:[<ffffffff81154955>] [<ffffffff81154955>] copy_page_rep+0x5/0x10 RSP: 0000:ffff880136e15c00 EFLAGS: 00010286 RAX: ffff880000000000 RBX: ffff880136e14000 RCX: 0000000000000200 RDX: 6db6db6db6db6db7 RSI: db73880000000000 RDI: ffff880dd0c00000 RBP: ffff880136e15c18 R08: 0000000000000200 R09: 000000000005987c R10: 000000000005987c R11: 0000000000000200 R12: 0000000000000001 R13: ffffea00305aa000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f195752f700(0000) GS:ffff880c7fc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000093010000 CR3: 00000001458e1000 CR4: 00000000000027e0 Call Trace: copy_user_huge_page+0x93/0xab do_huge_pmd_wp_page+0x710/0x815 handle_mm_fault+0x15d8/0x1d70 __do_page_fault+0x14d/0x840 do_page_fault+0x2f/0x90 page_fault+0x22/0x30 do_huge_pmd_wp_page() tests is_huge_zero_pmd(orig_pmd) four times: but since shrink_huge_zero_page() can free the huge_zero_page, and we have no hold of our own on it here (except where the fourth test holds page_table_lock and has checked pmd_same), it's possible for it to answer yes the first time, but no to the second or third test. Change all those last three to tests for NULL page. (Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin in https://lkml.org/lkml/2013/3/29/103. I believe that one is due to the source page being split, and a tail page freed, while copy is in progress; and not a problem without DEBUG_PAGEALLOC, since the pmd_same check will prevent a miscopy from being made visible.) Fixes: `97ae17497e` ("thp: implement refcounting for huge zero page") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org # v3.10 v3.11 v3.12 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-12 16:47:15 +07:00
Peter Zijlstra	47933ad41a	arch: Introduce smp_load_acquire(), smp_store_release() A number of situations currently require the heavyweight smp_mb(), even though there is no need to order prior stores against later loads. Many architectures have much cheaper ways to handle these situations, but the Linux kernel currently has no portable way to make use of them. This commit therefore supplies smp_load_acquire() and smp_store_release() to remedy this situation. The new smp_load_acquire() primitive orders the specified load against any subsequent reads or writes, while the new smp_store_release() primitive orders the specifed store against any prior reads or writes. These primitives allow array-based circular FIFOs to be implemented without an smp_mb(), and also allow a theoretical hole in rcu_assign_pointer() to be closed at no additional expense on most architectures. In addition, the RCU experience transitioning from explicit smp_read_barrier_depends() and smp_wmb() to rcu_dereference() and rcu_assign_pointer(), respectively resulted in substantial improvements in readability. It therefore seems likely that replacing other explicit barriers with smp_load_acquire() and smp_store_release() will provide similar benefits. It appears that roughly half of the explicit barriers in core kernel code might be so replaced. [Changelog by PaulMck] Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:37:17 +01:00
Peter Zijlstra	93ea02bb84	arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h We're going to be adding a few new barrier primitives, and in order to avoid endless duplication make more agressive use of asm-generic/barrier.h. Change the asm-generic/barrier.h such that it allows partial barrier definitions and fills out the rest with defaults. There are a few architectures (m32r, m68k) that could probably do away with their barrier.h file entirely but are kept for now due to their unconventional nop() implementation. Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131213150640.846368594@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:37:15 +01:00
Peter Zijlstra	1de7da377b	arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h Move the barriers functions that depend on the atomic implementation into the atomic implementation. Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Vineet Gupta <vgupta@synopsys.com> [for arch/arc bits] Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20131213150640.786183683@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:37:14 +01:00
Peter Zijlstra	2e4f5382d1	locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE The LOCK and UNLOCK barriers as described in our barrier document are generally known as ACQUIRE and RELEASE barriers in other literature. Since we plan to introduce the acquire and release nomenclature in generic kernel primitives we should amend the document to avoid confusion as to what an acquire/release means. Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131217092435.GC21999@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:37:13 +01:00
Ming Lei	518d00b749	block: null_blk: fix queue leak inside removing device When queue_mode is NULL_Q_MQ and null_blk is being removed, blk_cleanup_queue() isn't called to cleanup queue, so the queue allocated won't be freed. This patch calls blk_cleanup_queue() for MQ to drain all pending requests first and release the reference counter of queue kobject, then blk_mq_free_queue() will be called in queue kobject's release handler when queue kobject's reference counter drops to zero. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-12 16:22:42 +07:00
Yann Droneaud	a21b0b354d	perf: Introduce a flag to enable close-on-exec in perf_event_open() Unlike recent modern userspace API such as: epoll_create1 (EPOLL_CLOEXEC), eventfd (EFD_CLOEXEC), fanotify_init (FAN_CLOEXEC), inotify_init1 (IN_CLOEXEC), signalfd (SFD_CLOEXEC), timerfd_create (TFD_CLOEXEC), or the venerable general purpose open (O_CLOEXEC), perf_event_open() syscall lack a flag to atomically set FD_CLOEXEC (eg. close-on-exec) flag on file descriptor it returns to userspace. The present patch adds a PERF_FLAG_FD_CLOEXEC flag to allow perf_event_open() syscall to atomically set close-on-exec. Having this flag will enable userspace to remove the file descriptor from the list of file descriptors being inherited across exec, without the need to call fcntl(fd, F_SETFD, FD_CLOEXEC) and the associated race condition between the current thread and another thread calling fork(2) then execve(2). Links: - Secure File Descriptor Handling (Ulrich Drepper, 2008) http://udrepper.livejournal.com/20407.html - Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) http://danwalsh.livejournal.com/53603.html - Notes in DMA buffer sharing: leak and security hole http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/dma-buf-sharing.txt?id=v3.13-rc3#n428 Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/8c03f54e1598b1727c19706f3af03f98685d9fe6.1388952061.git.ydroneaud@opteya.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:16:59 +01:00
Stephane Eranian	f228c5b882	perf/x86/intel: Add Intel RAPL PP1 energy counter support This patch adds support for the Intel RAPL energy counter PP1 (Power Plane 1). On client processors, it usually corresponds to the energy consumption of the builtin graphic card. That is why the sysfs event is called energy-gpu. New event: - name: power/energy-gpu/ - code: event=0x4 - unit: 2^-32 Joules On processors without graphics, this should count 0. The patch only enables this event on client processors. Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389176153-3128-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:16:08 +01:00
Stephane Eranian	f3ae75de98	perf/x86: Fix active_entry initialization This patch fixes a problem with the initialization of the struct perf_event active_entry field. It is defined inside an anonymous union and was initialized in perf_event_alloc() using INIT_LIST_HEAD(). However at that time, we do not know whether the event is going to use active_entry or hlist_entry (SW). Or at last, we don't want to make that determination there. The problem is that hlist and list_head are not initialized the same way. One is okay with NULL (from kzmalloc), the other needs to pointers to point to self. This patch resolves this problem by dropping the union. This will avoid problems later on, if someone starts using active_entry or hlist_entry without verifying that they actually overlap. This also solves the initialization problem. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Cc: maria.n.dimakopoulou@gmail.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389176153-3128-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:16:07 +01:00
John Stultz	7a06c41cbe	sched_clock: Disable seqlock lockdep usage in sched_clock() Unfortunately the seqlock lockdep enablement can't be used in sched_clock(), since the lockdep infrastructure eventually calls into sched_clock(), which causes a deadlock. Thus, this patch changes all generic sched_clock() usage to use the raw_* methods. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Reported-by: Krzysztof Hałasa <khalasa@piap.pl> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:14:00 +01:00
John Stultz	0c3351d451	seqlock: Use raw_ prefix instead of _no_lockdep Linus disliked the _no_lockdep() naming, so instead use the more-consistent raw_* prefix to the non-lockdep enabled seqcount methods. This also adds raw_ methods for the write operations as well, which will be utilized in a following patch. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Krzysztof Hałasa <khalasa@piap.pl> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Willy Tarreau <w@1wt.eu> Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:13:59 +01:00
Rik van Riel	9722c2dac7	sched: Calculate effective load even if local weight is 0 Thomas Hellstrom bisected a regression where erratic 3D performance is experienced on virtual machines as measured by glxgears. It identified commit `58d081b5` ("sched/numa: Avoid overloading CPUs on a preferred NUMA node") as the problem which had modified the behaviour of effective_load. Effective load calculates the difference to the system-wide load if a scheduling entity was moved to another CPU. The task group is not heavier as a result of the move but overall system load can increase/decrease as a result of the change. Commit `58d081b5` ("sched/numa: Avoid overloading CPUs on a preferred NUMA node") changed effective_load to make it suitable for calculating if a particular NUMA node was compute overloaded. To reduce the cost of the function, it assumed that a current sched entity weight of 0 was uninteresting but that is not the case. wake_affine() uses a weight of 0 for sync wakeups on the grounds that it is assuming the waking task will sleep and not contribute to load in the near future. In this case, we still want to calculate the effective load of the sched entity hierarchy. As effective_load is no longer used by task_numa_compare since commit `fb13c7ee` (sched/numa: Use a system-wide search to find swap/migration candidates), this patch simply restores the historical behaviour. Reported-and-tested-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Rik van Riel <riel@redhat.com> [ Wrote changelog] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140106113912.GC6178@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 09:22:15 +01:00
Linus Torvalds	26bef1318a	x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog <me@halfdog.net> Tested-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-01-11 19:15:52 -08:00
Taras Kondratiuk	d6cd989477	ARM: 7939/1: traps: fix opcode endianness when read from user memory Currently code has an inverted logic: opcode from user memory is swapped to a proper endianness only in case of read error. While normally opcode should be swapped only if it was read correctly from user memory. Reviewed-by: Victor Kamensky <victor.kamensky@linaro.org> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-01-11 12:06:59 +00:00
Stephen Boyd	261521f142	ARM: 7937/1: perf_event: Silence sparse warning arch/arm/kernel/perf_event_cpu.c:274:25: warning: incorrect type in assignment (different modifiers) arch/arm/kernel/perf_event_cpu.c:274:25: expected int ( init_fn )( ... ) arch/arm/kernel/perf_event_cpu.c:274:25: got void const const data Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-01-11 12:06:58 +00:00

... 2 3 4 5 6 ...

415060 Commits