There is a potential kernel crash when the MMU notifier calls the
invalidation routines in the hfi1 pinned page caching code for sdma.
The invalidation routine could call the remove callback
for the node, which in turn ends up dereferencing the
current task_struct to get a pointer to the mm_struct.
However, the mm_struct pointer could be NULL resulting in
the following backtrace:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
15
task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000
RIP: 0010:[<ffffffffa041f75a>] [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
RSP: 0000:ffff88085c245878 EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830
RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0
RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb
R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0
R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40
FS: 0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0
Stack:
ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
Call Trace:
[<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
[<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
[<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
[<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
[<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
[<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
[<ffffffff81141fad>] try_to_unmap+0x4d/0x60
[<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
[<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
[<ffffffff81121491>] shrink_lruvec+0x4c1/0x640
[<ffffffff81121641>] shrink_zone+0x31/0x100
[<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
[<ffffffff811229e3>] kswapd+0x403/0x7e0
[<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
[<ffffffff81068ac0>] kthread+0xc0/0xd0
[<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
[<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
[<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
To correct this, the mm_struct passed to us by the MMU notifier is
used (which is what should have been done to begin with). This avoids
the broken derefences and ensures that the correct mm_struct is used.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To reduce the size of builtin-trace.c.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-11zxg3qitk6bw2x30135k9z4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With 'perf record --switch-output' without -a, record__synthesize() in
record__switch_output() won't generate tracking events because there's
no thread_map in evlist. Which causes newly created perf.data doesn't
contain map and comm information.
This patch creates a fake thread_map and directly call
perf_event__synthesize_thread_map() for those events.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1461178794-40467-8-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The cost of buildid cache processing is high: reading all events in
output perf.data, opening each elf file to read buildids then copying
them into ~/.debug directory. In switch output mode, these heavy works
block perf from receiving perf events for too long.
Enable no-buildid and no-buildid-cache by default if --switch-output is
provided. Still allow user use --no-no-buildid to explicitly enable
buildid in this case.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1461178794-40467-6-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
[ Updated man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Without this patch, the last output doesn't have timestamp appended if
--timestamp-filename is not explicitly provided. For example:
# perf record -a --switch-output &
[1] 11224
# kill -s SIGUSR2 11224
[ perf record: dump data: Woken up 1 times ]
# [ perf record: Dump perf.data.2015122622372823 ]
# fg
perf record -a --switch-output
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.027 MB perf.data (540 samples) ]
# ls -l
total 836
-rw------- 1 root root 33256 Dec 26 22:37 perf.data <---- *Odd*
-rw------- 1 root root 817156 Dec 26 22:37 perf.data.2015122622372823
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1461178794-40467-5-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
[ Updated man page, that also got an entry for --timestamp-filename ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
auxtrace_snapshot_state matches the trigger model. Use trigger to
implement it. auxtrace_snapshot_state and auxtrace_snapshot_err are
absorbed.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1461178794-40467-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use 'trigger' to model operations which need to be executed when an
event (a signal, for example) is observed.
States and transits:
OFF--(on)--> READY --(hit)--> HIT
^ |
| (ready)
| |
\_____________/
is_hit and is_ready are two key functions to query the state of a
trigger. is_hit means the event already happen; is_ready means the
trigger is waiting for the event.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1461178794-40467-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The error messages returned by this method should not have an ending
newline, fix the two cases where it was.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-8af0pazzhzl3dluuh8p7ar7p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the kernel allows tweaking perf_event_max_stack and the event being
setup has PERF_SAMPLE_CALLCHAIN in its perf_event_attr.sample_type, tell
the user that tweaking /proc/sys/kernel/perf_event_max_stack may solve
the problem.
Before:
# echo 32000 > /proc/sys/kernel/perf_event_max_stack
# perf record -g usleep 1
Error:
The sys_perf_event_open() syscall returned with 12 (Cannot allocate memory) for event (cycles:ppp).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
#
After:
# echo 64000 > /proc/sys/kernel/perf_event_max_stack
# perf record -g usleep 1
Error:
Not enough memory to setup event with callchain.
Hint: Try tweaking /proc/sys/kernel/perf_event_max_stack
Hint: Current value: 64000
#
Suggested-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ebv0orelj1s1ye857vhb82ov@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Coverity flagged this under CID 1354884 as a sizeof mismatch, it turns
out that the argument "attr" passed to syscall should have been a
pointer to attr in the first place.
Reported-by: coverity (CID 1354884)
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Fixes: 8f9e05fb29 ("perf tools: Fix PowerPC native building")
Link: http://lkml.kernel.org/r/1461551694-5512-3-git-send-email-f.fainelli@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Assigning "attr" to "attr" does not have any effect, but was caught by
Coverity, so let's remove this.
Reported-by: coverity (CID 1354720)
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Fixes: 1b76c13e4b ("bpf tools: Introduce 'bpf' library and add bpf feature check")
Link: http://lkml.kernel.org/r/1461551694-5512-2-git-send-email-f.fainelli@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
eMMC HS-DDR no longer works on the A80, despite it working when support
for this developed.
Disable it for now.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Skylake processor supports a new set of RAPL registers for controlling
entire SoC instead of just CPU package. This is useful for thermal
and power control when source of power/thermal is not just CPU/GPU.
This change adds a new platform domain (AKA PSys) to the current
power capping Intel RAPL driver.
PSys also supports PL1 (long term) and PL2 (short term) control like
package domain. This also follows same MSRs for energy and time
units as package domain.
Unlike package domain, PSys support requires more than just processor
level implementation. The other parts in the system need additional
implementation, which OEMs needs to support. So not all Skylake
systems will support PSys.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: jacob.jun.pan@linux.intel.com
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1460930581-29748-3-git-send-email-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes a bug which was introduced by:
b16a5b52eb ("perf/x86: Add option to disable reading branch flags/cycles")
In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK
doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be
set in lbr_select.
This patch corrects the LBR_SEL_MASK by including all valid bits in
LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in
LBR_SELECT. It does not operate in suppress mode, so it needs to be
specially handled in intel_pmu_setup_hw_lbr_filter.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some versions of Intel PT do not support tracing across VMXON, more
specifically, VMXON will clear TraceEn control bit and any attempt to
set it before VMXOFF will throw a #GP, which in the current state of
things will crash the kernel. Namely:
$ perf record -e intel_pt// kvm -nographic
on such a machine will kill it.
To avoid this, notify the intel_pt driver before VMXON and after
VMXOFF so that it knows when not to enable itself.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jann reported that the ptrace_may_access() check in
find_lively_task_by_vpid() is racy against exec().
Specifically:
perf_event_open() execve()
ptrace_may_access()
commit_creds()
... if (get_dumpable() != SUID_DUMP_USER)
perf_event_exit_task();
perf_install_in_context()
would result in installing a counter across the creds boundary.
Fix this by wrapping lots of perf_event_open() in cred_guard_mutex.
This should be fine as perf_event_exit_task() is already called with
cred_guard_mutex held, so all perf locks already nest inside it.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is
referenced by filter_events() which expects undefined events to have a
value of 0.
Found via KASAN:
UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
index 9 is out of range for type 'u64 [9]'
UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64'
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A while ago, commit 9875201e10 ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().
A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup(). Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
Call Trace:
[<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
[<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
[<ffffffff8109d5db>] process_one_work+0x17b/0x470
[<ffffffff8109e3ab>] worker_thread+0x11b/0x400
[<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
[<ffffffff810a5acf>] kthread+0xcf/0xe0
[<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
[<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
[<ffffffff81645dd8>] ret_from_fork+0x58/0x90
[<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
RIP [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]
If an error occurs during rbd map, we have to error out, potentially
tearing down a watch. Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:
Assertion failure in rbd_dev_header_info() at line 4722:
rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
Call Trace:
[<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
[<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
[<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
[<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
[<ffffffff81107799>] process_one_work+0x689/0x1a80
[<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
[<ffffffff81132065>] ? finish_task_switch+0x225/0x640
[<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffff81108c69>] worker_thread+0xd9/0x1320
[<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
[<ffffffff8111b02d>] kthread+0x21d/0x2e0
[<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
[<ffffffff82022802>] ret_from_fork+0x22/0x40
[<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
RIP [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30
To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section. The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.
Fixes: http://tracker.ceph.com/issues/15490
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
the already allocated vectors. This subsequently calls clear_vector_irq().
The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
clear_vector_irq().
We cannot suppress the call to x86_vector_free_irqs() for the failed
interrupt, because the other data related to this irq must be cleaned up as
well. So calling clear_vector_irq() with vector == 0 is legitimate.
Remove the BUG_ON and return if vector is zero,
[ tglx: Massaged changelog ]
Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Power allocator's parameters are S32 type, so use %d to print them.
Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
When calculate temperature, old code firstly do division and then
convert to "millicelsius" unit. This will lose resolution and only can
read back temperature with "Celsius" unit.
So firstly scale step value to "millicelsius" and then do division, so
finally we can increase resolution for temperature value. Also refine
the calculation from temperature value to step value.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Pull workqueue fix from Tejun Heo:
"So, it turns out we had a silly bug in the most fundamental part of
workqueue for a very long time. AFAICS, this dates back to pre-git
era and has quite likely been there from the time workqueue was first
introduced.
A work item uses its PENDING bit to synchronize multiple queuers.
Anyone who wins the PENDING bit owns the pending state of the work
item. Whether a queuer wins or loses the race, one thing should be
guaranteed - there will soon be at least one execution of the work
item - where "after" means that the execution instance would be able
to see all the changes that the queuer has made prior to the queueing
attempt.
Unfortunately, we were missing a smp_mb() after clearing PENDING for
execution, so nothing guaranteed visibility of the changes that a
queueing loser has made, which manifested as a reproducible blk-mq
stall.
Lots of kudos to Roman for debugging the problem. The patch for
-stable is the minimal one. For v3.7, Peter is working on a patch to
make the code path slightly more efficient and less fragile"
* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix ghost PENDING flag while doing MQ IO
Pull cgroup fixes from Tejun Heo:
"Two patches to fix a deadlock which can be easily triggered if memcg
charge moving is used.
This bug was introduced while converting threadgroup locking to a
global percpu_rwsem and is caused by cgroup controller task migration
path depending on the ability to create new kthreads. cpuset had a
similar issue which was fixed by performing heavy-lifting operations
asynchronous to task migration. The two patches fix the same issue in
memcg in a similar way. The first patch makes the mechanism generic
and the second relocates memcg charge moving outside the migration
path.
Given that we don't want to perform heavy operations while
writelocking threadgroup lock anyway, moving them out of the way is a
desirable solution. One thing to note is that the problem was
difficult to debug because lockdep couldn't figure out the deadlock
condition. Looking into how to improve that"
* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
memcg: relocate charge moving from ->attach to ->post_attach
cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
Pull i2c fixes from Wolfram Sang:
"I2C has one buildfix, one ABBA deadlock fix, and three simple 'add ID'
patches"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
i2c: cpm: Fix build break due to incompatible pointer types
i2c: ismt: Add Intel DNV PCI ID
i2c: xlp9xx: add support for Broadcom Vulcan
i2c: rk3x: add support for rk3228
- LOCKDEP now words for ARCv2 builds
- Enabling DT reserved-memory binding to work (for forthcoming HDMI driver)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXIMkcAAoJEGnX8d3iisJeHdkP/1Eb+V6asWOPVGinbdKs0nhs
xdnFiBVesgdfXKBxY/K1GN6bVNUqaPNVRk7Y1fWFzS6O2tgVMMDFz+4FebU6sDMv
wiQuxqzEYhyaiqNvw2JUH60Y5GiCjWFLMfgR2FKCkUM99NZm/2DFos2hPE81K2yc
4JfEQoPcsuYYN6nheMKa0sjomGHg6qIL4wi3uvB3RCrbqs1MbauKpsbdbNF3thqZ
xpeHavHLRLUTnN/lIf7Z1SwSh6S0Ey7YFLePxsC48vZCL0a8L1HfFSfvjcxuHBMU
cGsRXWQmwVjPUeEC9JIPcDkbEfQ1nlezU8lZcg7PJHOt4DByxN3sCngAPKt6Skls
I2Ql5tP12IUmuWv4zpM7VP/ZZvC5EOh4RmG2xQwuV+rtDilkYHZrUPx7PdilRt3S
a+A+FoYMgczdTTSCJnI0kJYADmPtz/6e1N9rSyzzmVnDmSPR9hClO9dENtpSwiXD
Jtpo4tBsMLyw6+Oj68e70c58t8ek9PObR1lIqZ3zJ97hvgACjjpeDytVjv7oPpYH
scTUfx69s6qF96Wn+y44Iw1gRV896UFlLmHtX9Hk0BtuVdjZuwOIF1Kqfsk6SYsJ
0WFdnxoNJHPJVWkp+Pmrz7g8BaUxfAtc7Ly6C+8yUUa1nQswYQmrK+84uKLVjiWl
E2sBDIJl2Xu2E7amTJss
=rlDa
-----END PGP SIGNATURE-----
Merge tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- lockdep now works for ARCv2 builds
- enable DT reserved-memory binding (for forthcoming HDMI driver)
* tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: add support for reserved memory defined by device tree
ARC: support generic per-device coherent dma mem
Documentation: dt: arc: fix spelling mistakes
ARCv2: Enable LOCKDEP
nios2: memset: use the right constraint modifier for the %4 output operand
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXIH+FAAoJEFWoEK+e3syCfRoQAMPXIiWR/V/dLn3OX8f8CeA6
I9duVqMrKrVh/a+bwxzmVJkumm0xzqYnOyhOpX5fZd3Nx44Q4NynJakwgWpMDVAI
+xXxNtHZUhjcRC4EuqJW677plR0Uq8bWY2UibpARPHfB9d0arJOCuL11vdGCAjkg
lWeVzUFg7iB9n0tRFwvsN29EcRZDo7+WbJh3cGIfTYNbcihJfiAAlmNyXS7XFiBY
DqSYyTXIc8scH1q66gArTnyDryvM7cEZ+zyYoX9v8/E/+xPLLLhogthtqf3u4opn
J/70k1LBgzgHCYrlEG8vvd1kCr114PLo7RlgkwJqdAtVhtMAtcGZFfkVQkSo7R3h
gHQHf8f0exg3JKp0VesB443FyaIvpCNkth3eGdNMWunhCPB4bXE5W+hg5J5gZBNi
1Ft9VB9Ug/8sh9Es4muinNX1kR5Fc8IWQqIa2U/OCt4O2wR1aFanJvaRqeCozbES
SpbRAOoXtzOZ0xRPZGPQpqP6ggfizq9Zil2ZTeXbNPBRybFvmEpgewdJYqvrNwZj
pbgB1+7zVcsfGTiMhJ+d1rLUX/oeMuUWT+eHY1jM1k+gTQVUo6k8KNVoaiNKa9z5
cZh+XqgWKE6Qv4DVRnj+ouvDuglhwqIAyI3oXElghrYmCWxjvVKCXowQhnSq727M
th2iyocnFMjIVWd5u3b7
=PRJ4
-----END PGP SIGNATURE-----
Merge tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2
Pull arch/nios2 fix from Ley Foon Tan:
"memset: use the right constraint modifier for the %4 output operand"
* tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: memset: use the right constraint modifier for the %4 output operand
V2: disable all vm interrupts in late_init()
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When crtc/timing is disabled on boot the dig block
should be stopped in order ignore timing from crtc,
reset the steering fifo otherwise we get display
corruption or hung in dp sst mode.
v2: agd: fix coding style
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes the following scenario:
1. Page table bo allocated in vram and linked to man->lru.
tbo->list_kref.refcount=2
2. Page table bo is swapped out and removed from man->lru.
tbo->list_kref.refcount=1
3. Command submission from userspace. Page table bo is moved
to vram. ttm_bo_move_to_lru_tail() link it to man->lru and
don't increase the kref count.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
toshiba_acpi:
- Fix regression caused by hotkey enabling value
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXIEZSAAoJEKbMaAwKp364mLwH/3j01EDn0JF1FIIP+kxVgeeL
g8xI+0tlFzxmdcBqW3n4q0apzVuCmHr0pbOik289l3dv7hQ5PEvdmK/VhVPYmJDL
2u/4EWmW7cvYMUAVhGB499pKac38fMUN5y97dkmoikiTQO6VaWsvdczvXuhuz/dP
OcQzRR/UttCLMe/ERxz3xh4R9kbY5Hzh4slW8Ay/sGDRrgOUFRLT8Zg3Uo7MY27i
Kq++SrH96edL1dW6XkWFIqO7NzWGlbBxTMlTlh+xmGUkOtVxUyzAID3NEDIaw6zC
7QU61eyfIJToa2SxHZ/mT9bEFNHNbJR4KoLREG6K2LbRyMhsQfMxaTym8MNzT/Q=
=+IXa
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform driver fix from Darren Hart:
"Fix regression caused by hotkey enabling value in toshiba_acpi"
* tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
toshiba_acpi: Fix regression caused by hotkey enabling value
This is a fairly large collection of fixes but almost all driver
specific ones, especially to the new Intel drivers which have had a lot
of recent development. The one core fix is a change to the debugfs code
to avoid crashes in some relatively unusual configurations.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXINZtAAoJECTWi3JdVIfQ+BwH/1eLqMfCSZM9nsDr1QMvOCDP
SO4ZoWqvYplBcS8pYKbJmqtuo8jMxT3VIQF+b5hPAVhgpLwMmy9qeFtatqCQ2WDC
GfCqW8LSKtrzwUwmoRrtHx7vfBLP1/z78F8ORQzwhrplTCBhvPLbUOrV51EFj6tf
Dfo2tW0uxww9iCZduYu4LadOhFOfuw+5shUrJk5A5f975Zbdgyke4CbRnlbDPXLq
d4i7bNfiISkSJiKMpdZFeiOQCd0+uXHh2WkMtVYSGVTA2Kf7d7HtX+JpEFFmaJgJ
8CndjgNJ1ZXtMHl1pMYmNqKJ5mEgmVtbGGJWY4QmQBva0EfQ+vLZt78BG3qvJwk=
=SXH2
-----END PGP SIGNATURE-----
Merge tag 'asoc-fix-v4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v4.6
This is a fairly large collection of fixes but almost all driver
specific ones, especially to the new Intel drivers which have had a lot
of recent development. The one core fix is a change to the debugfs code
to avoid crashes in some relatively unusual configurations.
User visible:
- perf trace --pf maj/min/all works with --call-graph: (Arnaldo Carvalho de Melo)
Tracing write syscalls and major page faults with callchains while starting
firefox, limiting the stack to 5 frames:
# perf trace -e write --pf maj --max-stack 5 firefox
589.549 ( 0.014 ms): firefox/15377 write(fd: 4, buf: 0x7fff80acc898, count: 151) = 151
[0xfaed] (/usr/lib64/libpthread-2.22.so)
fire_glxtest_process+0x5c (/usr/lib64/firefox/libxul.so)
InstallGdkErrorHandler+0x41 (/usr/lib64/firefox/libxul.so)
XREMain::XRE_mainInit+0x12c (/usr/lib64/firefox/libxul.so)
XREMain::XRE_main+0x1e4 (/usr/lib64/firefox/libxul.so)
760.704 ( 0.000 ms): firefox/15332 majfault [gtk_tree_view_accessible_get_type+0x0] => /usr/lib64/libgtk-3.so.0.1800.9@0xa0850 (x.)
gtk_tree_view_accessible_get_type+0x0 (/usr/lib64/libgtk-3.so.0.1800.9)
gtk_tree_view_class_intern_init+0x1a54 (/usr/lib64/libgtk-3.so.0.1800.9)
g_type_class_ref+0x6dd (/usr/lib64/libgobject-2.0.so.0.4600.2)
[0x115378] (/usr/lib64/libgnutls.so.30.6.3)
This automagically selects "--call-graph dwarf", use "--call-graph fp" on systems
where -fno-omit-frame-pointer was used to built the components of interest, to
incur in less overhead, or tune "--call-graph dwarf" appropriately, see 'perf record --help'.
- Allow /proc/sys/kernel/perf_event_max_stack, that defaults to the old hard coded value
of PERF_MAX_STACK_DEPTH (127), useful for huge callstacks for things like Groovy, Ruby, etc,
and also to reduce overhead by limiting it to a smaller value, upcoming work will allow
this to be done per-event (Arnaldo Carvalho de Melo)
- Make 'perf trace --min-stack' be honoured by --pf and --event (Arnaldo Carvalho de Melo)
- Make 'perf evlist -v' decode perf_event_attr->branch_sample_type (Arnaldo Carvalho de Melo)
# perf record --call lbr usleep 1
# perf evlist -v
cycles:ppp: ... sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK, ...
branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
#
- Clear dummy entry accumulated period, fixing such 'perf top/report' output
as: (Kan Liang)
4769.98% 0.01% 0.00% 0.01% tchain_edit [kernel] [k] update_fast_timekeeper
- System calls with pid_t arguments gets them augmented with the COMM event
more thoroughly:
# trace -e perf_event_open perf stat -e cycles -p 15608
6.876 ( 0.014 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15608 (hexchat), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
6.882 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15639 (gmain), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
6.889 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15640 (gdbus), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5
^^^^^^^^^^^^^^^^^^
^C
- Fix offline module name mismatch issue in 'perf probe' (Ravi Bangoria)
- Fix module probe issue if no dwarf support in (Ravi Bangoria)
Assorted fixes:
- Fix off-by-one in write_buildid() (Andrey Ryabinin)
- Fix segfault when printing callchains in 'perf script' (Chris Phlipot)
- Replace assignment with comparison on assert check in 'perf test' entry (Colin Ian King)
- Fix off-by-one comparison in intel-pt code (Colin Ian King)
- Close target file on error path in 'perf probe' (Masami Hiramatsu)
- Set default kprobe group name if not given in 'perf probe' (Masami Hiramatsu)
- Avoid partial perf_event_header reads (Wang Nan)
Infrastructure:
- Update x86's syscall_64.tbl copy, adding preadv2 & pwritev2 (Arnaldo Carvalho de Melo)
- Make the x86 clean quiet wrt syscall table removal (Jiri Olsa)
Cleanups:
- Simplify wrapper for LOCK_PI in 'perf bench futex' (Davidlohr Bueso)
- Remove duplicate const qualifier (Eric Engestrom)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXIMqAAAoJENZQFvNTUqpA/vEQAJIPQb+D9vHYrJ59Q27ZX4WE
OjBcKTfAXRtkg+5GmGruFqGF0BgU98fdxsG9r/vOpZfpUsiERbf2uXJp8NOZsePG
9MT0znfimGHbbTKQPEg5VzHPRyL+CjYF6Nu4jAFNHs1Hx4LUrC3k/CR13GwQ8cW9
lTHQIESl+E0FMrrCNESCwcYvjpeOefdDaj+vbT8Csmy6zwDGKwzlS0fp4cTQm3U1
IZ1JRHX0RDq7Lvs6OA+IY2BDh8bC8tokzmldJedq4uQjyjkT9pM/KYtYogJ6+70D
B+JV77Tc6ORq8/WgneS78GeeQiVJzqkX7DBNjFrHjMagaiqrUwBPvwVhVbMXOfkm
LQhkaytFBn1mM9DZZMPY9mK97/V2NUWRO2e/iOjDZaj7349TxBtDHiXZbF+kG0MH
bH4AMopuVI0ULjKmwloACsuL5djgOIUu87AWJ2oLkgPxcRv0LR4jk7Tjt/q6nnKM
2/RVsJd8U4kZGoy4B+jOsSB7swaT8t0HhfNebLP4yGHAS8BpMmRx06BFqot+FoaV
+mHflfOADrPqO9mh30t93FwQV0dnSXg9sKn9hSXf/rweqQCzX+/BYSeJipaUDXS7
vrx2FFcMbDzh+HEfIUa+j+TLnh2ZNlwdFi+YkUHjF8VRXfHt4scIuReobl5s1Ywc
GpT5GLccSfopST1WjrNX
=MzXA
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-20160427' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- perf trace --pf maj/min/all works with --call-graph: (Arnaldo Carvalho de Melo)
Tracing write syscalls and major page faults with callchains while starting
firefox, limiting the stack to 5 frames:
# perf trace -e write --pf maj --max-stack 5 firefox
589.549 ( 0.014 ms): firefox/15377 write(fd: 4, buf: 0x7fff80acc898, count: 151) = 151
[0xfaed] (/usr/lib64/libpthread-2.22.so)
fire_glxtest_process+0x5c (/usr/lib64/firefox/libxul.so)
InstallGdkErrorHandler+0x41 (/usr/lib64/firefox/libxul.so)
XREMain::XRE_mainInit+0x12c (/usr/lib64/firefox/libxul.so)
XREMain::XRE_main+0x1e4 (/usr/lib64/firefox/libxul.so)
760.704 ( 0.000 ms): firefox/15332 majfault [gtk_tree_view_accessible_get_type+0x0] => /usr/lib64/libgtk-3.so.0.1800.9@0xa0850 (x.)
gtk_tree_view_accessible_get_type+0x0 (/usr/lib64/libgtk-3.so.0.1800.9)
gtk_tree_view_class_intern_init+0x1a54 (/usr/lib64/libgtk-3.so.0.1800.9)
g_type_class_ref+0x6dd (/usr/lib64/libgobject-2.0.so.0.4600.2)
[0x115378] (/usr/lib64/libgnutls.so.30.6.3)
This automagically selects "--call-graph dwarf", use "--call-graph fp" on systems
where -fno-omit-frame-pointer was used to built the components of interest, to
incur in less overhead, or tune "--call-graph dwarf" appropriately, see 'perf record --help'.
- Allow /proc/sys/kernel/perf_event_max_stack, that defaults to the old hard coded value
of PERF_MAX_STACK_DEPTH (127), useful for huge callstacks for things like Groovy, Ruby, etc,
and also to reduce overhead by limiting it to a smaller value, upcoming work will allow
this to be done per-event (Arnaldo Carvalho de Melo)
- Make 'perf trace --min-stack' be honoured by --pf and --event (Arnaldo Carvalho de Melo)
- Make 'perf evlist -v' decode perf_event_attr->branch_sample_type (Arnaldo Carvalho de Melo)
# perf record --call lbr usleep 1
# perf evlist -v
cycles:ppp: ... sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK, ...
branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
#
- Clear dummy entry accumulated period, fixing such 'perf top/report' output
as: (Kan Liang)
4769.98% 0.01% 0.00% 0.01% tchain_edit [kernel] [k] update_fast_timekeeper
- System calls with pid_t arguments gets them augmented with the COMM event
more thoroughly:
# trace -e perf_event_open perf stat -e cycles -p 15608
6.876 ( 0.014 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15608 (hexchat), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
6.882 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15639 (gmain), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
6.889 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15640 (gdbus), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5
^^^^^^^^^^^^^^^^^^
^C
- Fix offline module name mismatch issue in 'perf probe' (Ravi Bangoria)
- Fix module probe issue if no dwarf support in (Ravi Bangoria)
Assorted fixes:
- Fix off-by-one in write_buildid() (Andrey Ryabinin)
- Fix segfault when printing callchains in 'perf script' (Chris Phlipot)
- Replace assignment with comparison on assert check in 'perf test' entry (Colin Ian King)
- Fix off-by-one comparison in intel-pt code (Colin Ian King)
- Close target file on error path in 'perf probe' (Masami Hiramatsu)
- Set default kprobe group name if not given in 'perf probe' (Masami Hiramatsu)
- Avoid partial perf_event_header reads (Wang Nan)
Infrastructure changes:
- Update x86's syscall_64.tbl copy, adding preadv2 & pwritev2 (Arnaldo Carvalho de Melo)
- Make the x86 clean quiet wrt syscall table removal (Jiri Olsa)
Cleanups:
- Simplify wrapper for LOCK_PI in 'perf bench futex' (Davidlohr Bueso)
- Remove duplicate const qualifier (Eric Engestrom)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is an upper limit to what tooling considers a valid callchain,
and it was tied to the hardcoded value in the kernel,
PERF_MAX_STACK_DEPTH (127), now that this can be tuned via a sysctl,
make it read it and use that as the upper limit, falling back to
PERF_MAX_STACK_DEPTH for kernels where this sysctl isn't present.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-yjqsd30nnkogvj5oyx9ghir9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The default remains 127, which is good for most cases, and not even hit
most of the time, but then for some cases, as reported by Brendan, 1024+
deep frames are appearing on the radar for things like groovy, ruby.
And in some workloads putting a _lower_ cap on this may make sense. One
that is per event still needs to be put in place tho.
The new file is:
# cat /proc/sys/kernel/perf_event_max_stack
127
Chaging it:
# echo 256 > /proc/sys/kernel/perf_event_max_stack
# cat /proc/sys/kernel/perf_event_max_stack
256
But as soon as there is some event using callchains we get:
# echo 512 > /proc/sys/kernel/perf_event_max_stack
-bash: echo: write error: Device or resource busy
#
Because we only allocate the callchain percpu data structures when there
is a user, which allows for changing the max easily, its just a matter
of having no callchain users at that point.
Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Depending on the size of the area to be memset'ed, the nios2 memset implementation
either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized
implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores
rather than 1-byte stores to speed up memset.
However, we discovered that on our nios2 platform, memset() was not properly setting the
buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to:
0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ...
Which is obviously incorrect. Our investigation has revealed that the problem lies in the
incorrect constraints used in the inline assembly.
The following piece of assembly, from the nios2 memset implementation, is supposed to
create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument:
/* fill8 %3, %5 (c & 0xff) */
" slli %4, %5, 8\n"
" or %4, %4, %5\n"
" slli %3, %4, 16\n"
" or %3, %3, %4\n"
However, depending on the compiler and optimization level, this code might be compiled as:
34: 280a923a slli r5,r5,8
38: 294ab03a or r5,r5,r5
3c: 2808943a slli r4,r5,16
40: 2148b03a or r4,r4,r5
This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern
stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff.
%4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in
http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from
using the same register for an output operand (%4) and input operand (%5). By using the
constraint modifier '&', we indicate that the register should be used for output only. With this
change, we get the following assembly output:
34: 2810923a slli r8,r5,8
38: 4150b03a or r8,r8,r5
3c: 400e943a slli r7,r8,16
40: 3a0eb03a or r7,r7,r8
Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern.
It is worth mentioning the observed consequence of this bug: we were hitting the kernel
BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was
previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is
set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right
after the initialization was finding some bits to be set to 0, which didn't make sense since the
bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the
bitmap was in fact initialized to 0xff00ff00.
Thanks to Marek Vasut for his help and feedback.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Marek Vasut <marex@denx.de>
Acked-by: Ley Foon Tan <lftan@altera.com>
The sclp_ctl_ioctl_sccb function uses two copy_from_user calls to
retrieve the sclp request from user space. The first copy_from_user
fetches the length of the request which is stored in the first two
bytes of the request. The second copy_from_user gets the complete
sclp request, but this copies the length field a second time.
A malicious user may have changed the length in the meantime.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Wire up preadv2/pwritev2 in the same way as preadv/pwritev. Fixes two
build warnings on ppc64.
mpe: Lightly tested with fio (slightly hacked to add the syscall
wrappers):
fio-4217 [009] .... 1304.635300: sys_preadv2(fd: 3, vec:
10025821de0, vlen: 1, pos_l: 6253000, pos_h: 0, flags: 1)
fio-4217 [009] .... 1304.635474: sys_preadv2 -> 0x1000
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When detaching contexts, we may still have interrupts in the system
which are yet to be delivered to any CPU and be acked in the PSL.
This can result in a subsequent unrelated process getting an spurious
IRQ or an interrupt for a non-existent context.
This polls the PSL to ensure that the PSL is clear of IRQs for the
detached context, before removing the context from the idr.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Keep IRQ mappings on context teardown. This won't leak IRQs as if we
allocate the mapping again, the generic code will give the same
mapping used last time.
Doing this works around a race in the generic code. Masking the
interrupt introduces a race which can crash the kernel or result in
IRQ that is never EOIed. The lost of EOI results in all subsequent
mappings to the same HW IRQ never receiving an interrupt.
We've seen this race with cxl test cases which are doing heavy context
startup and teardown at the same time as heavy interrupt load.
A fix to the generic code is being investigated also.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: stable@vger.kernel.org # 3.8
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
virtio_gpu was failing to send vblank events when using the atomic IOCTL
with the DRM_MODE_PAGE_FLIP_EVENT flag set. This patch fixes each and
enables atomic pageflips updates.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>