AddressSanitizer (or ASan) and UndefinedBehaviorSanitizer (or UBSan) are
very useful tools to detect program bugs:
- AddressSanitizer (or ASan) is a GCC feature that detects memory
corruption bugs such as buffer overflows and memory leaks.
- UndefinedBehaviorSanitizer (or UBSan) is a fast undefined behavior
detector supported by GCC. UBSan detects undefined behaviors of programs
at runtime.
This patch adds a document about how to use them on perf. Later patches will fix
some of the issues disclosed by them.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190316080556.3075-2-changbin.du@gmail.com
[ Make some changes based on comments made by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -c option to enable multiplex scaling has been useless for quite
some time because scaling is default.
It's only useful as --no-scale to disable scaling. But the non scaling
code path has bitrotted and doesn't print anything because perf output
code relies on value run/ena information.
Also even when we don't want to scale a value it's still useful to show
its multiplex percentage.
This patch:
- Fixes help and documentation to show --no-scale instead of -c
- Removes -c, only keeps the long option because -c doesn't support negatives.
- Enables running/enabled even with --no-scale
- And fixes some other problems in the no-scale output.
Before:
$ perf stat --no-scale -e cycles true
Performance counter stats for 'true':
<not counted> cycles
0.000984154 seconds time elapsed
After:
$ ./perf stat --no-scale -e cycles true
Performance counter stats for 'true':
706,070 cycles
0.001219821 seconds time elapsed
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LPU-Reference: 20190314225002.30108-9-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-xggjvwcdaj2aqy8ib3i4b1g6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When comparing time stamps in 'perf script' traces it can be annoying to
work with the full perf time stamps.
Add a --reltime option that displays time stamps relative to the trace
start to make it easier to read the traces.
Note: not currently supported for --time. Report an error in this
case.
Before:
% perf script
swapper 0 [000] 245402.891216: 1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 245402.891223: 1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 245402.891227: 5 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 245402.891231: 41 cycles:ppp: ffffffffa0068816 native_write_msr+0x6 ([kernel.kallsyms])
swapper 0 [000] 245402.891235: 355 cycles:ppp: ffffffffa000dd51 intel_bts_enable_local+0x21 ([kernel.kallsyms])
swapper 0 [000] 245402.891239: 3084 cycles:ppp: ffffffffa0a0150a end_repeat_nmi+0x48 ([kernel.kallsyms])
After:
% perf script --reltime
swapper 0 [000] 0.000000: 1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 0.000006: 1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 0.000010: 5 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 0.000014: 41 cycles:ppp: ffffffffa0068816 native_write_msr+0x6 ([kernel.kallsyms])
swapper 0 [000] 0.000018: 355 cycles:ppp: ffffffffa000dd51 intel_bts_enable_local+0x21 ([kernel.kallsyms])
swapper 0 [000] 0.000022: 3084 cycles:ppp: ffffffffa0a0150a end_repeat_nmi+0x48 ([kernel.kallsyms])
Committer notes:
Do not use 'time' as the name of a variable, as this breaks the build on
older glibcs:
cc1: warnings being treated as errors
builtin-script.c: In function 'perf_sample__fprintf_start':
builtin-script.c:691: warning: declaration of 'time' shadows a global declaration
/usr/include/time.h:187: warning: shadowed declaration is here
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-8-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-bpahyi6pr9r399mvihu65fvc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Show all the supported sort keys in the command line help output, so
that it's not needed to refer to the manpage.
Before:
% perf report -h
...
-s, --sort <key[,key2...]>
sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer the man page for the complete list.
After:
% perf report -h
...
-s, --sort <key[,key2...]>
sort by key(s): overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period pid comm dso symbol parent cpu ...
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-5-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-9r3uz2ch4izoi1uln3f889co@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The help description for --switch-output looks like there are multiple
comma separated fields. But it's actually a choice of different options.
Make it clear and less confusing.
Before:
% perf record -h
...
--switch-output[=<signal,size,time>]
Switch output when receive SIGUSR2 or cross size,time threshold
After:
% perf record -h
...
--switch-output[=<signal or size[BKMG] or time[smhd]>]
Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-4-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-9yecyuha04nyg8toyd1b2pgi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We only need to clear the bit in a 32bit integer.
This fixes a crah on ARM64 and PPC64LE caused by
"drm/amdgpu: update the vm invalidation engine layout V2"
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 8466cc61da.
It can trigger a reference counter bug in TTM. Need to investigate further, but
for now revert the offending change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Commit 7f147f9bfd ("scsi: qla2xxx: Fix N2N target discovery with Local
loop") fixed N2N target discovery for local loop. However, same code is
used for FC-AL discovery as well. Added check to make sure we are bypassing
area and domain check only in N2N topology for target discovery.
Fixes: 7f147f9bfd ("scsi: qla2xxx: Fix N2N target discovery with Local loop")
Cc: stable@vger.kernel.org # 5.0+
Signed-off-by: Quinn Tran <qtran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Two fixes:
1. platform/chrome: Fix locking pattern in wilco_ec_mailbox()
- Closes a potential race condition in the new wilco_ec driver.
2. platform/chrome: cros_ec_debugfs: cancel/schedule logging work only if supported
- Fixes a warning in cros_ec_debugfs on systems that do not support
console logging, such as the Asus C201.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6gYDF28Li+nEiKLaHwn1ewov5lgFAlyRMPgACgkQHwn1ewov
5li4Qg/+PX4CWUuw1f1Jy00y//JDuzT06yuReeIE5a3Gq4O9u6tijeeZ5fUSFN3T
9FhM3zQ52qdLRL3gK81iNF5Fat4bs8sMM+znAAuezZcBK5LBt7IJxXQI3KBJf1wX
s33/9nnD+efLraQixESxGsfGRVVp3ocvYNxQsuxm3oUYY5kk8wPDYEcf59YtYF5E
GHRFwo+HB06IapkBpXRPDsEsN1p5Ky9uYShvkS7Ad3Xuu/C2S9xjBxCbPwk7/xYF
uY+NPJaPp+ndcp8lfvFlrn3jPYY0QrGPHncP6k7ZrELmAIQb2gOUiGDYv3HpkT7t
jtMIxLShL64szOtMGNr17waAoK0Q/W/MfNKfgyLZjUCPFiRoClUiHm30NJJMP+yZ
YIWH03T0pc5WtY7hr766L2gt2QMFmG4T/ITZOGz3KKgPcOBc5J3kAVQ8WFEA2QGX
uPGui58QpZe5DSH1jsuuvRzxCgj+qT/QLKGbyBQKeUohCs2oKyq4m+NQ7UmPdYqU
xBpidVWr51BJi/M6qEE5uPcbdBw+oURcjfTmkrsQjIaeMZu2Aev3sJQqHDFAk7H3
niDgM55w+/Qx06UOayoYKlPBc2sULpWjCfzeOSpF3KuKY8hyO6Zo+4S1PB+8BzQC
tME8dW+fOrA6/3Wg/HI8ixlOph7ukz576dAgCLnI0xIqE6M3wiA=
=7sfA
-----END PGP SIGNATURE-----
Merge tag 'tag-chrome-platform-fixes-for-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform fixes from Benson Leung:
"Two fixes:
- Fix locking and close a potential race condition in the new
wilco_ec driver.
- Fix a warning in cros_ec_debugfs on systems that do not support
console logging, such as the Asus C201"
* tag 'tag-chrome-platform-fixes-for-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_debugfs: cancel/schedule logging work only if supported
platform/chrome: Fix locking pattern in wilco_ec_mailbox()
Since scsi_device_quiesce() skips SCSI devices that have another state than
RUNNING, OFFLINE or TRANSPORT_OFFLINE, scsi_device_resume() should not
complain about SCSI devices that have been skipped. Hence this patch. This
patch avoids that the following warning appears during resume:
WARNING: CPU: 3 PID: 1039 at blk_clear_pm_only+0x2a/0x30
CPU: 3 PID: 1039 Comm: kworker/u8:49 Not tainted 5.0.0+ #1
Hardware name: LENOVO 4180F42/4180F42, BIOS 83ET75WW (1.45 ) 05/10/2013
Workqueue: events_unbound async_run_entry_fn
RIP: 0010:blk_clear_pm_only+0x2a/0x30
Call Trace:
? scsi_device_resume+0x28/0x50
? scsi_dev_type_resume+0x2b/0x80
? async_run_entry_fn+0x2c/0xd0
? process_one_work+0x1f0/0x3f0
? worker_thread+0x28/0x3c0
? process_one_work+0x3f0/0x3f0
? kthread+0x10c/0x130
? __kthread_create_on_node+0x150/0x150
? ret_from_fork+0x1f/0x30
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Martin Steigerwald <martin@lichtvoll.de>
Cc: <stable@vger.kernel.org>
Reported-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Tested-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Fixes: 3a0a529971 ("block, scsi: Make SCSI quiesce and resume work reliably") # v4.15
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
cmd->rcu is initialized by scsi_initialize_rq(). For passthrough
requests, blk_get_request() calls scsi_initialize_rq(). For filesystem
requests, scsi_init_command() calls scsi_initialize_rq(). Make sure
that destroy_rcu_head() is called for passthrough requests.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- An interrupt masking fix for Loongson-based Lemote 2F systems (fixing
a regression from v3.19).
- A relocation fix for configurations in which the devicetree is stored
in an ELF section (fixing a regression from v4.7).
- Fix jump labels for MIPSr6 kernels where they previously could
inadvertently place a control transfer instruction in a forbidden slot
& take unexpected exceptions (fixing MIPSr6 support added in v4.0).
- Extend an existing USB power workaround for the Netgear WNDR3400 to v2
boards in addition to the v3 ones that already used it.
- Remove the custom MIPS32 definition of __kernel_fsid_t to make it
consistent with MIPS64 & every other architecture, in particular
resolving issues for code which tries to print the val field whose
type previously differed (though had identical memory layout).
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXJARJxUccGF1bC5idXJ0
b25AbWlwcy5jb20ACgkQPqefrLV1AN0qJAEAg6i9PnkuHZFXjlaUsvBWyVJRrpgR
Y9vLYXTGJZdb1BwA/i17C6xD7i41Ef2/TtOuPc5fJ6IfEbt74nKJEeBxNTUO
=V6Ds
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_5.1_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A small batch of MIPS fixes for 5.1:
- An interrupt masking fix for Loongson-based Lemote 2F systems
(fixing a regression from v3.19)
- A relocation fix for configurations in which the devicetree is
stored in an ELF section (fixing a regression from v4.7)
- Fix jump labels for MIPSr6 kernels where they previously could
inadvertently place a control transfer instruction in a forbidden
slot & take unexpected exceptions (fixing MIPSr6 support added in
v4.0)
- Extend an existing USB power workaround for the Netgear WNDR3400 to
v2 boards in addition to the v3 ones that already used it
- Remove the custom MIPS32 definition of __kernel_fsid_t to make it
consistent with MIPS64 & every other architecture, in particular
resolving issues for code which tries to print the val field whose
type previously differed (though had identical memory layout)"
* tag 'mips_fixes_5.1_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Remove custom MIPS32 __kernel_fsid_t type
mips: bcm47xx: Enable USB power on Netgear WNDR3400v2
MIPS: Fix kernel crash for R6 in jump label branch function
MIPS: Ensure ELF appended dtb is relocated
mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction.
turbostat failed to return a non-zero exit status even though the
supplied command (turbostat <command>) failed. Currently when turbostat
forks a command it returns zero instead of the actual exit status of the
command. Modify the code to return the exit status.
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix sparse warning:
drivers/base/swnode.c:475:22: warning: symbol 'software_node_get_parent' was not declared. Should it be static?
drivers/base/swnode.c:484:22: warning: symbol 'software_node_get_next_child' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no usage of 'nr_expired'.
The 'nr_expired' was introduced by commit 1d9bd5161b ("blk-mq: replace
timeout synchronization with a RCU and generation based scheme"). Its usage
was removed since commit 12f5b93145 ("blk-mq: Remove generation
seqeunce").
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When doing long term recording and waiting for some event to snapshot
on, we often only care about the last minute or so.
The --switch-output command line option supports rotating the perf.data
file when the size exceeds a threshold. But the disk would still be
filled with unnecessary old files.
Add a new option to only keep a number of rotated files, so that the
disk space usage can be limited.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-3-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-y5u2lik0ragt4vlktz6qc9ks@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a filter is specified on the command line, filter the metrics too.
Before:
% perf list foo
List of pre-defined events (to be used in -e):
Metric Groups:
DSB:
DSB_Coverage
[Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
... more metrics ...
After:
% perf list foo
List of pre-defined events (to be used in -e):
Metric Groups:
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-1-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-1y8oi2s8c4jhjtykgs5zvda1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
HiSilicon Taishan v110 CPUs didn't implement CSV3 field of the
ID_AA64PFR0_EL1 and are not susceptible to Meltdown, so whitelist
the MIDR in kpti_safe_list[] table.
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Zhangshaokun <zhangshaokun@hisilicon.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Adding the MIDR encodings for HiSilicon Taishan v110 CPUs,
which is used in Kunpeng ARM64 server SoCs. TSV110 is the
abbreviation of Taishan v110.
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Zhangshaokun <zhangshaokun@hisilicon.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The ARM64 implements the save_stack_trace_regs function, but it is
unusable for any diagnostic tooling compiled as a kernel module due
the missing EXPORT_SYMBOL_GPL for the function. Export
save_stack_trace_regs() to align with other architectures such as
s390, openrisc, and powerpc. This is similar to the ARM64 export of
save_stack_trace_tsk() added in git commit e27c7fa015.
Signed-off-by: William Cohen <wcohen@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fujitsu erratum 010001 applies to A64FX v0r0 and v1r0, and we try to
handle either by masking MIDR with MIDR_FUJITSU_ERRATUM_010001_MASK
before comparing it to MIDR_FUJITSU_ERRATUM_010001.
Unfortunately, MIDR_FUJITSU_ERRATUM_010001 is constructed incorrectly
using MIDR_VARIANT(), which is intended to extract the variant field
from MIDR_EL1, rather than generate the field in-place. This results in
MIDR_FUJITSU_ERRATUM_010001 being all-ones, and we only match A64FX
v0r0.
This patch uses MIDR_CPU_VAR_REV() to generate an in-place mask for the
variant field, ensuring the we match both v0r0 and v1r0.
Fixes: 3e32131abc ("arm64: Add workaround for Fujitsu A64FX erratum 010001")
Reported-by: "Okamoto, Takayuki" <tokamoto@jp.fujitsu.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: fixed the patch author]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Use arch_populate_kprobe_blacklist() instead of
arch_within_kprobe_blacklist() so that we can see the full
blacklisted symbols under the debugfs.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[catalin.marinas@arm.com: Add arch_populate_kprobe_blacklist() comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Move exception/irqentry text address check in blacklist,
since those are symbol based rejection.
If we prohibit probing on the symbols in exception_text,
those should be blacklisted.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Remove unneeded RODATA check from arch_prepare_kprobe().
Since check_kprobe_address_safe() already ensured that
the probe address is in kernel text, we don't need to
check whether the address in RODATA or not. That must
be always false.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Move extable address check into arch_prepare_kprobe() from
arch_within_kprobe_blacklist().
The blacklist is exposed via debugfs as a list of symbols.
The extable entries are smaller, so must be filtered out
by arch_prepare_kprobe().
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since commit:
ad67b74d24 ("printk: hash addresses printed with %p")
at boot "____ptrval____" is printed instead of actual addresses:
found SMP MP-table at [mem 0x000f5cc0-0x000f5ccf] mapped at [(____ptrval____)]
Instead of changing the print to "%px", and leaking a kernel addresses,
just remove the print completely, like in:
071929dbdd ("arm64: Stop printing the virtual memory layout").
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The LLC NOHZ condition will become true as soon as >=2 CPUs in a
single LLC domain are busy. On big.LITTLE systems, this translates to
two or more CPUs of a "cluster" (big or LITTLE) being busy.
Issuing a NOHZ kick in these conditions isn't desired for asymmetric
systems, as if the busy CPUs can provide enough compute capacity to
the running tasks, then we can leave the NOHZ CPUs in peace.
Skip the LLC NOHZ condition for asymmetric systems, and rely on
nr_running & capacity checks to trigger NOHZ kicks when the system
actually needs them.
Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar.Eggemann@arm.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190211175946.4961-4-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In this commit:
3b1baa6496 ("sched/fair: Add 'group_misfit_task' load-balance type")
we set rq->misfit_task_load whenever the current running task has a
utilization greater than 80% of rq->cpu_capacity. A non-zero value in
this field enables misfit load balancing.
However, if the task being looked at is already running on a CPU of
highest capacity, there's nothing more we can do for it. We can
currently spot this in update_sd_pick_busiest(), which prevents us
from selecting a sched_group of group_type == group_misfit_task as the
busiest group, but we don't do any of that in nohz_balancer_kick().
This means that we could repeatedly kick NOHZ CPUs when there's no
improvements in terms of load balance to be done.
Introduce a check_misfit_status() helper that returns true iff there
is a CPU in the system that could give more CPU capacity to a rq's
misfit task - IOW, there exists a CPU of higher capacity_orig or the
rq's CPU is severely pressured by rt/IRQ.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar.Eggemann@arm.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morten.rasmussen@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190211175946.4961-3-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We now have a comment explaining the first sched_domain based NOHZ kick,
so might as well comment them all.
While at it, unwrap a line that fits under 80 characters.
Co-authored-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar.Eggemann@arm.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morten.rasmussen@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190211175946.4961-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Wang reported that get_next_freq() has a mult overflow bug on
32-bit platforms in the IOWAIT boost case, since in that case {util,max}
are in freq units instead of capacity units.
Solve this by moving the IOWAIT boost to capacity units. And since this
means @max is constant; simplify the code.
Reported-by: Vincent Wang <vincent.wang@unisoc.com>
Tested-by: Vincent Wang <vincent.wang@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190305083202.GU32494@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When file handle is embedded inside fanotify_event and usercopy checks
are enabled, we get a warning like:
Bad or missing usercopy whitelist? Kernel memory exposure attempt detected
from SLAB object 'fanotify_event' (offset 40, size 8)!
WARNING: CPU: 1 PID: 7649 at mm/usercopy.c:78 usercopy_warn+0xeb/0x110
mm/usercopy.c:78
Annotate handling in fanotify_event properly to mark copying it to
userspace is fine.
Reported-by: syzbot+2c49971e251e36216d1f@syzkaller.appspotmail.com
Fixes: a8b13aa20a ("fanotify: enable FAN_REPORT_FID init flag")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Recently we found the audio jack detection stop working after suspend
on many machines with Realtek codec. Sometimes the audio selection
dialogue didn't show up after users plugged headhphone/headset into
the headset jack, sometimes after uses plugged headphone/headset, then
click the sound icon on the upper-right corner of gnome-desktop, it
also showed the speaker rather than the headphone.
The root cause is that before suspend, the codec already call the
runtime_suspend since this codec is not used by any apps, then in
resume, it will not call runtime_resume for this codec. But for some
realtek codec (so far, alc236, alc255 and alc891) with the specific
BIOS, if it doesn't run runtime_resume after suspend, all codec
functions including jack detection stop working anymore.
This problem existed for a long time, but it was not exposed, that is
because when problem happens, if users play sound or open
sound-setting to check audio device, this will trigger calling to
runtime_resume (via snd_hda_power_up), then the codec starts working
again before users notice this problem.
Since we don't know how many codec and BIOS combinations have this
problem, to fix it, let the driver call runtime_resume for all codecs
in pm_resume, maybe for some codecs, this is not needed, but it is
harmless. After a codec is runtime resumed, if it is not used by any
apps, it will be runtime suspended soon and furthermore we don't run
suspend frequently, this change will not add much power consumption.
Fixes: cc72da7d4d ("ALSA: hda - Use standard runtime PM for codec power-save control")
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The commit 3baffc4a84 (ALSA: hda/intel: Refactoring PM code) changed
the behaviour of azx_resume(), it triggers the jackpoll_work after
applying this commit.
This change introduced a new issue, all codecs are runtime active
after S3, and will not call runtime_suspend() automatically.
The root cause is the jackpoll_work calls snd_hda_power_up/down_pm,
and it calls up_pm before snd_hdac_enter_pm is called, while calls
the down_pm in the middle of enter_pm and leave_pm is called. This
makes the dev->power.usage_count unbalanced after S3.
To fix it, let azx_resume() don't trigger jackpoll_work as before
it did.
Fixes: 3baffc4a84 ("ALSA: hda/intel: Refactoring PM code")
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We assumed that vm_mmap() would reject an attempt to mmap past the end of
the filp (our object), but we were wrong.
Applications that tried to use the mmap beyond the end of the object
would be greeted by a SIGBUS. After this patch, those applications will
be told about the error on creating the mmap, rather than at a random
moment on later access.
Reported-by: Antonio Argenziano <antonio.argenziano@intel.com>
Testcase: igt/gem_mmap/bad-size
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190314075829.16838-1-chris@chris-wilson.co.uk
(cherry picked from commit 794a11cb67)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
ffs() is 1-indexed, but we want to use it as an index into an array, so
use __ffs() instead.
Fixes: eb8d0f5af4 ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190315163933.19352-1-chris@chris-wilson.co.uk
(cherry picked from commit 9073e5b267)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We rely on VBT DDI port info for eDP detection on GEN9 platforms and
above. This breaks GEN9 platforms which don't have VBT because port A
eDP now defaults to false. Fix this by defaulting to true when VBT is
missing.
Fixes: a98d9c1d7e ("drm/i915/ddi: Rely on VBT DDI port info for eDP detection")
Signed-off-by: Thomas Preston <thomas.preston@codethink.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190306200618.17405-1-thomas.preston@codethink.co.uk
(cherry picked from commit 2131bc0ced)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Now that we have alloc_size that controls our discard behavior, it
doesn't make sense to have these set to object (set) size. alloc_size
defaults to 64k, but because discard_granularity is likely 4M, only
ranges that are equal to or bigger than 4M can be considered during
fstrim. A smaller io_min is also more likely to be met, resulting in
fewer deferred writes on bluestore OSDs.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Before, ec->data_buffer could be written to from multiple
contexts at the same time. Since the ec is shared data,
it needs to be inside the mutex as well.
Fixes: 7b3d4f44ab ("platform/chrome: Add new driver for Wilco EC")
Signed-off-by: Nick Crews <ncrews@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Benson Leung <bleung@chromium.org>
Several driver bug fixes post in the last three weeks
- First part of a race condition fix in mlx4 with CATAS errors
- Bad interaction with FW causing resource leaks in the mlx5 DCT flow
- Bad reporting of link speed/width in new mlx5 devices
- Userspace triggable OOPs in i40iw
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlyO7B8ACgkQOG33FX4g
mxoIPA//bfnWR/l0BdaGaOpZATqhYIvJXfGptmQD+/6SjNcv1ig0uDKh5Mfd4hf2
0ltxsDmrbLtd4JOgI0IvqOdXNnbwWJIxnGCyIX6UyVKogzZnJIuRKz0YLnALHgxt
KbuPURZPDSgMLZZnj/3WghOVw8cae5K2xPt8so+x51mUzVdJeR6GwDJ7Iuslddqv
c4IwG5zzsxoIZcgSSQZt2nUpUpVCly5rvII+guOXdctOCb/3BtjYfbVbAWVsK284
vTowvUS4OQJ/g2Jmej9ge9DY3Rw52+BW4NCzciJVz+QmY4hf85xt/E/6joQslRKl
DdAYafvrg/XKW9z8FqCWm0PXPHtQ63MeOLHkLQi9n8NjXhyppHJR5fW69lOAtsrV
V/yTTvLai+lA0Jv4bTDf/M9CkruFDjB1rlhzepytxvLGr5QcL5hwlSH0qCpVz+Xu
v+ftS2hGKpi5uK4YNsye9tzP11eCLAzBUI5VkgvoOGWkUibUfkzOXLG0OzzkoUMK
vaXGlkHus0dw/ojPQsNEVONZCsnB4V2LOF8wApcGtRqJocdxOzpbz0zZDLVCd7Nj
JuOe44hWGzzDWrecqOepnV9sPi9UCPcZC6LWVdduF00RW38wxBMlLZTdpb8sEZje
DB8QNgougQWg2UMfHAhDBFbV7/FsM/wHH7ZgIS7Xmyo8QdaAK2I=
=klIw
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Several driver bug fixes post in the last three weeks
- first part of a race condition fix in mlx4 with CATAS errors
- bad interaction with FW causing resource leaks in the mlx5 DCT flow
- bad reporting of link speed/width in new mlx5 devices
- user triggable OOPS in i40iw"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
i40iw: Avoid panic when handling the inetdev event
IB/mlx5: Fix mapping of link-mode to IB width and speed
IB/mlx5: Use mlx5 core to create/destroy a DEVX DCT
net/mlx5: Fix DCT creation bad flow
IB/mlx4: Fix race condition between catas error reset and aliasguid flows
If bio_iov_iter_get_pages() is called on an iov_iter that is flagged
with NO_REF, then we don't need to add a page reference for the pages
that we add.
Add BIO_NO_PAGE_REF to track this in the bio, so IO completion knows
not to drop a reference to these pages.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For ITER_BVEC, if we're holding on to kernel pages, the caller
doesn't need to grab a reference to the bvec pages, and drop that
same reference on IO completion. This is essentially safe for any
ITER_BVEC, but some use cases end up reusing pages and uncondtionally
dropping a page reference on completion. And example of that is
sendfile(2), that ends up being a splice_in + splice_out on the
pipe pages.
Add a flag that tells us it's fine to not grab a page reference
to the bvec pages, since that caller knows not to drop a reference
when it's done with the pages.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
I've seen cases where bulk alloc fails, since the bulk alloc API
is all-or-nothing - either we get the number we ask for, or it
returns 0 as number of entries.
If we fail a batch bulk alloc, retry a "normal" kmem_cache_alloc()
and just use that instead of failing with -EAGAIN.
While in there, ensure we use GFP_KERNEL. That was an oversight in
the original code, when we switched away from GFP_ATOMIC.
Signed-off-by: Jens Axboe <axboe@kernel.dk>