The proc files which is sorted with alphabetical order are evenly
assigned to several synthesize threads to be processed in parallel.
For 'perf top', the threads number hard code to online CPU number. The
following patch will introduce an option to set it.
For other perf tools, the thread number is 1. Because the process
function is not ready for multithreading, e.g.
process_synthesized_event.
This patch series only support event synthesize multithreading for 'perf
top'. For other tools, it can be done separately later.
With multithread applied, the total processing time can get up to 1.56x
speedup on Knights Mill for 'perf top'.
For specific single event processing, the processing time could increase
because of the lock contention. So proc_map_timeout may need to be
increased. Otherwise some proc maps will be truncated.
Based on my test, increasing the proc_map_timeout has small impact
on the total processing time. The total processing time still get 1.49x
speedup on Knights Mill after increasing the proc_map_timeout.
The patch itself doesn't increase the proc_map_timeout.
Doesn't need to implement multithreading for per task monitoring,
perf_event__synthesize_thread_map. It doesn't have performance issue.
Committer testing:
# getconf _NPROCESSORS_ONLN
4
# perf trace --no-inherit -e clone -o /tmp/output perf top
# tail -4 /tmp/bla
0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf)
0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf)
0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf)
246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf)
#
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add comm_str_lock to protect comm_str rb tree.
The lock is only needed for multithreaded code, so using mutex wrappers
provided by perf tool.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add two locks to protect namespaces_list and comm_list.
The lock is only needed for multithreaded code, so using mutex wrappers
provided by perf tool.
Not all the comm_list/namespaces_list accessing are protected, e.g.
thread__exec_comm. Because the multithread code for perf top event
synthesizing does not touch them. They don't need a lock.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-2-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Command perf test -v 16 (Setup struct perf_event_attr test) always
reports success even if the test case fails. It works correctly if you
also specify -F (for don't fork).
root@s35lp76 perf]# ./perf test -v 16
15: Setup struct perf_event_attr :
--- start ---
running './tests/attr/test-record-no-delay'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /tmp/tmp4E1h7R/perf.data
(1 samples) ]
expected task=0, got 1
expected precise_ip=0, got 3
expected wakeup_events=1, got 0
FAILED './tests/attr/test-record-no-delay' - match failure
test child finished with 0
---- end ----
Setup struct perf_event_attr: Ok
The reason for the wrong error reporting is the return value of the
system() library call. It is called in run_dir() file tests/attr.c and
returns the exit status, in above case 0xff00.
This value is given as parameter to the exit() function which can only
handle values 0-0xff.
The child process terminates with exit value of 0 and the parent does
not detect any error.
This patch corrects the error reporting and prints the correct test
result.
Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
LPU-Reference: 20170913081209.39570-2-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-rdube6rfcjsr1nzue72c7lqn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit d78ada4a76 ("perf tests attr: Do not store failed events") does
not create an event file in the /tmp directory when the
perf_open_event() system call failed.
This can lead to a situation where not /tmp/event-xx-yy-zz result file
exists at all (for example on a s390x virtual machine environment) where
no CPUMF hardware is available.
The following command then fails with a python call back chain instead
of printing failure:
[root@s8360046 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \
-p ./perf -v -ttest-stat-basic
running './tests/attr//test-stat-basic'
Traceback (most recent call last):
File "./tests/attr.py", line 379, in <module>
main()
File "./tests/attr.py", line 370, in main
run_tests(options)
File "./tests/attr.py", line 311, in run_tests
Test(f, options).run()
File "./tests/attr.py", line 300, in run
self.compare(self.expect, self.result)
File "./tests/attr.py", line 248, in compare
exp_event.diff(res_event)
UnboundLocalError: local variable 'res_event' referenced before assignment
[root@s8360046 perf]#
This patch catches this pitfall and prints an error message instead:
[root@s8360047 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \
-p ./perf -vvv -ttest-stat-basic
running './tests/attr//test-stat-basic'
loading expected events
Event event:base-stat
fd = 1
group_fd = -1
flags = 0|8
[....]
sample_regs_user = 0
sample_stack_user = 0
'PERF_TEST_ATTR=/tmp/tmpJbMQMP ./perf stat -o /tmp/tmpJbMQMP/perf.data -e cycles kill >/dev/null 2>&1' ret '1', expected '1'
loading result events
compare
matching [event:base-stat]
match: [event:base-stat] matches []
res_event is empty
FAILED './tests/attr//test-stat-basic' - match failure
[root@s8360047 perf]#
Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
LPU-Reference: 20170913081209.39570-1-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-04d63nn7svfgxdhi60gq2mlm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf_event_attr::task is 1 by default for first (tracking) event in
the session. Setting task=1 as default and adding task=0 for cases that
need it.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20170703145030.12903-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently free running PEBS is disabled when user or interrupt
registers are requested. Most of the registers are actually
available in the PEBS record and can be supported.
So we just need to check for the supported registers and then
allow it: it is all except for the segment register.
For user registers this only works when the counter is limited
to ring 3 only, so this also needs to be checked.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170831214630.21892-1-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To clarify why atomic_inc_return(&perf_sched_events) is not sufficient and
a mutex is needed to order static branch enabling vs the atomic counter
increment, this adds a comment with a short explanation.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170829140103.6563-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit:
d9a50b0256 ("perf/aux: Ensure aux_wakeup represents most recent wakeup index")
changed the AUX wakeup position calculation to rounddown(), which causes
a division-by-zero in AUX overwrite mode (aka "snapshot mode").
The zero denominator results from the fact that perf record doesn't set
aux_watermark to anything, in which case the kernel will set it to half
the AUX buffer size, but only for non-overwrite mode. In the overwrite
mode aux_watermark stays zero.
The good news is that, AUX overwrite mode, wakeups don't happen and
related bookkeeping is not relevant, so we can simply forego the whole
wakeup updates.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/20170906160811.16510-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This fixes an APEI problem that may cause a reported error to be
missed due to a race condition.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZzVkTAAoJEILEb/54YlRxGgcP/1j0kvrkU+/V8D4bZrpH9q/3
ad1W/krYLmam5Q+IsWEMfK+mwmT0CPHG3wM/OeT6VV/mTF9u6CyCM0m/XvnorKg3
Yp6wiKzAq7N9HIB6nqZUPwTgB0vIh3pLYBvqA1Dc6hlNU7lrsBuxUmYrpxM4hk6R
X5BAKFQygFrunjxi22fvJjk2yxxxg6IY4R7JYbJQIJbfKBAfMraMrDVoSE/gHieL
riOe1qJp0x5enI7kyOlGHQr0Sq+tOIrfJbf4O4Y4p1EwaXwk23mrfIpG9PtUpW3z
t3jJZC7Rg7liIS1ZrozZmSbNP2KFdF3nbQYqRBEzfbT4isOSJRXHGB2eqzroIpM5
rEgPjflLb561RWx7pcEQHH9z6cZ6cdbw97XNcdPTsJpxc46FohojdNR4FVY+z90I
KwakMwVUs5qUEhU7LcLbtRCXZyzCnXHdz72zYEIaqTBOhZ3yXFHzy66ld7Fe7Dwk
9Cu2u6P8gnnLPPbW5vRQGYhNdb5tfcOdzjQ0kajX+5kj+xlo5Nlhn8/LMOAqipOu
nwYnJLfu4adMyVCTmepgur32Pwlfp/oDupbe1Fp0dHe6wk6wiqzmo8RCxDT/3uIT
qxJB664AVQ9xpqOHVfRyZZxa07CRaW3aAYBqxkluIuL9lvEpSNWeStY9OxL1HCL3
C1R6WiA48V/1dDkbimJ1
=4NCJ
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"This fixes an APEI problem that may cause a reported error to be
missed due to a race condition"
* tag 'acpi-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / APEI: clear error status before acknowledging the error
- Fix a deadlock in the operating performance points (OPP)
framework caused by a notifier callback taking a lock that's
already held by its caller (Viresh Kumar).
- Prevent the ti-cpufreq and cpufreq-dt-platdev drivers from
attempting to register conflicting device objects which
triggers a warning from sysfs (Suniel Mahesh).
- Drop a stale reference to a piece of intel_pstate documentation
that's not in the tree any more (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZzVhfAAoJEILEb/54YlRxG1EQAJT3sG6InKIntxApvNG4o2fM
0zs2At29EgiOcxN0rBs5DYUiNk2yLMmbt3X/PnJxt5BijIANN/HlvEeD5jip4iHU
0F4o1gEDgFRkpEGlnR3tjUpCs/YZRwvmsox9zPqU7Nu4G/x6MjU5tlwT6BHCzq/U
gSG6O8GSy+pK8B+SJu2SpWSNGqdCmv2a1aKgGA+KLFlad+AM7k1cPoX/Wv5fQGZ6
iS20CLel4U4A6mzgYnnBhPSNsFYL4y0AxJ2SQ+O8PEWdP6hcmOvT5bo3TJTiTqqP
vQU9DTzsNxS8NL3ShGVCRAKZVWQav0SQHESTx687bjjPaxg7ppMHpodnRAp3niEI
5uyKGGerbdmJdKqEjEajpRLJWFjU8lcGMqWUUJFWDkIA88soSF1EoelxifH7rxnA
raLPxQ/FJKX/Og36jVgH96+a0sz+emnFj/BBrWxySKED5tBQ6HqKPKZqV/uFJE6h
DJ0qcYIxPdHtOCKYwLBsjJ2au2HUpp5fXzX+EOLmgnxIkHl9tsIbCnCAZldBIKVd
9ENErc1vFXA38SpHSWRf2mT/sGOjnToxik1PRsWOp7zXNkiXyFRqblQe6uJiUCvM
jU7Wl0HUNRGP0xEhdL3Ij7uGyOdVauHRLPEy5c+CJ9nSMCMYwYIW2pBdV8IgyRmD
Y4gxTBrJ8nHTTWLucSFb
=EEyM
-----END PGP SIGNATURE-----
Merge tag 'pm-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a deadlock in the operating performance points (OPP)
framework introduced during the 4.11 cycle, more issues with duplicate
device objects for cpufreq-dt and cpufreq documentation.
Specifics:
- Fix a deadlock in the operating performance points (OPP) framework
caused by a notifier callback taking a lock that's already held by
its caller (Viresh Kumar).
- Prevent the ti-cpufreq and cpufreq-dt-platdev drivers from
attempting to register conflicting device objects which triggers a
warning from sysfs (Suniel Mahesh).
- Drop a stale reference to a piece of intel_pstate documentation
that's not in the tree any more (Rafael Wysocki)"
* tag 'pm-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: docs: Drop intel-pstate.txt from index.txt
cpufreq: dt: Fix sysfs duplicate filename creation for platform-device
PM / OPP: Call notifier without holding opp_table->lock
- fix various problems with the copy-on-write extent maps getting freed
at the wrong time
- fix printk format specifier problems
- report zeroing operation outcomes instead of dropping them on the
floor
- fix some crashes when dio operations partially fail
- fix a race condition between unwritten extent conversion & dio read
- fix some incorrect tests in the inode log item processing
- correct the delayed allocation space reservations on rmap filesystems
- fix some problems checking for dax support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJZypYxAAoJEPh/dxk0SrTrJ3YQAJFWUCp194an+yuvgOY+MuyL
PG/vAA3DyJjYbwIsqUE//dlp9nrarccAXcxPITWlLdGZ//qHbXO2MguO3KIQ4iG8
qmsA+tXetVoYZYxYZLQ0KjX/XJTaAXY64xKTFxMMTTKUoxPygJRUF/FPfFFcTtaq
Q/ULikS5mhtW7/mQCfXBvtqM5ZD61A9vQRjDL5jRdrDbz49TQqtskp/7F6SEHLxU
fTCGhN7Ys4MQ4fmtUc+EUh0LPX8oAKIIKiGz3zUqrk/FgNYI2NqnTYvflfN8L9UE
t+k+4CGrON+dzrau4HrvZaYbfIPhRaJUM4QzFcDIPoaBZOt6DpBI0dEKm9FD7Hw/
vUvBs0M9asqYycH3PopFHugF+SxW8g7g+5TD8S9rg3j33PZahSNm3gt5gYb1Kiij
3TZPirst6OeQuEjWX6L5LAruAtqtEXtHL7o4dGn5LdQkJ0EIdKXMd9YGz0F/trTK
Grqf2Mep/Q8nccMTksaj94X5AhmM4znYmbAnbS/+QfYTgLk92GJltxoKTB6roW/N
fJ5azjyzGsr4BWdgakK3aA9glaQWGh3PY8Up2VLeEdjwcy3zyscnpZP2PSvt+l9X
pmMDpMTvQD0E6e5246itB69Il1NXTEoG/t9Hlx/2x9g0R2hjK6CRXXrwPnz9zYkI
7wFz5B5LmJ27vFGTCxo5
=7ptY
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- fix various problems with the copy-on-write extent maps getting freed
at the wrong time
- fix printk format specifier problems
- report zeroing operation outcomes instead of dropping them on the
floor
- fix some crashes when dio operations partially fail
- fix a race condition between unwritten extent conversion & dio read
- fix some incorrect tests in the inode log item processing
- correct the delayed allocation space reservations on rmap filesystems
- fix some problems checking for dax support
* tag 'xfs-4.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: revert "xfs: factor rmap btree size into the indlen calculations"
xfs: Capture state of the right inode in xfs_iflush_done
xfs: perag initialization should only touch m_ag_max_usable for AG 0
xfs: update i_size after unwritten conversion in dio completion
iomap_dio_rw: Allocate AIO completion queue before submitting dio
xfs: validate bdev support for DAX inode flag
xfs: remove redundant re-initialization of total_nr_pages
xfs: Output warning message when discard option was enabled even though the device does not support discard
xfs: report zeroed or not correctly in xfs_zero_range()
xfs: kill meaningless variable 'zero'
fs/xfs: Use %pS printk format for direct addresses
xfs: evict CoW fork extents when performing finsert/fcollapse
xfs: don't unconditionally clear the reflink flag on zero-block files
This reverts commit dbbccdc4ce.
It turns out that the "legacy" users aren't so legacy at all, and that
turning off the legacy ioctl will break the current Qt bluetooth stack
for bluetooth LE devices that were released just a couple of months ago.
So it's simply not true that this was a legacy interface that hasn't
been needed and is only limited to old legacy BT devices. Because I
actually read Kconfig help messages, and actively try to turn off
features that I don't need, I turned the option off.
Then I spent _way_ too much time debugging BLE issues until I realized
that it wasn't the Qt and subsurface development that had broken one of
my dive computer BLE downloads, but simply my broken kernel config.
Maybe in a decade it will be true that this is a legacy interface. And
maybe with a better help-text and correct dependencies, this kind of
legacy removal might be acceptable. But as things are right now both
the commit message and the Kconfig help text were misleading, and the
Kconfig option had the wrong dependenencies.
There's no reason to keep that broken Kconfig option in the tree.
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- a few core fixes
- a few ipoib fixes
- a few mlx5 fixes
- a 7 patch hfi1 related series
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZy8HKAAoJELgmozMOVy/d/3YP/RtJ4I+7dlHAdTrUsLkNIXzj
6e2sc5A7JQRvhbWa6ZfqkbD4DBz2gkz9bXmlYotP1nVfunBie9xQPi+nN39YNnTv
VPYa0G7RD53APw71ETCGh0uBBAjc8lGm0AOPj+HpSP7PvrLdH6B68IcAeXCSOf8D
orzXI0bRpRnLsW4IJ0zN09zShigYuCJVl0Wf59QB0Wrbw4veQD4W7bLSCAUTmuZk
TPb8bPlXY64Bf731HRftxIRl3HwUrpTPv5DuHcASAbVL/KeucWpPmOAj9XqhXTQp
tnqtiwBWYDcsLBwS/IS40B2gfN1BCh6hn03pSVbPj+HD/FLY7x8Gf/Lu0qQNmklz
9nvgMKHL/2h+T4M7DulhS7DTP58bvtkyKG+j77gjEmKX1OI0NXHOntKZDSjGAT2J
zw2dNx4Y/Sgng1HBCbHAAHMrFUdyj7XpQNR8mzdGvDcwtRfrDKmchGtvhVclPsbl
R3U9GN2NcAwg2+bIN96hTzUMB10QOZdvddGFvbxuB7FaWkskPaN52O1ptT3+MyWt
xccZp0iYu40zV80mEm+nF/kZwR8omfE6xM1ujQdIhMHstGe+z29BhqsaQ8Zw1qEG
oaU7+9m2aK57SvcSimR2S4kdK7Gxw9+BIVKdRREJwe9xvWVf96OvJnhnh5t5Fs56
BTN1mBn+7LxlK9eDVler
=HbhA
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Second -rc update for 4.14.
Both Mellanox and Intel had a series of -rc fixes that landed this
week. The Mellanox bunch is spread throughout the stack and not just
in their driver, where as the Intel bunch was mostly in the hfi1
driver. And, several of the fixes in the hfi1 driver were more than
just simple 5 line fixes. As a result, the hfi1 driver fixes has a
sizable LOC count.
Everything else is as one would expect in an RC cycle in terms of LOC
count. One item that might jump out and make you think "That's not an
rc item" is the fix that corrects a typo. But, that change fixes a
typo in a user visible API that was just added in this merge window,
so if we fix it now, we can fix it. If we don't, the typo is in the
API forever. Another that might not appear to be a fix at first glance
is the Simplify mlx5_ib_cont_pages patch, but the simplification
allows them to fix a bug in the existing function whenever the length
of an SGE exceeded page size. We also had to revert one patch from the
merge window that was wrong.
Summary:
- a few core fixes
- a few ipoib fixes
- a few mlx5 fixes
- a 7-patch hfi1 related series"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/hfi1: Unsuccessful PCIe caps tuning should not fail driver load
IB/hfi1: On error, fix use after free during user context setup
Revert "IB/ipoib: Update broadcast object if PKey value was changed in index 0"
IB/hfi1: Return correct value in general interrupt handler
IB/hfi1: Check eeprom config partition validity
IB/hfi1: Only reset QSFP after link up and turn off AOC TX
IB/hfi1: Turn off AOC TX after offline substates
IB/mlx5: Fix NULL deference on mlx5_ib_update_xlt failure
IB/mlx5: Simplify mlx5_ib_cont_pages
IB/ipoib: Fix inconsistency with free_netdev and free_rdma_netdev
IB/ipoib: Fix sysfs Pkey create<->remove possible deadlock
IB: Correct MR length field to be 64-bit
IB/core: Fix qp_sec use after free access
IB/core: Fix typo in the name of the tag-matching cap struct
On s390x perf test 1 failed. It turned out that commit cf6383f73c
("perf report: Fix kernel symbol adjustment for s390x") was incorrect.
The previous implementation in dso__load_sym() is also suitable for
s390x.
Therefore this patch undoes commit cf6383f73c
Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Fixes: cf6383f73c ("perf report: Fix kernel symbol adjustment for s390x")
LPU-Reference: 20170915071404.58398-2-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-v101o8k25vuja2ogosgf15yy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On s390x perf test 1 failed. It turned out that commit 4a084ecfc8
("perf report: Fix module symbol adjustment for s390x") was incorrect.
The previous implementation in dso__load_sym() is also suitable for
s390x.
Therefore this patch undoes commit 4a084ecfc8.
Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Fixes: 4a084ecfc8 ("perf report: Fix module symbol adjustment for s390x")
LPU-Reference: 20170915071404.58398-1-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-5ani7ly57zji7s0hmzkx416l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kkprobes don't need to disable IRQs if they are called from the
ftrace/jump trampoline code, because Documentation/kprobes.txt says:
-----
Probe handlers are run with preemption disabled. Depending on the
architecture and optimization state, handlers may also run with
interrupts disabled (e.g., kretprobe handlers and optimized kprobe
handlers run without interrupt disabled on x86/x86-64).
-----
So let's remove IRQ disabling from those handlers.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581534039.32348.11331736206004264553.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Disable preemption in ftrace-based jprobe handlers as
described in Documentation/kprobes.txt:
"Probe handlers are run with preemption disabled."
This will fix jprobes behavior when CONFIG_PREEMPT=y.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581530024.32348.9863783558598926771.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Disable preemption in optprobe handler as described
in Documentation/kprobes.txt, which says:
"Probe handlers are run with preemption disabled."
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581525942.32348.6359217983269060829.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Warn if optprobe handler tries to change execution path.
As described in Documentation/kprobes.txt, with optprobe
user handler can not change instruction pointer. In that
case user must avoid optimizing the kprobes by setting
post_handler or break_handler.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581521955.32348.3615624715034787365.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since get_kprobe_ctlblk() accesses per-cpu variables
which calls smp_processor_id(), it must be called under
preempt-disabled or irq-disabled.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581517952.32348.2655896843219158446.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add preemptible check to each handler. Handlers are called with
non-preemtible, which is guaranteed by Documentation/kprobes.txt.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581513991.32348.7956810394499654272.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit:
54a7d50b92 ("x86: mark kprobe templates as character arrays, not single characters")
changed optprobe_template_* to arrays, so we can remove the addressof()
operators from those symbols.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150304469798.17009.15886717935027472863.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make insn buffer always ROX and use text_poke() to write
the copied instructions instead of set_memory_*().
This makes instruction buffer stronger against other
kernel subsystems because there is no window time
to modify the buffer.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150304463032.17009.14195368040691676813.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As Chris explains, get_seccomp_filter() and put_seccomp_filter() can end
up using different filters. Once we drop ->siglock it is possible for
task->seccomp.filter to have been replaced by SECCOMP_FILTER_FLAG_TSYNC.
Fixes: f8e529ed94 ("seccomp, ptrace: add support for dumping seccomp filters")
Reported-by: Chris Salls <chrissalls5@gmail.com>
Cc: stable@vger.kernel.org # needs s/refcount_/atomic_/ for v4.12 and earlier
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[tycho: add __get_seccomp_filter vs. open coding refcount_inc()]
Signed-off-by: Tycho Andersen <tycho@docker.com>
[kees: tweak commit log]
Signed-off-by: Kees Cook <keescook@chromium.org>
Commit 33fc30b470 (cpufreq: intel_pstate: Document the current
behavior and user interface) dropped the intel-pstate.txt file
from Documentation/cpu-freq/, but it did not update the index.txt
file in there accordingly, so do that now.
Fixes: 33fc30b470 (cpufreq: intel_pstate: Document the current behavior and user interface)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently we acknowledge errors before clearing the error status.
This could cause a new error to be populated by firmware in-between
the error acknowledgment and the error status clearing which would
cause the second error's status to be cleared without being handled.
So, clear the error status before acknowledging the errors.
Also, make sure to acknowledge the error if the error status read
fails.
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull quota and isofs fixes from Jan Kara:
"Two quota fixes (fallout of the quota locking changes) and an isofs
build fix"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Fix quota corruption with generic/232 test
isofs: fix build regression
quota: add missing lock into __dquot_transfer()
This update consists of:
- fixes to several existing tests
- a test for regression introduced by
b9470c2760 ("inet: kill smallest_size and smallest_port")
- seccomp support for glibc 2.26 siginfo_t.h
- fixes to kselftest framework and tests to run make O=dir use-case
- fixes to silence unnecessary test output to de-clutter test results
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJZy7S7AAoJEAsCRMQNDUMcAt0P/iuR279yaBF3RVqHTyXsmr/t
RO6k4uj4XLYKTrVnV/YTu5hLCGO9fPDhprMmrTqlAGclioEyMDtRTOWDDln4TNFh
gehbXiOTVVHlLPCOXXRwvU+RsMppgi4O2WRTBK0dnTkBdl+sTLOl4iywGyqFPB11
O3oj1nNc8ruaxYoUMYwxiGCm1OATrngoSu/Y4mMhZPgT9MnCtZWDlg//kkrxQDHO
UTD11zk17nBAOw2q4nw3I4un00tgN8RzIOfg9g47Az40LjWSG5c5oAgd/hArqeBv
7pCUR1PnNKTf0RujX0nfaoQQ+bOEXqpV9GmM67HLo8Q/5e4lYxWdmSdhItPS5qtS
ZLo1lEMOuRH7+FCQuD236llhwKVMm/+R3jnXgdJcc+SupdGCmpzZ9P8rscX1g11R
ZDZ9+k8XOA2p7ufxSIGFEILSovn0FUMneOd3Nhwk40R7cIvSiZh+V+Xzdb6Q1K9T
NBVtH8qvRi5TyHSNwQCDF45fC6bCM80JxGcPToOguFsQTcUL6B0pG6xhxZG73+Ut
br+Z5y+g+JLWLeGzaBjo4LnqFpeP6w4Jb8CCrqu8BussV3BToIFCJkGX6aOggow/
D3g03tGDeMjqFMYwn0ZCH5s5u9cicWUUC8CBvoCJp2UZaE/prsNNfRjZjfwYlrVj
TvWPdPJtwjA/sdq/n2Hl
=FUuY
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-4.14-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This update consists of:
- fixes to several existing tests
- a test for regression introduced by b9470c2760 ("inet: kill
smallest_size and smallest_port")
- seccomp support for glibc 2.26 siginfo_t.h
- fixes to kselftest framework and tests to run make O=dir use-case
- fixes to silence unnecessary test output to de-clutter test results"
* tag 'linux-kselftest-4.14-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (28 commits)
selftests: timers: set-timer-lat: Fix hang when testing unsupported alarms
selftests: timers: set-timer-lat: fix hang when std out/err are redirected
selftests/memfd: correct run_tests.sh permission
selftests/seccomp: Support glibc 2.26 siginfo_t.h
selftests: futex: Makefile: fix for loops in targets to run silently
selftests: Makefile: fix for loops in targets to run silently
selftests: mqueue: Use full path to run tests from Makefile
selftests: futex: copy sub-dir test scripts for make O=dir run
selftests: lib.mk: copy test scripts and test files for make O=dir run
selftests: sync: kselftest and kselftest-clean fail for make O=dir case
selftests: sync: use TEST_CUSTOM_PROGS instead of TEST_PROGS
selftests: lib.mk: add TEST_CUSTOM_PROGS to allow custom test run/install
selftests: watchdog: fix to use TEST_GEN_PROGS and remove clean
selftests: lib.mk: fix test executable status check to use full path
selftests: Makefile: clear LDFLAGS for make O=dir use-case
selftests: lib.mk: kselftest and kselftest-clean fail for make O=dir case
Makefile: kselftest and kselftest-clean fail for make O=dir case
selftests/net: msg_zerocopy enable build with older kernel headers
selftests: actually run the various net selftests
selftest: add a reuseaddr test
...
Pull x86 fpu fixes and cleanups from Ingo Molnar:
"This is _way_ more cleanups than fixes, but the bugs were subtle and
hard to hit, and the primary reason for them existing was the
unnecessary historical complexity of some of the x86/fpu interfaces.
The first bunch of commits clean up and simplify the xstate user copy
handling functions, in reaction to the collective head-scratching
about the xstate user-copy handling code that leads up to the fix for
this SkyLake xstate handling bug:
0852b37417: x86/fpu: Add FPU state copying quirk to handle XRSTOR failure on Intel Skylake CPUs
The cleanups don't change any functionality, they just (hopefully)
make it all clearer, more consistent, more debuggable and more robust.
Note that most of the linecount increase comes from these commits,
where we better split the user/kernel copy logic by having more
variants, instead repeated fragile patterns of:
if (kbuf) {
memcpy(kbuf + pos, data, copy);
} else {
if (__copy_to_user(ubuf + pos, data, copy))
return -EFAULT;
}
The next bunch of commits simplify the FPU state-machine to get rid of
old lazy-FPU idiosyncrasies - a defensive simplification to make all
the code easier to review and fix. No change in functionality.
Then there's a couple of additional debugging tweaks: static checker
warning fix and move an FPU related warning to under WARN_ON_FPU(),
followed by another bunch of commits that represent a finegrained
split-up of the fixes from Eric Biggers to handle weird xstate bits
properly.
I did this finegrained split-up because some of these fixes also
impact the ABI for weird xstate handling, for which we'd like to have
good bisection results, should they cause any problems. (We also had
one regression with the more monolithic fixes, so splitting it all up
sounded prudent for robustness reasons as well.)
About the whole series: the commits up to 03eaec81ac have been in
-next for months - but I've recently rebased them to remove a state
machine clean-up commit that was objected to, and to make it more
bisectable - so technically it's a new, rebased tree.
Robustness history: this series had some regressions along the way,
and all reported regressions have been fixed. All but one of the
regressions manifested itself as easy to report warnings. The previous
version of this latest series was also in linux-next, with one
(warning-only) regression reported which is fixed in the latest
version.
Barring last minute brown paper bag bugs (and the commits are now
older by a day which I'd hope helps paperbag reduction), I'm
reasonably confident about its general robustness.
Famous last words ..."
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/fpu: Use using_compacted_format() instead of open coded X86_FEATURE_XSAVES
x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_user_to_xstate()
x86/fpu: Eliminate the 'xfeatures' local variable in copy_user_to_xstate()
x86/fpu: Copy the full header in copy_user_to_xstate()
x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_kernel_to_xstate()
x86/fpu: Eliminate the 'xfeatures' local variable in copy_kernel_to_xstate()
x86/fpu: Copy the full state_header in copy_kernel_to_xstate()
x86/fpu: Use validate_xstate_header() to validate the xstate_header in __fpu__restore_sig()
x86/fpu: Use validate_xstate_header() to validate the xstate_header in xstateregs_set()
x86/fpu: Introduce validate_xstate_header()
x86/fpu: Rename fpu__activate_fpstate_read/write() to fpu__prepare_[read|write]()
x86/fpu: Rename fpu__activate_curr() to fpu__initialize()
x86/fpu: Simplify and speed up fpu__copy()
x86/fpu: Fix stale comments about lazy FPU logic
x86/fpu: Rename fpu::fpstate_active to fpu::initialized
x86/fpu: Remove fpu__current_fpstate_write_begin/end()
x86/fpu: Fix fpu__activate_fpstate_read() and update comments
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
x86/fpu: Don't let userspace set bogus xcomp_bv
x86/fpu: Turn WARN_ON() in context switch into WARN_ON_FPU()
...
Failure to tune PCIe capabilities should not fail driver load. This can
cause the driver load to fail on systems with any of the following:
1. HFI's parent is not root. Example: HFI card is behind a PCIe bridge.
2. HFI's parent is not PCI Express capable.
In these situations, failure to tune PCIe capabilities should be logged
in the system message logs but not cause the driver load to fail.
This patch also ensures pcie capability word DevCtl is written only
after a successful read and the capability tuning process continues
even if read/write of the pcie capability word DevCtl fails.
Fixes: c53df62c7a ("IB/hfi1: Check return values from PCI config API calls")
Fixes: bf70a77577 ("staging/rdma/hfi1: Enable WFR PCIe extended tags from the driver")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
During base context setup, if setup_base_ctxt() fails, the context is
deallocated. This is incorrect because the context is referenced on
return, to notify any waiting subcontext. If there are no subcontexts
the pointer will be invalid.
Reorganize the error path so that deallocate_ctxt() is called after all
the possible subcontexts have been notified.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
commit 9a9b811269 will cause core to fail UD QP from being destroyed
on ipoib unload, therefore cause resources leakage.
On pkey change event above patch modifies mgid before calling underlying
driver to detach it from QP. Drivers' detach_mcast() will fail to find
modified mgid it was never given to attach in a first place.
Core qp->usecnt will never go down, so ib_destroy_qp() will fail.
IPoIB driver actually does take care of new broadcast mgid based on new
pkey by destroying an old mcast object in ipoib_mcast_dev_flush())
....
if (priv->broadcast) {
rb_erase(&priv->broadcast->rb_node, &priv->multicast_tree);
list_add_tail(&priv->broadcast->list, &remove_list);
priv->broadcast = NULL;
}
...
then in restarted ipoib_macst_join_task() creating a new broadcast mcast
object, sending join request and on completion tells the driver to attach
to reinitialized QP:
...
if (!priv->broadcast) {
...
broadcast = ipoib_mcast_alloc(dev, 0);
...
memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid));
priv->broadcast = broadcast;
...
Fixes: 9a9b811269 ("IB/ipoib: Update broadcast object if PKey value was changed in index 0")
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The general interrupt handler returns IRQ_HANDLED whether an IRQ
was handled or not.
Determine if an IRQ was handled and return the correct value.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Relying on a trailing magic value is incorrect. There are instances where
this is not present as trailing magic value has a specific purpose which is
not partition validation. Instead use the header magic value which is
present in all variants of the platform configuration and is intended for
validation. This is also used in other locations in the driver.
Fixes: bc5214ee29 (IB/hfi1: Handle missing magic values in config file)
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
QSFP reset enables AOC transmitters by default. They should be off
before moving to high power mode to complete the setup. There is no
need to reset the QSFP during LNI failure as it was reset at link down.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Offline.quietDuration was added in the 8051 firmware, and the driver
only turns off the AOC transmitters when offline.quiet is reached.
However, the AOC transmitters need to be turned off at the new state.
Therefore, turn off the AOC transmitters at any offline substates
including offline.quiet and offline.quietDuration, then recheck we
reached offline.quiet to support backwards compatibility.
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Eric has reported that since commit d2faa41516 "quota: Do not acquire
dqio_sem for dquot overwrites in v2 format" test generic/232
occasionally fails due to quota information being incorrect. Indeed that
commit was too eager to remove dqio_sem completely from the path that
just overwrites quota structure with updated information. Although that
is innocent on its own, another process that inserts new quota structure
to the same block can perform read-modify-write cycle of that block thus
effectively discarding quota information update if they race in a wrong
way.
Fix the problem by acquiring dqio_sem for reading for overwrites of
quota structure. Note that it *is* possible to completely avoid taking
dqio_sem in the overwrite path however that will require modifying path
inserting / deleting quota structures to avoid RMW cycles of the full
block and for now it is not clear whether it is worth the hassle.
Fixes: d2faa41516
Reported-and-tested-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
In generic_file_llseek_size, return -ENXIO for negative offsets as well
as offsets beyond EOF. This affects filesystems which don't implement
SEEK_HOLE / SEEK_DATA internally, possibly because they don't support
holes.
Fixes xfstest generic/448.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit fd26a88093 we added a worst case estimate for rmapbt blocks
needed to satisfy the block mapping request. Since then, we added the
ability to reserve enough space in each AG such that we should never run
out of blocks to grow the rmapbt, which makes this calculation
unnecessary. Revert the commit because it makes the extra delalloc
indlen accounting unnecessary and incorrect.
Reported-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
My previous patch: d3a304b629 check for
XFS_LI_FAILED flag xfs_iflush done, so the failed item can be properly
resubmitted.
In the loop scanning other inodes being completed, it should check the
current item for the XFS_LI_FAILED, and not the initial one.
The state of the initial inode is checked after the loop ends
Kudos to Eric for catching this.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We call __xfs_ag_resv_init to make a per-AG reservation for each AG.
This makes the reservation per-AG, not per-filesystem. Therefore, it
is incorrect to adjust m_ag_max_usable for each AG. Adjust it only
when we're reserving AG 0's blocks so that we only do it once per fs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>