Now that grace-period requests use funnel locking and now that they
set ->gp_flags to RCU_GP_FLAG_INIT even when the RCU grace-period
kthread has not yet started, rcu_gp_kthread() no longer needs to check
need_any_future_gp() at startup time. This commit therefore removes
this check.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Now that RCU no longer relies on failsafe checks, cpu_needs_another_gp()
can be greatly simplified. This simplification eliminates the last
call to rcu_future_needs_gp() and to rcu_segcblist_future_gp_needed(),
both of which which can then be eliminated. And then, because
cpu_needs_another_gp() is called only from __rcu_pending(), it can be
inlined and eliminated.
This commit carries out the simplification, inlining, and elimination
called out above.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
All of the cpu_needs_another_gp() function's checks (except for
newly arrived callbacks) have been subsumed into the rcu_gp_cleanup()
function's scan of the rcu_node tree. This commit therefore drops the
call to cpu_needs_another_gp(). The check for newly arrived callbacks
is supplied by rcu_accelerate_cbs(). Any needed advancing (as in the
earlier rcu_advance_cbs() call) will be supplied when the corresponding
CPU becomes aware of the end of the now-completed grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
If rcu_start_this_gp() is invoked with a requested grace period more
than three in the future, then either the ->need_future_gp[] array
needs to be bigger or the caller needs to be repaired. This commit
therefore adds a WARN_ON_ONCE() checking for this condition.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_start_this_gp() function had a simple form of funnel locking that
used only the leaves and root of the rcu_node tree, which is fine for
systems with only a few hundred CPUs, but sub-optimal for systems having
thousands of CPUs. This commit therefore adds full-tree funnel locking.
This variant of funnel locking is unusual in the following ways:
1. The leaf-level rcu_node structure's ->lock is held throughout.
Other funnel-locking implementations drop the leaf-level lock
before progressing to the next level of the tree.
2. Funnel locking can be started at the root, which is convenient
for code that already holds the root rcu_node structure's ->lock.
Other funnel-locking implementations start at the leaves.
3. If an rcu_node structure other than the initial one believes
that a grace period is in progress, it is not necessary to
go further up the tree. This is because grace-period cleanup
scans the full tree, so that marking the need for a subsequent
grace period anywhere in the tree suffices -- but only if
a grace period is currently in progress.
4. It is possible that the RCU grace-period kthread has not yet
started, and this case must be handled appropriately.
However, the general approach of using a tree to control lock contention
is still in place.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_accelerate_cbs() function selects a grace-period target, which
it uses to have rcu_segcblist_accelerate() assign numbers to recently
queued callbacks. Then it invokes rcu_start_future_gp(), which selects
a grace-period target again, which is a bit pointless. This commit
therefore changes rcu_start_future_gp() to take the grace-period target as
a parameter, thus avoiding double selection. This commit also changes
the name of rcu_start_future_gp() to rcu_start_this_gp() to reflect
this change in functionality, and also makes a similar change to the
name of trace_rcu_future_gp().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_start_gp_advanced() is invoked only from rcu_start_future_gp() and
much of its code is redundant when invoked from that context. This commit
therefore inlines rcu_start_gp_advanced() into rcu_start_future_gp(),
then removes rcu_start_gp_advanced().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Once the grace period has ended, any RCU_GP_FLAG_FQS requests are
irrelevant: The grace period has ended, so there is no longer any
point in forcing quiescent states in order to try to make it end sooner.
This commit therefore causes rcu_gp_cleanup() to clear any bits other
than RCU_GP_FLAG_INIT from ->gp_flags at the end of the grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
It is true that currently only the low-order two bits are used, so
there should be no problem given modern machines and compilers, but
good hygiene and maintainability dictates use of an unsigned long
instead of an int. This commit therefore makes this change.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The __rcu_process_callbacks() function currently checks to see if
the current CPU needs a grace period and also if there is any other
reason to kick off a new grace period. This is one of the fail-safe
checks that has been rendered unnecessary by the changes that increase
the accuracy of rcu_gp_cleanup()'s estimate as to whether another grace
period is required. Because this particular fail-safe involved acquiring
the root rcu_node structure's ->lock, which has seen excessive contention
in real life, this fail-safe needs to go.
However, one check must remain, namely the check for newly arrived
RCU callbacks that have not yet been associated with a grace period.
One might hope that the checks in __note_gp_changes(), which is invoked
indirectly from rcu_check_quiescent_state(), would suffice, but this
function won't be invoked at all if RCU is idle. It is therefore necessary
to replace the fail-safe checks with a simpler check for newly arrived
callbacks during an RCU idle period, which is exactly what this commit
does. This change removes the final call to rcu_start_gp(), so this
function is removed as well.
Note that lockless use of cpu_needs_another_gp() is racy, but that
these races are harmless in this case. If RCU really is idle, the
values will not change, so the return value from cpu_needs_another_gp()
will be correct. If RCU is not idle, the resulting redundant call to
rcu_accelerate_cbs() will be harmless, and might even have the benefit
of reducing grace-period latency a bit.
This commit also moves interrupt disabling into the "if" statement to
improve real-time response a bit.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
When __call_rcu_core() notices excessive numbers of callbacks pending
on the current CPU, we know that at least one of them is not yet
classified, namely the one that was just now queued. Therefore, it
is not necessary to invoke rcu_start_gp() and thus not necessary to
acquire the root rcu_node structure's ->lock. This commit therefore
replaces the rcu_start_gp() with rcu_accelerate_cbs(), thus replacing
an acquisition of the root rcu_node structure's ->lock with that of
this CPU's leaf rcu_node structure.
This decreases contention on the root rcu_node structure's ->lock.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_migrate_callbacks() function invokes rcu_advance_cbs()
twice, ignoring the return value. This is OK at pressent because of
failsafe code that does the wakeup when needed. However, this failsafe
code acquires the root rcu_node structure's lock frequently, while
rcu_migrate_callbacks() does so only once per CPU-offline operation.
This commit therefore makes rcu_migrate_callbacks()
wake up the RCU GP kthread when either call to rcu_advance_cbs()
returns true, thus removing need for the failsafe code.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
There is no longer any need for ->need_future_gp[] to count the number of
requests for future grace periods, so this commit converts the additions
to assignments to "true" and reduces the size of each element to one byte.
While we are in the area, fix an obsolete comment.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Currently, the rcu_future_needs_gp() function checks only the current
element of the ->need_future_gps[] array, which might miss elements that
were offset from the expected element, for example, due to races with
the start or the end of a grace period. This commit therefore makes
rcu_future_needs_gp() use the need_any_future_gp() macro to check all
of the elements of this array.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_cbs_completed() function provides the value of ->completed
at which new callbacks can safely be invoked. This is recorded in
two-element ->need_future_gp[] arrays in the rcu_node structure, and
the elements of these arrays corresponding to the just-completed grace
period are zeroed at the end of that grace period. However, the
rcu_cbs_completed() function can return the current ->completed value
plus either one or two, so it is possible for the corresponding
->need_future_gp[] entry to be cleared just after it was set, thus
losing a request for a future grace period.
This commit avoids this race by expanding ->need_future_gp[] to four
elements.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset
state to reflect the end of the grace period. It also checks to see
whether a new grace period is needed, but in a number of cases, rather
than directly cause the new grace period to be immediately started, it
instead leaves the grace-period-needed state where various fail-safes
can find it. This works fine, but results in higher contention on the
root rcu_node structure's ->lock, which is undesirable, and contention
on that lock has recently become noticeable.
This commit therefore makes rcu_gp_cleanup() immediately start a new
grace period if there is any need for one.
It is quite possible that it will later be necessary to throttle the
grace-period rate, but that can be dealt with when and if.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_gp_kthread() function immediately sleeps waiting to be notified
of the need for a new grace period, which currently works because there
are a number of code sequences that will provide the needed wakeup later.
However, some of these code sequences need to acquire the root rcu_node
structure's ->lock, and contention on that lock has started manifesting.
This commit therefore makes rcu_gp_kthread() check for early-boot activity
when it starts up, omitting the initial sleep in that case.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Accessors for the ->need_future_gp[] array are currently open-coded,
which makes them difficult to change. To improve maintainability, this
commit adds need_future_gp_mask() to compute the indexing mask from the
array size, need_future_gp_element() to access the element corresponding
to the specified grace-period number, and need_any_future_gp() to
determine if any future grace period is needed. This commit also applies
need_future_gp_element() to existing open-coded single-element accesses.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_start_future_gp() function uses a sloppy check for a grace
period being in progress, which works today because there are a number
of code sequences that resolve the resulting races. However, some of
these race-resolution code sequences must acquire the root rcu_node
structure's ->lock, and contention on that lock has started manifesting.
This commit therefore makes rcu_start_future_gp() check more precise,
eliminating the sloppy lockless check of the rcu_state structure's ->gpnum
and ->completed fields. The effect is that rcu_start_future_gp() will
sometimes unnecessarily attempt to start a new grace period, but this
overhead will be reduced later using funnel locking.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
When rcu_cbs_completed() is invoked on a non-root rcu_node structure,
it unconditionally assumes that two grace periods must complete before
the callbacks at hand can be invoked. This is overly conservative because
if that non-root rcu_node structure believes that no grace period is in
progress, and if the corresponding rcu_state structure's ->gpnum field
has not yet been incremented, then these callbacks may safely be invoked
after only one grace period has completed.
This change is required to permit grace-period start requests to use
funnel locking, which is in turn permitted to reduce root rcu_node ->lock
contention, which has been observed by Nick Piggin. Furthermore, such
contention will likely be increased by the merging of RCU-bh, RCU-preempt,
and RCU-sched, so it makes sense to take steps to decrease it.
This commit therefore improves the accuracy of rcu_cbs_completed() when
invoked on a non-root rcu_node structure as described above.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlrTQ4sACgkQxWXV+ddt
WDti3Q/+MAeqsLTjvre2RQ3ka5hNyCuVftUIBmcP3YfJbt+xZYQyaewW4Xkfi3cm
cbJE+zehzf5ag+RJhxk3OwFTvNfLGIO9asWs3b08NGUi6VzwL0/8B/iOdZPuHSAV
TrecQIBE2Tp+xax9cQEnxav34D4dUtXNaDweGjp1MIIUkDneQP/I0vlTu7vafBgX
UVxP6riL/MCs7sjTHGIPs0lv8L/fgdmo+dk5SnNuIPTOcFTQXgVrtHjw9IvbKWd4
aq+sbNWoSrhXUfllbFg/wZqDe9tWn9E2f6m/H0ThSoNdxusSVgacOjFRYh20NKLW
WGB8Amd/ItGtJwJ1CIypa7VX2U11UAi0XT7BeiK82rUNEJ6moRqFOXG861gRLoTZ
SpH8uWO+e+CogfXob1KCndn5lot4AM2ZTkCqfrjpM35Nul72PZdne0CxNlmiRupY
Fdt5GB+sg8plcMaRiYr++BbbHP5tggX1MrhLGEbx2XBs2eRdn+2Lv2I1Ig/U4NUb
Vf+xk/tFLKGOTSZlbv7SV7ekXxG/3+7gAuL7A1XMETZCwBF4L3hwyW7CgkEBb3PC
TqX8BwMaRpyp/FgW/QL6edjXZ3a64VaHIfqRPNks3lWWcCHVbzyiVPfEjx+AEJ/0
abx6jXTJhLUBPuPxEgb5rsSv62RoxPoYHqCErrG95XwnZ8rSCts=
=I0y0
-----END PGP SIGNATURE-----
Merge tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull more btrfs updates from David Sterba:
"We have queued a few more fixes (error handling, log replay,
softlockup) and the rest is SPDX updates that touche almost all files
so the diffstat is long"
* tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Only check first key for committed tree blocks
btrfs: add SPDX header to Kconfig
btrfs: replace GPL boilerplate by SPDX -- sources
btrfs: replace GPL boilerplate by SPDX -- headers
Btrfs: fix loss of prealloc extents past i_size after fsync log replay
Btrfs: clean up resources during umount after trans is aborted
btrfs: Fix possible softlock on single core machines
Btrfs: bail out on error during replay_dir_deletes
Btrfs: fix NULL pointer dereference in log_dir_items
-----BEGIN PGP SIGNATURE-----
iQGwBAABCAAaBQJa0mDhExxzbWZyZW5jaEBnbWFpbC5jb20ACgkQiiy9cAdyT1HA
EQv+KxZAFONG/YGBIdoyJevnSobzZSTYgsrEmox28rIvWSFr6Xkt9K6WN4lki+Td
nacByZBhyeKPp7CxdgjmiNunbGfBZJsWk5LwtgdE2jOLjM9CFq8Z7m2qmkXQ4IkJ
EoH+NWpCUvEe9nj8oaTdSelR2OHH63SG5E5dK29eOB59ixcVe7qKUG29SpJks9qd
mCdZyhYyuzh469CKGFeQ8qfWopBH+QFBs4SrpQ9d4liNtZ3W8zTsQ3ezX069AGlU
Ef2xVTT1DtlZ4/6SiXBKlXGH3jjUg67vEGHzT3ZJwfLVNhTLEpKM/hsPiduSnjS8
JqdVRdFLc2lQzr4HknhNR9yuE5wT5tZM4iZOjNDwT6Sef2Mt0QIBuYI6kb4mv6Qg
aCVv7uXUMdvSrTk0mYiO1dXYozGUC+uFhJCuaiLax583o3ihp+/INDosQyqTbavZ
amRBZpWeNysSJwwJu21RdJiM74+fasR/CDy11U94ssxplg1qE72M8DShK7kXtUIy
g9ZW
=Ah5z
-----END PGP SIGNATURE-----
Merge tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"SMB3 fixes, a few for stable, and some important cleanup work from
Ronnie of the smb3 transport code"
* tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: change validate_buf to validate_iov
cifs: remove rfc1002 hardcoded constants from cifs_discard_remaining_data()
cifs: Change SMB2_open to return an iov for the error parameter
cifs: add resp_buf_size to the mid_q_entry structure
smb3.11: replace a 4 with server->vals->header_preamble_size
cifs: replace a 4 with server->vals->header_preamble_size
cifs: add pdu_size to the TCP_Server_Info structure
SMB311: Improve checking of negotiate security contexts
SMB3: Fix length checking of SMB3.11 negotiate request
CIFS: add ONCE flag for cifs_dbg type
cifs: Use ULL suffix for 64-bit constant
SMB3: Log at least once if tree connect fails during reconnect
cifs: smb2pdu: Fix potential NULL pointer dereference
This is a set of minor (and safe changes) that didn't make the initial
pull request plus some bug fixes. The status handling code is
actually a running regression from the previous merge window which had
an incomplete fix (now reverted) and most of the remaining bug fixes
are for problems older than the current merge window.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWtMW7SYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishYdtAP97FhqR
x2lDO7J6QT8hMVqwPeQS0Xh5ZPbZLedPmfx9BAD+K1HauGv8J/eMggMDPGrWa/CP
tGrg2UorMrokLLdIbyA=
=fOJs
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of minor (and safe changes) that didn't make the initial
pull request plus some bug fixes.
The status handling code is actually a running regression from the
previous merge window which had an incomplete fix (now reverted) and
most of the remaining bug fixes are for problems older than the
current merge window"
[ Side note: this merge also takes the base kernel git repository to 6+
million objects for the first time. Technically we hit it a couple of
merges ago already if you count all the tag objects, but now it
reaches 6M+ objects reachable from HEAD.
I was joking around that that's when I should switch to 5.0, because
3.0 happened at the 2M mark, and 4.0 happened at 4M objects. But
probably not, even if numerology is about as good a reason as any.
- Linus ]
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: devinfo: Add Microsoft iSCSI target to 1024 sector blacklist
scsi: cxgb4i: silence overflow warning in t4_uld_rx_handler()
scsi: dpt_i2o: Use after free in I2ORESETCMD ioctl
scsi: core: Make scsi_result_to_blk_status() recognize CONDITION MET
scsi: core: Rename __scsi_error_from_host_byte() into scsi_result_to_blk_status()
Revert "scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"
scsi: aacraid: Insure command thread is not recursively stopped
scsi: qla2xxx: Correct setting of SAM_STAT_CHECK_CONDITION
scsi: qla2xxx: correctly shift host byte
scsi: qla2xxx: Fix race condition between iocb timeout and initialisation
scsi: qla2xxx: Avoid double completion of abort command
scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure
scsi: scsi_dh: Don't look for NULL devices handlers by name
scsi: core: remove redundant assignment to shost->use_blk_mq
- pass HOSTLDFLAGS when compiling single .c host programs
- build genksyms lexer and parser files instead of using shipped
versions
- rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency
- let the top .gitignore globally ignore artifacts generated by
flex, bison, and asn1_compiler
- let the top Makefile globally clean artifacts generated by
flex, bison, and asn1_compiler
- use safer .SECONDARY marker instead of .PRECIOUS to prevent
intermediate files from being removed
- support -fmacro-prefix-map option to make __FILE__ a relative path
- fix # escaping to prepare for the future GNU Make release
- clean up deb-pkg by using debian tools instead of handrolled
source/changes generation
- improve rpm-pkg portability by supporting kernel-install as a
fallback of new-kernel-pkg
- extend Kconfig listnewconfig target to provide more information
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJa0krLAAoJED2LAQed4NsGyCAP/3Vsb8A4sea7sE3LV6/aFUJp
WcAm6PXcip1MXy7GI5yxFciwen3Z3ghQUer7fJKDcHR5c4mRSfKaqWp+TLHd6uux
7I4pV0FNx2PapcPu5T7wNZHN96p3xZC0Z66sq9BCZ/+gNyYmZLIDcBUSIOEk0nzJ
IsvD46zy6R6KtEnycShKVscg4JyPXJIw1UBqsPDEFHg5l16ARkghND7e5zTW62Fi
2MqQxNXAksIKpxxoxPH/fIcNp1kFKVxYBH2CW4LQtOjC3GmrozdeV5PUc7yTezPc
dpqOuEcIAbMH91bkvhhF+ZBi34YrxRoT4S8B3G9iCXRz+2LRZZaitqO4dAH8Kjbn
0KjkqzNc5TosJXQ8RPTcQlRBi+JmE1bHxICvTx3XNJcqJMqIH0vs3ez/LJKOwhB4
DbAROoxQNfVcOdouHcx2EuCSdHn24BEyzaGFhi04LACpbRLxr8IJS7hSGXRloBYp
K3ydRvG/dCZjFRTS+xWWSi3Nzjih2mCctQlH3D4nf4M3vtCX+/k5B9IMEYFfHlvL
KoNlK4/1vP/dAJZj0iOqd2ksCA1G6iLoHrFp3E5pdtmb4sVe2Ez3gMt+pxz3htR9
XvjuHOzkWE9eiihs1NsFgQuyP/o3UmNKpDDW0irQ06IFEPXkA/y1mVmeTU3qtrII
ZDiwGozIkMMEy/MLkcjE
=tD6R
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- pass HOSTLDFLAGS when compiling single .c host programs
- build genksyms lexer and parser files instead of using shipped
versions
- rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency
- let the top .gitignore globally ignore artifacts generated by flex,
bison, and asn1_compiler
- let the top Makefile globally clean artifacts generated by flex,
bison, and asn1_compiler
- use safer .SECONDARY marker instead of .PRECIOUS to prevent
intermediate files from being removed
- support -fmacro-prefix-map option to make __FILE__ a relative path
- fix # escaping to prepare for the future GNU Make release
- clean up deb-pkg by using debian tools instead of handrolled
source/changes generation
- improve rpm-pkg portability by supporting kernel-install as a
fallback of new-kernel-pkg
- extend Kconfig listnewconfig target to provide more information
* tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: extend output of 'listnewconfig'
kbuild: rpm-pkg: use kernel-install as a fallback for new-kernel-pkg
Kbuild: fix # escaping in .cmd files for future Make
kbuild: deb-pkg: split generating packaging and build
kbuild: use -fmacro-prefix-map to make __FILE__ a relative path
kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers
kbuild: rename *-asn1.[ch] to *.asn1.[ch]
kbuild: clean up *-asn1.[ch] patterns from top-level Makefile
.gitignore: move *-asn1.[ch] patterns to the top-level .gitignore
kbuild: add %.dtb.S and %.dtb to 'targets' automatically
kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically
genksyms: generate lexer and parser during build instead of shipping
kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile
.gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore
kbuild: use HOSTLDFLAGS for single .c executables
Pull x86 fixes from Thomas Gleixner:
"A set of fixes and updates for x86:
- Address a swiotlb regression which was caused by the recent DMA
rework and made driver fail because dma_direct_supported() returned
false
- Fix a signedness bug in the APIC ID validation which caused invalid
APIC IDs to be detected as valid thereby bloating the CPU possible
space.
- Fix inconsisten config dependcy/select magic for the MFD_CS5535
driver.
- Fix a corruption of the physical address space bits when encryption
has reduced the address space and late cpuinfo updates overwrite
the reduced bit information with the original value.
- Dominiks syscall rework which consolidates the architecture
specific syscall functions so all syscalls can be wrapped with the
same macros. This allows to switch x86/64 to struct pt_regs based
syscalls. Extend the clearing of user space controlled registers in
the entry patch to the lower registers"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix signedness bug in APIC ID validity checks
x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
x86/olpc: Fix inconsistent MFD_CS5535 configuration
swiotlb: Use dma_direct_supported() for swiotlb_ops
syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
syscalls/core, syscalls/x86: Clean up syscall stub naming convention
syscalls/x86: Extend register clearing on syscall entry to lower registers
syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
x86/syscalls: Don't pointlessly reload the system call number
x86/mm: Fix documentation of module mapping range with 4-level paging
x86/cpuid: Switch to 'static const' specifier
Pull x86 pti updates from Thomas Gleixner:
"Another series of PTI related changes:
- Remove the manual stack switch for user entries from the idtentry
code. This debloats entry by 5k+ bytes of text.
- Use the proper types for the asm/bootparam.h defines to prevent
user space compile errors.
- Use PAGE_GLOBAL for !PCID systems to gain back performance
- Prevent setting of huge PUD/PMD entries when the entries are not
leaf entries otherwise the entries to which the PUD/PMD points to
and are populated get lost"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
x86/pti: Leave kernel text global for !PCID
x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image
x86/pti: Enable global pages for shared areas
x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
x86/mm: Comment _PAGE_GLOBAL mystery
x86/mm: Remove extra filtering in pageattr code
x86/mm: Do not auto-massage page protections
x86/espfix: Document use of _PAGE_GLOBAL
x86/mm: Introduce "default" kernel PTE mask
x86/mm: Undo double _PAGE_PSE clearing
x86/mm: Factor out pageattr _PAGE_GLOBAL setting
x86/entry/64: Drop idtentry's manual stack switch for user entries
x86/uapi: Fix asm/bootparam.h userspace compilation errors
Pull scheduler fixes from Thomas Gleixner:
"A few scheduler fixes:
- Prevent a bogus warning vs. runqueue clock update flags in
do_sched_rt_period_timer()
- Simplify the helper functions which handle requests for skipping
the runqueue clock updat.
- Do not unlock the tunables mutex in the error path of the cpu
frequency scheduler utils. Its not held.
- Enforce proper alignement for 'struct util_est' in sched_avg to
prevent a misalignment fault on IA64"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Force proper alignment of 'struct util_est'
sched/core: Simplify helpers for rq clock update skip requests
sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
sched/cpufreq/schedutil: Fix error path mutex unlock
Pull more perf updates from Thomas Gleixner:
"A rather large set of perf updates:
Kernel:
- Fix various initialization issues
- Prevent creating [ku]probes for not CAP_SYS_ADMIN users
Tooling:
- Show only failing syscalls with 'perf trace --failure' (Arnaldo
Carvalho de Melo)
e.g: See what 'openat' syscalls are failing:
# perf trace --failure -e openat
762.323 ( 0.007 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video2) = -1 ENOENT No such file or directory
<SNIP N /dev/videoN open attempts... sigh, where is that improvised camera lid?!? >
790.228 ( 0.008 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video63) = -1 ENOENT No such file or directory
^C#
- Show information about the event (freq, nr_samples, total
period/nr_events) in the annotate --tui and --stdio2 'perf
annotate' output, similar to the first line in the 'perf report
--tui', but just for the samples for a the annotated symbol
(Arnaldo Carvalho de Melo)
- Introduce 'perf version --build-options' to show what features were
linked, aliased as well as a shorter 'perf -vv' (Jin Yao)
- Add a "dso_size" sort order (Kim Phillips)
- Remove redundant ')' in the tracepoint output in 'perf trace'
(Changbin Du)
- Synchronize x86's cpufeatures.h, no effect on toolss (Arnaldo
Carvalho de Melo)
- Show group details on the title line in the annotate browser and
'perf annotate --stdio2' output, so that the per-event columns can
have headers (Arnaldo Carvalho de Melo)
- Fixup vertical line separating metrics from instructions and
cleaning unused lines at the bottom, both in the annotate TUI
browser (Arnaldo Carvalho de Melo)
- Remove duplicated 'samples' in lost samples warning in
'perf report' (Arnaldo Carvalho de Melo)
- Synchronize i915_drm.h, silencing the perf build process,
automagically adding support for the new DRM_I915_QUERY ioctl
(Arnaldo Carvalho de Melo)
- Make auxtrace_queues__add_buffer() allocate struct buffer, from a
patchkit already applied (Adrian Hunter)
- Fix the --stdio2/TUI annotate output to include group details, be
it for a recorded '{a,b,f}' explicit event group or when forcing
group display using 'perf report --group' for a set of events not
recorded as a group (Arnaldo Carvalho de Melo)
- Fix display artifacts in the ui browser (base class for the
annotate and main report/top TUI browser) related to the extra
title lines work (Arnaldo Carvalho de Melo)
- perf auxtrace refactorings, leftovers from a previously partially
processed patchset (Adrian Hunter)
- Fix the builtin clang build (Sandipan Das, Arnaldo Carvalho de
Melo)
- Synchronize i915_drm.h, silencing a perf build warning and in the
process automagically adding support for a new ioctl command
(Arnaldo Carvalho de Melo)
- Fix a strncpy issue in uprobe tracing"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf/core: Need CAP_SYS_ADMIN to create k/uprobe with perf_event_open()
tracing/uprobe_event: Fix strncpy corner case
perf/core: Fix perf_uprobe_init()
perf/core: Fix perf_kprobe_init()
perf/core: Fix use-after-free in uprobe_perf_close()
perf tests clang: Fix function name for clang IR test
perf clang: Add support for recent clang versions
perf tools: Fix perf builds with clang support
perf tools: No need to include namespaces.h in util.h
perf hists browser: Remove leftover from row returned from refresh
perf hists browser: Show extra_title_lines in the 'D' debug hotkey
perf auxtrace: Make auxtrace_queues__add_buffer() do CPU filtering
tools headers uapi: Synchronize i915_drm.h
perf report: Remove duplicated 'samples' in lost samples warning
perf ui browser: Fixup cleaning unused lines at the bottom
perf annotate browser: Fixup vertical line separating metrics from instructions
perf annotate: Show group details on the title line
perf auxtrace: Make auxtrace_queues__add_buffer() allocate struct buffer
perf/x86/intel: Move regs->flags EXACT bit init
perf trace: Remove redundant ')'
...
Pull x86 EFI bootup fixlet from Thomas Gleixner:
"A single fix for an early boot warning caused by invoking
this_cpu_has() before SMP initialization"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
Pull irq affinity fixes from Thomas Gleixner:
- Fix error path handling in the affinity spreading code
- Make affinity spreading smarter to avoid issues on systems which
claim to have hotpluggable CPUs while in fact they can't hotplug
anything.
So instead of trying to spread the vectors (and thereby the
associated device queues) to all possibe CPUs, spread them on all
present CPUs first. If there are left over vectors after that first
step they are spread among the possible, but not present CPUs which
keeps the code backwards compatible for virtual decives and NVME
which allocate a queue per possible CPU, but makes the spreading
smarter for devices which have less queues than possible or present
CPUs.
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/affinity: Spread irq vectors among present CPUs as far as possible
genirq/affinity: Allow irq spreading from a given starting point
genirq/affinity: Move actual irq vector spreading into a helper function
genirq/affinity: Rename *node_to_possible_cpumask as *node_to_cpumask
genirq/affinity: Don't return with empty affinity masks on error
Just one small thing here, it came in a while back but I didnt have
anything in my 4.16 queue, still its the only thing for 4.17 so sending
it alone.
Small cleanup:
- remove unused __ARCH_HAVE_MMU define
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJayP1LAAoJEMOzHC1eZifkoCcQAK5PZ5awA3+DVAPNWN3I9hMF
0tk3f+R+GrF+wBQ8IwXOok4QaXmJrDwgxHnNRrhBFqHROz1OjEDYNnOk4Two3rs6
+yTw7TC9GMyowxWcyf+VHzo4/54XIk/Cwg8thEN5W2JSYC/kn1f0HIqfF6AYfCcw
fLqhDbuM09cslvdhfgsdRQ/ZPDcQjrr9XDP8CUZIeHqCxbPwPWikOUmey4itZKTi
VQO60SAjNjqSYWcIEcH8EfCvvpEndREvzOH8w7um282USKfiQv3/Qkur3ihTT+0e
/7JF2LZFLBVPU3dqZVlZDfukamfU+aRsXMh4foEhM6l6uddiKYnu0wjh25trJ6P2
AlM8BsGL3kjWi7mqR7nxuOgRsLqNcTGPcJaU5X/TGTObDLADATDf7xrxcGxPFMY/
Z3xkBmKHxv7tqSEo5pfUqvNV7mzT3uep5t4Ql9wq6v0dj+feB6sNx1n/mjSLefOf
6g9mTarcVR6QajJZYggVNdY1XGOLh88MKpbV/oYIYqMNC34Q/2JTVvKKqsBmyi3r
Rjk14gZ/fQeKT28XzUgRabDwevCSBzHknDBCQRoQJOjwC/q6Pm7xlOx8XCgEavBn
OgqXEzq4aWrsAOL2Ffi24+dMvnJe6IOGkU0xDQD1elc1FxJedsMPJUgS9FEnPkGN
TC3WReB0EdSJ8QI8/oy7
=edKX
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/openrisc/linux
Pull OpenRISC fixlet from Stafford Horne:
"Just one small thing here, it came in a while back but I didnt have
anything in my 4.16 queue, still its the only thing for 4.17 so
sending it alone.
Small cleanup: remove unused __ARCH_HAVE_MMU define"
* tag 'for-linus' of git://github.com/openrisc/linux:
openrisc: remove unused __ARCH_HAVE_MMU define
- Fix crashes when loading modules built with a different CONFIG_RELOCATABLE
value by adding CONFIG_RELOCATABLE to vermagic.
- Fix busy loops in the OPAL NVRAM driver if we get certain error conditions
from firmware.
- Remove tlbie trace points from KVM code that's called in real mode, because
it causes crashes.
- Fix checkstops caused by invalid tlbiel on Power9 Radix.
- Ensure the set of CPU features we "know" are always enabled is actually the
minimal set when we build with support for firmware supplied CPU features.
Thanks to:
Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJa0oWBExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAV
EA//UQB7n33HlroMBL3841VswUrIgY36gSi9+QZPjxHiTYGVStI+u0FHq5hm8OMm
1FkronD0sfbN7t8LJ9FS6vCoAlW15vMBj95pWJXAL7NuQneO7cM5JvRbowVKbNbq
GESn4EkUj9bAXl7aTzX2yA7jruzMJIwE/2bgl1J6YkpByHx5x/fgzKTuoCRggQqY
Xd656ZZyWR/uIuWhAjCPFdrPnekVvo7/Gy5Yo5kcR6LbX4MO0JFNOHpiT3wPGGv/
LJedJtdpH1biMk7SrqLfWD7mpMsVJeMelpkftEbe8zUXf17wBd1rl4JkybLTDwZD
HznZ0nbnh0j5Q/KRHwOh0sLDSisJF6pQHH/Bnc4Xqrsn4OERwEk40/Ym/O0kDsjH
n2F3X5a5QLWoBWRsdV3BMyEzc0bgnb++tXeBWOV4jJBEOhoqxwimCU+3fmLiy95A
urJYf8Sljwve9I5kJyXKpruopgeoJ1aumRwopE8YCG9lfBgzvU/90u6+uKqAdkOv
kpZxS5FDT2lNIwbCP9WjEelAOGrtA6JQNWU0iPGwsVAvAxX/p1RwwbQeWVlWs3Ek
UdVF8eGezClpLPa/FQFdDyXfGE6WIf1vK9YnG7k3rUwwkSqoYxKgVQ3o1Vetp0Lg
fzPsDdDi2V2I3KDYNav8s6kohAIIS/a9XYk4BGvT/htRnUg=
=0e7c
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix crashes when loading modules built with a different
CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic.
- Fix busy loops in the OPAL NVRAM driver if we get certain error
conditions from firmware.
- Remove tlbie trace points from KVM code that's called in real mode,
because it causes crashes.
- Fix checkstops caused by invalid tlbiel on Power9 Radix.
- Ensure the set of CPU features we "know" are always enabled is
actually the minimal set when we build with support for firmware
supplied CPU features.
Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.
* tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
powerpc/8xx: Fix build with hugetlbfs enabled
powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
powerpc/fscr: Enable interrupts earlier before calling get_user()
powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
Merge yet more updates from Andrew Morton:
- various hotfixes
- kexec_file updates and feature work
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
kernel/kexec_file.c: move purgatories sha256 to common code
kernel/kexec_file.c: allow archs to set purgatory load address
kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs
kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs
kernel/kexec_file.c: split up __kexec_load_puragory
kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
kernel/kexec_file.c: search symbols in read-only kexec_purgatory
kernel/kexec_file.c: make purgatory_info->ehdr const
kernel/kexec_file.c: remove checks in kexec_purgatory_load
include/linux/kexec.h: silence compile warnings
kexec_file, x86: move re-factored code to generic side
x86: kexec_file: clean up prepare_elf64_headers()
x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
x86: kexec_file: purge system-ram walking from prepare_elf64_headers()
kexec_file,x86,powerpc: factor out kexec_file_ops functions
kexec_file: make use of purgatory optional
proc: revalidate misc dentries
mm, slab: reschedule cache_reap() on the same CPU
...
The code to verify the new kernels sha digest is applicable for all
architectures. Move it to common code.
One problem is the string.c implementation on x86. Currently sha256
includes x86/boot/string.h which defines memcpy and memset to be gcc
builtins. By moving the sha256 implementation to common code and
changing the include to linux/string.h both functions are no longer
defined. Thus definitions have to be provided in x86/purgatory/string.c
Link: http://lkml.kernel.org/r/20180321112751.22196-12-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For s390 new kernels are loaded to fixed addresses in memory before they
are booted. With the current code this is a problem as it assumes the
kernel will be loaded to an 'arbitrary' address. In particular,
kexec_locate_mem_hole searches for a large enough memory region and sets
the load address (kexec_bufer->mem) to it.
Luckily there is a simple workaround for this problem. By returning 1
in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This
allows the architecture to set kbuf->mem by hand. While the trick works
fine for the kernel it does not for the purgatory as here the
architectures don't have access to its kexec_buffer.
Give architectures access to the purgatories kexec_buffer by changing
kexec_load_purgatory to take a pointer to it. With this change
architectures have access to the buffer and can edit it as they need.
A nice side effect of this change is that we can get rid of the
purgatory_info->purgatory_load_address field. As now the information
stored there can directly be accessed from kbuf->mem.
Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current code uses the sh_offset field in purgatory_info->sechdrs to
store a pointer to the current load address of the section. Depending
whether the section will be loaded or not this is either a pointer into
purgatory_info->purgatory_buf or kexec_purgatory. This is not only a
violation of the ELF standard but also makes the code very hard to
understand as you cannot tell if the memory you are using is read-only
or not.
Remove this misuse and store the offset of the section in
pugaroty_info->purgatory_buf in sh_offset.
Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The main loop currently uses quite a lot of variables to update the
section headers. Some of them are unnecessary. So clean them up a
little.
Link: http://lkml.kernel.org/r/20180321112751.22196-9-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To update the entry point there is an extra loop over all section
headers although this can be done in the main loop. So move it there
and eliminate the extra loop and variable to store the 'entry section
index'.
Also, in the main loop, move the usual case, i.e. non-bss section, out
of the extra if-block.
Link: http://lkml.kernel.org/r/20180321112751.22196-8-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When inspecting __kexec_load_purgatory you find that it has two tasks
1) setting up the kexec_buffer for the new kernel and,
2) setting up pi->sechdrs for the final load address.
The two tasks are independent of each other. To improve readability
split up __kexec_load_purgatory into two functions, one for each task,
and call them directly from kexec_load_purgatory.
Link: http://lkml.kernel.org/r/20180321112751.22196-7-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the relocations are applied to the purgatory only the section the
relocations are applied to is writable. The other sections, i.e. the
symtab and .rel/.rela, are in read-only kexec_purgatory. Highlight this
by marking the corresponding variables as 'const'.
While at it also change the signatures of arch_kexec_apply_relocations* to
take section pointers instead of just the index of the relocation section.
This removes the second lookup and sanity check of the sections in arch
code.
Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The stripped purgatory does not contain a symtab. So when looking for
symbols this is done in read-only kexec_purgatory. Highlight this by
marking the corresponding variables as 'const'.
Link: http://lkml.kernel.org/r/20180321112751.22196-5-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kexec_purgatory buffer is read-only. Thus all pointers into
kexec_purgatory are read-only, too. Point this out by explicitly
marking purgatory_info->ehdr as 'const' and update the comments in
purgatory_info.
Link: http://lkml.kernel.org/r/20180321112751.22196-4-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before the purgatory is loaded several checks are done whether the ELF
file in kexec_purgatory is valid or not. These checks are incomplete.
For example they don't check for the total size of the sections defined
in the section header table or if the entry point actually points into
the purgatory.
On the other hand the purgatory, although an ELF file on its own, is
part of the kernel. Thus not trusting the purgatory means not trusting
the kernel build itself.
So remove all validity checks on the purgatory and just trust the kernel
build.
Link: http://lkml.kernel.org/r/20180321112751.22196-3-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kexec_file: Clean up purgatory load", v2.
Following the discussion with Dave and AKASHI, here are the common code
patches extracted from my recent patch set (Add kexec_file_load support
to s390) [1]. The patches were extracted to allow upstream integration
together with AKASHI's common code patches before the arch code gets
adjusted to the new base.
The reason for this series is to prepare common code for adding
kexec_file_load to s390 as well as cleaning up the mis-use of the
sh_offset field during purgatory load. In detail this series contains:
Patch #1&2: Minor cleanups/fixes.
Patch #3-9: Clean up the purgatory load/relocation code. Especially
remove the mis-use of the purgatory_info->sechdrs->sh_offset field,
currently holding a pointer into either kexec_purgatory (ro) or
purgatory_buf (rw) depending on the section. With these patches the
section address will be calculated verbosely and sh_offset will contain
the offset of the section in the stripped purgatory binary
(purgatory_buf).
Patch #10: Allows architectures to set the purgatory load address. This
patch is important for s390 as the kernel and purgatory have to be
loaded to fixed addresses. In current code this is impossible as the
purgatory load is opaque to the architecture.
Patch #11: Moves x86 purgatories sha implementation to common lib/
directory to allow reuse in other architectures.
This patch (of 11)
When building the kernel with CONFIG_KEXEC_FILE enabled gcc prints a
compile warning multiple times.
In file included from <path>/linux/init/initramfs.c:526:0:
<path>/include/linux/kexec.h:120:9: warning: `struct kimage' declared inside parameter list [enabled by default]
unsigned long cmdline_len);
^
This is because the typedefs for kexec_file_load uses struct kimage
before it is declared. Fix this by simply forward declaring struct
kimage.
Link: http://lkml.kernel.org/r/20180321112751.22196-2-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the previous patches, commonly-used routines, exclude_mem_range() and
prepare_elf64_headers(), were carved out. Now place them in kexec
common code. A prefix "crash_" is given to each of their names to avoid
possible name collisions.
Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Removing bufp variable in prepare_elf64_headers() makes the code simpler
and more understandable.
Link: http://lkml.kernel.org/r/20180306102303.9063-7-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While CRASH_MAX_RANGES (== 16) seems to be good enough, fixed-number
array is not a good idea in general.
In this patch, size of crash_mem buffer is calculated as before and the
buffer is now dynamically allocated. This change also allows removing
crash_elf_data structure.
Link: http://lkml.kernel.org/r/20180306102303.9063-6-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The code guarded by CONFIG_X86_64 is necessary on some architectures
which have a dedicated kernel mapping outside of linear memory mapping.
(arm64 is among those.)
In this patch, an additional argument, kernel_map, is added to enable/
disable the code removing #ifdef.
Link: http://lkml.kernel.org/r/20180306102303.9063-5-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While prepare_elf64_headers() in x86 looks pretty generic for other
architectures' use, it contains some code which tries to list crash
memory regions by walking through system resources, which is not always
architecture agnostic. To make this function more generic, the related
code should be purged.
In this patch, prepare_elf64_headers() simply scans crash_mem buffer
passed and add all the listed regions to elf header as a PT_LOAD
segment. So walk_system_ram_res(prepare_elf64_headers_callback) have
been moved forward before prepare_elf64_headers() where the callback,
prepare_elf64_headers_callback(), is now responsible for filling up
crash_mem buffer.
Meanwhile exclude_elf_header_ranges() used to be called every time in
this callback it is rather redundant and now called only once in
prepare_elf_headers() as well.
Link: http://lkml.kernel.org/r/20180306102303.9063-4-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>