The increment of delay counter was 2 instructions:
Arithmatic Shfit Left (ASL) + set to 1 on overflow
This can be done in 1 using ROtate Left (ROL)
Suggested-by: Nigel Topham <ntopham@synopsys.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This is to workaround the llock/scond livelock
HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping
coherency transactions in the SCU. The exclusive line state keeps rotating
among contenting cores leading to a never ending cycle. So break the cycle
by deferring the retry of failed exclusive access (SCOND). The actual delay
needed is function of number of contending cores as well as the unrelated
coherency traffic from other cores. To keep the code simple, start off with
small delay of 1 which would suffice most cases and in case of contention
double the delay. Eventually the delay is sufficient such that the coherency
pipeline is drained, thus a subsequent exclusive access would succeed.
Link: http://lkml.kernel.org/r/1438612568-28265-1-git-send-email-vgupta@synopsys.com
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This reduces the diff in forth-coming patches and also helps understand
better the incremental changes to inline asm.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Extended testing of quad core configuration revealed that this fix was
insufficient. Specifically LTP open posix shm_op/23-1 would cause the
hardware livelock in llock/scond loop in update_cpu_load_active()
So remove this and make way for a proper workaround
This reverts commit a5c8b52abe.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
A quad core SMP build could get into hardware livelock with concurrent
LLOCK/SCOND. Workaround that by adding a PREFETCHW which is serialized by
SCU (System Coherency Unit). It brings the cache line in Exclusive state
and makes others invalidate their lines. This gives enough time for
winner to complete the LLOCK/SCOND, before others can get the line back.
The prefetchw in the ll/sc loop is not nice but this is the only
software workaround for current version of RTL.
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
- arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
Since ARCv2 only provides load/load, store/store and all/all, we need
the full barrier
- LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
values were lacking the explicit smp barriers.
- Non LLOCK/SCOND varaints don't need the explicit barriers since that
is implicity provided by the spin locks used to implement the
critical section (the spin lock barriers in turn are also fixed in
this commit as explained above
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Updated Boot printing
kgdb update for arc gdb 7.5
Bug fixes (some marked for stable)
More code refactoring/consolidation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJURJFgAAoJEGnX8d3iisJeuFcP/R/GlO+IHMt7MsMZ523qGF5/
sltK7/OBiUUYFGiCE2IjzSXcv3eRxhpE46bHfqPCncbo/LgEEHrnxKLYkQJyTGiF
nhAQ5jwlGzsgOJAuhDSrsdT5ixYzx8Q27T6+9bWHk7iG2QuFVv7oPqVBUciu8jML
43eSfCz5NxlbzXMW4KBTtpQXJ1y/BFFcRl4BDEV2iXfNt/Qkbc7UBdX52cc9Rh68
r3l0xvfSgTc6gWsULaYtNsh6Jeyhk6ocvoV1C/o1vYWSAEdzrEJSek3V33rybROE
dFVoFMJ0Qyj+Z6gMulPoZgfesKDEqUryr7YjNeOCnAGUp6zDIzaZpJj70qkRdTJT
7MQuXA+OIiVuMJ/egLukSEgsb4n8NjWeb69LWEnTHom59TVu9tDlY4Y36W4rsGMc
3Uek7rn+a2QYYGq+aP2R4+j6Y6+/zKmipPQpUaYo6C1WaBGpPeEDDZzSFX83web9
VBb0FmOL37evy1Sv1VhBT1PX8q5HuRrNs9v23GRWMOnonQ0D87TexiTbdpOMoe8I
uOXlUwtkDyh5pHSJW6dtNogKhe12XMVTalWYws7jPKXsFuNMSAE/5F3GsNoAZ61y
RPnss4WPwW3J0oWJCKAmUDOP+VNFsWValMc+Sa7IDryuxs/Dlh7G4paj7KTA4pVR
+FipaGsUuOK7gbOGChVn
=236F
-----END PGP SIGNATURE-----
Merge tag 'arc-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
"Sorry for the late pull request. Current stuff was ready for a while
but I was hoping to squeeze in support for almost ready ARC SDP
platform (and avoid a 2nd pull request), however it seems there are
still some loose ends which warrant more time.
- Platform code reduction/moving-up (TB10X no longer needs any
callbacks)
- updated boot printing
- kgdb update for arc gdb 7.5
- bug fixes (some marked for stable)
- more code refactoring/consolidation"
* tag 'arc-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: boot: cpu feature print enhancements
ARC: boot: consolidate cross-checking of h/w and s/w
ARC: unbork FPU save/restore
ARC: remove extraneous __KERNEL__ guards
ARC: Update order of registers in KGDB to match GDB 7.5
ARC: Remove unneeded Kconfig entry NO_DMA
ARC: BUG() dumps stack after @msg (@msg now same as in generic BUG))
ARC: refactoring: reduce the scope of some local vars
ARC: remove gcc mpy heuristics
ARC: RIP @running_on_hw
ARC: Update comments about uncached address space
ARC: rename kconfig option for unaligned emulation
ARC: [nsimosci] Allow "headless" models to boot
ARC: [arcfpga] Get rid of ARC_BOARD_ANGEL4 and ARC_BOARD_ML509
ARC: [arcfpga] Remove more dead code
ARC: [plat*] move code out of .init_machine into common
ARC: [arcfpga] consolidate machine description, DT
ARC: Allow SMP kernel to build/boot on UP-only infrastructure
Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.
This also prepares for easy addition of new ops.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Link: http://lkml.kernel.org/r/20140508135851.886055622@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The arc mb() implementation is a compiler barrier(), therefore it all
doesn't matter one way or the other. Simply remove the existing
definitions and use whatever is generated by the defaults.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-ua48a59wri3ybz1rz8i7uvbr@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the barriers functions that depend on the atomic implementation
into the atomic implementation.
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [for arch/arc bits]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20131213150640.786183683@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This covers the UP / SMP (with no hardware assist for atomic r-m-w) as
well as ARC700 LLOCK/SCOND insns based.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>