* 'core/iter-div' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
always_inline timespec_add_ns
add an inlined version of iter_div_u64_rem
common implementation of iterative div/mod
Kumar Gala <galak@kernel.crashing.org> wrote:
We have a case in powerpc in which we want to link some library
routines with all module objects. The routines are intended for
handling out-of-line function call register save/restore so having
them as EXPORT_SYMBOL() is counter productive (we do also need to
link the same "library" code into the kernel).
Without this patch a powerpc build would error out and fail
to build modules with the added register save/restore module.
There were two obvious solutions:
1) To link the .o file before the modpost stage
2) To ignore the symbols in modpost
Option 1) was ruled out because we do not have any separate
linking stage for single file modules.
This patch implements option 2 - and do so only for powerpc.
The symbols we ignore are all undefined symbols named:
_restgpr_*, _savegpr_*, _rest32gpr_*, _save32gpr_*
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
(overflow means weight >= 2^32 here, because inv_weigh = 2^32/weight)
A weight of a cfs_rq is the sum of weights of which entities
are queued on this cfs_rq, so it will overflow when there are
too many entities.
Although, overflow occurs very rarely, but it break fairness when
it occurs. 64-bits systems have more memory than 32-bit systems
and 64-bit systems can create more process usually, so overflow may
occur more frequently.
This patch guarantees fairness when overflow happens on 64-bit systems.
Thanks to the optimization of compiler, it changes nothing on 32-bit.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I found a bug which can be reproduced by this way:(linux-2.6.26-rc5, x86-64)
(use 2^32, 2^33, ...., 2^63 as shares value)
# mkdir /dev/cpuctl
# mount -t cgroup -o cpu cpuctl /dev/cpuctl
# cd /dev/cpuctl
# mkdir sub
# echo 0x8000000000000000 > sub/cpu.shares
# echo $$ > sub/tasks
oops here! divide by zero.
This is because do_div() expects the 2th parameter to be 32 bits,
but unsigned long is 64 bits in x86_64.
Peter Zijstra pointed it out that the sane thing to do is limit the
shares value to something smaller instead of using an even more
expensive divide.
Also, I found another bug about "the shares value is too large":
pid1 and pid2 are set affinity to cpu#0
pid1 is attached to cg1 and pid2 is attached to cg2
if cg1/cpu.shares = 1024 cg2/cpu.shares = 2000000000
then pid2 got 100% usage of cpu, and pid1 0%
if cg1/cpu.shares = 1024 cg2/cpu.shares = 20000000000
then pid2 got 0% usage of cpu, and pid1 100%
And a weight of a cfs_rq is the sum of weights of which entities
are queued on this cfs_rq, so the shares value should be limited
to a smaller value.
I think that (1UL << 18) is a good limited value:
1) it's not too large, we can create a lot of group before overflow
2) it's several times the weight value for nice=-19 (not too small)
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the RTNL is held when we invoke flush_scheduled_work() we could
deadlock. One such case is linkwatch, it is a work struct which tries
to grab the RTNL semaphore.
The most common case are net driver ->stop() methods. The
simplest conversion is to instead use cancel_{delayed_}work_sync()
explicitly on the various work struct the driver uses.
This is an OK transformation because these work structs are doing
things like resetting the chip, restarting link negotiation, and so
forth. And if we're bringing down the device, we're about to turn the
chip off and reset it anways. So if we cancel a pending work event,
that's fine here.
Some drivers were working around this deadlock by using a msleep()
polling loop of some sort, and those cases are converted to instead
use cancel_{delayed_}work_sync() as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
timespec_add_ns is used from the x86-64 vdso, which cannot call out to
other kernel code. Make sure that timespec_add_ns is always inlined
(and only uses always_inlined functions) to make sure there are no
unexpected calls.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
iter_div_u64_rem is used in the x86-64 vdso, which cannot call other
kernel code. For this case, provide the always_inlined version,
__iter_div_u64_rem.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have a few instances of the open-coded iterative div/mod loop, used
when we don't expcet the dividend to be much bigger than the divisor.
Unfortunately modern gcc's have the tendency to strength "reduce" this
into a full mod operation, which isn't necessarily any faster, and
even if it were, doesn't exist if gcc implements it in libgcc.
The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.
This patch creates a single implementation of this loop, and uses it
to replace the open-coded versions I know about.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On some boards, the mv643xx_eth MAC isn't connected to a PHY but
directly (via the MII/GMII/RGMII interface) to another MAC-layer
device. This patch allows specifying ->phy_addr = -1 to skip all
PHY-related initialisation and run-time poking in that case.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
During OOM, instead of stopping RX refill when the rx desc ring is
not empty, keep trying to refill the ring as long as it is not full
instead.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Some SoCs have the TX bandwidth control registers in a slightly
different place. This patch detects that case at run time, and
re-directs accesses to those registers to the proper place at
run time if needed.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Newer hardware has a 16-bit instead of a 14-bit RX coalescing
count field in the SDMA_CONFIG register. This patch adds a run-time
check for which of the two we have, and adjusts further writes to the
rx coal count field accordingly.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Under some conditions, the TXQ ('TX queue being served') bit can clear
before all packets queued for that TX queue have been transmitted.
This patch enables TXend interrupts, and uses those to re-kick TX
queues that claim to be idle but still have queued descriptors from
the interrupt handler.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
As with the multiple RX queue support, allow the platform code to
specify that the hardware we are running on supports multiple TX
queues. This patch only uses the highest-numbered enabled queue
to send packets to for now, this can be extended later to enable
QoS and such.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Allow the platform code to specify that we are running on hardware
that is capable of supporting multiple RX queues. If this option
is used, initialise all of the given RX queues instead of just RX
queue zero.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Add an interface for the hardware's per-port and per-subqueue
TX rate control. In this stage, this is mainly so that we can
disable the bandwidth limits during initialisation of the port.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
General cleanup of the mv643xx_eth driver. Mainly fixes coding
style / indentation issues, get rid of some useless 'volatile's,
kill some more superfluous comments, and such.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Remove the write-only ->[rt]x_int_coal members from struct
mv643xx_eth_private. In the process, tweak the RX/TX interrupt
mitigation code so that it is compiled by default, and set the
default coalescing delays to 0 usec.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Split all TX queue related state into 'struct tx_queue', in
preparation for multiple TX queue support.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Split all RX queue related state into 'struct rx_queue', in
preparation for multiple RX queue support.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The per-port mv643xx_eth_private struct had a private instance
of struct net_device_stats that was never ever written to, only
read (via the ethtool statistics interface). This patch gets
rid of the private instance, and tweaks the ethtool statistics
code in mv643xx_eth to use the statistics in struct net_device
instead.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Since they are no longer used, kill enum FUNC_RET_STATUS and
struct pkt_info (which were a rather roundabout way of communicating
RX/TX status within the same driver).
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
rx_return_buff() is also a remnant of the HAL layering that the
original mv643xx_eth driver used. Moving it into its caller kills
the last reference to FUNC_RET_STATUS/pkt_info.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The port_receive() function is a remnant of the original mv643xx_eth
HAL split. This patch moves port_receive() into its caller, so that
the top and the bottom half of RX processing no longer communicate
via the HAL FUNC_RET_STATUS/pkt_info mechanism abstraction anymore.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Half of the functions in the mv643xx_eth driver are prefixed by
useless and baroque comment blocks on _what_ those functions do (which
is obvious from the code itself) rather than why, and there's no point
in keeping those comments around.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
A bunch of places in the mv643xx_eth driver use the 'mv643xx_'
prefix. Since the mv643xx is a chip that includes more than just
ethernet, this patch makes all those places use either no prefix
(for some internal-use-only functions), or the full 'mv643xx_eth_'
prefix.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The fact that mv643xx_eth is an ethernet driver is pretty obvious,
and having a lot of internal-use-only functions and defines prefixed
with ETH_/ethernet_/eth_ prefixes is rather pointless. So, get rid
of most of those prefixes.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Remove the unused rx/tx descriptor field defines, and move the ones
that are actually used to the actual definitions of the rx/tx
descriptor format.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
All except one of the port serial status register bit defines are
unused -- kill the unused ones.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The only user of the ETH_MIB_VERY_LONG_NAME_HERE defines is the
eth_update_mib_counters() function. Get rid of the defines by
open-coding the register offsets in the latter.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Get rid of RX_BUF_OFFSET (which is synonymous with ETH_HW_IP_ALIGN).
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Replace the nondescriptive names ETH_INT_UNMASK_ALL and
ETH_INT_UNMASK_ALL_EXT by names of the actual fields being masked
and unmasked in the various writes to the interrupt mask registers.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
None of the port status register bit defines are ever used in the
mv643xx_eth driver -- nuke them all.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Over half of the port serial control register bit defines are never
used, and the PORT_SERIAL_CONTROL_DEFAULT_VALUE define is never used
either. Keep only those defines that are actually used.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Delete the defines for SDMA config register bit values that are
never used in the driver, to tidy up the code some more.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The port config extend register is never changed at run time.
Document the meaning of the initial value, and delete the defines
for the individual bits in this register.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The mv643xx_eth driver only ever changes bit 0 of the port config
register at run time, the rest of the register bits are fixed (and
always zero). Document the meaning of the chosen default value,
and get rid of all the defines for each of the individual bits.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Shorten the various oversized register names in mv643xx_eth.c, to
increase readability.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
This patch performs a reverse topological sort of all functions in
mv643xx_eth.c, so that we can get rid of all forward declarations,
and end up with a more understandable driver due to related functions
being grouped together.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
This patch removes CVS keyword that weren't updated for a long time.
One of them was printed as part of a printk, which also doesn't make
much sense for a 5 year old and no longer updated keyword.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>