Fix a subtle race condition between bnx2_start_xmit() and bnx2_tx_int()
similar to the one in tg3 discovered by Herbert Xu:
CPU0 CPU1
bnx2_start_xmit()
if (tx_ring_full) {
tx_lock
bnx2_tx()
if (!netif_queue_stopped)
netif_stop_queue()
if (!tx_ring_full)
update_tx_ring
netif_wake_queue()
tx_unlock
}
Even though tx_ring is updated before the if statement in bnx2_tx_int() in
program order, it can be re-ordered by the CPU as shown above. This
scenario can cause the tx queue to be stopped forever if bnx2_tx_int() has
just freed up the entire tx_ring. The possibility of this happening
should be very rare though.
The following changes are made, very much identical to the tg3 fix:
1. Add memory barrier to fix the above race condition.
2. Eliminate the private tx_lock altogether and rely solely on
netif_tx_lock. This eliminates one spinlock in bnx2_start_xmit()
when the ring is full.
3. Because of 2, use netif_tx_lock in bnx2_tx_int() before calling
netif_wake_queue().
4. Add memory barrier to bnx2_tx_avail().
5. Add bp->tx_wake_thresh which is set to half the tx ring size.
6. Check for the full wake queue condition before getting
netif_tx_lock in tg3_tx(). This reduces the number of unnecessary
spinlocks when the tx ring is full in a steady-state condition.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
this minor patch fixes the description of net.ipv4.tcp_mem sysctl
in ip-sysctl.txt - the headline names the values "min, pressure, max",
while the description uses the "low, pressure, high" values.
Both tcp_rmem and tcp_wmem descriptions use the "min, pressure, max"
values, so I have changed the tcp_mem to match this and not vice versa.
Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a leak of a socket's multicast source filter list structure
on closing a socket with a multicast source filter set on an interface
that does not exist any more.
Signed-off-by: Michal Ruzicka <michal.ruzicka@comstar.cz>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split off __icmpv6_socket's sk->sk_dst_lock class, because it gets
used from softirqs, which is safe for __icmpv6_sockets (because they
never get directly used via userspace syscalls), but unsafe for normal
sockets.
Has no effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On High end systems (1024 or so cpus) this can potentially cause stack
overflow. Fix the stack usage.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since __vlan_hwaccel_rx() is essentially bypassing the
netif_receive_skb() call that would have occurred if we did the VLAN
decapsulation in software, we are missing the skb_bond() call and the
assosciated checks it does.
Export those checks via an inline function, skb_bond_should_drop(),
and use this in __vlan_hwaccel_rx().
Signed-off-by: David S. Miller <davem@davemloft.net>
Clear HID0[en_attn] at CPU init time on PPC970. Closes CVE-2006-4093.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The code for using the radix tree for reverse mapping of interrupts has
a typo that causes it to create incorrect mappings if the software and
hardware numbers happen to be different. This would, among others, cause
the IDE interrupt to fail on js20's. This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
- On archs that have no-exec support, we vmalloc() a executable scratch
area of PAGE_SIZE and divide it up into an array of slots of maximum
instruction size for that arch
- On a kprobe registration, the original instruction is copied to the
first available free slot, so if multiple kprobes are registered, chances
are, they get contiguous slots
- On POWER4, due to not having coherent icaches, we could hit a situation
where a probe that is registered on one processor, is hit immediately on
another. This second processor could have fetched the stream of text from
the out-of-line single-stepping area *before* the probe registration
completed, possibly due to an earlier (and a different) kprobe hit and
hence would see stale data at the slot.
Executing such an arbitrary instruction lead to a problem as reported
in LTC bugzilla 23555.
The correct solution is to call flush_icache_range() as soon as the
instruction is copied for out-of-line single-stepping, so the correct
instruction is seen on all processors.
Thanks to Will Schmidt who tracked this down.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To compile kexec on 32-bit we need a few more bits and pieces. Rather
than add empty definitions, we can make crash.c work on 32-bit, with
only a couple of kludges.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We're missing a few functions for kexec to compile on 32-bit. There's
nothing really 64-bit specific about the 64-bit versions, so make them
generic rather than adding empty definitions for 32-bit.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Updating the defconfigs for iseries, pseries, and G5. Sticking with
the defaults, with the following exceptions: I've turned off HW_RANDOM
for all three configs. For G5, I've enabled SND_AOA and friends as
modules; this includes the FABRIC_LAYOUT, ONYX, TAS, TOONIE and
SOUNDBUS* config options.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In the case of a system hang, the user will invoke soft-reset to
initiate the kdump boot. If xmon is enabled, the CPU(s) enter into the
xmon debugger. Unfortunately, the secondary CPU(s) will return to the
hung state when they exit from the debugger (returned from die() ->
system_reset_exception()). This causes a problem in kdump since the
hung CPU(s) will not respond to the IPI sent from kdump. This patch
fixes the issue by calling crash_kexec_secondary() directly from
system_reset_exception() without returning to the previous state. These
secondary CPUs wait 5ms until the kdump boot is started by the primary
CPU. In the case we exited from the debugger to "recover" (command 'x'
in xmon) the primary and the secondary CPUs will all return from die()
-> system_reset_exception() ->crash_kexec_secondary() wait 5ms, then
return to the previous state. A kdump boot is not started in this case.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
While going through the code, I found out some memory leaks and potential
crashes in drivers/acpi/hotkey.c Please find the patch to fix them.
This patch does the following,
1. Fixes memory leaks in error paths of hotkey_write_config
2. Fixes freeing unallocated pointers in the error paths of hotkey_write_config
3. Uses a loop instead of linear searching for parsing the userspace
input in get_params
4. Uses array of char * instead of passing 4 pointer parameters
explicitly into the init_{poll_}hotkey_* static functions
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Luming Yu <luming.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
A BIOS has been found that resumes from S3 to the routine that invoked suspend,
ignoring the resume vector. This appears to the OS as a failed S3 attempt.
This same system suspend/resume's properly with Windows.
It is possible to invoke the protected mode register restore routine (which
would normally restore the sysenter registers) when the BIOS returns from
S3. This has no effect on a correctly running system and repairs the
damage from the deviant BIOS.
Signed-off-by: William Morrow <william.morrow@amd.com>
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Some architectures change $CC in arch/$(ARCH)/Makefile
mips is one example.
That have impact on what options are supported by gcc so move all
$(call cc-option, ...) after include of arch specific Makefile.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
After commit 12bbb2b7be, when SM LID
change or LID change MAD also has a client reregistration bit set,
only CLIENT_REREGISTER event is generated.
As a result, the sa_query module and the cache module don't update the
port information, and ULPs (e.g. IPoIB) stop working. This is the
regression we observe as compared to 2.6.17.
Rather than generate multiple events (which would have negative
performance impact), let us simply let cache and SA query respond to
reregister event in the same way as to LID and SM change events.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In some situations PAV alias devices on LPAR are not accessible.
The initialization procedure required to enable access to PAV alias
devices has to be performed per storage server subsystem and not
only once per storage server.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The dasd_page_cache should return page addresses and therefore the
cache must be created with an alignment of PAGE_SIZE.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Change the build options for acpiphp so that it may build without being
dependent on the ACPI_DOCK option, but yet does not allow the option of
acpiphp being built-in when dock is built as a module.
This does not change the previous patch for ACPI_IBM_DOCK Kconfig.
For the following matrix of config options, I built an i386 kernel.
Dock acpiphp should it build? confirmed
y y y y
y n y y
y m y y
m y no - acpiphp should acpiphp was
convert to m converted to m
m n y y
m m y y
n y y y
n n y y
n m y y
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This is to remove noisy useless message at boot. The message is a ton of
"ACPI Exception (acpi_memory-0492): AE_ERROR, handle is no memory device"
In my emulation, number of memory devices are not so many (only 6), but,
this messages are displayed 114 times.
It is showed by acpi_memory_register_notify_handler() which is called by
acpi_walk_namespace().
acpi_walk_namespace() parses all of ACPI's namespace and execute
acpi_memory_register_notify_handler(). So, it is called for all of the
device which is defined in namespace. If the parsing device is not memory,
acpi_memhotplug ignores it due to "no match" and will parse next device.
This is normal route, not an exception.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Fix acpi_ac/battery boot with acpi=off
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
There is a small but annoying bug in scripts/mod/file2alias.c which causes
it to generate invalid aliases for input devices on 64 bit archs. This causes
joydev.ko to not be automaticly loaded when inserting a joystick, resulting in
a non working joystick (for the average user).
In scripts/mod/file2alias.c is the following code for generating the input
aliases:
static void do_input(char *alias,
kernel_ulong_t *arr, unsigned int min, unsigned int max)
{
unsigned int i;
for (i = min; i < max; i++)
if (arr[i / BITS_PER_LONG] & (1 << (i%BITS_PER_LONG)))
sprintf(alias + strlen(alias), "%X,*", i);
}
On 32 bits systems, this correctly generates "0,*" for the first alias, "8,*"
for the second etc.
However on 64 bits it generates: "0,*20,*" resp "8,*28,*" Notice how it adds 20
+ first entry (hex) ! to the list of hex codes, which is 32 more then the first
entry, thus is because the bit test above wraps at 32 bits instead of 64.
scripts/mod/file2alias.c, line 379 reads:
if (arr[i / BITS_PER_LONG] & (1 << (i%BITS_PER_LONG)))
That should be:
if (arr[i / BITS_PER_LONG] & (1L << (i%BITS_PER_LONG)))
Notice the added 'L' after the 1, otherwise that is an 32 bit int instead of a
64 bit long, and when that int gets shifted >= 32 times, appearantly the number
by which to shift is wrapped at 5 bits ( % 32) causing it to test a bit 32 bits
too low.
The patch below makes the nescesarry 1 char change :)
Signed-off-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Three typos in drivers/char/watchdog/Kconfig...
Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
fcntl(F_SETSIG) no longer works on leases because
lease_release_private_callback() gets called as the lease is copied in
order to initialise it.
The problem is that lease_alloc() performs an unnecessary initialisation,
which sets the lease_manager_ops. Avoid the problem by allocating the
target lease structure using locks_alloc_lock().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't let fuse_readpages leave the @pages list not empty when exiting
on error.
[akpm@osdl.org: kernel-doc fixes]
Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use a private lock instead. It protects all per-cpu data structures in
workqueue.c, including the workqueues list.
Fix a bug in schedule_on_each_cpu(): it was forgetting to lock down the
per-cpu resources.
Unfixed long-standing bug: if someone unplugs the CPU identified by
`singlethread_cpu' the kernel will get very sick.
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
linux/backlight.h pulls in header files (eg. ioport.h) that break
compilation of userspace programs. To solve the problem, only include
backlight.h in fb.h if compiling kernel stuff.
Signed-off-by: Michal Januszewski <spock@gentoo.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We found this issue last week w/ the -RT kernel, but it seems the same
issue is in mainline as well.
Basically it is possible for futex_unlock_pi to return without actually
freeing the lock. This is due to buggy logic in the use of
futex_handle_fault() and its attempt argument in a failure case.
Looking at futex.c the logic is as follows:
1) In futex_unlock_pi() we start w/ ret=0 and we go down to the first
futex_atomic_cmpxchg_inatomic(), where we find uval==-EFAULT. We then
jump to the pi_faulted label.
2) From pi_faulted: We increment attempt, unlock the sem and hit the
retry label.
3) From the retry label, with ret still zero, we again hit EFAULT on the
first futex_atomic_cmpxchg_inatomic(), and again goto the pi_faulted
label.
4) Again from pi_faulted: we increment attempt and enter the
conditional, where we call futex_handle_fault.
5) futex_handle_fault fails, and we goto the out_unlock_release_sem
label.
6) From out_unlock_release_sem we return, and since ret is still zero,
we return without error, while never actually unlocking the lock.
Issue #1: at the first futex_atomic_cmpxchg_inatomic() we should probably
be setting ret=-EFAULT before jumping to pi_faulted: However in our case
this doesn't really affect anything, as the glibc we're using ignores the
error value from futex_unlock_pi().
Issue #2: Look at futex_handle_fault(), its first conditional will return
-EFAULT if attempt is >= 2. However, from the "if(attempt++)
futex_handle_fault(attempt)" logic above, we'll *never* call
futex_handle_fault when attempt is less then two. So we never get a chance
to even try to fault the page in.
The following patch addresses these two issues by 1) Always setting ret to
-EFAULT if futex_handle_fault fails, and 2) Removing the = in
futex_handle_fault's (attempt >= 2) check.
I'm really not sure this is the right fix, but wanted to bring it up so
folks knew the issue is alive and well in the current -git tree. From
looking at the git logs the logic was first introduced (then later copied
to other places) in the following commit almost a year ago:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4732efbeb997189d9f9b04708dc26bf8613ed721;hp=5b039e681b8c5f30aac9cc04385cc94be45d0823
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sys_getppid() optimization can access a freed memory. On kernels with
DEBUG_SLAB turned ON, this results in Oops. As Dave Hansen noted, this
optimization is also unsafe for memory hotplug.
So this patch always takes the lock to be safe.
[oleg@tv-sign.ru: simplifications]
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: <stable@kernel.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Previously the message was "Fatal exception: panic_on_oops", as introduced
in a recent patch whith removed a somewhat dangerous call to ssleep() in
the panic_on_oops path. However, Paul Mackerras suggested that this was
somewhat confusing, leadind people to believe that it was panic_on_oops
that was the root cause of the fatal exception. On his suggestion, this
patch changes the message to simply "Fatal exception". A suitable oops
message should already have been displayed.
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
kernel/panic.c: In function 'add_taint':
kernel/panic.c:176: warning: implicit declaration of function 'debug_locks_off'
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The percpu variable is used incorrectly in switch_hrtimer_base().
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric says:
> I saw an oops down this path when trying to create a new file on a UDF
> filesystem which was internally marked as readonly, but mounted rw:
>
> udf_create
> udf_new_inode
> new_inode
> alloc_inode
> udf_alloc_inode
> udf_new_block
> returns EIO due to readonlyness
> iput (on error)
I ran into the same issue today, but when listing a directory with
invalid/corrupt entries:
udf_lookup
udf_iget
get_new_inode_fast
alloc_inode
udf_alloc_inode
__udf_read_inode
fails for any reason
iput (on error)
...
The following patch to udf_alloc_inode() should take care of both (and
other similar) cases, but I've only tested it with udf_lookup().
Signed-off-by: Dan Bastone <dan@pwienterprises.com>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't use NULL as a printf control string. Fixes bug #6889.
Cc: Ralph Corderoy <ralph@inputplus.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add basic Machine detection to imacfb and some Ducumentation bits for
imacfb.
Signed-off-by: Edgar Hucek <hostmaster@ed-soft.at>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Enabling PXA DMA for the smc91x on the logicpd pxa270 produces
unacceptable interference with the TFT panel, so disable it. Also
delete the lpd270 versions of the SMC_{in,out}[bl]() macros, as they
aren't used, since the board only supports 16bit accesses.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The IPv4/IPv6 datagram output path was using skb_trim to trim paged
packets because they know that the packet has not been cloned yet
(since the packet hasn't been given to anything else in the system).
This broke because skb_trim no longer allows paged packets to be
trimmed. Paged packets must be given to one of the pskb_trim functions
instead.
This patch adds a new pskb_trim_unique function to cover the IPv4/IPv6
datagram output path scenario and replaces the corresponding skb_trim
calls with it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel panic on various SMP machines. The culprit is a null
ub->skb in ulog_send(). If ulog_timer() has already been scheduled on
one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the
queue on another CPU by calling ulog_send() right before it exits,
there will be no skbuff when ulog_timer() acquires the lock and calls
ulog_send(). Cancelling the timer in ulog_send() doesn't help because
it has already been scheduled and is running on the first CPU.
Similar problem exists in ebt_ulog.c and nfnetlink_log.c.
Signed-off-by: Mark Huang <mlhuang@cs.princeton.edu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>