When compacting CQ entries, we need to set the correct value of the
ownership bit in case the value is different between the index we copy
the CQE from and the index we copy it to.
Found by Ronni Zimmerman of Mellanox.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The calculation of max_inline_data in set_kernel_sq_size() is bogus,
since it doesn't take into account the fact that inline segments may
not cross a 64-byte boundary, and hence multiple inline segments will
probably need to be used to post large inline sends.
We don't support inline sends for kernel QPs anyway, so there's no
point in doing this calculation anyway, since the field is just zeroed
out a little later. So just delete the bogus calculation.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
New ConnectX firmware introduces FW command interface revision 2,
which requires that for each QP, a chunk of send queue entries (the
"headroom") is kept marked as invalid, so that the HCA doesn't get
confused if it prefetches entries that haven't been posted yet. Add
code to the driver to do this, and also update the user ABI so that
userspace can request that the prefetcher be turned off for userspace
QPs (we just leave the prefetcher on for all kernel QPs).
Unfortunately, marking send queue entries this way is confuses older
firmware, so we change the driver to allow only FW command interface
revisions 2. This means that users will have to update their firmware
to work with the new driver, but the firmware is changing quickly and
the old firmware has lots of other bugs anyway, so this shouldn't be too
big a deal.
Based on a patch from Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As Russell helpfully pointed out on linux-arch:
http://marc.info/?l=linux-arch&m=118208089204630&w=2
We were missing the oops_enter/exit() in the sh die() implementation.
As we do support lockdep, it's beneficial to add these calls so lockdep
properly disables itself in the die() case.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
In the routine acpi_ut_create_package_object(), if the
ACPI_ALLOCATE_ZEROED() fails then ACPI_FREE(package_desc) is called as
part of the cleanup. This should instead be
acpi_ut_remove_reference(package_desc) in order to remove the reference
acquired from acpi_ut_create_internal_object() [see the routine
acpi_ut_create_buffer_object() as an example of proper functionality].
Signed-off-by: Myron Stowe <myron.stowe@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
if acpi_bus_get_device() returns NULL, print nothing
instead of "<NUL" in /proc/acpi/thermal_zone/*/trip_points
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
We use R0 as the 5th argument of syscall. When the syscall restarts
after signal handling, we should restore the old value of R0.
The attached patch does it. Without this patch, I've experienced random
failures in the situation which signals are issued frequently.
Signed-off-by: Kaz Kojima <kkojima@rr.iij4u.or.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Some user space tools need to identify SYSV shared memory when examining
/proc/<pid>/maps. To do so they look for a block device with major zero, a
dentry named SYSV<sysv key>, and having the minor of the internal sysv
shared memory kernel mount.
To help these tools and to make it easier for people just browsing
/proc/<pid>/maps this patch modifies hugetlb sysv shared memory to use the
SYSV<key> dentry naming convention.
User space tools will still have to be aware that hugetlb sysv shared
memory lives on a different internal kernel mount and so has a different
block device minor number from the rest of sysv shared memory.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Albert Cahalan <acahalan@gmail.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's another breakage as a result of shared memory stacked files :(
The NUMA policy for a VMA is determined by checking the following (in the
order given):
1) vma->vm_ops->get_policy() (if defined)
2) vma->vm_policy (if defined)
3) task->mempolicy (if defined)
4) Fall back to default_policy
By switching to stacked files for shared memory, get_policy() is now always
set to shm_get_policy which is a wrapper function. This causes us to stop
at step 1, which yields NULL for hugetlb instead of task->mempolicy which
was the previous (and correct) result.
This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for
the wrapped vm_ops.
(akpm: the refcounting of mempolicies is busted and this patch does nothing to
improve it)
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: dean gaudet <dean@arctic.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have to take care that when we call udf_discard_prealloc() from
udf_clear_inode() we have to write inode ourselves afterwards (otherwise,
some changes might be lost leading to leakage of blocks, use of free blocks
or improperly aligned extents).
Also udf_discard_prealloc() does two different things - it removes
preallocated blocks and truncates the last extent to exactly match i_size.
We move the latter functionality to udf_truncate_tail_extent(), call
udf_discard_prealloc() when last reference to a file is dropped and call
udf_truncate_tail_extent() when inode is being removed from inode cache
(udf_clear_inode() call).
We cannot call udf_truncate_tail_extent() earlier as subsequent open+write
would find the last block of the file mapped and happily write to the end
of it, although the last extent says it's shorter.
[akpm@linux-foundation.org: Make checkpatch.pl happier]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Sandeen <sandeen@sandeen.net>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If ARCH_KMALLOC_MINALIGN is set to a value greater than 8 (SLUBs smallest
kmalloc cache) then SLUB may generate duplicate slabs in sysfs (yes again)
because the object size is padded to reach ARCH_KMALLOC_MINALIGN. Thus the
size of the small slabs is all the same.
No arch sets ARCH_KMALLOC_MINALIGN larger than 8 though except mips which
for some reason wants a 128 byte alignment.
This patch increases the size of the smallest cache if
ARCH_KMALLOC_MINALIGN is greater than 8. In that case more and more of the
smallest caches are disabled.
If we do that then the count of the active general caches that is displayed
on boot is not correct anymore since we may skip elements of the kmalloc
array. So count them separately.
This approach was tested by Havard yesterday.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some changes done a while ago to avoid pounding on ptep_set_access_flags and
update_mmu_cache in some race situations break sun4c which requires
update_mmu_cache() to always be called on minor faults.
This patch reworks ptep_set_access_flags() semantics, implementations and
callers so that it's now responsible for returning whether an update is
necessary or not (basically whether the PTE actually changed). This allow
fixing the sparc implementation to always return 1 on sun4c.
[akpm@linux-foundation.org: fixes, cleanups]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: David Miller <davem@davemloft.net>
Cc: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(As reported by linux@horizon.com)
Folding is done to minimize the theoretical possibility of systematic
weakness in the particular bits of the SHA1 hash output. The result of
this bug is that 16 out of 80 bits are un-folded. Without a major new
vulnerability being found in SHA1, this is harmless, but still worth
fixing.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: <linux@horizon.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The x86_64 a.out.h got a definition of STACK_TOP_MAX, which interferes with
the UML version. So, just undef it like STACK_TOP.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Distros seem to be removing PAGE_SIZE from asm/page.h. So, the libc side of
UML should stop using it.
I replace it with UM_KERN_PAGE_SIZE, which is defined to be the same as
PAGE_SIZE on the kernel side of the house. I could also use getpagesize(),
but it's more important that UML have the same value of PAGE_SIZE everywhere.
It's conceivable that it could be built with a larger PAGE_SIZE, and use of
getpagesize() would break that badly.
PAGE_MASK got the same treatment, as it is closely tied to PAGE_SIZE.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update two points in the SPI interface documentation:
- Update description of the "chip stays selected after message ends"
mode. In some cases it's required for correctness; it isn't just a
performance tweak. (Yes: to use this mode on mult-device busses, another
programming interface will be needed. One draft has been circulated
already.)
- Clarify spi_setup(), highlighting that callers must ensure that no
requests are queued (can't change configuration except between I/Os), and
that the device must be deselected when this returns (which is a key part
of why it's called during device init).
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If raid1/repair (which reads all block and fixes any differences it finds)
hits a read error, it doesn't reset the bio for writing before writing
correct data back, so the read error isn't fixed, and the device probably
gets a zero-length write which it might complain about.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1/ When resyncing a degraded raid10 which has more than 2 copies of each block,
garbage can get synced on top of good data.
2/ We round the wrong way in part of the device size calculation, which
can cause confusion.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix oops triggered during: echo 0 > /proc/sys/kernel/nmi_watchdog
The culprit seems to be 09198e6850:
[PATCH] i386: Clean up NMI watchdog code
In two places, the parameters to release_{evntsel,perfctr}_nmi
got interchanged during the cleanup.
Fix interchanged parameters to release_{evntsel,perfctr}_nmi.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix oops caused by 'cat /dev/snapshot', reported by Arkadiusz Miskiewicz,
and make it impossible to thaw tasks with the help of the swsusp userland
interface while there is a snapshot image ready to save.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix section error (allyesconfig). The exit function is called from init,
so functions that are called by the exit function cannot be marked __exit.
WARNING: drivers/built-in.o(.text+0xe5bc6): Section mismatch: reference to .exit.
text: (between 'toshiba_acpi_exit' and 'hci_raw')
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cpuset code to present a list of tasks using a cpuset to user space could
write to an array that it had kmalloc'd, after a kmalloc request of zero size.
The problem was that the code didn't check for writes past the allocated end
of the array until -after- the first write.
This is a race condition that is likely rare -- it would only show up if a
cpuset went from being empty to having a task in it, during the brief time
between the allocation and the first write.
Prior to roughly 2.6.22 kernels, this was also a benign problem, because a
zero kmalloc returned a few usable bytes anyway, and no harm was done with the
bogus write.
With the 2.6.22 kernel changes to make issue a warning if code tries to write
to the location returned from a zero size allocation, this problem is no
longer benign. This cpuset code would occassionally trigger that warning.
The fix is trivial -- check before storing into the array, not after, whether
the array is big enough to hold the store.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is not safe to use pte_update_defer() in ptep_test_and_clear_young():
its only user, /proc/<pid>/clear_refs, drops pte lock before flushing TLB.
Use the safe though less efficient pte_update() paravirtop in its place.
Likewise in ptep_test_and_clear_dirty(), though that has no current use.
These are macros (header file dependency stops them from becoming inline
functions), so be more liberal with the underscores and parentheses.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shmid used to be stored as inode# for shared memory segments. Some of
the proc-ps tools use this from /proc/pid/maps. Recent cleanups
to newseg() changed it. This patch sets inode number back to shared
memory id to fix breakage.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: "Albert Cahalan" <acahalan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The data structure to manage the information gathered about functions
allocating and freeing objects is allocated when the list_lock has already
been taken. We need to allocate with GFP_ATOMIC instead of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When disabled through /proc/sys/kernel/nmi_watchdog, the NMI watchdog uses the
stop() method directly, which does not decrement the activity counter, leading
to a BUG(). Use the wrapper function instead to fix that.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At system boot time, the NMI watchdog no longer reserved its MSRs, allowing
other subsystems to mess with them. Fix that.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Restore tty locked ioctl handler which was replaced with
an unlocked ioctl handler in hung_up_tty_fops by the patch:
commit e10cc1df1d
Author: Paul Fulghum <paulkf@microgate.com>
Date: Thu May 10 22:22:50 2007 -0700
tty: add compat_ioctl
This was reported in:
[Bug 8473] New: Oops: 0010 [1] SMP
The bug is caused by switching to hung_up_tty_fops in do_tty_hangup. An
ioctl call can be waiting on BLK after testing for existence of the locked
ioctl handler in the normal tty fops, but before calling the locked ioctl
handler. If a hangup occurs at that point, the locked ioctl fop is NULL
and an oops occurs.
(akpm: we can remove my debugging code from do_ioctl() now, but it'll be OK to
do that for 2.6.23)
Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9b01bd5b28 introduced a
compat_ioctl handler for RADEON_SETPARAM, the sole purpose of which was
to handle the fact that on i386, alignof(uint64_t)==4.
Unfortunately, this handler was installed for _all_ 64-bit
architectures, instead of only x86_64 and ia64. And thus it breaks
32-bit compatibility on every other arch, where 64-bit integers are
aligned to 8 bytes in 32-bit mode just the same as in 64-bit mode.
Arnd has a cunning plan to use 'compat_u64' with appropriate alignment
attributes according to the 32-bit ABI, but for now let's just make the
compat_radeon_cp_setparam routine entirely disappear on 64-bit machines
whose 32-bit compat support isn't for i386. It would be a no-op with
compat_u64 anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes a problem that occurs when packets cannot be sent across
the ieee1394 bus and we return NETDEV_TX_BUSY in the net driver "hard start
xmit" routine ether1394_tx. When we return NETDEV_TX_BUSY the stack will
call ether1394_tx again with the same skb. So we need to restore the header
to look like it did before we munged it for xmit over ieee1394.
[Stefan Richter: changed whitespace, deleted a local variable]
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
generic_ide_resume() should check if dev->driver is not NULL before applying
to_ide_driver() to it. Fix that.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[RXRPC] net/rxrpc/ar-connection.c: fix NULL dereference
[TCP]: Fix logic breakage due to DSACK separation
[TCP]: Congestion control API RTT sampling fix
When building with memory hotplug enabled and cpu hotplug disabled, we
end up with the following section mismatch:
WARNING: mm/built-in.o(.text+0x4e58): Section mismatch: reference to
.init.text: (between 'free_area_init_node' and '__build_all_zonelists')
This happens as a result of:
-> free_area_init_node()
-> free_area_init_core()
-> zone_pcp_init() <-- all __meminit up to this point
-> zone_batchsize() <-- marked as __cpuinit fo
This happens because CONFIG_HOTPLUG_CPU=n sets __cpuinit to __init, but
CONFIG_MEMORY_HOTPLUG=y unsets __meminit.
Changing zone_batchsize() to __devinit fixes this.
__devinit is the only thing that is common between CONFIG_HOTPLUG_CPU=y and
CONFIG_MEMORY_HOTPLUG=y. In the long run, perhaps this should be moved to
another section identifier completely. Without this, memory hot-add
of offline nodes (via hotadd_new_pgdat()) will oops if CPU hotplug is
not also enabled.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
--
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (30 commits)
Blackfin SMC91X ethernet supporting driver: SMC91C111 LEDs are note drived in the kernel like in uboot
Blackfin SPI driver: fix bug SPI DMA incomplete transmission
Blackfin SPI driver: tweak spi cleanup function to match newer kernel changes
Blackfin RTC drivers: update MAINTAINERS information
Blackfin serial driver: decouple PARODD and CMSPAR checking from PARENB
Blackfin serial driver: actually implement the break_ctl() function
Blackfin serial driver: ignore framing and parity errors
Blackfin serial driver: hook up our UARTs STP bit with userspaces CMSPAR
Blackfin arch: move HI/LO macros into blackfin.h and punt the rest of macros.h as it includes VDSP macros we never use
Blackfin arch: redo our linker script a bit
Blackfin arch: make sure we initialize our L1 Data B section properly based on the linked kernel
Blackfin arch: fix bug can not wakeup from sleep via push buttons
Blackfin arch: add support for Alon Bar-Lev's dynamic kernel command-line
Blackfin arch: add missing gpio.h header to fix compiling in some pm configurations
Blackfin arch: As Mike pointed out range goes form m..MAX_BLACKFIN_GPIO -1
Blackfin arch: fix spelling typo in output
Blackfin arch: try to split up functions like this into smaller units according to LKML review
Blackfin arch: add proper ENDPROC()
Blackfin arch: move more of our startup code to .init so it can be freed once we are up and running
Blackfin arch: unify differences between our diff head.S files -- no functional changes
...
* 'splice-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block:
splice: only check do_wakeup in splice_to_pipe() for a real pipe
splice: fix leak of pages on short splice to pipe
splice: adjust balance_dirty_pages_ratelimited() call
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Fix builds where MSC01E_xxx is undefined.
[MIPS] Separate performance counter interrupts
[MIPS] Malta: Fix for SOCitSC based Maltas
* 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32:
[AVR32] Define ARCH_KMALLOC_MINALIGN to L1_CACHE_BYTES
[AVR32] STK1000: Set SPI_MODE_3 in the ltv350qv board info
[AVR32] gpio_*_cansleep() fix
[AVR32] ratelimit segfault reporting rate
SCSI marks internal commands with REQ_PREEMPT and push it at the front
of the request queue using blk_execute_rq(). When entering suspended
or frozen state, SCSI devices are quiesced using
scsi_device_quiesce(). In quiesced state, only REQ_PREEMPT requests
are processed. This is how SCSI blocks other requests out while
suspending and resuming. As all internal commands are pushed at the
front of the queue, this usually works.
Unfortunately, this interacts badly with ordered requeueing. To
preserve request order on requeueing (due to busy device, active EH or
other failures), requests are sorted according to ordered sequence on
requeue if IO barrier is in progress.
The following sequence deadlocks.
1. IO barrier sequence issues.
2. Suspend requested. Queue is quiesced with part or all of IO
barrier sequence at the front.
3. During suspending or resuming, SCSI issues internal command which
gets deferred and requeued for some reason. As the command is
issued after the IO barrier in #1, ordered requeueing code puts the
request after IO barrier sequence.
4. The device is ready to process requests again but still is in
quiesced state and the first request of the queue isn't
REQ_PREEMPT, so command processing is deadlocked -
suspending/resuming waits for the issued request to complete while
the request can't be processed till device is put back into
running state by resuming.
This can be fixed by always putting !fs requests at the front when
requeueing.
The following thread reports this deadlock.
http://thread.gmane.org/gmane.linux.kernel/537473
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: David Greaves <david@dgreaves.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes a NULL dereference spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6f74651ae6 is found guilty
of breaking DSACK counting, which should be done only for the
SACK block reported by the DSACK instead of every SACK block
that is received along with DSACK information.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 164891aadf broke RTT
sampling of congestion control modules. Inaccurate timestamps
could be fed to them without providing any way for them to
identify such cases. Previously RTT sampler was called only if
FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate
timestamps nicely. In addition, the new behavior could give an
invalid timestamp (zero) to RTT sampler if only skbs with
TCPCB_RETRANS were ACKed. This solves both problems.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
We only ever set do_wakeup to non-zero if the pipe has an inode
backing, so it's pointless to check outside the pipe->inode
check.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
If the destination pipe is full and we already transferred
data, we break out instead of waiting for more pipe room.
The exit logic looks at spd->nr_pages to see if we moved
everything inside the spd container, but we decrement that
variable in the loop to decide when spd has emptied.
Instead we want to compare to the original page count in
the spd, so cache that in a local variable.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>