millions of inodes cached and has sparse cluster population, removing
inodes from the cluster hash consumes excessive amounts of CPU time.
Reduce the CPU cost by making removal O(1) via use of a double linked list
for the hash chains.
SGI-PV: 951551
SGI-Modid: xfs-linux-melb:xfs-kern:25683a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
nonblock mode with the new IO path code (since 2.6.16).
SGI-PV: 951662
SGI-Modid: xfs-linux-melb:xfs-kern:25676a
Signed-off-by: Nathan Scott <nathans@sgi.com>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3473/1: Use numbers 0-15 for the VFP double registers
[ARM] 3472/1: Use the D variants of FLDMIA/FSTMIA on ARMv6
[ARM] 3471/1: FTOSI functions should return 0 for NaN
[ARM] 3470/1: Clear the HWCAP bits for the disabled kernel features
[ARM] 3469/1: S3C24XX: clkout missing hclk selector
[ARM] 3468/1: S3C2410: SMDK common include fix
[ARM] 3461/1: ARM: OMAP: Fix clk_get() when using id and name
[ARM] 3460/1: ARM: OMAP: Remove unnecessary nop_release()
[ARM] 3459/1: ixp23xx: fix debug serial macros for big-endian operation
[ARM] Allow decompressor to be built with -ffunction-sections
[ARM] Fix SA110/SA1100 cache flushing
[ARM] ebsa110: Fix incorrect serial port address
[ARM] Fix ebsa110 debug macros
[ARM] Move FLUSH_BASE macros to asm/arch/memory.h
[ARM] Remove unnecessary extra parens in include/asm-arm/memory.h
[ARM] arm's arch_local_page_offset() fix against 2.6.17-rc1
* 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2:
[PATCH] CONFIGFS_FS must depend on SYSFS
[PATCH] Bogus NULL pointer check in fs/configfs/dir.c
ocfs2: Better I/O error handling in heartbeat
ocfs2: test and set teardown flag early in user_dlm_destroy_lock()
ocfs2: Handle the DLM_CANCELGRANT case in user_unlock_ast()
ocfs2: catch an invalid ast case in dlmfs
ocfs2: remove an overly aggressive BUG() in dlmfs
ocfs2: multi node truncate fix
Oleg Nesterov spotted two interesting bugs with the current de_thread
code. The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process. Caused by
two processes exiting when only one was created.
The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice. Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.
The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.
To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.
Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process. I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids. This should also be
slightly cheaper then the existing thread_group_leader macro.
I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Catalin Marinas
This patch changes the double registers numbering to 0-15 from even 0-30,
in preparation for future VFP extensions. It also fixes the VFP_REG_ZERO
bug (value 16 actually represents the 8th double register with the original
numbering).
The original mcrr/mrrc on CP10 were generating FMRRS/FMSRR instead of
FMRRD/FMDRR. The patch changes to CP11 for the correct instructions.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Catalin Marinas
The X variants are deprecated starting with ARMv6. Using the D variants,
the fpmx_state in vfp_hard_struct is no longer needed.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Catalin Marinas
The NaN case was dealed with by the "exponent >= ... + 32" condition but it
was not setting the value "d" to 0.
Signed-off-by: Ken'ichi Kuromusha <musha@aplix.co.jp>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Catalin Marinas
Glibc interprets the HWCAP bits and decides on what features to use.
However, even if the features are present in the hardware, they are not
always supported by the kernel and hence the corresponding bits have to be
cleared from the elf_hwcap variable.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When allocating gid_cache, use kmalloc(sizeof *gid_cache, ...) rather
than kmalloc(sizeof *pkey_cache, ...). It doesn't really matter which
one is used, since the size ends up the same either way, but it's much
better to say what we mean.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes the a compile error with CONFIG_SYSFS=n
Configfs is creating, as a matter of policy, the /sys/kernel/config
mountpoint. This means it requires CONFIG_SYSFS.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We check the "group" pointer after we dereference it. This check is
bogus, as it cannot be NULL coming in.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Overriding the whole EH code is a per-transport, not per-host thing.
Move ->eh_strategy_handler to the transport class, same as
->eh_timed_out.
Downside is that scsi_host_alloc can't check for the total lack of EH
anymore, but the transition period from old EH where we needed it is
long gone already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Rohit found an obscure bug causing buddy list corruption.
page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.
Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).
Signed-off-by: Martin Bligh <mbligh@google.com>
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We know ipoib_flush_paths() is called from plain process context with
interrupts enabled, since it does wait_for_completion(). So there's
no need to use spin_lock_irqsave() -- spin_lock_irq() is fine.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ib_sa_cancel_query() must be called with priv->lock held since
a completion might arrive and set path->query to NULL.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The PCI spec recommends against drivers playing with a device's PCI
read burst size, and says that systems software should configure it.
And we actually have users that report that changing it from the
default set by BIOS hurts performance and/or stability for them. On
the other hand, the Mellanox Programmer's Reference Manual recommends
turning it up all the way to the maximum value. Some tests conducted
here in the lab do not show performance improvement from this tuning,
but this might be just me.
As a work-around, make this tuning an option, off by default (safe
value), with an eye towards removing it completely one day if no one
complains.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make IPoIB's send and receive queue sizes tunable via module
parameters ("send_queue_size" and "recv_queue_size"). This allows the
queue sizes to be enlarged to fix disastrously bad performance on some
platforms and workloads, without bloating memory usage when large
queues aren't needed.
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipoib_mcast_restart_task() might free an mcast object while a join
request is still outstanding, leading to an oops when the query
completes. Fix this by waiting for query to complete, similar to what
ipoib_stop_thread() is doing. The wait for mcast completion code is
consolidated in wait_for_mcast_join().
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Push translation of static rate to HCA format into low-level drivers,
where it belongs. For static rate encoding, use encoding of rate
field from IB standard PathRecord, with addition of value 0, for
backwards compatibility with current usage. The changes are:
- Add enum ib_rate to midlayer includes.
- Get rid of static rate translation in IPoIB; just use static rate
directly from Path and MulticastGroup records.
- Update mthca driver to translate absolute static rate into the
format used by hardware. This also fixes mthca's static rate
handling for HCAs that are capable of 4X DDR.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This fixes the problem of an oops occuring when a user attempts to add a
key to a non-keyring key [CVE-2006-1522].
The problem is that __keyring_search_one() doesn't check that the
keyring it's been given is actually a keyring.
I've fixed this problem by:
(1) declaring that caller of __keyring_search_one() must guarantee that
the keyring is a keyring; and
(2) making key_create_or_update() check that the keyring is a keyring,
and return -ENOTDIR if it isn't.
This can be tested by:
keyctl add user b b `keyctl add user a a @s`
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add optional input and output offsets to sys_splice(), for seekable file
descriptors:
asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
int fd_out, loff_t __user *off_out,
size_t len, unsigned int flags);
semantics are straightforward: f_pos will be updated with the offset
provided by user-space, before the splice transfer is about to begin.
Providing a NULL offset pointer means the existing f_pos will be used
(and updated in situ). Providing an offset for a pipe results in
-ESPIPE. Providing an invalid offset pointer results in -EFAULT.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
separate out the 'internal pipe object' abstraction, and make it
usable to splice. This cleans up and fixes several aspects of the
internal splice APIs and the pipe code:
- pipes: the allocation and freeing of pipe_inode_info is now more symmetric
and more streamlined with existing kernel practices.
- splice: small micro-optimization: less pointer dereferencing in splice
methods
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Update XFS for the ->splice_read/->splice_write changes.
Signed-off-by: Jens Axboe <axboe@suse.de>
We don't want to call into the read-ahead logic unless we are at the
start of a page, _or_ we have multiple pages to read.
Signed-off-by: Jens Axboe <axboe@suse.de>
We can get to out: with a NULL page, which we probably
don't want to be calling page_cache_release() on.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Otherwise the build breaks with EXPERIMENTAL disabled
because SPARSEMEM will not get selected properly. See
mm/Kconfig for how that works.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Take doc-book function comment from i386 implementation.
2) cacheline align call_lock, taken from powerpc
3) Need memory barrier after setting call_data
4) Remove timeout
Signed-off-by: David S. Miller <davem@davemloft.net>
GDB uses a PTRACE_PEEKUSR call with offset 0 to see
if a thread is alive, so provide a success return for
this particular special case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Deinline a few functions which produce 200+ bytes of code.
Size Uses Wasted Name and definition
===== ==== ====== ================================================
429 3 818 __inet6_lookup include/net/inet6_hashtables.h
404 2 384 __inet6_lookup_established include/net/inet6_hashtables.h
206 3 372 __inet6_hash include/net/inet6_hashtables.h
Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we could compute an inaccurate hash due to the
random seed changing.
Noticed by Zach Brown and patch is based upon some feedback
from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Thomas de Grenier de Latour <degrenier@easyconnect.fr>
On Sun, 9 Apr 2006 21:56:59 +0400,
Sergey Vlasov <vsu@altlinux.ru> wrote:
> However, show_address() does not output anything unless
> dev->reg_state == NETREG_REGISTERED - and this state is set by
> netdev_run_todo() only after netdev_register_sysfs() returns, so in
> the meantime (while netdev_register_sysfs() is busy adding the
> "statistics" attribute group) some process may see an empty "address"
> attribute.
I've tried the attached patch, suggested by Sergey Vlasov on
hotplug-devel@, and as far as i can test it works just fine.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the unused EXPORT_SYMBOL(secure_ipv6_port_ephemeral).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Can't build with CONFIG_NETFILTER=y/m on IA64, there's a missing
#include in net/ipv6/netfilter.c
net/ipv6/netfilter.c: In function `nf_ip6_checksum':
net/ipv6/netfilter.c:92: warning: implicit declaration of function
`csum_ipv6_magic'
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename policer specific _generic_ methods to be specific to
_act_police_
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Speed up SRAM read and write functions if possible by using MMIO
instead of config. cycles. With this change, the post reset signature
done at the end of D3 power change must now be moved before the D3
power change.
IBM reported a problem on powerpc blades during ethtool self test that
was caused by the memory test taking excessively long. Config. cycles
are very slow on powerpc and the memory test can take more than 10
seconds to complete using config. cycles.
David Miller informed me that an earlier version of the patch caused
problems on sparc64 systems with built-in tg3 chips. This version
fixes the problem by excluding all SUN built-in tg3 chips from doing
MMIO SRAM access.
TG3_FLAG_EEPROM_WRITE_PROT is also set unconditionally when
TG3_FLG2_SUN_570X is set. This should be sane as all SUN chips are
built-in and do not require Vaux switching.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kill the TG3_FLAG_NO_{TX|RX}_PSEUDO_CSUM flags because they are not
very useful. This will free up some bits for new flags.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>