So copy their contents into the asm-m68k files.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/kkeil/ISDN-2.6:
Add DIP switch readout for HFC-4S IOB4ST
Fix remaining big endian issue of hfcmulti
mISDN cleanup user interface
mISDN fix main ISDN Makefile
This reverts commit f9247273cb (and
fb2e405fc1 - "fix fs/nfs/nfsroot.c
compilation" - that fixed a missed conversion).
The changes cause problems for at least the sparc build. Let's re-do
them when the exact issues are resolved.
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Requested-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 2b14290078, since it
seems to break some other USB storage devices (at least a JMicron USB to
ATA bridge). As such, while it apparently fixes some cardreaders, it
would need to be made conditional on the exact reader it fixes in order
to avoid causing regressions.
Cc: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes possible for a driver to specify maximal listen interval
The possibility for user to configure listen interval is not implemented
yet, currently the maximum provided by the driver or 1 is used.
Mac80211 uses config handler to set listen interval for to the driver.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds the dtim_period in ieee80211_bss_conf, this allows the low
level driver to know the dtim_period, and to plan power save accordingly.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There are a few places where the RDMA CM code handles IPv6 by doing
struct sockaddr addr;
u8 pad[sizeof(struct sockaddr_in6) -
sizeof(struct sockaddr)];
This is fragile and ugly; handle this in a better way with just
struct sockaddr_storage addr;
[ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
switch to struct sockaddr_storage and get rid of padding arrays in
struct rdma_addr. ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The ipfragok flag controls whether the packet may be fragmented
either on the local host on beyond. The latter is only valid on
IPv4.
In fact, we never want to do the latter even on IPv4 when PMTU is
enabled. This is because even though we can't fragment packets
within SCTP due to the prtocol's inherent faults, we can still
fragment it at IP layer. By setting the DF bit we will improve
the PMTU process.
RFC 2960 only says that we SHOULD clear the DF bit in this case,
so we're compliant even if we set the DF bit. In fact RFC 4960
no longer has this statement.
Once we make this change, we only need to control the local
fragmentation. There is already a bit in the skb which controls
that, local_df. So this patch sets that instead of using the
ipfragok argument.
The only complication is that there isn't a struct sock object
per transport, so for IPv4 we have to resort to changing the
pmtudisc field for every packet. This should be safe though
as the protocol is single-threaded.
Note that after this patch we can remove ipfragok from the rest
of the stack too.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
from include/asm-powerpc. This is the result of a
mkdir arch/powerpc/include/asm
git mv include/asm-powerpc/* arch/powerpc/include/asm
Followed by a few documentation/comment fixups and a couple of places
where <asm-powepc/...> was being used explicitly. Of the latter only
one was outside the arch code and it is a driver only built for powerpc.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We can simply wrap in to the dev_set/get_drvdata(), there's no reason
to track an extra level of private data on top of the struct device.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
These were completely inconsistent. Clean these up to take a maple_driver
pointer directly for consistency.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
drivers/video/console/promcon.c:158: error: implicit declaration of
function 'con_protect_unimap'
Introduced by commit a29ccf6f82
("embedded: fix vc_translate operator precedence").
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It is the only legal environment in which this can be
used.
Add some commentary explaining the situation.
Signed-off-by: David S. Miller <davem@davemloft.net>
No file should be explicitly referencing its own platform headers
by specifying an absolute include path. Fix these paths to use
standard <asm/arch/...> includes.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move platform independent header files to arch/arm/include/asm, leaving
those in asm/arch* and asm/plat* alone.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix both the IHEX firmware generation (len field always null, and EOF
marker a byte too short) and loading (struct ihex_binrec needs to be
packed to reflect the on-disk structure).
Signed-off-by: Marc Zyngier <maz@misterjones.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The channelmap should have the same size on 32 and 64 bit systems
and should not depend on endianess.
Thanks to David Woodhouse for spotting this.
Signed-off-by: Karsten Keil <kkeil@suse.de>
This fixes a bug in operator precedence in the newly introduced vc_translate
macro. Without this fix, the translation of some characters on the
kernel console is garbled.
This patch was copied to the e-mail list previously for testing. Now,
all reports confirm that it works, so this is an official post for
application.
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Wire up for FRV the system calls that were added in the last merge window.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wire up system calls added in the last merge window for the MN10300 arch.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: s390: Fix kvm on IBM System z10
KVM: Advertise synchronized mmu support to userspace
KVM: Synchronize guest physical memory map to host virtual memory map
KVM: Allow browsing memslots with mmu_lock
KVM: Allow reading aliases with mmu_lock
ARCH=h8300:
init/main.c:781: undefined reference to `___early_initcall_end'
Same problem have
__start___bug_table
__stop___bug_table
__tracedata_start
__tracedata_end
__per_cpu_start
__per_cpu_end
When defining a symbol in vmlinux.lds, use the VMLINUX_SYMBOL macro.
VMLINUX_SYMBOL adds a prefix charactor.
You can't just use straight symbol names in common header files as they
dont take into consideration weird arch-specific ABI conventions. in the
case of Blackfin/h8300, the ABI dictates that any C-visible symbols have
an underscore prefixed to them. Thus all symbols in vmlinux.lds.h need to
be wrapped in VMLINUX_SYMBOL() so that each arch can put hide this magic
in their own files.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
pata_it821x: Driver updates and reworking
libata.h: replace __FUNCTION__ with __func__
ata_piix: subsys 106b:00a3 is apple ich8m too
libata-core: make sure that ata_force_tbl is freed in case of an error
libata: update atapi disable handling
pata_via: add VX800 flag; add function for fixing h/w bugs
pata_ali: misplaced pci_dev_put()
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-pull: (64 commits)
[XFS] Remove vn_revalidate calls in xfs.
[XFS] Now that xfs_setattr is only used for attributes set from ->setattr
[XFS] xfs_setattr currently doesn't just handle the attributes set through
[XFS] fix use after free with external logs or real-time devices
[XFS] A bug was found in xfs_bmap_add_extent_unwritten_real(). In a
[XFS] fix compilation without CONFIG_PROC_FS
[XFS] s/XFS_PURGE_INODE/IRELE/g s/VN_HOLD(XFS_ITOV())/IHOLD()/
[XFS] fix mount option parsing in remount
[XFS] Disable queue flag test in barrier check.
[XFS] streamline init/exit path
[XFS] Fix up problem when CONFIG_XFS_POSIX_ACL is not set and yet we still
[XFS] Don't assert if trying to mount with blocksize > pagesize
[XFS] Don't update mtime on rename source
[XFS] Allow xfs_bmbt_split() to fallback to the lowspace allocator
[XFS] Restore the lowspace extent allocator algorithm
[XFS] use minleft when allocating in xfs_bmbt_split()
[XFS] attrmulti cleanup
[XFS] Check for invalid flags in xfs_attrlist_by_handle.
[XFS] Fix CI lookup in leaf-form directories
[XFS] Use the generic xattr methods.
...
My commit 2b2a1ff64a introduced a regression
(sorry about that) for the odd case of exit_signal=0 (e.g. clone_flags=0).
This is not a normal use, but it's used by a case in the glibc test suite.
Dying with exit_signal=0 sends no signal, but it's supposed to wake up a
parent's blocked wait*() calls (unlike the delayed_group_leader case).
This fixes tracehook_notify_death() and its caller to distinguish a
"signal 0" wakeup from the delayed_group_leader case (with no wakeup).
Signed-off-by: Roland McGrath <roland@redhat.com>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://neil.brown.name/md:
md: raid10: wake up frozen array
md: do not count blocked devices as spares
md: do not progress the resync process if the stripe was blocked
md: delay notification of 'active_idle' to the recovery thread
md: fix merge error
md: move async_tx_issue_pending_all outside spin_lock_irq
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
[PATCH] ocfs2: Release mutex in error handling code
[PATCH] ocfs2: Fix oops when racing files truncates with writes into an mmap region
[PATCH 2/2] ocfs2: Fix race between mount and recovery
[PATCH 1/2] ocfs2: Add counter in struct ocfs2_dinode to track journal replays
[PATCH] configfs: Convenience macros for attribute definition.
[PATCH] configfs: Pin configfs subsystems separately from new config_items.
[PATCH] configfs: Fix open directory making rmdir() fail
[PATCH] configfs: Lock new directory inodes before removing on cleanup after failure
[PATCH] configfs: Prevent userspace from creating new entries under attaching directories
[PATCH] configfs: Fix failing symlink() making rmdir() fail
[PATCH] configfs: Fix symlink() to a removing item
[PATCH] configfs: Include linux/err.h in linux/configfs.h
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
md: the bitmap code needs to use blk_plug_device_unlocked()
block: add a blk_plug_device_unlocked() that grabs the queue lock
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: ASoC: Export dapm_reg_event() fully
ALSA: ASoC: Update Poodle to current ASoC API
ALSA: asoc: restrict sample rate and size in Freescale MPC8610 sound drivers
ALSA: sound/soc/pxa/tosa.c: removed duplicated include
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
tcp: MD5: Fix IPv6 signatures
skbuff: add missing kernel-doc for do_not_encrypt
net/ipv4/route.c: fix build error
tcp: MD5: Fix MD5 signatures on certain ACK packets
ipv6: Fix ip6_xmit to send fragments if ipfragok is true
ipvs: Move userspace definitions to include/linux/ip_vs.h
netdev: Fix lockdep warnings in multiqueue configurations.
netfilter: xt_hashlimit: fix race between htable_destroy and htable_gc
netfilter: ipt_recent: fix race between recent_mt_destroy and proc manipulations
netfilter: nf_conntrack_tcp: decrease timeouts while data in unacknowledged
irda: replace __FUNCTION__ with __func__
nsc-ircc: default to dongle type 9 on IBM hardware
bluetooth: add quirks for a few hci_usb devices
hysdn: remove the packed attribute from PofTimStamp_tag
isdn: use the common ascii hex helpers
tg3: adapt tg3 to use reworked PCI PM code
atm: fix direct casts of pointers to u32 in the InterPhase driver
atm: fix const assignment/discard warnings in the ATM networking driver
net: use the common ascii hex helpers
random32: seeding improvement
...
blk_plug_device() must be called with the queue lock held, so callers
often just grab and release the lock for that purpose. Add a helper
that does just that.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.infradead.org/mtd-2.6:
[MTD] [NAND] drivers/mtd/nand/nandsim.c: fix printk warnings
[MTD] [NAND] Blackfin NFC Driver: Cleanup the error exit path of bf5xx_nand_probe function
[MTD] [NAND] Blackfin NFC Driver: use standard dev_err() rather than printk()
[MTD] [NAND] Blackfin NFC Driver: enable Blackfin nand HWECC support by default
[MTD] [NAND] Blackfin NFC Driver: add proper devinit/devexit markings to probe/remove functions
[MTD] [NAND] Blackfin NFC Driver: add support for the ECC layout the Blackfin bootrom uses
[MTD] [NAND] Blackfin NFC Driver: fix bug - hw ecc calc by making sure we extract 11 bits from each register instead of 10
[MTD] [NAND] Blackfin NFC Driver: fix bug - do not clobber the status from the first 256 bytes if operating on 512 pages
[MTD] [NAND] diskonchip.c fix sparse endian warnings
[MTD] [NAND] drivers/mtd/nand/nandsim.c needs div64.h
[JFFS2] Fix allocation of summary buffer
Fix rename of at91_nand -> atmel_nand
[MTD] [NOR] drivers/mtd/chips/jedec_probe.c: fix Am29DL800BB device ID
[MTD] MTD_DEBUG always does compile-time typechecks
[MTD] DataFlash: bugfix, binary page sizes now handled
[MTD] [NAND] fsl_elbc_nand.c: fix printk warning
[MTD] [NAND] nandsim: support random page read command
[MTD] [NAND] fix subpage read for small page NAND
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] pass struct path * to do_add_mount()
[PATCH] switch mtd and dm-table to lookup_bdev()
[patch 3/4] vfs: remove unused nameidata argument of may_create()
[PATCH] devpts: switch to IDA
[PATCH 2/2] proc: switch inode number allocation to IDA
[PATCH 1/2] proc: fix inode number bogorithmetic
[PATCH] fix bdev leak in block_dev.c do_open()
[PATCH] fix races and leaks in vfs_quota_on() users
[PATCH] clean dup2() up a bit
[PATCH] merge locate_fd() and get_unused_fd()
[PATCH] ipv4_static_sysctl_init() should be under CONFIG_SYSCTL
Re: BUG at security/selinux/avc.c:883 (was: Re: linux-next: Tree
* git://git.infradead.org/battery-2.6:
power_supply: Sharp SL-6000 (tosa) batteries support
power_supply: fix up CHARGE_COUNTER output to be more precise
power_supply: add CHARGE_COUNTER property and olpc_battery support for it
power_supply: bump EC version check that we refuse to run with in olpc_battery
power_supply: cleanup of the OLPC battery driver
power_supply: add eeprom dump file to olpc_battery's sysfs
power_supply: Support serial number in olpc_battery
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (28 commits)
mm/hugetlb.c must #include <asm/io.h>
video: Fix up hp6xx driver build regressions.
sh: defconfig updates.
sh: Kill off stray mach-rsk7203 reference.
serial: sh-sci: Fix up SH7760/SH7780/SH7785 early printk regression.
sh: Move out individual boards without mach groups.
sh: Make sure AT_SYSINFO_EHDR is exposed to userspace in asm/auxvec.h.
sh: Allow SH-3 and SH-5 to use common headers.
sh: Provide common CPU headers, prune the SH-2 and SH-2A directories.
sh/maple: clean maple bus code
sh: More header path fixups for mach dir refactoring.
sh: Move out the solution engine headers to arch/sh/include/mach-se/
sh: I2C fix for AP325RXA and Migo-R
sh: Shuffle the board directories in to mach groups.
sh: dma-sh: Fix up dreamcast dma.h mach path.
sh: Switch KBUILD_DEFCONFIG to shx3_defconfig.
sh: Add ARCH_DEFCONFIG entries for sh and sh64.
sh: Fix compile error of Solution Engine
sh: Proper __put_user_asm() size mismatch fix.
sh: Stub in a dummy ENTRY_OFFSET for uImage offset calculation.
...
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] qeth: avoid use of include/asm-s390
[S390] dont use kthread for smp_rescan_cpus().
[S390] virtio console: fix section mismatch warning.
[S390] cio: Include linux/string.h in schid.h.
[S390] qdio: fix section mismatch bug.
[S390] stp: fix section mismatch warning.
[S390] Remove diag 0x260 call from memory detection.
[S390] qdio: make sure qdr is aligned to page size
[S390] Add support for memory hot-remove.
[S390] Wire up new syscalls.
[S390] cio: Memory allocation for idset changed.
[S390] qeth: preallocated qeth header for hiper socket
[S390] Optimize storage key operations for anon pages
[S390] nohz/sclp: disable timer on synchronous waits.
[S390] ipl: Reboot from alternate device does not work when booting from file
[S390] dasd: Add support for enhanced VM UID
[S390] Remove last P390 trace.
After moving the the include files there were a few clean-ups:
1) Some files used #include <asm-ia64/xyz.h>, changed to <asm/xyz.h>
2) Some comments alerted maintainers to look at various header files to
make matching updates if certain code were to be changed. Updated these
comments to use the new include paths.
3) Some header files mentioned their own names in initial comments. Just
deleted these self references.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the
pathname resolution.
* callers of vfs_quota_on() that do their own pathname resolution and
checks based on it are switched to vfs_quota_on_path(); that way we
avoid the races.
* reiserfs leaked dentry/vfsmount references on several failure exits.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
New primitive: alloc_fd(start, flags). get_unused_fd() and
get_unused_fd_flags() become wrappers on top of it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
schid.h needs string.h for memset and memcmp.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For anonymous pages without a swap cache backing the check in
page_remove_rmap for the physical dirty bit in page_remove_rmap is
unnecessary. The instructions that are used to check and reset the dirty
bit are expensive. Removing the check noticably speeds up process exit.
In addition the clearing of the dirty bit in __SetPageUptodate is
pointless as well. With these two changes there is no storage key
operation for an anonymous page anymore if it does not hit the swap
space.
The micro benchmark which repeatedly executes an empty shell script
gets about 5% faster.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
sclp_sync_wait wait synchronously for an sclp interrupt and disables
timer interrupts. However on the irq enter paths there is an extra
check if a timer interrupt would be due and calls the timer callback.
This would schedule softirqs in the wrong context.
So introduce local_tick_enable/disable which prevents this.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
During startup we check if diag308 works using diag 308 subcode 6,
which stores the actual ipl information. This fails with rc = 0x102, if
the system has been ipled from the HMC using load from CD or load from file.
In the case of rc = 0x102 we have to assume that diag 308 is working,
since it still can be used to ipl from an alternative device.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add missing kernel-doc notation to sk_buff:
Warning(linux-2.6.27-rc1-git2//include/linux/skbuff.h:345): No description found for parameter 'do_not_encrypt'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current versions of ipvsadm include "/usr/src/linux/include/net/ip_vs.h"
directly. This file also contains kernel-only definitions. Normally, public
definitions should live in include/linux, so this patch moves the
definitions shared with userspace to a new file, "include/linux/ip_vs.h".
This also removes the unused NFC_IPVS_PROPERTY bitmask, which was once
used to point into skb->nfcache.
To make old ipvsadms still compile with this, the old header file includes
the new one.
Thanks to Dave Miller and Horms for noting/adding the missing Kbuild entry
for the new header file.
Signed-off-by: Julius Volz <juliusv@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When support for multiple TX queues were added, the
netif_tx_lock() routines we converted to iterate over
all TX queues and grab each queue's spinlock.
This causes heartburn for lockdep and it's not a healthy
thing to do with lots of TX queues anyways.
So modify this to use a top-level lock and a "frozen"
state for the individual TX queues.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sysfs has the _ATTR() and _ATTR_RO() macros to make defining extended
form attributes easier. configfs should have something similiar.
- _CONFIGFS_ATTR() and _CONFIGFS_ATTR_RO() are the counterparts to the
sysfs macros.
- CONFIGFS_ATTR_STRUCT() creates the extended form attribute structure.
- CONFIGFS_ATTR_OPS() defines the show_attribute()/store_attribute()
operations that call the show()/store() operations of the extended
form configfs_attributes.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
We now use PTR_ERR() in the ->make_item() and ->make_group() operations.
Folks including configfs.h need err.h.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
When we traverse the graph, either forwards or backwards, we
are interested in whether a certain property exists somewhere
in a node reachable in the graph.
Therefore it is never necessary to traverse through a node more
than once to get a correct answer to the given query.
Take advantage of this property using a global ID counter so that we
need not clear all the markers in all the lock_class entries before
doing a traversal. A new ID is choosen when we start to traverse, and
we continue through a lock_class only if it's ID hasn't been marked
with the new value yet.
This short-circuiting is essential especially for high CPU count
systems. The scheduler has a runqueue per cpu, and needs to take
two runqueue locks at a time, which leads to long chains of
backwards and forwards subgraphs from these runqueue lock nodes.
Without the short-circuit implemented here, a graph traversal on
a runqueue lock can take up to (1 << (N - 1)) checks on a system
with N cpus.
For anything more than 16 cpus or so, lockdep will eventually bring
the machine to a complete standstill.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Found an interactivity problem on a quad core test-system - simple
CPU loops would occasionally delay the system un an unacceptable way.
After much debugging with Peter Zijlstra it turned out that the problem
is caused by the string of sched_clock() changes - they caused the CPU
clock to jump backwards a bit - which confuses the scheduler arithmetics.
(which is unsigned for performance reasons)
So revert:
# c300ba2: sched_clock: and multiplier for TSC to gtod drift
# c0c8773: sched_clock: only update deltas with local reads.
# af52a90: sched_clock: stop maximum check on NO HZ
# f7cce27: sched_clock: widen the max and min time
This solves the interactivity problems.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
In order to time out dead connections quicker, keep track of outstanding data
and cap the timeout.
Suggested by Herbert Xu.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add support for the RDC 1010 variant
- Rework the core library to have a read_id method. This allows the hacky
bits of it821x to go and prepares us for pata_hd
- Switch from WARN to BUG in ata_id_string as it will reboot if you get
it wrong so WARN won't be seen
- Allow the issue of command 0xFC on the 821x. This is needed to query
rebuild status.
- Tidy up printk formatting
- Do more ident rewriting on RAID volumes to handle firmware provided
ident data which is rather wonky
- Report the firmware revision and device layout in RAID mode
- Don't try and disable raid on the 8211 or RDC - they don't have the
relevant bits
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The new kgdb architecture specific handler registers and unregisters
dynamically for exceptions depending on when you configure a kgdb I/O
driver.
Aside from initializing the exceptions earlier in the boot process,
kgdb should have no impact on a device when it is compiled in so long
as an I/O module is not configured for use.
There have been quite a number of contributors during the existence of
this patch (see arch/mips/kernel/kgdb.c). Most recently Jason
re-wrote the mips kgdb logic to use the die notification handlers.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch explicitly removes the kgdb implementation, for mips which
is intended to be followed by a patch that adds a kgdb implementation
for MIPS that makes use of the kgdb core in the kernel.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* Unify calling of early_serial_txx9_setup.
* Use dedicated serial clock on RBTX4938.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* Random cleanups spotted by checkpatch script.
* Do not initialize panic_timeout. "panic=" kernel parameter can be used.
* Do not add "ip=any" or "ip=bootp". This options is not board specific.
* Do not add "root=/dev/nfs". This is default on CONFIG_ROOT_NFS.
* Kill unused error checking.
* Fix IRQ comment to match current code.
* Kill some unneeded includes
* ST0_ERL is already cleared in generic code.
* conswitchp is initialized generic code.
* __init is not needed in prototype.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Make some TX4938 SoC specific code independent from board specific code.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Make some TX3927 SoC specific code independent from board specific code.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/mm: Lockless get_user_pages_fast() for 64-bit (v3)
powerpc: Don't use the wrong thread_struct for ptrace get/set VSX regs
powerpc: Fix ptrace buffer size for VSX
powerpc: Correctly hookup PTRACE_GET/SETVSRREGS for 32 bit processes
ide/powermac: Fix use of uninitialized pointer on media-bay
powerpc: Allow non-hcall return values for lparcfg writes
ipmi/powerpc: Use linux/of_{device,platform}.h instead of asm
powerpc/fsl: proliferate simple-bus compatibility to soc nodes
Documentation: remove old sbc8260 board specific information
cpm2: Rework baud rate generators configuration to support external clocks.
powerpc: rtc_cmos_setup: assign interrupts only if there is i8259 PIC
cpm_uart: Add generic clock API support to set baudrates
cpm_uart: Modem control lines support
powerpc: implement GPIO LIB API on CPM1 Freescale SoC.
cpm2: Implement GPIO LIB API on CPM2 Freescale SoC.
powerpc: Fix 8xx build failure
powerpc: clean up the Book-E HW watchpoint support
when you take the address of the result. Noticed on a sparc64 compile
using a version 3.4.5 cross compiler.
kernel/time/tick-common.c: In function `tick_check_new_device':
kernel/time/tick-common.c:210: error: invalid lvalue in unary `&'
...
Just make it a regular expression.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
net: Make "networking" one-click deselectable.
ipv6: Fix useless proc net sockstat6 removal
tcp: MD5: Use MIB counter instead of warning for MD5 mismatch.
pkt_sched: Fix OOPS on ingress qdisc add.
niu: Fix error checking in niu_ethflow_to_class.
IPv6: datagram_send_ctl() should exit immediately when an error occured
mac80211: fix mesh beaconing
PS3: gelic: use unsigned long for irqflags
mac80211: fix cfg80211 hooks for master interface
nl80211: fix dump callbacks
mac80211: partially fix skb->cb use
rtl8187: Improve wireless statistics for RTL8187B
rtl8187: Fix for TX sequence number problem
mac80211: append CONFIG_ to MAC80211_VERBOSE_PS_DEBUG in net/mac80211/tx.c.
mac80211: fix sparse integer as NULL pointer warning
drivers/net/wireless/iwlwifi/iwl-led.c: printk fix
mac80211: return correct error return from ieee80211_wep_init
mac80211: tx, use dev_kfree_skb_any for beacon_get
rt2x00: Clear queue entry flags during initialization
rt2x00: Force full register config after start()
...
Replace the AMO_t typedef by a direct reference to 'struct amo'.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zap_vma_ptes() is intended to be used by drivers to unmap ptes assigned to the
driver private vmas. This interface is similar to zap_page_range() but is
less general & less likely to be abused.
Needed by the GRU driver.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The file kernel.h contains the upper_32_bits macro. This patch adds the
other part, the lower_32_bits macro. Its first use will be in the driver
for AMD IOMMU.
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a BlackBoard user to connector. BlackBoard is part of the TSP GPL
sampling framework (http://savannah.nongnu.org/p/tsp)
[akpm@linux-foundation.org: add comment]
Signed-off-by: Jerome Arbez-Gindre <jeromearbezgindre@gmail.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It has no user now
Also print out info about adding/removing active regions.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar provided a fix to not call _PPC at processor driver
initialization time in "[PATCH] ACPI: fix cpufreq regression" (git
commit e4233dec74)
But it can still happen that _PPC is called at processor driver
initialization time.
This patch should make sure that this is not possible anymore.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid one-off errors by introducing a resource_size() function.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current style for debug messages is to ensure they're always
parsed by the compiler and then subjected to dead code removal.
That way builds won't break only when debug options get enabled,
which is common when they are stripped out early by CPP.
This patch makes CONFIG_MTD_DEBUG adopt that convention.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Current implementation of subpage read feature for NAND has issues with
small page devices. Small page NAND do not support RNDOUT command.
So subpage feature is not applicable for them.
This patch disables support of subpage for small page NAND.
The code is verified on nandsim(SP NAND simulation) and on LP NAND
devices.
Thanks a lot to Artem for finding this issue.
Signed-off-by: Alexey Korolev <akorolev@infradead.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This adds a regulator driver for the TI bq24022 Single-Chip
Li-Ion Charger with its nCE and ISET2 pins connected to GPIOs.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
This patch adds support for fixed regulators. This class of regulator is
not software controllable but can coexist on machines with software
controlable regulators.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
This interface is for machine specific code and allows the creation of
voltage/current domains (with constraints) for each regulator. It can
provide regulator constraints that will prevent device damage through
overvoltage or over current caused by buggy client drivers. It also
allows the creation of a regulator tree whereby some regulators are
supplied by others (similar to a clock tree).
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This allows regulator drivers to register their regulators and provide
operations to the core. It also has a notifier call chain for propagating
regulator events to clients.
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Add support to allow consumer device drivers to control their regulator
power supply.
This uses a similar API to the kernel clock interface in that consumer
drivers can get and put a regulator (like they can with clocks atm) and
get/set voltage, current limit, mode, enable and disable. This should
allow consumers complete control over their supply voltage and current
limit. This also compiles out if not in use so drivers can be reused in
systems with no regulator based power control.
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Implement lockless get_user_pages_fast for 64-bit powerpc.
Page table existence is guaranteed with RCU, and speculative page references
are used to take a reference to the pages without having a prior existence
guarantee on them.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch fixes mac80211 to not use the skb->cb over the queue step
from virtual interfaces to the master. The patch also, for now,
disables aggregation because that would still require requeuing,
will fix that in a separate patch. There are two other places (software
requeue and powersaving stations) where requeue can happen, but that is
not currently used by any drivers/not possible to use respectively.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reorder fields in struct rfkill and add comments to make it clear
which fields are protected by rfkill->mutex.
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
I forgot this in the previous patch that made it unused.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
dapm_reg_event() is used by devices using SND_SOC_DAPM_REG() so needs to
be exported to support building them as modules and prototyped to avoid
sparse warnings and potential build issues.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This patch cleans up the handling of the maple bus queue to remove
the risk of races when adding packets. It also removes references to the
redundant connect and disconnect functions.
Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This IOMMU helper function doesn't work for some architectures:
http://marc.info/?l=linux-kernel&m=121699304403202&w=2
It also breaks POWER and SPARC builds:
http://marc.info/?l=linux-kernel&m=121730388001890&w=2
Currently, only x86 IOMMUs use this so let's move it to x86 for
now.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Synchronize changes to host virtual addresses which are part of
a KVM memory slot to the KVM shadow mmu. This allows pte operations
like swapping, page migration, and madvise() to transparently work
with KVM.
Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* 'for-linus' of git://git.o-hand.com/linux-mfd:
mfd: accept pure device as a parent, not only platform_device
mfd: add platform_data to mfd_cell
mfd: Coding style fixes
mfd: Use to_platform_device instead of container_of
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
x86/PCI: use dev_printk when possible
PCI: add D3 power state avoidance quirk
PCI: fix bogus "'device' may be used uninitialized" warning in pci_slot
PCI: add an option to allow ASPM enabled forcibly
PCI: disable ASPM on pre-1.1 PCIe devices
PCI: disable ASPM per ACPI FADT setting
PCI MSI: Don't disable MSIs if the mask bit isn't supported
PCI: handle 64-bit resources better on 32-bit machines
PCI: rewrite PCI BAR reading code
PCI: document pci_target_state
PCI hotplug: fix typo in pcie hotplug output
x86 gart: replace to_pages macro with iommu_num_pages
x86, AMD IOMMU: replace to_pages macro with iommu_num_pages
iommu: add iommu_num_pages helper function
dma-coherent: add documentation to new interfaces
Cris: convert to using generic dma-coherent mem allocator
Sh: use generic per-device coherent dma allocator
ARM: support generic per-device coherent dma mem
Generic dma-coherent: fix DMA_MEMORY_EXCLUSIVE
x86: use generic per-device dma coherent allocator
...
When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.
I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment. Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.
So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate. This can
reduce read IO and improve system throughput.
I wrote a benchmark program and got result number with this program.
This benchmark do:
1: mount and open a test file.
2: create a 512MB file.
3: close a file and umount.
4: mount and again open a test file.
5: pwrite randomly 300000 times on a test file. offset is aligned
by IO size(1024bytes).
6: measure time of preading randomly 100000 times on a test file.
The result was:
2.6.26
330 sec
2.6.26-patched
226 sec
Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB
On ext3/4, a file is written through buffer/block. So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment. This test result
showed this.
The benchmark program is as follows:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#define LEN 1024
#define LOOP 1024*512 /* 512MB */
main(void)
{
unsigned long i, offset, filesize;
int fd;
char buf[LEN];
time_t t1, t2;
if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
memset(buf, 0, LEN);
fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}
for (i = 0; i < LOOP; i++)
write(fd, buf, LEN);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
fd = open("/root/test1/testfile", O_RDWR);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}
filesize = LEN * LOOP;
for (i = 0; i < 300000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pwrite(fd, buf, LEN, offset);
}
printf("start test\n");
time(&t1);
for (i = 0; i < 100000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pread(fd, buf, LEN, offset);
}
time(&t2);
printf("%ld sec\n", t2-t1);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
}
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The original "Pass the bus number we expect the S3C24XX SPI driver to
attach to via the platform data." [1] patch was mis-sent, and missed two
important parts of the diff, which was to actually set the bus_num field
and add the relevant field to the platform data.
The previous commit 50f426b55d promised to
add a bus_num field, but failed to include the two hunks that added this
field to include/asm-arm/arch-s3c2410/spi.h and then pass it to the spi
core when creating the new master field in drivers/spi/spi_s3c24xx.c.
[1] git commit 50f426b55d
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages.
There are secondary MMUs (with secondary sptes and secondary tlbs) too.
sptes in the kvm case are shadow pagetables, but when I say spte in
mmu-notifier context, I mean "secondary pte". In GRU case there's no
actual secondary pte and there's only a secondary tlb because the GRU
secondary MMU has no knowledge about sptes and every secondary tlb miss
event in the MMU always generates a page fault that has to be resolved by
the CPU (this is not the case of KVM where the a secondary tlb miss will
walk sptes in hardware and it will refill the secondary tlb transparently
to software if the corresponding spte is present). The same way
zap_page_range has to invalidate the pte before freeing the page, the spte
(and secondary tlb) must also be invalidated before any page is freed and
reused.
Currently we take a page_count pin on every page mapped by sptes, but that
means the pages can't be swapped whenever they're mapped by any spte
because they're part of the guest working set. Furthermore a spte unmap
event can immediately lead to a page to be freed when the pin is released
(so requiring the same complex and relatively slow tlb_gather smp safe
logic we have in zap_page_range and that can be avoided completely if the
spte unmap event doesn't require an unpin of the page previously mapped in
the secondary MMU).
The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and know
when the VM is swapping or freeing or doing anything on the primary MMU so
that the secondary MMU code can drop sptes before the pages are freed,
avoiding all page pinning and allowing 100% reliable swapping of guest
physical address space. Furthermore it avoids the code that teardown the
mappings of the secondary MMU, to implement a logic like tlb_gather in
zap_page_range that would require many IPI to flush other cpu tlbs, for
each fixed number of spte unmapped.
To make an example: if what happens on the primary MMU is a protection
downgrade (from writeable to wrprotect) the secondary MMU mappings will be
invalidated, and the next secondary-mmu-page-fault will call
get_user_pages and trigger a do_wp_page through get_user_pages if it
called get_user_pages with write=1, and it'll re-establishing an updated
spte or secondary-tlb-mapping on the copied page. Or it will setup a
readonly spte or readonly tlb mapping if it's a guest-read, if it calls
get_user_pages with write=0. This is just an example.
This allows to map any page pointed by any pte (and in turn visible in the
primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an
full MMU with both sptes and secondary-tlb like the shadow-pagetable layer
with kvm), or a remote DMA in software like XPMEM (hence needing of
schedule in XPMEM code to send the invalidate to the remote node, while no
need to schedule in kvm/gru as it's an immediate event like invalidating
primary-mmu pte).
At least for KVM without this patch it's impossible to swap guests
reliably. And having this feature and removing the page pin allows
several other optimizations that simplify life considerably.
Dependencies:
1) mm_take_all_locks() to register the mmu notifier when the whole VM
isn't doing anything with "mm". This allows mmu notifier users to keep
track if the VM is in the middle of the invalidate_range_begin/end
critical section with an atomic counter incraese in range_begin and
decreased in range_end. No secondary MMU page fault is allowed to map
any spte or secondary tlb reference, while the VM is in the middle of
range_begin/end as any page returned by get_user_pages in that critical
section could later immediately be freed without any further
->invalidate_page notification (invalidate_range_begin/end works on
ranges and ->invalidate_page isn't called immediately before freeing
the page). To stop all page freeing and pagetable overwrites the
mmap_sem must be taken in write mode and all other anon_vma/i_mmap
locks must be taken too.
2) It'd be a waste to add branches in the VM if nobody could possibly
run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if
CONFIG_KVM=m/y. In the current kernel kvm won't yet take advantage of
mmu notifiers, but this already allows to compile a KVM external module
against a kernel with mmu notifiers enabled and from the next pull from
kvm.git we'll start using them. And GRU/XPMEM will also be able to
continue the development by enabling KVM=m in their config, until they
submit all GRU/XPMEM GPLv2 code to the mainline kernel. Then they can
also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM=n).
This guarantees nobody selects MMU_NOTIFIER=y if KVM and GRU and XPMEM
are all =n.
The mmu_notifier_register call can fail because mm_take_all_locks may be
interrupted by a signal and return -EINTR. Because mmu_notifier_reigster
is used when a driver startup, a failure can be gracefully handled. Here
an example of the change applied to kvm to register the mmu notifiers.
Usually when a driver startups other allocations are required anyway and
-ENOMEM failure paths exists already.
struct kvm *kvm_arch_create_vm(void)
{
struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+ int err;
if (!kvm)
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+ kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops;
+ err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm);
+ if (err) {
+ kfree(kvm);
+ return ERR_PTR(err);
+ }
+
return kvm;
}
mmu_notifier_unregister returns void and it's reliable.
The patch also adds a few needed but missing includes that would prevent
kernel to compile after these changes on non-x86 archs (x86 didn't need
them by luck).
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix mm/filemap_xip.c build]
[akpm@linux-foundation.org: fix mm/mmu_notifier.c build]
Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Chris Wright <chrisw@redhat.com>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Izik Eidus <izike@qumranet.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm_take_all_locks holds off reclaim from an entire mm_struct. This allows
mmu notifiers to register into the mm at any time with the guarantee that
no mmu operation is in progress on the mm.
This operation locks against the VM for all pte/vma/mm related operations
that could ever happen on a certain mm. This includes vmtruncate,
try_to_unmap, and all page faults.
The caller must take the mmap_sem in write mode before calling
mm_take_all_locks(). The caller isn't allowed to release the mmap_sem
until mm_drop_all_locks() returns.
mmap_sem in write mode is required in order to block all operations that
could modify pagetables and free pages without need of altering the vma
layout (for example populate_range() with nonlinear vmas). It's also
needed in write mode to avoid new anon_vmas to be associated with existing
vmas.
A single task can't take more than one mm_take_all_locks() in a row or it
would deadlock.
mm_take_all_locks() and mm_drop_all_locks are expensive operations that
may have to take thousand of locks.
mm_take_all_locks() can fail if it's interrupted by signals.
When mmu_notifier_register returns, we must be sure that the driver is
notified if some task is in the middle of a vmtruncate for the 'mm' where
the mmu notifier was registered (mmu_notifier_invalidate_range_start/end
is run around the vmtruncation but mmu_notifier_register can run after
mmu_notifier_invalidate_range_start and before
mmu_notifier_invalidate_range_end). Same problem for rmap paths. And
we've to remove page pinning to avoid replicating the tlb_gather logic
inside KVM (and GRU doesn't work well with page pinning regardless of
needing tlb_gather), so without mm_take_all_locks when vmtruncate frees
the page, kvm would have no way to notice that it mapped into sptes a page
that is going into the freelist without a chance of any further
mmu_notifier notification.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Chris Wright <chrisw@redhat.com>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Izik Eidus <izike@qumranet.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adding platform_data to mfd_cell allows passing of platform data directly
to the platform_device created for each cell and thus reuse of existing
drivers.
On the other side it can be used as a hook to mfd_cell itself
removing the need in mfd_get_cell method.
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
This follows the sparc changes a439fe51a1.
Most of the moving about was done with Sam's directions at:
http://marc.info/?l=linux-sh&m=121724823706062&w=2
with subsequent hacking and fixups entirely my fault.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Libata has some hacks to deal with certain controllers going silly in D3
state. The right way to handle this is to keep a PCI device flag for
such devices. That can then be generalised for no ATA devices with power
problems.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Disable ASPM on pre-1.1 PCIe devices, as many of them don't implement it
correctly.
Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The ACPI FADT table includes an ASPM control bit. If the bit is set, do
not enable ASPM since it may indicate that the platform doesn't actually
support the feature.
Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
As noted by Adrian:
Commit 3ab8352137 (kexec jump)
causes the following build error on sh:
<-- snip -->
...
CC kernel/kexec.o
{standard input}: Assembler messages:
{standard input}:1518: Error: offset to unaligned destination
make[2]: *** [kernel/kexec.o] Error 1
<-- snip -->
If I understand the assembler correctly it fails at
include/asm-sh/kexec.h:59
The issue here is that the mova reference lacks an explicit alignment,
and previous code paths would end up with this on a 16-bit boundary,
so we make the alignment explicit.
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.
In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.
So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:
static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
{
const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
p -= cpu / BITS_PER_LONG;
return (const cpumask_t *)p;
}
This brings other advantages and simplifications as well:
- we are not wasting memory that is just filled with a single bit in
various different places
- we don't need all those games to re-create the arrays in some dense
format, because they're already going to be dense enough.
if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).
[ mingo@elte.hu:
Converted Linus's mails into a commit. See:
http://lkml.org/lkml/2008/7/27/156http://lkml.org/lkml/2008/7/28/320
Also applied a family filter - which also has the side-effect of leaving
out the bits where Linus calls me an idio... Oh, never mind ;-)
]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
powerpc: Disable 64K hugetlb support when doing 64K SPU mappings
powerpc/powermac: Fixup default serial port device for pmac_zilog
powerpc/powermac: Use sane default baudrate for SCC debugging
powerpc/mm: Implement _PAGE_SPECIAL & pte_special() for 64-bit
powerpc: Show processor cache information in sysfs
powerpc: Make core id information available to userspace
powerpc: Make core sibling information available to userspace
powerpc/vio: More fallout from dma_mapping_error API change
ibmveth: Fix multiple errors with dma_mapping_error conversion
powerpc/pseries: Fix CMO sysdev attribute API change fallout
powerpc: Enable tracehook for the architecture
powerpc: Add TIF_NOTIFY_RESUME support for tracehook
powerpc: Add asm/syscall.h with the tracehook entry points
powerpc: Make syscall tracing use tracehook.h helpers
powerpc: Call tracehook_signal_handler() when setting up signal frames
powerpc: Update cpu_sibling_maps dynamically
powerpc: register_cpu_online should be __cpuinit
powerpc: kill useless SMT code in prom_hold_cpus
powerpc: Fix 8xx build failure
powerpc: Fix vio build warnings
...
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (72 commits)
sh: SuperH Mobile CEU and camera platform data for AP325RXA
sh: Update smc911x platform data for AP325RXA
sh: SuperH Mobile LCDC platform data for AP325RXA
sh: Add SuperH Mobile CEU platform data for Migo-R
sh: Add SuperH Mobile LCDC platform data for Migo-R
sh: Move asid_cache() out of ifdef to fix SH-3/4 nommu build.
sh: Workaround for __put_user_asm() bug with gcc 4.x on big-endian.
sh: Wire up new syscalls.
sh: fix uImage Entry Point
sh_keysc: remove request_mem_region() and release_mem_region()
sh: Don't miss pending signals returning to user mode after signal processing
sh: Use clk_always_enable() on sh7366
sh: Use clk_always_enable() on sh7343 / SE77343
sh: Use clk_always_enable() on sh7722 / Migo-R / SE7722
sh: Use clk_always_enable() on sh7723 / ap325rxa
sh: Introduce clk_always_enable() function
sh: Show all clocks and their state in /proc/clocks
sh: Merge sh7343 and sh7722 clock code
sh: Add SuperH Mobile MSTPCR bits to clock framework
sh: Use arch_flags to simplify sh7722 siu clock code
...
The CPM2 BRG setup functions cpm_setbrg and cpm2_fastbrg don't support
external clocks. This patch adds a new exported __cpm2_setbrg function
that takes the clock rate and clock source as extra parameters, and moves
cpm_setbrg and cpm2_fastbrg to include/asm-powerpc/cpm2.h where they
become inline wrappers around __cpm2_setbrg.
Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch implement GPIO LIB support for the CPM2 GPIOs. The code can
also be used for CPM1 GPIO port E, as both cores are compatible at the
register level.
Based on earlier work by Laurent Pinchart.
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Cc: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Allow the platform data to specify the bus bumber that the
new I2C bus will be given. This is to allow the use of the
board registration mechanism to specify the new style of
I2C device registration which allows boards to provide a
list of attached devices.
Note, as discussed on the mailing list, we have dropped
backwards compatibility of adding an dynamic bus number
as it should not affect most boards to have the bus pinned
to 0 if they have either not specified platform data for
driver. Any board supplying platform data will automatically
have the bus_num field set to 0, and anyone who needs the
driver on a different bus number can supply platform data
to set bus_num.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Add Migo-R specific platform data for on-chip sh7722 CEU and ov772x camera.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Use clk_always_enable() on the sh7343 processor and in the board code
for Solution Engine 7343. Remove duplicate MSTPCR register definitions.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Use clk_always_enable() on the sh7722 processor and in the board code
for Migo-R and Solution Engine 7722. Remove duplicate MSTPCR register
definitions.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add SuperH specific funcion clk_always_enable(), useful to enable MSTPCR
bits in processor or board specific code.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Handle module stop clock bits in MSTPCRn through the clock framework.
The clocks are named after the bits in the data sheet. The association
between bit number and hardware block is processor specific.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add arch_flags to struct clk so we can keep per-clock private data
somewhere and share code between multiple clocks.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This adds initial support for the Renesas R0P7785LC0011RL board.
This patch supports 29bit address mode only.
Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch adds physically contiguous memory chunks to the UIO devices.
The same strategy can be used in the future for the CEU as well.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
updated the following codes for Solution Endine 7343:
- fix compile error in arch/sh/boards/se/7343/irq.c
- add nor flash physmaps
- update defconfig
Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch is based on interrupt acknowledge code for external
interrupt sources on sh3 processors and adds on sh4a processors.
Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add implementation of flush_icache_range() suitable for signal handler
and kprobes. Remove flush_cache_sigtramp() and change signal.c to use
flush_icache_range().
Signed-off-by: Chris Smith <chris.smith@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add support SH-Ether for Hitachi Solution Engine.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch contains the following cleanups:
- make the following needlessly global code static:
- cf-enabler.c: cf_init()
- cpu/clock.c: __clk_enable()
- cpu/clock.c: __clk_disable()
- process_32.c: default_idle()
- time_32.c: struct clocksource_sh
- timers/timer-tmu.c: struct tmu_timer_ops
- remove the following unused functions (no CONFIG_BLK_DEV_FD on sh):
- process_{32,64}.c: disable_hlt()
- process_{32,64}.c: enable_hlt()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The connect and disconnect functions are unnecessary - everything they do can be
accomplished in the initial probe - so remove them.
Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch adds basic support for the SH7763RDP board.
This supports a basic stuff provided in SH7763, like SCIF,
NOR Flash and USB host.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This consolidates everything but the bare assembly routines, which we
will sync up in a follow-up patch.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
16kB is a useful size on nommu, while 64kB still tends to be too big to
be useful. Newer MMUs are likely to support this as well, so plug it
in in anticipation of those, too.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This moves get_fs/set_fs() and friends in to asm/segment.h. The
mm_segment_t definition is likewise consolidated from the _32/_64 split.
This is prepatory groundwork for using the generic address space limit
and verification routines across mmu/nommu configs.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
CONFIG_SUPERH32 is currently trickling into userspace unistd.h. Attached
patch uses __SH5__ define in userspace.
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This add a dcache entry to the dcache for lookup, but changing the name
that is associated with the entry rather than the one passed in to the
lookup routine.
First, it sees if the case-exact match already exists in the dcache and
uses it if one exists. Otherwise, it allocates a new node with the new
name and splices it into the dcache.
Original code from ntfs_lookup in fs/ntfs/namei.c by Anton Altaparmakov.
Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Acked-by: Christoph Hellwig <hch@infradead.org>
Implement _PAGE_SPECIAL and pte_special() for 64-bit powerpc. This bit will
be used by the fast get_user_pages() to differenciate PTEs that correspond
to a valid struct page from special mappings that don't such as IO mappings
obtained via io_remap_pfn_ranges().
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Existing Open Firmware practice is to report each processor core as a
separate node in the device tree. Report the value of the "reg" OF
property corresponding to a logical CPU's device node as the core_id
attribute in /sys/devices/system/cpu/cpu*/topology/core_id.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Implement the notion of "core siblings" for powerpc. This makes
/sys/devices/system/cpu/cpu*/topology/core_siblings present sensible
values, indicating online CPUs which share an L2 cache.
BenH: Made cpu_to_l2cache() use of_find_node_by_phandle() instead
of IBM-specific open coded search
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds TIF_NOTIFY_RESUME support for powerpc. When set,
we call tracehook_notify_resume() on the way to user mode.
This overloads do_signal() to do the work, but changes its
arguments to it has the TIF_* bits handy in a register and
drops the useless first argument that was always zero.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add asm/syscall.h for powerpc with all the required entry points.
This will allow arch-independent tracing code for system calls.
BenH: Fixed up use of regs->trap to properly mask low bit
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The 'powerpc ioremap_prot' broke 8xx builds:
include2/asm/pgtable-ppc32.h:555: error: '_PAGE_WRITETHRU' undeclared (first use in this function)
include2/asm/pgtable-ppc32.h:555: error: (Each undeclared identifier is reported only once
include2/asm/pgtable-ppc32.h:555: error: for each function it appears in.)
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Instead of a "cpu" arg with magic values NR_CPUS (any cpu) and ~0 (all
cpus), pass a cpumask_t. Allow NULL for the common case (where we
don't care which CPU the function is run on): temporary cpumask_t's
are usually considered bad for stack space.
This deprecates stop_machine_run, to be removed soon when all the
callers are dead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
stop_machine creates a kthread which creates kernel threads. We can
create those threads directly and simplify things a little. Some care
must be taken with CPU hotunplug, which has special needs, but that code
seems more robust than it was in the past.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
-allow stop_mahcine_run() to call a function on all cpus. Calling
stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
stop_machine_run() proceeds as normal until the calling cpu has
invoked 'fn'. Then, we tell all the other cpus to call 'fn'.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
Simplify the code of include/linux/task_io_accounting.h.
It is also more reasonable to have all the task i/o-related statistics in a
single struct (task_io_accounting).
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The majority of this patch was created by the following script:
***
ASM=arch/sparc/include/asm
mkdir -p $ASM
git mv include/asm-sparc64/ftrace.h $ASM
git rm include/asm-sparc64/*
git mv include/asm-sparc/* $ASM
sed -ie 's/asm-sparc64/asm/g' $ASM/*
sed -ie 's/asm-sparc/asm/g' $ASM/*
***
The rest was an update of the top-level Makefile to use sparc
for header files when sparc64 is being build.
And a small fixlet to pick up the correct unistd.h from
sparc64 code.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: state userland requirements in Kconfig help
firewire: avoid memleak after phy config transmit failure
firewire: fw-ohci: TSB43AB22/A dualbuffer workaround
firewire: queue the right number of data
firewire: warn on unfinished transactions during card removal
firewire: small fw_fill_request cleanup
firewire: fully initialize fw_transaction before marking it pending
firewire: fix race of bus reset with request transmission
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (59 commits)
[SCSI] replace __FUNCTION__ with __func__
[SCSI] extend the last_sector_bug flag to cover more sectors
[SCSI] qla2xxx: Update version number to 8.02.01-k6.
[SCSI] qla2xxx: Additional NPIV corrections.
[SCSI] qla2xxx: suppress uninitialized-var warning
[SCSI] qla2xxx: use memory_read_from_buffer()
[SCSI] qla2xxx: Issue proper ISP callbacks during stop-firmware.
[SCSI] ch: fix ch_remove oops
[SCSI] 3w-9xxx: add MSI support and misc fixes
[SCSI] scsi_lib: use blk_rq_tagged in scsi_request_fn
[SCSI] ibmvfc: Update driver version to 1.0.1
[SCSI] ibmvfc: Add ADISC support
[SCSI] ibmvfc: Miscellaneous fixes
[SCSI] ibmvfc: Fix hang on module removal
[SCSI] ibmvfc: Target refcounting fixes
[SCSI] ibmvfc: Reduce unnecessary log noise
[SCSI] sym53c8xx: free luntbl in sym_hcb_free
[SCSI] scsi_scan.c: Release mutex in error handling code
[SCSI] scsi_eh_prep_cmnd should save scmd->underflow
[SCSI] sd: Support for SCSI disk (SBC) Data Integrity Field
...
* git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
avr32: some mmc/sd cleanups
include/video/atmel_lcdc.h must #include <linux/workqueue.h>
avr32: allow system timer to share interrupt to make OProfile work
drivers/misc/atmel-ssc.c: Removed duplicated include
avr32: Add platform data for AC97C platform device
avr32: clean up mci platform code
fix avr32 build errors
* 'kvm-updates-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: ppc: fix invalidation of large guest pages
KVM: s390: Fix possible host kernel bug on lctl(g) handling
KVM: s390: Fix instruction naming for lctlg
KVM: s390: Fix program check on interrupt delivery handling
KVM: s390: Change guestaddr type in gaccess
KVM: s390: Fix guest kconfig
KVM: s390: Advertise KVM_CAP_USER_MEMORY
KVM: ia64: Fix irq disabling leak in error handling code
KVM: VMX: Fix undefined beaviour of EPT after reload kvm-intel.ko
KVM: VMX: Fix bypass_guest_pf enabling when disable EPT in module parameter
KVM: task switch: translate guest segment limit to virt-extension byte granular field
KVM: Avoid instruction emulation when event delivery is pending
KVM: task switch: use seg regs provided by subarch instead of reading from GDT
KVM: task switch: segment base is linear address
KVM: SVM: allow enabling/disabling NPT by reloading only the architecture module
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits)
setlocalversion: do not describe if there is nothing to describe
kconfig: fix typos: "Suport" -> "Support"
kconfig: make defconfig is no longer chatty
kconfig: make oldconfig is now less chatty
kconfig: speed up all*config + randconfig
kconfig: set all new symbols automatically
kconfig: add diffconfig utility
kbuild: remove Module.markers during mrproper
kbuild: sparse needs CF not CHECKFLAGS
kernel-doc: handle/strip __init
vmlinux.lds: move __attribute__((__cold__)) functions back into final .text section
init: fix URL of "The GNU Accounting Utilities"
kbuild: add arch/$ARCH/include to search path
kbuild: asm symlink support for arch/$ARCH/include
kbuild: support arch/$ARCH/include for tags, cscope
kbuild: prepare headers_* for arch/$ARCH/include
kbuild: install all headers when arch is changed
kbuild: make clean removes *.o.* as well
kbuild: optimize headers_* targets
kbuild: only one call for include/ in make headers_*
...
Put all i/o statistics in struct proc_io_accounting and use inline functions to
initialize and increment statistics, removing a lot of single variable
assignments.
This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and
CONFIG_TASK_IO_ACCOUNTING=y).
text data bss dec hex filename
11651 0 0 11651 2d83 kernel/exit.o.before
11619 0 0 11619 2d63 kernel/exit.o.after
10886 132 136 11154 2b92 kernel/fork.o.before
10758 132 136 11026 2b12 kernel/fork.o.after
3082029 807968 4818600 8708597 84e1f5 vmlinux.o.before
3081869 807968 4818600 8708437 84e155 vmlinux.o.after
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Piss-poor sysctl registration API strikes again, film at 11...
What we really need is _pathname_ required to be present in already
registered table, so that kernel could warn about bad order. That's the
next target for sysctl stuff (and generally saner and more explicit
order of initialization of ipv[46] internals wouldn't hurt either).
For the time being, here are full fixups required by ..._rotable()
stuff; we make per-net sysctl sets descendents of "ro" one and make sure
that sufficient skeleton is there before we start registering per-net
sysctls.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The last_sector_bug flag was added to work around a bug in certain usb
cardreaders, where they would crash if a multiple sector read included the
last sector. The original implementation avoids this by e.g. splitting an 8
sector read which includes the last sector into a 7 sector read, and a single
sector read for the last sector. The flag is enabled for all USB devices.
This revealed a second bug in other usb cardreaders, which crash when they
get a multiple sector read which stops 1 sector short of the last sector.
Affected hardware includes the Kingston "MobileLite" external USB cardreader
and the internal USB cardreader on the Asus EeePC.
Extend the last_sector_bug workaround to ensure that any access which touches
the last 8 hardware sectors of the device is a single sector long. Requests
are shrunk as necessary to meet this constraint.
This gives us a safety margin against potential unknown or future bugs
affecting multi-sector access to the end of the device. The two known bugs
only affect the last 2 sectors. However, they suggest that these devices
are prone to fencepost errors and that multi-sector access to the end of the
device is not well tested. Popular OS's use multi-sector accesses, but they
rarely read the last few sectors. Linux (with udev & vol_id) automatically
reads sectors from the end of the device on insertion. It is assumed that
single sector accesses are more thoroughly tested during development.
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The VID_TYPE defines are V4L1 specific, so copy them back to videodev.h.
In videodev2.h ensure that they are not used in the kernel (you need
to include videodev.h instead) and mark them are deprecated.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The type and type2 fields were unused and so could be removed.
Instead add a vfl_type field that contains the type of the video
device.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
SPCA505 and SPCA508 added in the pixel formats.
Decode functions and associated resources removed in spca505, 506 and 508.
The decode routines are now found in the V4L library.
Signed-off-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
cacheflush.h was doing:
... VIVT only stuff
... VIPT only stuff
... VIVT or VIPT stuff
which is clearly bogus - we would only ever use the "VIVT or VIPT" case
when both VIVT and VIPT are not selected. Fix this.
Add comments to each case, including noting the impossibility of
correctly detecting the cache type of ARM926 and ARMv6 cores from
the cache type register in the "VIVT or VIPT" case.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When guest invalidates a large tlb map, there may be more than one
corresponding shadow tlb maps that need to be invalidated. Use eaddr and eend
to find these shadow tlb maps.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
IRQT_* and __IRQT_* were obsoleted long ago by patch [3692/1].
Remove them completely. Sed script for the reference:
s/__IRQT_RISEDGE/IRQ_TYPE_EDGE_RISING/g
s/__IRQT_FALEDGE/IRQ_TYPE_EDGE_FALLING/g
s/__IRQT_LOWLVL/IRQ_TYPE_LEVEL_LOW/g
s/__IRQT_HIGHLVL/IRQ_TYPE_LEVEL_HIGH/g
s/IRQT_RISING/IRQ_TYPE_EDGE_RISING/g
s/IRQT_FALLING/IRQ_TYPE_EDGE_FALLING/g
s/IRQT_BOTHEDGE/IRQ_TYPE_EDGE_BOTH/g
s/IRQT_LOW/IRQ_TYPE_LEVEL_LOW/g
s/IRQT_HIGH/IRQ_TYPE_LEVEL_HIGH/g
s/IRQT_PROBE/IRQ_TYPE_PROBE/g
s/IRQT_NOEDGE/IRQ_TYPE_NONE/g
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Lets fix the name for the lctlg instruction...
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
All registers are unsigned long types. This patch changes all occurences
of guestaddr in gaccess from u64 to unsigned long.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If NPT is enabled after loading both KVM modules on AMD and it should be
disabled, both KVM modules must be reloaded. If only the architecture module is
reloaded the behavior is undefined. With this patch it is possible to disable
NPT only by reloading the kvm_amd module.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files
mlx4_core: Add VLAN tag field to WQE control segment struct
RDMA/nes: CM connection setup/teardown rework
IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG
IPoIB/cm: Connected mode is no longer EXPERIMENTAL
RDMA/ucm: BKL is not needed for ib_ucm_open()
RDMA/ucma: BKL is not needed for ucma_open()
* git://git.infradead.org/mtd-2.6: (57 commits)
[MTD] [NAND] subpage read feature as a way to increase performance.
CPUFREQ: S3C24XX NAND driver frequency scaling support.
[MTD][NAND] au1550nd: remove unused variable
[MTD] jedec_probe: Fix SST 16-bit chip detection
[MTD][MTDPART] Fix a division by zero bug
[MTD][MTDPART] Cleanup and document the erase region handling
[MTD][MTDPART] Handle most checkpatch findings
[MTD][MTDPART] Seperate main loop from per-partition code in add_mtd_partition
[MTD] physmap: resume already suspended chips on failure to suspend
[MTD] physmap: Fix suspend/resume/shutdown bugs.
[MTD] [NOR] Fix -ETIMEO errors in CFI driver
[MTD] [NAND] fsl_elbc_nand: fix section mismatch with CONFIG_MTD_OF_PARTS=y
[JFFS2] Use .unlocked_ioctl
[MTD] Fix const assignment in the MTD command line partitioning driver
[MTD] [NOR] gen_probe: No debug message when debugging is disabled
[MTD] [NAND] remove __PPC__ hardcoded address from DiskOnChip drivers
[MTD] [MAPS] Remove the bast-flash driver.
[MTD] [NAND] fsl_elbc_nand: ecclayout cleanups
[MTD] [NAND] fsl_elbc_nand: implement support for flash-based BBT
[MTD] [NAND] fsl_elbc_nand: fix OOB workability for large page NAND chips
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
atmel-mci: debugfs support
mmc: Add per-card debugfs support
mmc: Export internal host state through debugfs
imxmmc: fix crash when no platform data is provided
imxmmc: fix platform resources
imxmmc: remove DEBUG definition
mmc_spi: put signals to low power off fix
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits)
[PATCH] fix RLIM_NOFILE handling
[PATCH] get rid of corner case in dup3() entirely
[PATCH] remove remaining namei_{32,64}.h crap
[PATCH] get rid of indirect users of namei.h
[PATCH] get rid of __user_path_lookup_open
[PATCH] f_count may wrap around
[PATCH] dup3 fix
[PATCH] don't pass nameidata to __ncp_lookup_validate()
[PATCH] don't pass nameidata to gfs2_lookupi()
[PATCH] new (local) helper: user_path_parent()
[PATCH] sanitize __user_walk_fd() et.al.
[PATCH] preparation to __user_walk_fd cleanup
[PATCH] kill nameidata passing to permission(), rename to inode_permission()
[PATCH] take noexec checks to very few callers that care
Re: [PATCH 3/6] vfs: open_exec cleanup
[patch 4/4] vfs: immutable inode checking cleanup
[patch 3/4] fat: dont call notify_change
[patch 2/4] vfs: utimes cleanup
[patch 1/4] vfs: utimes: move owner check into inode_change_ok()
[PATCH] vfs: use kstrdup() and check failing allocation
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netns: fix ip_rt_frag_needed rt_is_expired
netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences
netfilter: fix double-free and use-after free
netfilter: arptables in netns for real
netfilter: ip{,6}tables_security: fix future section mismatch
selinux: use nf_register_hooks()
netfilter: ebtables: use nf_register_hooks()
Revert "pkt_sched: sch_sfq: dump a real number of flows"
qeth: use dev->ml_priv instead of dev->priv
syncookies: Make sure ECN is disabled
net: drop unused BUG_TRAP()
net: convert BUG_TRAP to generic WARN_ON
drivers/net: convert BUG_TRAP to generic WARN_ON
Remove the following warning when CONFIG_HUGETLB_PAGE is not set:
ipc/shm.c: In function `shm_get_stat':
ipc/shm.c:565: warning: unused variable `h'
[akpm@linux-foundation.org: use tabs, not spaces]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes an off-by-one error in a comment.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs.h needs path.h, not namei.h; nfs_fs.h doesn't need it at all.
Several places in the tree needed direct include.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* do not pass nameidata; struct path is all the callers want.
* switch to new helpers:
user_path_at(dfd, pathname, flags, &path)
user_path(pathname, &path)
user_lpath(pathname, &path)
user_path_dir(pathname, &path) (fail if not a directory)
The last 3 are trivial macro wrappers for the first one.
* remove nameidata in callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a new ia_valid flag: ATTR_TIMES_SET, to handle the
UTIMES_OMIT/UTIMES_NOW and UTIMES_NOW/UTIMES_OMIT cases. In these
cases neither ATTR_MTIME_SET nor ATTR_ATIME_SET is in the flags, yet
the POSIX draft specifies that permission checking is performed the
same way as if one or both of the times was explicitly set to a
timestamp.
See the path "vfs: utimensat(): fix error checking for
{UTIME_NOW,UTIME_OMIT} case" by Michael Kerrisk for the patch
introducing this behavior.
This is a cleanup, as well as allowing filesystems (NFS/fuse/...) to
perform their own permission checking instead of the default.
CC: Ulrich Drepper <drepper@redhat.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
- use kstrdup() instead of kmalloc() + memcpy()
- return NULL if allocating ->mnt_devname failed
- mnt_devname should be const
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* MAY_CHDIR is redundant - it's an equivalent of MAY_ACCESS
* MAY_ACCESS on fuse should affect only the last step of pathname resolution
* fchdir() and chroot() should pass MAY_ACCESS, for the same reason why
chdir() needs that.
* now that we pass MAY_ACCESS explicitly in all cases, LOOKUP_ACCESS can be
removed; it has no business being in nameidata.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... so we ought to pass MAY_CHDIR to vfs_permission() instead of having
it triggered on every step of preceding pathname resolution. LOOKUP_CHDIR
is killed by that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove the unused mode parameter from vfs_symlink and callers.
Thanks to Tetsuo Handa for noticing.
CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
All calls to remove_suid() are made with a file pointer, because
(similarly to file_update_time) it is called when the file is written.
Clean up callers by passing in a file instead of a dentry.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* kill nameidata * argument; map the 3 bits in ->flags anybody cares
about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
MAY_... found in mask.
The obvious next target in that direction is permission(9)
folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* keep references to ctl_table_head and ctl_table in /proc/sys inodes
* grab the former during operations, use the latter for access to
entry if that succeeds
* have ->d_compare() check if table should be seen for one who does lookup;
that allows us to avoid flipping inodes - if we have the same name resolve
to different things, we'll just keep several dentries and ->d_compare()
will reject the wrong ones.
* have ->lookup() and ->readdir() scan the table of our inode first, then
walk all ctl_table_header and scan ->attached_by for those that are
attached to our directory.
* implement ->getattr().
* get rid of insane amounts of tree-walking
* get rid of the need to know dentry in ->permission() and of the contortions
induced by that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In a sense, that's the heart of the series. It's based on the following
property of the trees we are actually asked to add: they can be split into
stem that is already covered by registered trees and crown that is entirely
new. IOW, if a/b and a/c/d are introduced by our tree, then a/c is also
introduced by it.
That allows to associate tree and table entry with each node in the union;
while directory nodes might be covered by many trees, only one will cover
the node by its crown. And that will allow much saner logics for /proc/sys
in the next patches. This patch introduces the data structures needed to
keep track of that.
When adding a sysctl table, we find a "parent" one. Which is to say,
find the deepest node on its stem that already is present in one of the
tables from our table set or its ancestor sets. That table will be our
parent and that node in it - attachment point. Add our table to list
anchored in parent, have it refer the parent and contents of attachment
point. Also remember where its crown lives.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Massage ipv4 initialization - make sure that net.ipv4 appears as
non-per-net-namespace before it shows up in per-net-namespace sysctls.
That's the only change outside of sysctl.c needed to get sane ordering
rules and data structures for sysctls (esp. for procfs side of that
mess).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Refcount the sucker; instead of freeing it by the end of unregistration
just drop the refcount and free only when it hits zero. Make sure that
we _always_ make ->unregistering non-NULL in start_unregistering().
That allows anybody to get a reference to such puppy, preventing its
freeing and reuse. It does *not* block unregistration. Anybody who
holds such a reference can
* try to grab a "use" reference (ctl_head_grab()); that will
succeeds if and only if it hadn't entered unregistration yet. If it
succeeds, we can use it in all normal ways until we release the "use"
reference (with ctl_head_finish()). Note that this relies on having
->unregistering become non-NULL in all cases when one starts to unregister
the sucker.
* keep pointers to ctl_table entries; they *can* be freed if
the entire thing is unregistered. However, if ctl_head_grab() succeeds,
we know that unregistration had not happened (and will not happen until
ctl_head_finish()) and such pointers can be used safely.
IOW, now we can have inodes under /proc/sys keep references to ctl_table
entries, protecting them with references to ctl_table_header and
grabbing the latter for the duration of operations that require access
to ctl_table. That won't cause deadlocks, since unregistration will not
be stopped by mere keeping a reference to ctl_table_header.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
New object: set of sysctls [currently - root and per-net-ns].
Contains: pointer to parent set, list of tables and "should I see this set?"
method (->is_seen(set)).
Current lists of tables are subsumed by that; net-ns contains such a beast.
->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
that to ->list of that ctl_table_set.
[folded compile fixes by rdd for configs without sysctl]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
As suggested by Patrick McHardy, introduce a __krealloc() that doesn't
free the original buffer to fix a double-free and use-after-free bug
introduced by me in netfilter that uses RCU.
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Tested-by: Dieter Ries <clip2@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable support for digital audio processing capability.
This module may be used for special applications that require
cross connecting of bchannels, conferencing, dtmf decoding
echo cancelation, tone generation, and Blowfish encryption and
decryption.
It may use hardware features if available.
Signed-off-by: Karsten Keil <kkeil@suse.de>
For each card successfully added to the bus, create a subdirectory under
the host's debugfs root with information about the card.
At the moment, only a single file is added to the card directory for
all cards: "state". It reflects the "state" field in struct mmc_card,
indicating whether the card is present, readonly, etc.
For MMC and SD cards (not SDIO), another file is added: "status".
Reading this file will ask the card about its current status and
return it. This can be useful if the card just refuses to respond to
any commands, which might indicate that the card state is not what the
MMC core thinks it is (due to a missing stop command, for example.)
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
When CONFIG_DEBUG_FS is set, create a few files under /sys/kernel/debug
containing information about an mmc host's internal state. Currently,
just a single file is created, "ios", which contains information about
the current operating parameters for the bus (clock speed, bus width,
etc.)
Host drivers can add additional files and directories under the host's
root directory by passing the debugfs_root field in struct mmc_host as
the 'parent' parameter to debugfs_create_*.
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, AMD IOMMU: include amd_iommu_last_bdf in device initialization
x86: fix IBM Summit based systems' phys_cpu_present_map on 32-bit kernels
x86, RDC321x: remove gpio.h complications
x86, RDC321x: add to mach-default
crashdump: fix undefined reference to `elfcorehdr_addr'
flag parameters: fix compile error of sys_epoll_create1
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (30 commits)
Blackfin arch: If we double fault, rather than hang forever, reset
Blackfin arch: When icache is off, make sure people know it
Blackfin arch: Fix bug - skip single step in high priority interrupt handler instead of disabling all interrupts in single step debugging.
Blackfin arch: cache the values of vco/sclk/cclk as the overhead of doing so (~24 bytes) is worth avoiding the software mult/div routines
Blackfin arch: fix bug - IMDMA is not type struct dma_register
Blackfin arch: check the EXTBANKS field of the DDRCTL1 register to see if we are using both memory banks
Blackfin arch: Apply Bluetechnix CM-BF527 board support patch
Blackfin arch: Add unwinding for stack info, and a little more detail on trace buffer
Blackfin arch: Add ISP1760 board resources to BF548-EZKIT
Blackfin arch: fix bug - detect 0.1 silicon revision BF527-EZKIT as 0.0 version
Blackfin arch: add missing IORESOURCE_MEM flags to UART3
Blackfin arch: Add return value check in bfin_sir_probe(), remove SSYNC().
Blackfin arch: Extend sram malloc to handle L2 SRAM.
Blackfin arch: Remove useless config option.
Blackfin arch: change L1 malloc to base on slab cache and lists.
Blackfin arch: use local labels and ENDPROC() markings
Blackfin arch: Do not need this dualcore test module in kernel.
Blackfin arch: Allow ptrace to peek and poke application data in L1 data SRAM.
Blackfin arch: Add ANOMALY_05000368 workaround
Blackfin arch: Functional power management support
...
This patch (as1116) fixes a bug in scsi_eh_prep_cmnd() and
scsi_eh_restore_cmnd(). These routines are supposed to save any
values they change and restore them later, but someone forgot to
save & restore scmd->underflow.
This fixes part of the problem reported in Bugzilla #9638.
[jejb: fix up rejections around DIF/DIX]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Implement support for DMA of protection information for devices that
are data integrity capable.
- Add support for mapping an extra scatter-gather list containing
the protection information.
- Allocate protection scsi_data_buffer if host is DIX (integrity DMA)
capable.
- Accessor function for checking whether a device has protection
enabled.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Controllers that support DMA of protection information must be told
explicitly how to handle the I/O. The controller has no knowledge of
the protection capabilities of the target device so this information
must be passed in the scsi_cmnd.
- The protection operation tells the HBA whether to generate, strip or
verify protection information.
- The protection type tells the HBA which layout the target is
formatted with. This is necessary because the controller must be
able to correctly interpret the included protection information in
order to verify it.
- When a scsi_cmnd is reused for error handling the protection
operation must be cleared and saved while error handling is in
progress.
- prot_op and prot_type are placed in an existing hole in scsi_cmnd
and don't cause the structure to grow.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Controllers that support protection information must indicate this to
the SCSI midlayer so that the ULD can prepare scsi_cmnds accordingly.
This patch implements a host mask and various types of protection:
- DIF Type 1-3 (between HBA and disk)
- DIX Type 0-3 (between OS and HBA)
The patch also allows the HBA to set the guard type to something
different than the T10-mandated CRC.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
multipath keeps a separate device table which may be
more current than the built-in one.
So we should make sure to always call ->attach whenever
a multipath map with hardware handler is instantiated.
And we should call ->detach on removal, too.
[sekharan: update as per comments from agk]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch converts the EMC device handler to use a proper
state machine. We now also parse the extended INQUIRY
information to determine if long trespass commands are
supported. And we're now using the long trespass command
correctly. And finally there's now an check at init time
to refuse to attach to devices not supporting EMC-specific
VPD pages.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Instead of having each and every driver implement its own
device table scanning code we should rather implement a common
routine and scan the device tables there.
This allows us also to implement a general notifier chain
callback for all device handler instead for one per handler.
[sekharan: Fix rejections caused by conflicting bug fix]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Daniel Debonzi reports that he has managed to wrap host_no. Increasing
the number of host numbers available to 32-bit from 16-bit allows the
problem to be evaded for another hundred years.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The following functions can now become static:
- rtc_interrupt()
- rtc_get_rtc_time()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Bernhard Walle <bwalle@suse.de>
Acked-by: Paul Gortmaker <p_gortmaker@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the following needlessly global code static:
- swap_lock
- nr_swapfiles
- struct swap_list
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the needlessly global print_bad_pte() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the following needlessly global functions static:
- percpu_depopulate()
- __percpu_depopulate_mask()
- percpu_populate()
- __percpu_populate_mask()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds the new function task_current_syscall() on machines where the
asm/syscall.h interface is supported (CONFIG_HAVE_ARCH_TRACEHOOK). It's
exported for modules to use in the future. This function safely samples
the state of a blocked thread to collect what system call it is blocked
in, and the six system call argument registers.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This extends wait_task_inactive() with a new argument so it can be used in
a "soft" mode where it will check for the task changing state unexpectedly
and back off. There is no change to existing callers. This lays the
groundwork to allow robust, noninvasive tracing that can try to sample a
blocked thread but back off safely if it wakes up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds asm-generic/syscall.h, which documents what a real
asm-ARCH/syscall.h file should define. This is not used yet, but will
provide all the machine-dependent details of examining a user system call
about to begin, in progress, or just ended.
Each arch should add an asm-ARCH/syscall.h that defines all the entry
points documented in asm-generic/syscall.h, as short inlines if possible.
This lets us write new tracing code that understands user system call
registers, without any new arch-specific work.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds tracehook.h inlines to enable a new arch feature in support of
user debugging/tracing. This is not used yet, but it lays the groundwork
for a debugger to be able to wrangle a task that's possibly running,
without interrupting its syscalls in progress.
Each arch should define TIF_NOTIFY_RESUME, and in their entry.S code treat
it much like TIF_SIGPENDING. That is, it causes you to take the slow path
when returning to user mode, where you get the full user-mode state
accessible as for signal handling or ptrace. The arch code should check
TIF_NOTIFY_RESUME after handling TIF_SIGPENDING. When it's set, clear it
and then call tracehook_notify_resume().
In future, tracing code will call set_notify_resume() when it wants to get
a callback in tracehook_notify_resume().
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This defines a new hook tracehook_force_sigpending() that lets tracing
code decide to force TIF_SIGPENDING on in recalc_sigpending().
This is not used yet, so it compiles away to nothing for now. It lays the
groundwork for new tracing code that can interrupt a task synthetically
without actually sending a signal.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves the ptrace logic in task death (exit_notify) into tracehook.h
inlines. Some code is rearranged slightly to make things nicer. There is
no change, only cleanup.
There is one hook called with the tasklist_lock write-locked, as ptrace
needs. There is also a new hook called after exit_state changes and
without locks. This is a better place for tracing work to be in the
future, since it doesn't delay the whole system with locking.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This defines the tracehook_notify_jctl() hook to formalize the ptrace
effects on the job control notifications. There is no change, only
cleanup.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This defines the tracehook_get_signal() hook to allow tracing code to slip
in before normal signal dequeuing. This lays the groundwork for new
tracing features that can inject synthetic signals outside the normal
queue or control the disposition of delivered signals. The calling
convention lets tracehook_get_signal() decide both exactly what will
happen and what signal number to report in the handler/exit.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds standard tracehook.h inlines for arch code to call when
TIF_SYSCALL_TRACE has been set. This replaces having each arch implement
the ptrace guts for its syscall tracing support.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This defines tracehook_consider_fatal_signal() has a fine-grained hook for
deciding to skip the special cases for a fatal signal, as ptrace does.
There is no change, only cleanup.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This defines tracehook_consider_ignored_signal() has a fine-grained hook
for deciding to prevent the normal short-circuit of sending an ignored
signal, as ptrace does. There is no change, only cleanup.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This defines tracehook_signal_handler() as a hook for the arch signal
handling code to call. It gives ptrace the opportunity to stop for a
pseudo-single-step trap immediately after signal handler setup is done.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds tracehook_expect_breakpoints() as a formal hook for the nommu
code to use for its, "Is text-poking likely?" check at mmap time. This
names the actual semantics the code means to test, and documents it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds the tracehook_tracer_task() hook to consolidate all forms of
"Who is using ptrace on me?" logic. This is used for "TracerPid:" in
/proc and for permission checks. We also clean up the selinux code the
called an identical accessor.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves the ptrace-related logic from release_task into tracehook.h and
ptrace.h inlines. It provides clean hooks both before and after locking
tasklist_lock, for future tracing logic to do more cleanup without the
lock.
This also changes release_task() itself in the rare "zap_leader" case to
set the leader to EXIT_DEAD before iterating. This maintains the
invariant that release_task() only ever handles a task in EXIT_DEAD. This
is a common-sense invariant that is already always true except in this one
arcane case of zombie leader whose parent ignores SIGCHLD.
This change is harmless and only costs one store in this one rare case.
It keeps the expected state more consisently sane, which is nicer when
debugging weirdness in release_task(). It also lets some future code in
the tracehook entry points rely on this invariant for bookkeeping.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves the PTRACE_EVENT_VFORK_DONE tracing into a tracehook.h inline,
tracehook_report_vfork_done(). The change has no effect, just clean-up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves all the ptrace initialization and tracing logic for task
creation into tracehook.h and ptrace.h inlines. It reorganizes the code
slightly, but should not change any behavior.
There are four tracehook entry points, at each important stage of task
creation. This keeps the interface from the core fork.c code fairly
clean, while supporting the complex setup required for ptrace or something
like it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves the PTRACE_EVENT_EXIT tracing into a tracehook.h inline,
tracehook_report_exec(). The change has no effect, just clean-up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves all the ptrace hooks related to exec into tracehook.h inlines.
This also lifts the calls for tracing out of the binfmt load_binary hooks
into search_binary_handler() after it calls into the binfmt module. This
change has no effect, since all the binfmt modules' load_binary functions
did the call at the end on success, and now search_binary_handler() does
it immediately after return if successful. We consolidate the repeated
code, and binfmt modules no longer need to import ptrace_notify().
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch series introduces the "tracehook" interface layer of inlines in
<linux/tracehook.h>. There are more details in the log entry for patch
01/23 and in the header file comments inside that patch. Most of these
changes move code around with little or no change, and they should not
break anything or change any behavior.
This sets a new standard for uniform arch support to enable clean
arch-independent implementations of new debugging and tracing stuff,
denoted by CONFIG_HAVE_ARCH_TRACEHOOK. Patch 20/23 adds that symbol to
arch/Kconfig, with comments listing everything an arch has to do before
setting "select HAVE_ARCH_TRACEHOOK". These are elaborted a bit at:
http://sourceware.org/systemtap/wiki/utrace/arch/HowTo
The new inlines that arch code must define or call have detailed kerneldoc
comments in the generic header files that say what is required.
No arch is obligated to do any work, and no arch's build should be broken
by these changes. There are several steps that each arch should take so
it can set HAVE_ARCH_TRACEHOOK. Most of these are simple. Providing this
support will let new things people add for doing debugging and tracing of
user-level threads "just work" for your arch in the future. For an arch
that does not provide HAVE_ARCH_TRACEHOOK, some new options for such
features will not be available for config.
I have done some arch work and will submit this to the arch maintainers
after the generic tracehook series settles in. For now, that work is
available in my GIT repositories, and in patch and mbox-of-patches form at
http://people.redhat.com/roland/utrace/2.6-current/
This paves the way for my "utrace" work, to be submitted later. But it is
not innately tied to that. I hope that the tracehook series can go in
soon regardless of what eventually does or doesn't go on top of it. For
anyone implementing any kind of new tracing/debugging plan, or just
understanding all the context of the existing ptrace implementation,
having tracehook.h makes things much easier to find and understand.
This patch:
This adds the new kernel-internal header file <linux/tracehook.h>. This
is not yet used at all. The comments in the header introduce what the
following series of patches is about.
The aim is to formalize and consolidate all the places that the core
kernel code and the arch code now ties into the ptrace implementation.
These patches mostly don't cause any functional change. They just move
the details of ptrace logic out of core code into tracehook.h inlines,
where they are mostly compiled away to the same as before. All that
changes is that everything is thoroughly documented and any future
reworking of ptrace, or addition of something new, would not have to touch
core code all over, just change the tracehook.h inlines.
The new linux/ptrace.h inlines are used by the following patches in the
new tracehook_*() inlines. Using these helpers for the ptrace event stops
makes it simple to change or disable the old ptrace implementation of
these stops conditionally later.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.
Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c
This is flag day, yes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mapping->tree_lock has no read lockers. convert the lock from an rwlock
to a spinlock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we can be sure that elevating the page_count on a pagecache page will
pin it, we can speculatively run this operation, and subsequently check to
see if we hit the right page rather than relying on holding a lock or
otherwise pinning a reference to the page.
This can be done if get_page/put_page behaves consistently throughout the
whole tree (ie. if we "get" the page after it has been used for something
else, we must be able to free it with a put_page).
Actually, there is a period where the count behaves differently: when the
page is free or if it is a constituent page of a compound page. We need
an atomic_inc_not_zero operation to ensure we don't try to grab the page
in either case.
This patch introduces the core locking protocol to the pagecache (ie.
adds page_cache_get_speculative, and tweaks some update-side code to make
it work).
Thanks to Hugh for pointing out an improvement to the algorithm setting
page_count to zero when we have control of all references, in order to
hold off speculative getters.
[kamezawa.hiroyu@jp.fujitsu.com: fix migration_entry_wait()]
[hugh@veritas.com: fix add_to_page_cache]
[akpm@linux-foundation.org: repair a comment]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce gang_lookup_slot() and gang_lookup_slot_tag() functions, which
are used by lockless pagecache.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement get_user_pages_fast without locking in the fastpath on x86.
Do an optimistic lockless pagetable walk, without taking mmap_sem or any
page table locks or even mmap_sem. Page table existence is guaranteed by
turning interrupts off (combined with the fact that we're always looking
up the current mm, means we can do the lockless page table walk within the
constraints of the TLB shootdown design). Basically we can do this
lockless pagetable walk in a similar manner to the way the CPU's pagetable
walker does not have to take any locks to find present ptes.
This patch (combined with the subsequent ones to convert direct IO to use
it) was found to give about 10% performance improvement on a 2 socket 8
core Intel Xeon system running an OLTP workload on DB2 v9.5
"To test the effects of the patch, an OLTP workload was run on an IBM
x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel. Comparing
runs with and without the patch resulted in an overall performance
benefit of ~9.8%. Correspondingly, oprofiles showed that samples from
__up_read and __down_read routines that is seen during thread contention
for system resources was reduced from 2.8% down to .05%. Monitoring the
/proc/vmstat output from the patched run showed that the counter for
fast_gup contained a very high number while the fast_gup_slow value was
zero."
(fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a
counter we had for the number of times the slowpath was invoked).
The main reason for the improvement is that DB2 has multiple threads each
issuing direct-IO. Direct-IO uses get_user_pages, and thus the threads
contend the mmap_sem cacheline, and can also contend on page table locks.
I would anticipate larger performance gains on larger systems, however I
think DB2 uses an adaptive mix of threads and processes, so it could be
that thread contention remains pretty constant as machine size increases.
In which case, we stuck with "only" a 10% gain.
The downside of using get_user_pages_fast is that if there is not a pte
with the correct permissions for the access, we end up falling back to
get_user_pages and so the get_user_pages_fast is a bit of extra work.
However this should not be the common case in most performance critical
code.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: Kconfig fix]
[akpm@linux-foundation.org: Makefile fix/cleanup]
[akpm@linux-foundation.org: warning fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new get_user_pages_fast mm API, which is basically a
get_user_pages with a less general API (but still tends to be suited to
the common case):
- task and mm are always current and current->mm
- force is always 0
- pages is always non-NULL
- don't pass back vmas
This restricted API can be implemented in a much more scalable way on many
architectures when the ptes are present, by walking the page tables
locklessly (no mmap_sem or page table locks). When the ptes are not
populated, get_user_pages_fast() could be slower.
This is implemented locklessly on x86, and used in some key direct IO call
sites, in later patches, which provides nearly 10% performance improvement
on a threaded database workload.
Lots of other code could use this too, depending on use cases (eg. grep
drivers/). And it might inspire some new and clever ways to use it.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement the pte_special bit for x86. This is required to support
lockless get_user_pages, because we need to know whether or not we can
refcount a particular page given only its pte (and no vma).
[hugh@veritas.com: fix a BUG]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allows one to create and use a channel with no associated files. Files
can be initialized later. This is useful in scenarios such as logging in
early code, before VFS is up. Therefore, such channels can be created and
used as soon as kmem_cache_init() completed.
This is needed by kmemtrace to do tracing in early kernel code.
[kosaki.motohiro@jp.fujitsu.com: build fix]
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A previous patch added the early_initcall(), to allow a cleaner hooking of
pre-SMP initcalls. Now we remove the older interface, converting all
existing users to the new one.
[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: build fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Added early initcall (pre-SMP) support, using an identical interface to
that of regular initcalls. Functions called from do_pre_smp_initcalls()
could be converted to use this cleaner interface.
This is required by CPU hotplug, because early users have to register
notifiers before going SMP. One such CPU hotplug user is the relay
interface with buffer-only channels, which needs to register such a
notifier, to be usable in early code. This in turn is used by kmemtrace.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch implements devices state save/restore before after kexec.
This patch together with features in kexec_jump patch can be used for
following:
- A simple hibernation implementation without ACPI support. You can kexec a
hibernating kernel, save the memory image of original system and shutdown
the system. When resuming, you restore the memory image of original system
via ordinary kexec load then jump back.
- Kernel/system debug through making system snapshot. You can make system
snapshot, jump back, do some thing and make another system snapshot.
- Cooperative multi-kernel/system. With kexec jump, you can switch between
several kernels/systems quickly without boot process except the first time.
This appears like swap a whole kernel/system out/in.
- A general method to call program in physical mode (paging turning
off). This can be used to invoke BIOS code under Linux.
The following user-space tools can be used with kexec jump:
- kexec-tools needs to be patched to support kexec jump. The patches
and the precompiled kexec can be download from the following URL:
source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
- makedumpfile with patches are used as memory image saving tool, it
can exclude free pages from original kernel memory image file. The
patches and the precompiled makedumpfile can be download from the
following URL:
source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10
- An initramfs image can be used as the root file system of kexeced
kernel. An initramfs image built with "BuildRoot" can be downloaded
from the following URL:
initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
All user space tools above are included in the initramfs image.
Usage example of simple hibernation:
1. Compile and install patched kernel with following options selected:
CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y
CONFIG_HIBERNATION=y
CONFIG_KEXEC_JUMP=y
2. Build an initramfs image contains kexec-tool and makedumpfile, or
download the pre-built initramfs image, called rootfs.gz in
following text.
3. Prepare a partition to save memory image of original kernel, called
hibernating partition in following text.
4. Boot kernel compiled in step 1 (kernel A).
5. In the kernel A, load kernel compiled in step 1 (kernel B) with
/sbin/kexec. The shell command line can be as follow:
/sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
--mem-max=0xffffff --initrd=rootfs.gz
6. Boot the kernel B with following shell command line:
/sbin/kexec -e
7. The kernel B will boot as normal kexec. In kernel B the memory
image of kernel A can be saved into hibernating partition as
follow:
jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
echo $jump_back_entry > kexec_jump_back_entry
cp /proc/vmcore dump.elf
Then you can shutdown the machine as normal.
8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
root file system.
9. In kernel C, load the memory image of kernel A as follow:
/sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
10. Jump back to the kernel A as follow:
/sbin/kexec -e
Then, kernel A is resumed.
Implementation point:
To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.
Known issues:
- Because the segment number supported by sys_kexec_load is limited,
hibernation image with many segments may not be load. This is
planned to be eliminated by adding a new flag to sys_kexec_load to
make a image can be loaded with multiple sys_kexec_load invoking.
Now, only the i386 architecture is supported.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch provides an enhancement to kexec/kdump. It implements the
following features:
- Backup/restore memory used by the original kernel before/after
kexec.
- Save/restore CPU state before/after kexec.
The features of this patch can be used as a general method to call program in
physical mode (paging turning off). This can be used to call BIOS code under
Linux.
kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:
source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
Usage example of calling some physical mode code and return:
1. Compile and install patched kernel with following options selected:
CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y
2. Build patched kexec-tool or download the pre-built one.
3. Build some physical mode executable named such as "phy_mode"
4. Boot kernel compiled in step 1.
5. Load physical mode executable with /sbin/kexec. The shell command
line can be as follow:
/sbin/kexec --load-preserve-context --args-none phy_mode
6. Call physical mode executable with following shell command line:
/sbin/kexec -e
Implementation point:
To support jumping without reserving memory. One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page). When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped. Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.
C ABI (calling convention) is used as communication protocol between
kernel and called code.
A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.
Now, only the i386 architecture is supported.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In some cases it may be desirable to ensure that associated driver is not
going to access the media in some period of time. "start" and "stop"
methods are provided therefore to allow it.
Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some controllers (Jmicron, for instance) can report temporal failure
condition during power-on. It is desirable to account for this using a
return value of "set_param" device method. The return value can also be
handy to distinguish between supported and unsupported device parameters
in run time.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>