Commit Graph

18607 Commits

Author SHA1 Message Date
Hugh Dickins
e9995ef978 ksm: rmap_walk to remove_migation_ptes
A side-effect of making ksm pages swappable is that they have to be placed
on the LRUs: which then exposes them to isolate_lru_page() and hence to
page migration.

Add rmap_walk() for remove_migration_ptes() to use: rmap_walk_anon() and
rmap_walk_file() in rmap.c, but rmap_walk_ksm() in ksm.c.  Perhaps some
consolidation with existing code is possible, but don't attempt that yet
(try_to_unmap needs to handle nonlinears, but migration pte removal does
not).

rmap_walk() is sadly less general than it appears: rmap_walk_anon(), like
remove_anon_migration_ptes() which it replaces, avoids calling
page_lock_anon_vma(), because that includes a page_mapped() test which
fails when all migration ptes are in place.  That was valid when NUMA page
migration was introduced (holding mmap_sem provided the missing guarantee
that anon_vma's slab had not already been destroyed), but I believe not
valid in the memory hotremove case added since.

For now do the same as before, and consider the best way to fix that
unlikely race later on.  When fixed, we can probably use rmap_walk() on
hwpoisoned ksm pages too: for now, they remain among hwpoison's various
exceptions (its PageKsm test comes before the page is locked, but its
page_lock_anon_vma fails safely if an anon gets upgraded).

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Wright <chrisw@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:20 -08:00
Hugh Dickins
db114b83ab ksm: hold anon_vma in rmap_item
For full functionality, page_referenced_one() and try_to_unmap_one() need
to know the vma: to pass vma down to arch-dependent flushes, or to observe
VM_LOCKED or VM_EXEC.  But KSM keeps no record of vma: nor can it, since
vmas get split and merged without its knowledge.

Instead, note page's anon_vma in its rmap_item when adding to stable tree:
all the vmas which might map that page are listed by its anon_vma.

page_referenced_ksm() and try_to_unmap_ksm() then traverse the anon_vma,
first to find the probable vma, that which matches rmap_item's mm; but if
that is not enough to locate all instances, traverse again to try the
others.  This catches those occasions when fork has duplicated a pte of a
ksm page, but ksmd has not yet come around to assign it an rmap_item.

But each rmap_item in the stable tree which refers to an anon_vma needs to
take a reference to it.  Andrea's anon_vma design cleverly avoided a
reference count (an anon_vma was free when its list of vmas was empty),
but KSM now needs to add that.  Is a 32-bit count sufficient?  I believe
so - the anon_vma is only free when both count is 0 and list is empty.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Wright <chrisw@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:19 -08:00
Hugh Dickins
5ad6468801 ksm: let shared pages be swappable
Initial implementation for swapping out KSM's shared pages: add
page_referenced_ksm() and try_to_unmap_ksm(), which rmap.c calls when
faced with a PageKsm page.

Most of what's needed can be got from the rmap_items listed from the
stable_node of the ksm page, without discovering the actual vma: so in
this patch just fake up a struct vma for page_referenced_one() or
try_to_unmap_one(), then refine that in the next patch.

Add VM_NONLINEAR to ksm_madvise()'s list of exclusions: it has always been
implicit there (being only set with VM_SHARED, already excluded), but
let's make it explicit, to help justify the lack of nonlinear unmap.

Rely on the page lock to protect against concurrent modifications to that
page's node of the stable tree.

The awkward part is not swapout but swapin: do_swap_page() and
page_add_anon_rmap() now have to allow for new possibilities - perhaps a
ksm page still in swapcache, perhaps a swapcache page associated with one
location in one anon_vma now needed for another location or anon_vma.
(And the vma might even be no longer VM_MERGEABLE when that happens.)

ksm_might_need_to_copy() checks for that case, and supplies a duplicate
page when necessary, simply leaving it to a subsequent pass of ksmd to
rediscover the identity and merge them back into one ksm page.
Disappointingly primitive: but the alternative would have to accumulate
unswappable info about the swapped out ksm pages, limiting swappability.

Remove page_add_ksm_rmap(): page_add_anon_rmap() now has to allow for the
particular case it was handling, so just use it instead.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Wright <chrisw@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:19 -08:00
Hugh Dickins
08beca44df ksm: stable_node point to page and back
Add a pointer to the ksm page into struct stable_node, holding a reference
to the page while the node exists.  Put a pointer to the stable_node into
the ksm page's ->mapping.

Then we don't need get_ksm_page() while traversing the stable tree: the
page to compare against is sure to be present and correct, even if it's no
longer visible through any of its existing rmap_items.

And we can handle the forked ksm page case more efficiently: no need to
memcmp our way through the tree to find its match.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:19 -08:00
Hugh Dickins
af8e3354b4 mm: CONFIG_MMU for PG_mlocked
Remove three degrees of obfuscation, left over from when we had
CONFIG_UNEVICTABLE_LRU.  MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT is
CONFIG_HAVE_MLOCK is CONFIG_MMU.  rmap.o (and memory-failure.o) are only
built when CONFIG_MMU, so don't need such conditions at all.

Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK* lines from
169 defconfigs: leave those to evolve in due course.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:17 -08:00
Hugh Dickins
3ca7b3c5b6 mm: define PAGE_MAPPING_FLAGS
At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit set
in page->mapping, with the higher bits a pointer to the anon_vma; and have
defined PageKsm(page) as that with NULL anon_vma.

But KSM swapping will need to store a pointer there: so in preparation for
that, now define PAGE_MAPPING_FLAGS as the low two bits, including
PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
other use for the bit emerges).

Declare page_rmapping(page) to return the pointer part of page->mapping,
and page_anon_vma(page) to return the anon_vma pointer when that's what it
is.  Use these in a few appropriate places: notably, unuse_vma() has been
testing page->mapping, but is better to be testing page_anon_vma() (cases
may be added in which flag bits are set without any pointer).

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:17 -08:00
KOSAKI Motohiro
bb3ab59683 vmscan: stop kswapd waiting on congestion when the min watermark is not being met
If reclaim fails to make sufficient progress, the priority is raised.
Once the priority is higher, kswapd starts waiting on congestion.
However, if the zone is below the min watermark then kswapd needs to
continue working without delay as there is a danger of an increased rate
of GFP_ATOMIC allocation failure.

This patch changes the conditions under which kswapd waits on congestion
by only going to sleep if the min watermarks are being met.

[mel@csn.ul.ie: add stats to track how relevant the logic is]
[mel@csn.ul.ie: make kswapd only check its own zones and rename the relevant counters]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:16 -08:00
Mel Gorman
f50de2d381 vmscan: have kswapd sleep for a short interval and double check it should be asleep
After kswapd balances all zones in a pgdat, it goes to sleep.  In the
event of no IO congestion, kswapd can go to sleep very shortly after the
high watermark was reached.  If there are a constant stream of allocations
from parallel processes, it can mean that kswapd went to sleep too quickly
and the high watermark is not being maintained for sufficient length time.

This patch makes kswapd go to sleep as a two-stage process.  It first
tries to sleep for HZ/10.  If it is woken up by another process or the
high watermark is no longer met, it's considered a premature sleep and
kswapd continues work.  Otherwise it goes fully to sleep.

This adds more counters to distinguish between fast and slow breaches of
watermarks.  A "fast" premature sleep is one where the low watermark was
hit in a very short time after kswapd going to sleep.  A "slow" premature
sleep indicates that the high watermark was breached after a very short
interval.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Frans Pop <elendil@planet.nl>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:16 -08:00
Lee Schermerhorn
d4906e1aa5 swap: rework map_swap_page() again
Seems that page_io.c doesn't really need to know that page_private(page)
is the swp_entry 'val'.  Rework map_swap_page() to do what its name says
and map a page to a page offset in the swap space.

The only other caller of map_swap_page() is internal to mm/swapfile.c and
it does want to map a swap entry to the 'sector'.  So rename
map_swap_page() to map_swap_entry(), make it 'static' and and implement
map_swap_page() as a wrapper around that.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:16 -08:00
Hugh Dickins
7509765a29 swap_info: reorder its fields
Reorder (and comment) the fields of swap_info_struct, to make better
use of its cachelines: it's good for swap_duplicate() in particular
if unsigned int max and swap_map are near the start.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:16 -08:00
Hugh Dickins
aaa468653b swap_info: note SWAP_MAP_SHMEM
While we're fiddling with the swap_map values, let's assign a particular
value to shmem/tmpfs swap pages: their swap counts are never incremented,
and it helps swapoff's try_to_unuse() a little if it can immediately
distinguish those pages from process pages.

Since we've no use for SWAP_MAP_BAD | COUNT_CONTINUED,
we might as well use that 0xbf value for SWAP_MAP_SHMEM.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:16 -08:00
Hugh Dickins
570a335b8e swap_info: swap count continuations
Swap is duplicated (reference count incremented by one) whenever the same
swap page is inserted into another mm (when forking finds a swap entry in
place of a pte, or when reclaim unmaps a pte to insert the swap entry).

swap_info_struct's vmalloc'ed swap_map is the array of these reference
counts: but what happens when the unsigned short (or unsigned char since
the preceding patch) is full? (and its high bit is kept for a cache flag)

We then lose track of it, never freeing, leaving it in use until swapoff:
at which point we _hope_ that a single pass will have found all instances,
assume there are no more, and will lose user data if we're wrong.

Swapping of KSM pages has not yet been enabled; but it is implemented,
and makes it very easy for a user to overflow the maximum swap count:
possible with ordinary process pages, but unlikely, even when pid_max
has been raised from PID_MAX_DEFAULT.

This patch implements swap count continuations: when the count overflows,
a continuation page is allocated and linked to the original vmalloc'ed
map page, and this used to hold the continuation counts for that entry
and its neighbours.  These continuation pages are seldom referenced:
the common paths all work on the original swap_map, only referring to
a continuation page when the low "digit" of a count is incremented or
decremented through SWAP_MAP_MAX.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:15 -08:00
Hugh Dickins
8d69aaee80 swap_info: swap_map of chars not shorts
Halve the vmalloc'ed swap_map array from unsigned shorts to unsigned
chars: it's still very unusual to reach a swap count of 126, and the
next patch allows it to be extended indefinitely.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:15 -08:00
Hugh Dickins
253d553ba7 swap_info: SWAP_HAS_CACHE cleanups
Though swap_count() is useful, I'm finding that swap_has_cache() and
encode_swapmap() obscure what happens in the swap_map entry, just at
those points where I need to understand it.  Remove them, and pass
more usable "usage" values to scan_swap_map(), swap_entry_free() and
__swap_duplicate(), instead of the SWAP_MAP and SWAP_CACHE enum.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:15 -08:00
Hugh Dickins
9625a5f289 swap_info: include first_swap_extent
Make better use of the space by folding first swap_extent into its
swap_info_struct, instead of just the list_head: swap partitions need
only that one, and for others it's used as a circular list anyway.

[jirislaby@gmail.com: fix crash on double swapon]
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:15 -08:00
Hugh Dickins
efa90a981b swap_info: change to array of pointers
The swap_info_struct is only 76 or 104 bytes, but it does seem wrong
to reserve an array of about 30 of them in bss, when most people will
want only one.  Change swap_info[] to an array of pointers.

That does need a "type" field in the structure: pack it as a char with
next type and short prio (aha, char is unsigned by default on PowerPC).
Use the (admittedly peculiar) name "type" throughout for this index.

/proc/swaps does not take swap_lock: I wouldn't want it to, but do take
care with barriers when adding a new item to the array (never removed).

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:15 -08:00
Hugh Dickins
f29ad6a99b swap_info: private to swapfile.c
The swap_info_struct is mostly private to mm/swapfile.c, with only
one other in-tree user: get_swap_bio().  Adjust its interface to
map_swap_page(), so that we can then remove get_swap_info_struct().

But there is a popular user out-of-tree, TuxOnIce: so leave the
declaration of swap_info_struct in linux/swap.h.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Nigel Cunningham <ncunningham@crca.org.au>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:13 -08:00
David Rientjes
bad44b5be8 mm: add gfp flags for NODEMASK_ALLOC slab allocations
Objects passed to NODEMASK_ALLOC() are relatively small in size and are
backed by slab caches that are not of large order, traditionally never
greater than PAGE_ALLOC_COSTLY_ORDER.

Thus, using GFP_KERNEL for these allocations on large machines when
CONFIG_NODES_SHIFT > 8 will cause the page allocator to loop endlessly in
the allocation attempt, each time invoking both direct reclaim or the oom
killer.

This is of particular interest when using NODEMASK_ALLOC() from a
mempolicy context (either directly in mm/mempolicy.c or the mempolicy
constrained hugetlb allocations) since the oom killer always kills current
when allocations are constrained by mempolicies.  So for all present use
cases in the kernel, current would end up being oom killed when direct
reclaim fails.  That would allow the NODEMASK_ALLOC() to succeed but
current would have sacrificed itself upon returning.

This patch adds gfp flags to NODEMASK_ALLOC() to pass to kmalloc() on
CONFIG_NODES_SHIFT > 8; this parameter is a nop on other configurations.
All current use cases either directly from hugetlb code or indirectly via
NODEMASK_SCRATCH() union __GFP_NORETRY to avoid direct reclaim and the oom
killer when the slab allocator needs to allocate additional pages.

The side-effect of this change is that all current use cases of either
NODEMASK_ALLOC() or NODEMASK_SCRATCH() need appropriate -ENOMEM handling
when the allocation fails (never for CONFIG_NODES_SHIFT <= 8).  All
current use cases were audited and do have appropriate error handling at
this time.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:13 -08:00
Lee Schermerhorn
39da08cb07 hugetlb: offload per node attribute registrations
Offload the registration and unregistration of per node hstate sysfs
attributes to a worker thread rather than attempt the
allocation/attachment or detachment/freeing of the attributes in the
context of the memory hotplug handler.

I don't know that this is absolutely required, but the registration can
sleep in allocations and other mem hot plug handlers do it this way.  If
it turns out this is NOT required, we can drop this patch.

N.B.,  Only tested build, boot, libhugetlbfs regression.
       i.e., no memory hotplug testing.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:13 -08:00
David Rientjes
8fe23e0571 mm: clear node in N_HIGH_MEMORY and stop kswapd when all memory is offlined
When memory is hot-removed, its node must be cleared in N_HIGH_MEMORY if
there are no present pages left.

In such a situation, kswapd must also be stopped since it has nothing left
to do.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:13 -08:00
Lee Schermerhorn
9a30523066 hugetlb: add per node hstate attributes
Add the per huge page size control/query attributes to the per node
sysdevs:

/sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/
	nr_hugepages       - r/w
	free_huge_pages    - r/o
	surplus_huge_pages - r/o

The patch attempts to re-use/share as much of the existing global hstate
attribute initialization and handling, and the "nodes_allowed" constraint
processing as possible.

Calling set_max_huge_pages() with no node indicates a change to global
hstate parameters.  In this case, any non-default task mempolicy will be
used to generate the nodes_allowed mask.  A valid node id indicates an
update to that node's hstate parameters, and the count argument specifies
the target count for the specified node.  From this info, we compute the
target global count for the hstate and construct a nodes_allowed node mask
contain only the specified node.

Setting the node specific nr_hugepages via the per node attribute
effectively ignores any task mempolicy or cpuset constraints.

With this patch:

(me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB
./  ../  free_hugepages  nr_hugepages  surplus_hugepages

Starting from:
Node 0 HugePages_Total:     0
Node 0 HugePages_Free:      0
Node 0 HugePages_Surp:      0
Node 1 HugePages_Total:     0
Node 1 HugePages_Free:      0
Node 1 HugePages_Surp:      0
Node 2 HugePages_Total:     0
Node 2 HugePages_Free:      0
Node 2 HugePages_Surp:      0
Node 3 HugePages_Total:     0
Node 3 HugePages_Free:      0
Node 3 HugePages_Surp:      0
vm.nr_hugepages = 0

Allocate 16 persistent huge pages on node 2:
(me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages

[Note that this is equivalent to:
	numactl -m 2 hugeadmin --pool-pages-min 2M:+16
]

Yields:
Node 0 HugePages_Total:     0
Node 0 HugePages_Free:      0
Node 0 HugePages_Surp:      0
Node 1 HugePages_Total:     0
Node 1 HugePages_Free:      0
Node 1 HugePages_Surp:      0
Node 2 HugePages_Total:    16
Node 2 HugePages_Free:     16
Node 2 HugePages_Surp:      0
Node 3 HugePages_Total:     0
Node 3 HugePages_Free:      0
Node 3 HugePages_Surp:      0
vm.nr_hugepages = 16

Global controls work as expected--reduce pool to 8 persistent huge pages:
(me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

Node 0 HugePages_Total:     0
Node 0 HugePages_Free:      0
Node 0 HugePages_Surp:      0
Node 1 HugePages_Total:     0
Node 1 HugePages_Free:      0
Node 1 HugePages_Surp:      0
Node 2 HugePages_Total:     8
Node 2 HugePages_Free:      8
Node 2 HugePages_Surp:      0
Node 3 HugePages_Total:     0
Node 3 HugePages_Free:      0
Node 3 HugePages_Surp:      0

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:12 -08:00
Lee Schermerhorn
4e25b2576e hugetlb: add generic definition of NUMA_NO_NODE
Move definition of NUMA_NO_NODE from ia64 and x86_64 arch specific headers
to generic header 'linux/numa.h' for use in generic code.  NUMA_NO_NODE
replaces bare '-1' where it's used in this series to indicate "no node id
specified".  Ultimately, it can be used to replace the -1 elsewhere where
it is used similarly.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:12 -08:00
Lee Schermerhorn
06808b0827 hugetlb: derive huge pages nodes allowed from task mempolicy
This patch derives a "nodes_allowed" node mask from the numa mempolicy of
the task modifying the number of persistent huge pages to control the
allocation, freeing and adjusting of surplus huge pages when the pool page
count is modified via the new sysctl or sysfs attribute
"nr_hugepages_mempolicy".  The nodes_allowed mask is derived as follows:

* For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
  is produced.  This will cause the hugetlb subsystem to use
  node_online_map as the "nodes_allowed".  This preserves the
  behavior before this patch.
* For "preferred" mempolicy, including explicit local allocation,
  a nodemask with the single preferred node will be produced.
  "local" policy will NOT track any internode migrations of the
  task adjusting nr_hugepages.
* For "bind" and "interleave" policy, the mempolicy's nodemask
  will be used.
* Other than to inform the construction of the nodes_allowed node
  mask, the actual mempolicy mode is ignored.  That is, all modes
  behave like interleave over the resulting nodes_allowed mask
  with no "fallback".

See the updated documentation [next patch] for more information
about the implications of this patch.

Examples:

Starting with:

	Node 0 HugePages_Total:     0
	Node 1 HugePages_Total:     0
	Node 2 HugePages_Total:     0
	Node 3 HugePages_Total:     0

Default behavior [with or without this patch] balances persistent
hugepage allocation across nodes [with sufficient contiguous memory]:

	sysctl vm.nr_hugepages[_mempolicy]=32

yields:

	Node 0 HugePages_Total:     8
	Node 1 HugePages_Total:     8
	Node 2 HugePages_Total:     8
	Node 3 HugePages_Total:     8

Of course, we only have nr_hugepages_mempolicy with the patch,
but with default mempolicy, nr_hugepages_mempolicy behaves the
same as nr_hugepages.

Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
'--membind' because it allows multiple nodes to be specified
and it's easy to type]--we can allocate huge pages on
individual nodes or sets of nodes.  So, starting from the
condition above, with 8 huge pages per node, add 8 more to
node 2 using:

	numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40

This yields:

	Node 0 HugePages_Total:     8
	Node 1 HugePages_Total:     8
	Node 2 HugePages_Total:    16
	Node 3 HugePages_Total:     8

The incremental 8 huge pages were restricted to node 2 by the
specified mempolicy.

Similarly, we can use mempolicy to free persistent huge pages
from specified nodes:

	numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32

yields:

	Node 0 HugePages_Total:     4
	Node 1 HugePages_Total:     4
	Node 2 HugePages_Total:    16
	Node 3 HugePages_Total:     8

The 8 huge pages freed were balanced over nodes 0 and 1.

[rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:12 -08:00
Lee Schermerhorn
c1e6c8d074 hugetlb: factor init_nodemask_of_node()
Factor init_nodemask_of_node() out of the nodemask_of_node() macro.

This will be used to populate the huge pages "nodes_allowed" nodemask for
a single node when basing nodes_allowed on a preferred/local mempolicy or
when a persistent huge page pool page count is modified via a per node
sysfs attribute.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:12 -08:00
David Rientjes
4e7b8a6cef nodemask: make NODEMASK_ALLOC more general
This is a series of patches to provide control over the location of the
allocation and freeing of persistent huge pages on a NUMA platform.
Please consider for merging into mmotm.

This series uses two mechanisms to constrain the nodes from which
persistent huge pages are allocated: 1) the task NUMA mempolicy of the
task modifying a new sysctl "nr_hugepages_mempolicy", based on a
suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs
attributes have been added [in V4] to each node system device under:

	/sys/devices/node/node[0-9]*/hugepages

The per node attibutes allow direct assignment of a huge page count on a
specific node, regardless of the task's mempolicy or cpuset constraints.

This patch:

NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary.
It's perfectly reasonable to use this macro to allocate a nodemask_t,
which is anonymous, either dynamically or on the stack depending on
NODES_SHIFT.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:12 -08:00
Thomas Gleixner
e625cce1b7 perf_event: Convert to raw_spinlock
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:34 +01:00
Thomas Gleixner
ecb49d1a63 hrtimers: Convert to raw_spinlocks
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:34 +01:00
Thomas Gleixner
239007b844 genirq: Convert irq_desc.lock to raw_spinlock
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:33 +01:00
Thomas Gleixner
d209d74d52 rtmutes: Convert rtmutex.lock to raw_spinlock
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:33 +01:00
Thomas Gleixner
1d61548254 sched: Convert pi_lock to raw_spinlock
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:33 +01:00
Thomas Gleixner
a26724591e plist: Make plist debugging raw_spinlock aware
plists are used with spinlocks and raw_spinlocks. Change the plist
debugging to handle both types.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:33 +01:00
Thomas Gleixner
9c1721aa49 locking: Cleanup the name space completely
Make the name space hierarchy of locking functions consistent:
     raw_spin* -> _raw_spin* -> __raw_spin*

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:33 +01:00
Thomas Gleixner
9828ea9d75 locking: Further name space cleanups
The name space hierarchy for the internal lock functions is now a bit
backwards. raw_spin* functions map to _spin* which use __spin*, while
we would like to have _raw_spin* and __raw_spin*.

_raw_spin* is already used by lock debugging, so rename those funtions
to do_raw_spin* to free up the _raw_spin* name space.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:33 +01:00
Thomas Gleixner
c2f21ce2e3 locking: Implement new raw_spinlock
Now that the raw_spin name space is freed up, we can implement
raw_spinlock and the related functions which are used to annotate the
locks which are not converted to sleeping spinlocks in preempt-rt.

A side effect is that only such locks can be used with the low level
lock fsunctions which circumvent lockdep.

For !rt spin_* functions are mapped to the raw_spin* implementations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:32 +01:00
Thomas Gleixner
e5931943d0 locking: Convert raw_rwlock functions to arch_rwlock
Name space cleanup for rwlock functions. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner
fb3a6bbc91 locking: Convert raw_rwlock to arch_rwlock
Not strictly necessary for -rt as -rt does not have non sleeping
rwlocks, but it's odd to not have a consistent naming convention.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner
0199c4e68d locking: Convert __raw_spin* functions to arch_spin*
Name space cleanup. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner
edc35bd72e locking: Rename __RAW_SPIN_LOCK_UNLOCKED to __ARCH_SPIN_LOCK_UNLOCKED
Further name space cleanup. No functional change

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner
445c89514b locking: Convert raw_spinlock to arch_spinlock
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner
6b6b4792f8 locking: Separate rwlock api from spinlock api
Move the rwlock smp api defines and functions into a separate header
file. Makes the -rt selection simpler and less intrusive.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:32 +01:00
Thomas Gleixner
ef12f10994 locking: Split rwlock from spinlock headers
Move the rwlock defines and inlines into separate header files. This
makes the selection for -rt easier.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:32 +01:00
Linus Torvalds
5443040754 Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  i2c-core: i2c bus should support PM entries in struct dev_pm_ops
  i2c: Get rid of I2C_CLIENT_MODULE_PARM
  i2c: Drop I2C_CLIENT_INSMOD_2 to 8
  i2c: Drop I2C_CLIENT_INSMOD_1
  i2c: Get rid of struct i2c_client_address_data
  i2c: Drop the kind parameter from detect callbacks
2009-12-14 14:11:56 -08:00
Jean Delvare
7f508118b1 i2c: Get rid of I2C_CLIENT_MODULE_PARM
There is no user left of I2C_CLIENT_MODULE_PARM, so we can finally
get rid of this ugly macro.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
2009-12-14 21:17:29 +01:00
Jean Delvare
e5e9f44c24 i2c: Drop I2C_CLIENT_INSMOD_2 to 8
These macros simply declare an enum, so drivers might as well declare
it themselves. This puts an end to the arbitrary limit of 8 chip types
per i2c driver.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
2009-12-14 21:17:27 +01:00
Jean Delvare
1f86df49dd i2c: Drop I2C_CLIENT_INSMOD_1
This macro simply declares an enum, so drivers might as well declare
it themselves.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
2009-12-14 21:17:26 +01:00
Jean Delvare
c3813d6af1 i2c: Get rid of struct i2c_client_address_data
Struct i2c_client_address_data only contains one field at this point,
which makes its usefulness questionable. Get rid of it and pass simple
address lists around instead.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
2009-12-14 21:17:25 +01:00
Jean Delvare
310ec79210 i2c: Drop the kind parameter from detect callbacks
The "kind" parameter always has value -1, and nobody is using it any
longer, so we can remove it.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
2009-12-14 21:17:23 +01:00
Linus Torvalds
478e4e9d7a Merge branch 'next-spi' of git://git.secretlab.ca/git/linux-2.6
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6: (23 commits)
  spi: fix probe/remove section markings
  Add OMAP spi100k driver
  spi-imx: don't access struct device directly but use dev_get_platdata
  spi-imx: Add mx25 support
  spi-imx: use positive logic to distinguish cpu variants
  spi-imx: correct check for platform_get_irq failing
  ARM: NUC900: Add spi driver support for nuc900
  spi: SuperH MSIOF SPI Master driver V2
  spi: fix spidev compilation failure when VERBOSE is defined
  spi/au1550_spi: fix setupxfer not to override cfg with zeros
  spi/mpc8xxx: don't use __exit_p to wrap plat_mpc8xxx_spi_remove
  spi/i.MX: fix broken error handling for gpio_request
  spi/i.mx: drain MXC SPI transfer buffer when probing device
  MAINTAINERS: add SPI co-maintainer.
  spi/xilinx_spi: fix incorrect casting
  spi/mpc52xx-spi: minor cleanups
  xilinx_spi: add a platform driver using the xilinx_spi common module.
  xilinx_spi: add support for the DS570 IP.
  xilinx_spi: Switch to iomem functions and support little endian.
  xilinx_spi: Split into of driver and generic part.
  ...
2009-12-14 10:22:11 -08:00
Linus Torvalds
2205afa7d1 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf sched: Fix build failure on sparc
  perf bench: Add "all" pseudo subsystem and "all" pseudo suite
  perf tools: Introduce perf_session class
  perf symbols: Ditch dso->find_symbol
  perf symbols: Allow lookups by symbol name too
  perf symbols: Add missing "Variables" entry to map_type__name
  perf symbols: Add support for 'variable' symtabs
  perf symbols: Introduce ELF counterparts to symbol_type__is_a
  perf symbols: Introduce symbol_type__is_a
  perf symbols: Rename kthreads to kmaps, using another abstraction for it
  perf tools: Allow building for ARM
  hw-breakpoints: Handle bad modify_user_hw_breakpoint off-case return value
  perf tools: Allow cross compiling
  tracing, slab: Fix no callsite ifndef CONFIG_KMEMTRACE
  tracing, slab: Define kmem_cache_alloc_notrace ifdef CONFIG_TRACING

Trivial conflict due to different fixes to modify_user_hw_breakpoint()
in include/linux/hw_breakpoint.h
2009-12-14 10:13:22 -08:00
David Howells
491424c0f4 PCI: Global variable decls must match the defs in section attributes
Global variable declarations must match the definitions in section attributes
as the compiler is at liberty to vary the method it uses to access a variable,
depending on the section it is in.

When building the FRV arch, I now see:

  drivers/built-in.o: In function `pci_apply_final_quirks':
  drivers/pci/quirks.c:2606: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o
  drivers/pci/quirks.c:2623: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o
  drivers/pci/quirks.c:2630: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o

because the declaration of pci_dfl_cache_line_size in linux/pci.h does not
match the definition in drivers/pci/pci.c.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-14 10:11:34 -08:00
David Howells
5185fb0699 FRV: Fix no-hardware-breakpoint case
If there is no hardware breakpoint support, modify_user_hw_breakpoint()
tries to return a NULL pointer through as an 'int' return value:

  In file included from kernel/exit.c:53:
  include/linux/hw_breakpoint.h: In function 'modify_user_hw_breakpoint':
  include/linux/hw_breakpoint.h:96: warning: return makes integer from pointer without a cast

Return 0 instead.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-14 10:10:55 -08:00
Linus Torvalds
37222e1c9e Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (27 commits)
  md: add 'recovery_start' per-device sysfs attribute
  md: rcu_read_lock() walk of mddev->disks in md_do_sync()
  md: integrate spares into array at earliest opportunity.
  md: move compat_ioctl handling into md.c
  md: revise Kconfig help for MD_MULTIPATH
  md: add MODULE_DESCRIPTION for all md related modules.
  raid: improve MD/raid10 handling of correctable read errors.
  md/raid10: print more useful messages on device failure.
  md/bitmap: update dirty flag when bitmap bits are explicitly set.
  md: Support write-intent bitmaps with externally managed metadata.
  md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb
  md: support updating bitmap parameters via sysfs.
  md: factor out parsing of fixed-point numbers
  md: support bitmap offset appropriate for external-metadata arrays.
  md: remove needless setting of thread->timeout in raid10_quiesce
  md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.
  md: move offset, daemon_sleep and chunksize out of bitmap structure
  md: collect bitmap-specific fields into one structure.
  md/raid1: add takeover support for raid5->raid1
  md: add honouring of suspend_{lo,hi} to raid1.
  ...
2009-12-14 10:03:36 -08:00
Linus Torvalds
76b8f82cde Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (58 commits)
  mfd: Add twl6030 regulator subdevices
  regulator: Add support for twl6030 regulators
  rtc: Add twl6030 RTC support
  mfd: Add support for twl6030 irq framework
  mfd: Rename twl4030_ routines in twl-regulator.c
  mfd: Rename twl4030_ routines in rtc-twl.c
  mfd: Rename all twl4030_i2c*
  mfd: Rename twl4030* driver files to enable re-use
  mfd: Clarify twl4030 return value for read and write
  mfd: Add all twl4030 regulators to the twl4030 mfd driver
  mfd: Don't set mc13783 ADREFMODE for touch conversions
  mfd: Remove ezx-pcap defines for custom led gpio encoding
  mfd: Near complete mc13783 rewrite
  mfd: Remove build time warning for WM835x register default tables
  mfd: Force I2C to be built in when building WM831x
  mfd: Don't allow wm831x to be built as a module
  mfd: Fix incorrect error check for wm8350-core
  mfd: Fix twl4030 warning
  gpiolib: Implement gpio_to_irq() for wm831x
  mfd: Remove default selection of AB4500
  ...
2009-12-14 10:02:35 -08:00
Linus Torvalds
3f86ce72cf Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (75 commits)
  NFS: Fix nfs_migrate_page()
  rpc: remove unneeded function parameter in gss_add_msg()
  nfs41: Invoke RECLAIM_COMPLETE on all new client ids
  SUNRPC: IS_ERR/PTR_ERR confusion
  NFSv41: Fix a potential state leakage when restarting nfs4_close_prepare
  nfs41: Handle NFSv4.1 session errors in the delegation recall code
  nfs41: Retry delegation return if it failed with session error
  nfs41: Handle session errors during delegation return
  nfs41: Mark stateids in need of reclaim if state manager gets stale clientid
  NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configured
  nfs41: Don't clear DRAINING flag on NFS4ERR_STALE_CLIENTID
  nfs41: nfs41_setup_state_renewal
  NFSv41: More cleanups
  NFSv41: Fix up some bugs in the NFS4CLNT_SESSION_DRAINING code
  NFSv41: Clean up slot table management
  NFSv41: Fix nfs4_proc_create_session
  nfs41: Invoke RECLAIM_COMPLETE
  nfs41: RECLAIM_COMPLETE functionality
  nfs41: RECLAIM_COMPLETE XDR functionality
  Cleanup some NFSv4 XDR decode comments
  ...
2009-12-14 10:00:24 -08:00
Linus Torvalds
d0316554d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
  m68k: rename global variable vmalloc_end to m68k_vmalloc_end
  percpu: add missing per_cpu_ptr_to_phys() definition for UP
  percpu: Fix kdump failure if booted with percpu_alloc=page
  percpu: make misc percpu symbols unique
  percpu: make percpu symbols in ia64 unique
  percpu: make percpu symbols in powerpc unique
  percpu: make percpu symbols in x86 unique
  percpu: make percpu symbols in xen unique
  percpu: make percpu symbols in cpufreq unique
  percpu: make percpu symbols in oprofile unique
  percpu: make percpu symbols in tracer unique
  percpu: make percpu symbols under kernel/ and mm/ unique
  percpu: remove some sparse warnings
  percpu: make alloc_percpu() handle array types
  vmalloc: fix use of non-existent percpu variable in put_cpu_var()
  this_cpu: Use this_cpu_xx in trace_functions_graph.c
  this_cpu: Use this_cpu_xx for ftrace
  this_cpu: Use this_cpu_xx in nmi handling
  this_cpu: Use this_cpu operations in RCU
  this_cpu: Use this_cpu ops for VM statistics
  ...

Fix up trivial (famous last words) global per-cpu naming conflicts in
	arch/x86/kvm/svm.c
	mm/slab.c
2009-12-14 09:58:24 -08:00
NeilBrown
7820f9e1dd md: remove sparse warning:symbol XXX was not declared.
Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:49:47 +11:00
Rajendra Nayak
9da6653928 mfd: Add twl6030 regulator subdevices
This patch adds initial support for creating twl6030 PMIC
specific voltage regulators in the twl mfd driver.

Board specific regulator configurations will have to be passed from
respective board files.

Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Balaji T K <balajitk@ti.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-14 00:26:26 +01:00
Rajendra Nayak
441a450554 regulator: Add support for twl6030 regulators
This patch updates the regulator driver to add support
for TWL6030 PMIC specific LDO regulators.
SMPS resources are not yet supported for TWL6030 and
also .set_mode and .get_status for LDO's are yet to
be implemented for TWL6030.

Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Balaji T K <balajitk@ti.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-14 00:26:20 +01:00
Balaji T K
e8deb28ca8 mfd: Add support for twl6030 irq framework
This patch adds support for phoenix interrupt framework. New iInterrupt
status register A, B, C are introduced in Phoenix and are cleared on write.
Due to the differences in interrupt handling with respect to TWL4030,
twl6030-irq.c is created for TWL6030 PMIC

Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Balaji T K <balajitk@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-14 00:25:31 +01:00
Balaji T K
fc7b92fca4 mfd: Rename all twl4030_i2c*
This patch renames function names like twl4030_i2c_write_u8,
twl4030_i2c_read_u8 to twl_i2c_write_u8, twl_i2c_read_u8
and also common variable in twl-core.c

Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Balaji T K <balajitk@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 21:23:33 +01:00
Santosh Shilimkar
b07682b605 mfd: Rename twl4030* driver files to enable re-use
The upcoming TWL6030 is companion chip for OMAP4 like the current TWL4030
for OMAP3. The common modules like RTC, Regulator creates opportunity
to re-use the most of the code from twl4030.

This patch renames few common drivers twl4030* files to twl* to enable
the code re-use.

Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Balaji T K <balajitk@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 20:05:51 +01:00
Trond Myklebust
52c9948b1f Merge branch 'nfs-for-2.6.33' 2009-12-13 13:56:27 -05:00
Juha Keski-Saari
ab4abe056d mfd: Add all twl4030 regulators to the twl4030 mfd driver
Add all twl4030 regulators to the twl4030 mfd driver and
twl4030_platform_data

Signed-off-by: Juha Keski-Saari <ext-juha.1.keski-saari@nokia.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:59 +01:00
Antonio Ospite
ed89a7551c mfd: Remove ezx-pcap defines for custom led gpio encoding
We used these, in a first version of leds-pcap driver, in order to encode gpio
enabling and gpio inversion for a led inside the variable used for the gpio
number. In the new leds-pcap driver we rely on gpio_is_valid() to derive if a
led is gpio enabled and we have a dedicated flag to tell if the gpio value has
to be inverted.

Signed-off-by: Antonio Ospite <ospite@studenti.unina.it>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:56 +01:00
Uwe Kleine-König
9e2726776d mfd: Near complete mc13783 rewrite
This fixes several things while still providing the old API:

 - simplify and fix locking
 - better error handling
 - don't ack all irqs making it impossible to detect a reset of the
   rtc
 - use a timeout variant to wait for completion of ADC conversion
 - provide platform-data to regulator subdevice (This allows making
   struct mc13783 opaque for other drivers after the regulator driver is
   updated to use its platform_data.)
 - expose all interrupts
 - use threaded irq

After all users in mainline are converted to the new API, some things
(e.g. mc13783-private.h) can go away.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:54 +01:00
Mark Brown
5fb4d38b19 mfd: Move WM831x to generic IRQ
Replace the wm831x-local IRQ infrastructure with genirq, allowing access
to the diagnostic infrastructure of genirq and allowing us to implement
interrupt support for the GPIOs.  The switchover is done within the
wm831x specific IRQ API, further patches will convert the individual
drivers to use genirq directly.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:41 +01:00
Ilkka Koskinen
1920a61e20 mfd: Initial support for twl5031
TWL5031 introduces two new interrupts in PIH. Moreover, BCI
has changed remarkably and, thus, it's disabled when TWL5031
is in use.

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:41 +01:00
Mark Brown
5a65edbc12 mfd: Convert wm8350 IRQ handlers to irq_handler_t
This is done as simple code transformation, the semantics of the
IRQ API provided by the core are are still very different to those
of genirq (mainly with regard to masking).

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:39 +01:00
Ben Dooks
b45440c33a mfd: Allow configuration of VDCDC2 for tps65010
Add function to allow the configuation fo the VDCDC2 register by
external users, to allow changing of the standard and low-power
running modes.

This is needed, for example, for the Simtec IM2440D20 where we need
to use the low-power mode to shutdown the LDO/DCDC that are not needed
during suspend (saving substantial power) and the runtime use of the
low-power mode to change VCore.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Simtec Linux Team <linux@simtec.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:36 +01:00
Ilkka Koskinen
38a684963f mfd: Enable twl4030 32kHz oscillator low-power mode
Allows TWL's 32kHz oscillator to go in low-power mode when
main battery voltage is running low.

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:33 +01:00
Mark Brown
75b75722b4 mfd: Allow platforms to specify an IRQ base for WM8350
This is currently unused by the wm8350 drivers but getting it merged
now will reduce merge issues in the future when implementing wm8350
genirq support.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:31 +01:00
Aaro Koskinen
56baa66797 mfd: fix undefined twl4030-power resconfig value checks
The code tries to skip values initialized with -1, but since the values
are unsigned the comparison is always true.

The patch eliminates the following compiler warnings:

drivers/mfd/twl4030-power.c: In function 'twl4030_configure_resource':
drivers/mfd/twl4030-power.c:338: warning: comparison is always true due to
limited range of data type
drivers/mfd/twl4030-power.c:358: warning: comparison is always true due to
limited range of data type
drivers/mfd/twl4030-power.c:363: warning: comparison is always true due to
limited range of data type

Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:25 +01:00
Amit Kucheria
b4ead61e57 mfd: Add support for remapping twl4030-power power states
The <RESOURCE>_REMAP register allows configuration of the <RESOURCE> in case
of a sleep or off transition.

Allow this property of resources to be configured (through twl4030_resconfig)
and add code to parse these values to program the registers accordingly.

Signed-off-by: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: linux-omap@vger.kernel.org
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:24 +01:00
Lars-Peter Clausen
68d641efd8 mfd: Fix memleak in pcf50633_client_dev_register
Since platform_device_add_data copies the passed data, the allocated
subdev_pdata is never freed. A simple fix would be to either free subdev_pdata
or put it onto the stack. But since the pcf50633 child devices can rely on
beeing children of the pcf50633 core device it's much more elegant to get access
to pcf50633 core structure through that link. This allows to get completly rid
of pcf5033_subdev_pdata.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:17 +01:00
Mark Brown
0c7229f93a mfd: Convert WM835x IRQ handling to use a data table
Rather than open coding individual IRQs in each function which
manipulates them store data for IRQs in a table which is then
referenced in the users.

This is a substantial code shrink and should be a performance win in
cases where only a single IRQ goes off at once since instead of
reading four of the second level IRQ registers for each interrupt
we read only the sub-registers which have had an interrupt flagged.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:21:02 +01:00
Mark Brown
e0a3389ab9 mfd: Split wm8350 IRQ code into a separate file
In preparation for refactoring - it's over 700 lines of well-isolated
code and having it in a file by itself makes things more managable.

While we're at it make sure that we clean up the IRQ if we fail after
acquiring it on init.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:20:55 +01:00
Michael Hennerich
a5736e0b62 mfd: Add ADP5520/ADP5501 driver
Base driver for Analog Devices ADP5520/ADP5501 MFD PMICs

Subdevs:
LCD Backlight   : drivers/video/backlight/adp5520_bl.c
LEDs            : drivers/led/leds-adp5520.c
GPIO            : drivers/gpio/adp5520-gpio.c (ADP5520 only)
Keys            : drivers/input/keyboard/adp5520-keys.c (ADP5520 only)

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:20:53 +01:00
Mark Brown
d4e0a89e3d mfd: Add support for WM8320 PMICs
The WM8320 is an integrated power management subsystem providing
voltage regulators, RTC, watchdog and other functionality. The
WM8320 is derived from the WM831x and therefore shares most of
the driver code with the WM831x.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:20:44 +01:00
Mark Brown
6f2ecaae72 gpiolib: Make WM831x GPIO count dynamic
This supports future devices with fewer GPIOs.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:20:43 +01:00
Srinidhi Kasagar
0c41839e98 mfd: add AB4500 driver
This adds core driver support for AB4500 mixed signal
multimedia & power management chip. This connects to U8500
on the SSP (pl022) and exports read/write functions for
the device to get access to this chip. This also registers
the client devices and sets the parent.

Signed-off-by: srinidhi kasagar <srinidhi.kasagar@stericsson.com>
Acked-by: Andrea Gallo <andrea.gallo@stericsson.com>
Reviewed-by: Mark Brown <broonie@sirena.org.uk>
Reviewed-by: Jean-Christophe <plagnioj@jcrosoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:20:38 +01:00
Haojian Zhuang
4107da2a28 mfd: Add 88PM8607 driver
This adds a core driver for 88PM8607 found in Marvell DKB development
platform. This driver is a proxy for all accesses to 88PM8607
sub-drivers which will be merged on top of this one, RTC, regulators,
battery and so on.

This chip is manufactured by Marvell.

Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-13 19:20:37 +01:00
Magnus Damm
8051effcbc spi: SuperH MSIOF SPI Master driver V2
This patch is V2 of SPI Master support for the SuperH MSIOF.
Full duplex, spi mode 0-3, active high cs, 3-wire and lsb
first should all be supported, but the driver has so far
only been tested with "mmc_spi".

The MSIOF hardware comes with 32-bit FIFOs for receive and
transmit, and this driver simply breaks the SPI messages
into FIFO-sized chunks. The MSIOF hardware manages the pins
for clock, receive and transmit (sck/miso/mosi), but the chip
select pin is managed by software and must be configured as
a regular GPIO pin by the board code.

Performance wise there is still room for improvement, but
on a Ecovec board with the built-in sh7724 MSIOF0 this driver
gets Mini-sd read speeds of about half a megabyte per second.

Future work include better clock setup and merging of 8-bit
transfers into 32-bit words to reduce interrupt load and
improve throughput.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-12-13 00:48:27 -07:00
Linus Torvalds
09cea96caa Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits)
  powerpc: Fix usage of 64-bit instruction in 32-bit altivec code
  MAINTAINERS: Add PowerPC patterns
  powerpc/pseries: Track previous CPPR values to correctly EOI interrupts
  powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP
  powerpc: Make "intspec" pointers in irq_host->xlate() const
  powerpc/8xx: DTLB Miss cleanup
  powerpc/8xx: Remove DIRTY pte handling in DTLB Error.
  powerpc/8xx: Start using dcbX instructions in various copy routines
  powerpc/8xx: Restore _PAGE_WRITETHRU
  powerpc/8xx: Add missing Guarded setting in DTLB Error.
  powerpc/8xx: Fixup DAR from buggy dcbX instructions.
  powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions.
  powerpc/8xx: Update TLB asm so it behaves as linux mm expects.
  powerpc/8xx: Invalidate non present TLBs
  powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
  pseries/pseries: Add code to online/offline CPUs of a DLPAR node
  powerpc: stop_this_cpu: remove the cpu from the online map.
  powerpc/pseries: Add kernel based CPU DLPAR handling
  sysfs/cpu: Add probe/release files
  powerpc/pseries: Kernel DLPAR Infrastructure
  ...
2009-12-12 14:27:24 -08:00
Linus Torvalds
702a7c7609 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
  sched: Remove forced2_migrations stats
  sched: Fix memory leak in two error corner cases
  sched: Fix build warning in get_update_sysctl_factor()
  sched: Update normalized values on user updates via proc
  sched: Make tunable scaling style configurable
  sched: Fix missing sched tunable recalculation on cpu add/remove
  sched: Fix task priority bug
  sched: cgroup: Implement different treatment for idle shares
  sched: Remove unnecessary RCU exclusion
  sched: Discard some old bits
  sched: Clean up check_preempt_wakeup()
  sched: Move update_curr() in check_preempt_wakeup() to avoid redundant call
  sched: Sanitize fork() handling
  sched: Clean up ttwu() rq locking
  sched: Remove rq->clock coupling from set_task_cpu()
  sched: Consolidate select_task_rq() callers
  sched: Remove sysctl.sched_features
  sched: Protect sched_rr_get_param() access to task->sched_class
  sched: Protect task->cpus_allowed access in sched_getaffinity()
  sched: Fix balance vs hotplug race
  ...

Fixed up conflicts in kernel/sysctl.c (due to sysctl cleanup)
2009-12-12 11:34:10 -08:00
Linus Torvalds
053fe57ac2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (75 commits)
  net: Handle NETREG_UNINITIALIZED devices correctly
  can: add the driver for Analog Devices Blackfin on-chip CAN controllers
  xfrm: Fix truncation length of authentication algorithms installed via PF_KEY
  net: use compat helper functions in compat_sys_recvmmsg
  net: fix compat_sys_recvmmsg parameter type
  cxgb3: Fixing EEH handlers
  cnic: Zero out status block and Event Queue indices.
  cnic: Send delete command when shutting down iSCSI ring.
  net: smc91x: Fix up type mismatch in smc_drv_resume().
  smc91x: fix unused flags warnings on UP systems
  MAINTAINERS: Transfering maintainership of cdc-ether
  net: Add missing TST_CFG_WRITE bits around sky2_pci_write
  net: Fix Yukon-2 Optima TCP offload setup
  net: niu uses crc32, so select CRC32
  wireless: update old static regulatory domain rules
  mac80211: Revert 'Use correct sign for mesh active path refresh'
  mac80211: Fixed bug in mesh portal paths
  net/mac80211: Correct size given to memset
  b43: Remove reset after fatal DMA error
  rtl8187: add radio led and fix warnings on suspend
  ...
2009-12-11 20:58:20 -08:00
Linus Torvalds
92340ee319 Merge branch 'compat-ioctl-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
* 'compat-ioctl-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
  usbdevfs: move compat_ioctl handling to devio.c
  lp: move compat_ioctl handling into lp.c
  compat_ioctl: pass compat pointer directly to handlers
  compat_ioctl: simplify lookup table
  compat_ioctl: simplify calling of handlers
  compat_ioctl: inline all conversion handlers
  compat_ioctl: Remove BKL
  compat_ioctl: remove all VT ioctl handling
2009-12-11 20:57:46 -08:00
Linus Torvalds
3070f27d6e Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  itimer: Fix the itimer trace print format
  hrtimer: move timer stats helper functions to hrtimer.c
  hrtimer: Tune hrtimer_interrupt hang logic
2009-12-11 20:49:09 -08:00
Linus Torvalds
df7147b3c3 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Remove comparing of NULL to va_list in trace_array_vprintk()
  tracing: Fix function graph trace_pipe to properly display failed entries
  tracing: Add full state to trace_seq
  tracing: Buffer the output of seq_file in case of filled buffer
  tracing: Only call pipe_close if pipe_close is defined
  tracing: Add pipe_close interface
2009-12-11 20:47:44 -08:00
Linus Torvalds
6f696eb17b Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits)
  x86, perf events: Check if we have APIC enabled
  perf_event: Fix variable initialization in other codepaths
  perf kmem: Fix unused argument build warning
  perf symbols: perf_header__read_build_ids() offset'n'size should be u64
  perf symbols: dsos__read_build_ids() should read both user and kernel buildids
  perf tools: Align long options which have no short forms
  perf kmem: Show usage if no option is specified
  sched: Mark sched_clock() as notrace
  perf sched: Add max delay time snapshot
  perf tools: Correct size given to memset
  perf_event: Fix perf_swevent_hrtimer() variable initialization
  perf sched: Fix for getting task's execution time
  tracing/kprobes: Fix field creation's bad error handling
  perf_event: Cleanup for cpu_clock_perf_event_update()
  perf_event: Allocate children's perf_event_ctxp at the right time
  perf_event: Clean up __perf_event_init_context()
  hw-breakpoints: Modify breakpoints without unregistering them
  perf probe: Update perf-probe document
  perf probe: Support --del option
  trace-kprobe: Support delete probe syntax
  ...
2009-12-11 20:47:30 -08:00
David S. Miller
501706565b Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	include/net/tcp.h
2009-12-11 17:12:17 -08:00
Linus Torvalds
2fe77b81c7 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface
  [CPUFREQ] make internal cpufreq_add_dev_* static
  [CPUFREQ] use an enum for speedstep processor identification
  [CPUFREQ] Document units for transition latency
  [CPUFREQ] Use global sysfs cpufreq structure for conservative governor tunings
  [CPUFREQ] Documentation: ABI: /sys/devices/system/cpu/cpu#/cpufreq/
  [CPUFREQ] powernow-k6: set transition latency value so ondemand governor can be used
  [CPUFREQ] cpumask: don't put a cpumask on the stack in x86...cpufreq/powernow-k8.c
2009-12-11 15:59:23 -08:00
Linus Torvalds
0f4974c439 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (58 commits)
  tty: split the lock up a bit further
  tty: Move the leader test in disassociate
  tty: Push the bkl down a bit in the hangup code
  tty: Push the lock down further into the ldisc code
  tty: push the BKL down into the handlers a bit
  tty: moxa: split open lock
  tty: moxa: Kill the use of lock_kernel
  tty: moxa: Fix modem op locking
  tty: moxa: Kill off the throttle method
  tty: moxa: Locking clean up
  tty: moxa: rework the locking a bit
  tty: moxa: Use more tty_port ops
  tty: isicom: fix deadlock on shutdown
  tty: mxser: Use the new locking rules to fix setserial properly
  tty: mxser: use the tty_port_open method
  tty: isicom: sort out the board init logic
  tty: isicom: switch to the new tty_port_open helper
  tty: tty_port: Add a kref object to the tty port
  tty: istallion: tty port open/close methods
  tty: stallion: Convert to the tty_port_open/close methods
  ...
2009-12-11 15:34:40 -08:00
Linus Torvalds
3126c136bc Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits)
  ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
  ext3: Fix data / filesystem corruption when write fails to copy data
  ext4: Support for 64-bit quota format
  ext3: Support for vfsv1 quota format
  quota: Implement quota format with 64-bit space and inode limits
  quota: Move definition of QFMT_OCFS2 to linux/quota.h
  ext2: fix comment in ext2_find_entry about return values
  ext3: Unify log messages in ext3
  ext2: clear uptodate flag on super block I/O error
  ext2: Unify log messages in ext2
  ext3: make "norecovery" an alias for "noload"
  ext3: Don't update the superblock in ext3_statfs()
  ext3: journal all modifications in ext3_xattr_set_handle
  ext2: Explicitly assign values to on-disk enum of filetypes
  quota: Fix WARN_ON in lookup_one_len
  const: struct quota_format_ops
  ubifs: remove manual O_SYNC handling
  afs: remove manual O_SYNC handling
  kill wait_on_page_writeback_range
  vfs: Implement proper O_SYNC semantics
  ...
2009-12-11 15:31:13 -08:00
Linus Torvalds
f58df54a54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (27 commits)
  Driver core: fix race in dev_driver_string
  Driver Core: Early platform driver buffer
  sysfs: sysfs_setattr remove unnecessary permission check.
  sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir
  sysfs: Propagate renames to the vfs on demand
  sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish
  sysfs: In sysfs_chmod_file lazily propagate the mode change.
  sysfs: Implement sysfs_getattr & sysfs_permission
  sysfs: Nicely indent sysfs_symlink_inode_operations
  sysfs: Update s_iattr on link and unlink.
  sysfs: Fix locking and factor out sysfs_sd_setattr
  sysfs: Simplify iattr time assignments
  sysfs: Simplify sysfs_chmod_file semantics
  sysfs: Use dentry_ops instead of directly playing with the dcache
  sysfs: Rename sysfs_d_iput to sysfs_dentry_iput
  sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex
  debugfs: fix create mutex racy fops and private data
  Driver core: Don't remove kobjects in device_shutdown.
  firmware_class: make request_firmware_nowait more useful
  Driver-Core: devtmpfs - set root directory mode to 0755
  ...
2009-12-11 15:24:56 -08:00
Linus Torvalds
748e566b7e Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (122 commits)
  USB: mos7840: add device IDs for B&B electronics devices
  USB: ftdi_sio: add USB device ID's for B&B Electronics line
  USB: musb: musb_host: fix sparse warning
  USB: musb: musb_gadget: fix sparse warning
  USB: musb: omap2430: fix sparse warning
  USB: core: message: fix sparse warning
  USB: core: hub: fix sparse warning
  USB: core: fix sparse warning for static function
  USB: Added USB_ETH_RNDIS to use instead of CONFIG_USB_ETH_RNDIS
  USB: Check bandwidth when switching alt settings.
  USB: Refactor code to find alternate interface settings.
  USB: xhci: Fix command completion after a drop endpoint.
  USB: xhci: Make reverting an alt setting "unfailable".
  USB: usbtmc: Use usb_clear_halt() instead of custom code.
  USB: xhci: Add correct email and files to MAINTAINERS entry.
  USB: ehci-omap.c: introduce missing kfree
  USB: xhci-mem.c: introduce missing kfree
  USB: add remove_id sysfs attr for usb drivers
  USB: g_multi kconfig: fix depends and help text
  USB: option: add pid for ZTE
  ...
2009-12-11 15:22:55 -08:00
Alan Cox
eeb89d918c tty: push the BKL down into the handlers a bit
Start trying to untangle the remaining BKL mess

Updated to fix missing unlock_kernel noted by Dan Carpenter

Signed-off-by: Alan "I must be out of my tree" Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:08 -08:00
Alan Cox
6ed847d8ef tty: isicom: sort out the board init logic
Split this into two flags - INIT meaning the board is set up and ACTIVE
meaning the board has ports open. Remove the broken HUPCL casing and push
the counts somewhere sensible.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:07 -08:00
Alan Cox
568aafc627 tty: tty_port: Add a kref object to the tty port
Users of tty port need a way to refcount ports when hotplugging is
involved.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:07 -08:00
Alan Cox
44e4909e45 tty: tty_port: Change the buffer allocator locking
We want to be able to do this without regard for the activate/own open
method being used which causes a problem using port->mutex. Add another
mutex for now. Once everything uses port_open to do buffer allocs we can
kill it back off

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:06 -08:00
Alan Cox
82fc594343 usb_serial: Kill port mutex
The tty port has a port mutex used for all the port related locking so we
don't need the one in the USB serial layer any more.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00