Commit Graph

420645 Commits

Author SHA1 Message Date
Vladimir Davydov
d644163770 memcg: rework memcg_update_kmem_limit synchronization
Currently we take both the memcg_create_mutex and the set_limit_mutex
when we enable kmem accounting for a memory cgroup, which makes kmem
activation events serialize with both memcg creations and other memcg
limit updates (memory.limit, memory.memsw.limit).  However, there is no
point in such strict synchronization rules there.

First, the set_limit_mutex was introduced to keep the memory.limit and
memory.memsw.limit values in sync.  Since memory.kmem.limit can be set
independently of them, it is better to introduce a separate mutex to
synchronize against concurrent kmem limit updates.

Second, we take the memcg_create_mutex in order to make sure all
children of this memcg will be kmem-active as well.  For achieving that,
it is enough to hold this mutex only while checking if
memcg_has_children() though.  This guarantees that if a child is added
after we checked that the memcg has no children, the newly added cgroup
will see its parent kmem-active (of course if the latter succeeded), and
call kmem activation for itself.

This patch simplifies the locking rules of memcg_update_kmem_limit()
according to these considerations.

[vdavydov@parallels.com: fix unintialized var warning]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
6de64beb34 memcg: remove KMEM_ACCOUNTED_ACTIVATED flag
Currently we have two state bits in mem_cgroup::kmem_account_flags
regarding kmem accounting activation, ACTIVATED and ACTIVE.  We start
kmem accounting only if both flags are set (memcg_can_account_kmem()),
plus throughout the code there are several places where we check only
the ACTIVE flag, but we never check the ACTIVATED flag alone.  These
flags are both set from memcg_update_kmem_limit() under the
set_limit_mutex, the ACTIVE flag always being set after ACTIVATED, and
they never get cleared.  That said checking if both flags are set is
equivalent to checking only for the ACTIVE flag, and since there is no
ACTIVATED flag checks, we can safely remove the ACTIVATED flag, and
nothing will change.

Let's try to understand what was the reason for introducing these flags.
The purpose of the ACTIVE flag is clear - it states that kmem should be
accounting to the cgroup.  The only requirement for it is that it should
be set after we have fully initialized kmem accounting bits for the
cgroup and patched all static branches relating to kmem accounting.
Since we always check if static branch is enabled before actually
considering if we should account (otherwise we wouldn't benefit from
static branching), this guarantees us that we won't skip a commit or
uncharge after a charge due to an unpatched static branch.

Now let's move on to the ACTIVATED bit.  As I proved in the beginning of
this message, it is absolutely useless, and removing it will change
nothing.  So what was the reason introducing it?

The ACTIVATED flag was introduced by commit a8964b9b84 ("memcg: use
static branches when code not in use") in order to guarantee that
static_key_slow_inc(&memcg_kmem_enabled_key) would be called only once
for each memory cgroup when its kmem accounting was activated.  The
point was that at that time the memcg_update_kmem_limit() function's
work-flow looked like this:

        bool must_inc_static_branch = false;

        cgroup_lock();
        mutex_lock(&set_limit_mutex);
        if (!memcg->kmem_account_flags && val != RESOURCE_MAX) {
                /* The kmem limit is set for the first time */
                ret = res_counter_set_limit(&memcg->kmem, val);

                memcg_kmem_set_activated(memcg);
                must_inc_static_branch = true;
        } else
                ret = res_counter_set_limit(&memcg->kmem, val);
        mutex_unlock(&set_limit_mutex);
        cgroup_unlock();

        if (must_inc_static_branch) {
                /* We can't do this under cgroup_lock */
                static_key_slow_inc(&memcg_kmem_enabled_key);
                memcg_kmem_set_active(memcg);
        }

So that without the ACTIVATED flag we could race with other threads
trying to set the limit and increment the static branching ref-counter
more than once.  Today we call the whole memcg_update_kmem_limit()
function under the set_limit_mutex and this race is impossible.

As now we understand why the ACTIVATED bit was introduced and why we
don't need it now, and know that removing it will change nothing anyway,
let's get rid of it.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
f8570263ee memcg, slab: RCU protect memcg_params for root caches
We relocate root cache's memcg_params whenever we need to grow the
memcg_caches array to accommodate all kmem-active memory cgroups.
Currently on relocation we free the old version immediately, which can
lead to use-after-free, because the memcg_caches array is accessed
lock-free (see cache_from_memcg_idx()).  This patch fixes this by making
memcg_params RCU-protected for root caches.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
f717eb3abb slab: do not panic if we fail to create memcg cache
There is no point in flooding logs with warnings or especially crashing
the system if we fail to create a cache for a memcg.  In this case we
will be accounting the memcg allocation to the root cgroup until we
succeed to create its own cache, but it isn't that critical.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
842e287369 memcg: get rid of kmem_cache_dup()
kmem_cache_dup() is only called from memcg_create_kmem_cache().  The
latter, in fact, does nothing besides this, so let's fold
kmem_cache_dup() into memcg_create_kmem_cache().

This patch also makes the memcg_cache_mutex private to
memcg_create_kmem_cache(), because it is not used anywhere else.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
2edefe1155 memcg, slab: fix races in per-memcg cache creation/destruction
We obtain a per-memcg cache from a root kmem_cache by dereferencing an
entry of the root cache's memcg_params::memcg_caches array.  If we find
no cache for a memcg there on allocation, we initiate the memcg cache
creation (see memcg_kmem_get_cache()).  The cache creation proceeds
asynchronously in memcg_create_kmem_cache() in order to avoid lock
clashes, so there can be several threads trying to create the same
kmem_cache concurrently, but only one of them may succeed.  However, due
to a race in the code, it is not always true.  The point is that the
memcg_caches array can be relocated when we activate kmem accounting for
a memcg (see memcg_update_all_caches(), memcg_update_cache_size()).  If
memcg_update_cache_size() and memcg_create_kmem_cache() proceed
concurrently as described below, we can leak a kmem_cache.

Asume two threads schedule creation of the same kmem_cache.  One of them
successfully creates it.  Another one should fail then, but if
memcg_create_kmem_cache() interleaves with memcg_update_cache_size() as
follows, it won't:

  memcg_create_kmem_cache()             memcg_update_cache_size()
  (called w/o mutexes held)             (called with slab_mutex,
                                         set_limit_mutex held)
  -------------------------             -------------------------

  mutex_lock(&memcg_cache_mutex)

                                        s->memcg_params=kzalloc(...)

  new_cachep=cache_from_memcg_idx(cachep,idx)
  // new_cachep==NULL => proceed to creation

                                        s->memcg_params->memcg_caches[i]
                                            =cur_params->memcg_caches[i]

  // kmem_cache_create_memcg takes slab_mutex
  // so we will hang around until
  // memcg_update_cache_size finishes, but
  // nothing will prevent it from succeeding so
  // memcg_caches[idx] will be overwritten in
  // memcg_register_cache!

  new_cachep = kmem_cache_create_memcg(...)
  mutex_unlock(&memcg_cache_mutex)

Let's fix this by moving the check for existence of the memcg cache to
kmem_cache_create_memcg() to be called under the slab_mutex and make it
return NULL if so.

A similar race is possible when destroying a memcg cache (see
kmem_cache_destroy()).  Since memcg_unregister_cache(), which clears the
pointer in the memcg_caches array, is called w/o protection, we can race
with memcg_update_cache_size() and omit clearing the pointer.  Therefore
memcg_unregister_cache() should be moved before we release the
slab_mutex.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
96403da244 memcg: fix possible NULL deref while traversing memcg_slab_caches list
All caches of the same memory cgroup are linked in the memcg_slab_caches
list via kmem_cache::memcg_params::list.  This list is traversed, for
example, when we read memory.kmem.slabinfo.

Since the list actually consists of memcg_cache_params objects, we have
to convert an element of the list to a kmem_cache object using
memcg_params_to_cache(), which obtains the pointer to the cache from the
memcg_params::memcg_caches array of the corresponding root cache.  That
said the pointer to a kmem_cache in its parent's memcg_params must be
initialized before adding the cache to the list, and cleared only after
it has been unlinked.  Currently it is vice-versa, which can result in a
NULL ptr dereference while traversing the memcg_slab_caches list.  This
patch restores the correct order.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
959c8963fc memcg, slab: fix barrier usage when accessing memcg_caches
Each root kmem_cache has pointers to per-memcg caches stored in its
memcg_params::memcg_caches array.  Whenever we want to allocate a slab
for a memcg, we access this array to get per-memcg cache to allocate
from (see memcg_kmem_get_cache()).  The access must be lock-free for
performance reasons, so we should use barriers to assert the kmem_cache
is up-to-date.

First, we should place a write barrier immediately before setting the
pointer to it in the memcg_caches array in order to make sure nobody
will see a partially initialized object.  Second, we should issue a read
barrier before dereferencing the pointer to conform to the write
barrier.

However, currently the barrier usage looks rather strange.  We have a
write barrier *after* setting the pointer and a read barrier *before*
reading the pointer, which is incorrect.  This patch fixes this.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
1aa1325425 memcg, slab: clean up memcg cache initialization/destruction
Currently, we have rather a messy function set relating to per-memcg
kmem cache initialization/destruction.

Per-memcg caches are created in memcg_create_kmem_cache().  This
function calls kmem_cache_create_memcg() to allocate and initialize a
kmem cache and then "registers" the new cache in the
memcg_params::memcg_caches array of the parent cache.

During its work-flow, kmem_cache_create_memcg() executes the following
memcg-related functions:

 - memcg_alloc_cache_params(), to initialize memcg_params of the newly
   created cache;
 - memcg_cache_list_add(), to add the new cache to the memcg_slab_caches
   list.

On the other hand, kmem_cache_destroy() called on a cache destruction
only calls memcg_release_cache(), which does all the work: it cleans the
reference to the cache in its parent's memcg_params::memcg_caches,
removes the cache from the memcg_slab_caches list, and frees
memcg_params.

Such an inconsistency between destruction and initialization paths make
the code difficult to read, so let's clean this up a bit.

This patch moves all the code relating to registration of per-memcg
caches (adding to memcg list, setting the pointer to a cache from its
parent) to the newly created memcg_register_cache() and
memcg_unregister_cache() functions making the initialization and
destruction paths look symmetrical.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
363a044f73 memcg, slab: kmem_cache_create_memcg(): fix memleak on fail path
We do not free the cache's memcg_params if __kmem_cache_create fails.
Fix this.

Plus, rename memcg_register_cache() to memcg_alloc_cache_params(),
because it actually does not register the cache anywhere, but simply
initialize kmem_cache::memcg_params.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Vladimir Davydov
3965fc3652 slab: clean up kmem_cache_create_memcg() error handling
Currently kmem_cache_create_memcg() backoffs on failure inside
conditionals, without using gotos.  This results in the rollback code
duplication, which makes the function look cumbersome even though on
error we should only free the allocated cache.  Since in the next patch
I am going to add yet another rollback function call on error path
there, let's employ labels instead of conditionals for undoing any
changes on failure to keep things clean.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Sasha Levin
309381feae mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE
Most of the VM_BUG_ON assertions are performed on a page.  Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.

I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.

This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.

[akpm@linux-foundation.org: fix up includes]
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Naoya Horiguchi
e3bba3c3c9 fs/proc/page.c: add PageAnon check to surely detect thp
stable_page_flags() checks !PageHuge && PageTransCompound && PageLRU to
know that a specified page is thp or not.  But sometimes it's not enough
and we fail to detect thp when the thp is on pagevec.  This happens only
for a few seconds after LRU list operations, but it makes it difficult
to control our applications depending on this flag.

So this patch adds another check PageAnon to detect thps on pagevec.  It
might not give the future extensibility for thp pagecache, but it's OK
at least for now.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Vladimir Davydov
8ff69e2c85 memcg: do not use vmalloc for mem_cgroup allocations
The vmalloc was introduced by 3332794878 ("memcgroup: use vmalloc for
mem_cgroup allocation"), because at that time MAX_NUMNODES was used for
defining the per-node array in the mem_cgroup structure so that the
structure could be huge even if the system had the only NUMA node.

The situation was significantly improved by commit 45cf7ebd5a ("memcg:
reduce the size of struct memcg 244-fold"), which made the size of the
mem_cgroup structure calculated dynamically depending on the real number
of NUMA nodes installed on the system (nr_node_ids), so now there is no
point in using vmalloc here: the structure is allocated rarely and on
most systems its size is about 1K.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Vlastimil Babka
01cc2e5869 mm: munlock: fix potential race with THP page split
Since commit ff6a6da60b ("mm: accelerate munlock() treatment of THP
pages") munlock skips tail pages of a munlocked THP page.  There is some
attempt to prevent bad consequences of racing with a THP page split, but
code inspection indicates that there are two problems that may lead to a
non-fatal, yet wrong outcome.

First, __split_huge_page_refcount() copies flags including PageMlocked
from the head page to the tail pages.  Clearing PageMlocked by
munlock_vma_page() in the middle of this operation might result in part
of tail pages left with PageMlocked flag.  As the head page still
appears to be a THP page until all tail pages are processed,
munlock_vma_page() might think it munlocked the whole THP page and skip
all the former tail pages.  Before ff6a6da60, those pages would be
cleared in further iterations of munlock_vma_pages_range(), but NR_MLOCK
would still become undercounted (related the next point).

Second, NR_MLOCK accounting is based on call to hpage_nr_pages() after
the PageMlocked is cleared.  The accounting might also become
inconsistent due to race with __split_huge_page_refcount()

- undercount when HUGE_PMD_NR is subtracted, but some tail pages are
  left with PageMlocked set and counted again (only possible before
  ff6a6da60)

- overcount when hpage_nr_pages() sees a normal page (split has already
  finished), but the parallel split has meanwhile cleared PageMlocked from
  additional tail pages

This patch prevents both problems via extending the scope of lru_lock in
munlock_vma_page().  This is convenient because:

- __split_huge_page_refcount() takes lru_lock for its whole operation

- munlock_vma_page() typically takes lru_lock anyway for page isolation

As this becomes a second function where page isolation is done with
lru_lock already held, factor this out to a new
__munlock_isolate_lru_page() function and clean up the code around.

[akpm@linux-foundation.org: avoid a coding-style ugly]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Dave Hansen
f0b791a34c mm: print more details for bad_page()
bad_page() is cool in that it prints out a bunch of data about the page.
But, I can never remember which page flags are good and which are bad,
or whether ->index or ->mapping is required to be NULL.

This patch allows bad/dump_page() callers to specify a string about why
they are dumping the page and adds explanation strings to a number of
places.  It also adds a 'bad_flags' argument to bad_page(), which it
then dumps out separately from the flags which are actually set.

This way, the messages will show specifically why the page was bad,
*specifically* which flags it is complaining about, if it was a page
flag combination which was the problem.

[akpm@linux-foundation.org: switch to pr_alert]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Dan Streetman
12ab028be0 mm/zswap.c: change params from hidden to ro
The "compressor" and "enabled" params are currently hidden, this changes
them to read-only, so userspace can tell if zswap is enabled or not and
see what compressor is in use.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Vladimir Murzin <murzin.v@gmail.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Dave Hansen
57ea8171d2 mm: documentation: remove hopelessly out-of-date locking doc
Documentation/vm/locking is a blast from the past.  In the entire git
history, it has had precisely Three modifications.  Two of those look to
be pure renames, and the third was from 2005.

The doc contains such gems as:

> The page_table_lock is grabbed while holding the
> kernel_lock spinning monitor.

> Page stealers hold kernel_lock to protect against a bunch of
> races.

Or this which talks about mmap_sem:

> 4. The exception to this rule is expand_stack, which just
>    takes the read lock and the page_table_lock, this is ok
>    because it doesn't really modify fields anybody relies on.

expand_stack() doesn't take any locks any more directly, and the
mmap_sem acquisition was long ago moved up in to the page fault code
itself.

It could be argued that we need to rewrite this, but it is dangerous to
leave it as-is.  It will confuse more people than it helps.

Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Michal Simek
372c7209d6 microblaze: extable: sort the exception table at build time
Sort the exception table at build-time rather than during boot.

Microblaze is the same case as AARCH64 that's why EM_MICROBLAZE
conditional check was added to allow cross-compilation on machines which
are not running the latest libc-dev.

Inspired by AARCH64 commit adace89562 ("arm64: extable: sort the
exception table at build time").

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Geert Uytterhoeven
3fdb38bd1f cris: provide {in,out}[wl]_p()
drivers/staging/comedi/drivers/das6402.c: In function 'intr_handler':
  drivers/staging/comedi/drivers/das6402.c:164:3: error: implicit declaration of function 'outw_p' [-Werror=implicit-function-declaration]
  drivers/staging/speakup/speakup_dtlk.c: In function 'synth_probe':
  drivers/staging/speakup/speakup_dtlk.c:362:2: error: implicit declaration of function 'inw_p' [-Werror=implicit-function-declaration]

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:50 -08:00
Linus Torvalds
90804ed61f Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF & jbd fixes from Jan Kara:
 "A cleanup of JBD log messages and UDF fix of a lockdep warning"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fix lockdep warning from udf_symlink()
  jbd: Revise KERN_EMERG error messages
2014-01-23 10:49:23 -08:00
Stephen Hemminger
30b02c4b2a assoc_array: remove global variable
The associative array code creates unnecessary and potentially
problematic global variable 'status'.  Remove it since never used.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 09:25:10 -08:00
Linus Torvalds
5ee7a81a9f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse update from Miklos Szeredi:
 "This contains a fix for a potential use-after-module-unload bug
  noticed by Al and caching improvements for read-only fuse filesystems
  by Andrew Gallagher"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: support clients that don't implement 'open'
  fuse: don't invalidate attrs when not using atime
  fuse: fix SetPageUptodate() condition in STORE
  fuse: fix pipe_buf_operations
2014-01-23 09:22:58 -08:00
Linus Torvalds
0d90d63872 f2fs updates for v3.14
This patch-set includes the following major enhancement patches.
 o support inline_data
 o refactor bio operations such as merge operations and rw type assignment
 o enhance the direct IO path
 o enhance bio operations
 o truncate a node page when it becomes obsolete
 o add sysfs entries: small_discards, max_victim_search, and in-place-update
 o add a sysfs entry to control max_victim_search
 
 The other bug fixes are as follows.
 o fix a bug in truncate_partial_nodes
 o avoid warnings during sparse and build process
 o fix error handling flows
 o fix potential bit overflows
 
 And, there are a bunch of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJS4HQfAAoJEEAUqH6CSFDSyyMP/iUXSMC9yw6eOmSjAh3boc6+
 C7e4zrhdovekGTuZgg41SLdr83cpbEohv11wcXAfxB+eYFEz0zrAVzt54zMi7uOL
 9JmFJ6XVL/T3omI5hpEwWHg6S6tOynN6mcjacsrvypEekgjHbbpLudSw6SCu3dKz
 Lpc3z6CxrWbhvX8Iyf1j8mCceWkTO6eRv7u2H4Njtsq4Tukw3BHiBsURXt6kGwpx
 CvRBgCFdQhv4GAtbDosmVjNWOUxvik7w2epHAPQGddFTgaCL9uS+gfweHK6H9EDp
 1e3BDhmn5r9IhiLY8KVXRc8+po9kQeO1jNQATBuWggfjJSGbEBmrEQX4MFE3uCi9
 q84hGV9+yaJxoT2A21qIeWgorF9gjqNbnrrENKHyKhOqXJSrh48u5LUV8KqIyz1Y
 Qw62cypEB+PQxWegN76vwX/OrHMCLYMQ6c78bYLSwkBKonOrF5sN2+kJW5+zEj6n
 q2cYi1PLMJe7LTcULUrxJTSPFLKM5yA2oYZq3LN4sUYBeN6USaouaIqcZBqRBTCO
 adqlTa3sWytkDMAHsTpwrHABKK7pwiZoPLDVwjo0TIJ6Us4JhDtTktp5pj24fQ7Y
 6lC9w4VbfAKtq8fMV17rZYD0lQFlmZk4uQRJ8XYicCRFx11kMPKYzdGmP5aVXWru
 wxcztktnABtCAXK0PFLf
 =gVDh
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, a couple of sysfs entries were introduced to tune the
  f2fs at runtime.

  In addition, f2fs starts to support inline_data and improves the
  read/write performance in some workloads by refactoring bio-related
  flows.

  This patch-set includes the following major enhancement patches.
   - support inline_data
   - refactor bio operations such as merge operations and rw type
     assignment
   - enhance the direct IO path
   - enhance bio operations
   - truncate a node page when it becomes obsolete
   - add sysfs entries: small_discards, max_victim_search, and
     in-place-update
   - add a sysfs entry to control max_victim_search

  The other bug fixes are as follows.
   - fix a bug in truncate_partial_nodes
   - avoid warnings during sparse and build process
   - fix error handling flows
   - fix potential bit overflows

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (95 commits)
  f2fs: drop obsolete node page when it is truncated
  f2fs: introduce NODE_MAPPING for code consistency
  f2fs: remove the orphan block page array
  f2fs: add help function META_MAPPING
  f2fs: move a branch for code redability
  f2fs: call mark_inode_dirty to flush dirty pages
  f2fs: clean checkpatch warnings
  f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH)
  f2fs: avoid f2fs_balance_fs call during pageout
  f2fs: add delimiter to seperate name and value in debug phrase
  f2fs: use spinlock rather than mutex for better speed
  f2fs: move alloc new orphan node out of lock protection region
  f2fs: move grabing orphan pages out of protection region
  f2fs: remove the needless parameter of f2fs_wait_on_page_writeback
  f2fs: update documents and a MAINTAINERS entry
  f2fs: add a sysfs entry to control max_victim_search
  f2fs: improve write performance under frequent fsync calls
  f2fs: avoid to read inline data except first page
  f2fs: avoid to left uninitialized data in page when read inline data
  f2fs: fix truncate_partial_nodes bug
  ...
2014-01-23 09:21:09 -08:00
Linus Torvalds
1d32bdafaa xfs: update for v3.14-rc1
For 3.14-rc1 there are fixes in the areas of remote attributes, discard,
 growfs, memory leaks in recovery, directory v2, quotas, the MAINTAINERS
 file, allocation alignment, extent list locking, and in
 xfs_bmapi_allocate.  There are cleanups in xfs_setsize_buftarg, removing
 unused macros, quotas, setattr, and freeing of inode clusters.  The
 in-memory and on-disk log format have been decoupled, a common helper to
 calculate the number of blocks in an inode cluster has been added, and
 handling of i_version has been pulled into the filesystems that use it.
 
 - cleanup in xfs_setsize_buftarg
 - removal of remaining unused flags for vop toss/flush/flushinval
 - fix for memory corruption in xfs_attrlist_by_handle
 - fix for out-of-date comment in xfs_trans_dqlockedjoin
 - fix for discard if range length is less than one block
 - fix for overrun of agfl buffer using growfs on v4 superblock filesystems
 - pull i_version handling out into the filesystems that use it
 - don't leak recovery items on error
 - fix for memory leak in xfs_dir2_node_removename
 - several cleanups for quotas
 - fix bad assertion in xfs_qm_vop_create_dqattach
 - cleanup for xfs_setattr_mode, and add xfs_setattr_time
 - fix quota assert in xfs_setattr_nonsize
 - fix an infinite loop when turning off group/project quota before user
   quota
 - fix for temporary buffer allocation failure in xfs_dir2_block_to_sf
   with large directory block sizes
 - fix Dave's email address in MAINTAINERS
 - cleanup calculation of freed inode cluster blocks
 - fix alignment of initial file allocations to match filesystem geometry
 - decouple in-memory and on-disk log format
 - introduce a common helper to calculate the number of filesystem
   blocks in an inode cluster
 - fixes for extent list locking
 - fix for off-by-one in xfs_attr3_rmt_verify
 - fix for missing destroy_work_on_stack in xfs_bmapi_allocate
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJS4C2sAAoJENaLyazVq6ZOF/UP/A/FmscsEbgz+KHtsg1UXP+g
 EMkrD8WOOzLnm7/GhkfvmmFLQrWrrNPmiP54MPJ/rxpfDb7rV60HgS0iHm13U+IR
 0uPyk2NIQKG/Sj6FHCrgn4oh19B0tmer6lLE32UPrlNM7w/0NRm32Ms/jKz7WNvc
 9Tqw3qNdtmP7leHG8EyD1ISzqUz6FNSC+qGyOTuBQUqG24LqsmZ1qD2Nw/HEz0ir
 1YCqS+cjj8c3WaWMsdjEFdCvEIKS/sMYM+ZihATIl/5ggwtVobP7PH/4cG8Biby3
 5wvcl3/jM+xjgGDC1YMQQeSADieus6ER4aZTjCvGLn9RajQTOBjP0QZTu2OHntTI
 kD/52DfYErthGijS2qJcqXy11AaYtWQU2yTZXuMpaACIM1DDSD4NyGFP99BzbBaX
 D4BgnC+vmfpbXO37PmophkeZwAvj+9K2BBG7X+g4sLynj/BtmvFCxIyK+SLHE3hN
 kn+Pn03yTw0VGgvj0krXSllbYKE0LyLElaA6h0LejQgcOZM0W6Q8qaj+RODLitbw
 HkQNKYCOMCzbdNu8rKAfup9UPZloiMukyMdMaw1MeCgCQSQLkZVxTegmVW9gzpIh
 QJEARDaOnKB/G/CLCwbPZR09DBpg3yBt/eHjt0VnV+z2x+u9DTInAp6f013g7E/W
 gU+vnBBPY1AYI8iiDraU
 =t8nB
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.14-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "This is primarily bug fixes, many of which you already have.  New
  stuff includes a series to decouple the in-memory and on-disk log
  format, helpers in the area of inode clusters, and i_version handling.

  We decided to try to use more topic branches this release, so there
  are some merge commits in there on account of that.  I'm afraid I
  didn't do a good job of putting meaningful comments in the first
  couple of merges.  Sorry about that.  I think I have the hang of it
  now.

  For 3.14-rc1 there are fixes in the areas of remote attributes,
  discard, growfs, memory leaks in recovery, directory v2, quotas, the
  MAINTAINERS file, allocation alignment, extent list locking, and in
  xfs_bmapi_allocate.  There are cleanups in xfs_setsize_buftarg,
  removing unused macros, quotas, setattr, and freeing of inode
  clusters.  The in-memory and on-disk log format have been decoupled, a
  common helper to calculate the number of blocks in an inode cluster
  has been added, and handling of i_version has been pulled into the
  filesystems that use it.

   - cleanup in xfs_setsize_buftarg
   - removal of remaining unused flags for vop toss/flush/flushinval
   - fix for memory corruption in xfs_attrlist_by_handle
   - fix for out-of-date comment in xfs_trans_dqlockedjoin
   - fix for discard if range length is less than one block
   - fix for overrun of agfl buffer using growfs on v4 superblock
     filesystems
   - pull i_version handling out into the filesystems that use it
   - don't leak recovery items on error
   - fix for memory leak in xfs_dir2_node_removename
   - several cleanups for quotas
   - fix bad assertion in xfs_qm_vop_create_dqattach
   - cleanup for xfs_setattr_mode, and add xfs_setattr_time
   - fix quota assert in xfs_setattr_nonsize
   - fix an infinite loop when turning off group/project quota before
     user quota
   - fix for temporary buffer allocation failure in xfs_dir2_block_to_sf
     with large directory block sizes
   - fix Dave's email address in MAINTAINERS
   - cleanup calculation of freed inode cluster blocks
   - fix alignment of initial file allocations to match filesystem
     geometry
   - decouple in-memory and on-disk log format
   - introduce a common helper to calculate the number of filesystem
     blocks in an inode cluster
   - fixes for extent list locking
   - fix for off-by-one in xfs_attr3_rmt_verify
   - fix for missing destroy_work_on_stack in xfs_bmapi_allocate"

* tag 'xfs-for-linus-v3.14-rc1' of git://oss.sgi.com/xfs/xfs: (51 commits)
  xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
  xfs: fix off-by-one error in xfs_attr3_rmt_verify
  xfs: assert that we hold the ilock for extent map access
  xfs: use xfs_ilock_attr_map_shared in xfs_attr_list_int
  xfs: use xfs_ilock_attr_map_shared in xfs_attr_get
  xfs: use xfs_ilock_data_map_shared in xfs_qm_dqiterate
  xfs: use xfs_ilock_data_map_shared in xfs_qm_dqtobp
  xfs: take the ilock around xfs_bmapi_read in xfs_zero_remaining_bytes
  xfs: reinstate the ilock in xfs_readdir
  xfs: add xfs_ilock_attr_map_shared
  xfs: rename xfs_ilock_map_shared
  xfs: remove xfs_iunlock_map_shared
  xfs: no need to lock the inode in xfs_find_handle
  xfs: use xfs_icluster_size_fsb in xfs_imap
  xfs: use xfs_icluster_size_fsb in xfs_ifree_cluster
  xfs: use xfs_icluster_size_fsb in xfs_ialloc_inode_init
  xfs: use xfs_icluster_size_fsb in xfs_bulkstat
  xfs: introduce a common helper xfs_icluster_size_fsb
  xfs: get rid of XFS_IALLOC_BLOCKS macros
  xfs: get rid of XFS_INODE_CLUSTER_SIZE macros
  ...
2014-01-23 09:16:20 -08:00
Olof Johansson
e41006c22e Split omap3 core padconf area into two as some of the registers in
the padconf area are not accessible and used for other devices.
 Also fix the interrupt number for omap2 RNG, and add basic support
 for sbc-3xxx with cm-t3730.
 
 Note that the minor merge conflicts for omap_hwmod_2xxx_ipblock_data.c
 can be solved by just dropping the legacy hwmod data for interrupts
 for v3.14.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSzaM5AAoJEBvUPslcq6VzhSwQAKM4vtw0eeLTGXURMs7KdCfX
 B6Qo+/DW7dEkrIcFVY8n5CfKRpb48KZ4RyN7Yff5HqkPzB5+/rd5yrruzG3zPwNl
 q5WTxCwuJo/QGXGsEA5W0PNx22WxmC2hcburzaRb/CK8XWwAuyEQ5NU14Az+d/Lf
 icJAk98c4DX60Lyr743v3Tow/tJJYEisJSUxPlqoUU50oX73we7BAhGh19B40jC8
 2gUXUWxaLhDT+Kry/h9R0fJD81KdPZ185GWUHnp81DHq2wj8tzclmKmic7lB9ycr
 lyZXff0dYSLw1Ufupi8LaLd0ULi4uFrYSD/ZgaZmTJuduSG0TXOOn5MfrPcU7Upq
 +vAY3YehOdMx8kyCx6rybrH/joLKdoIPT0Pqb6JCCVvnoE77fXJqRcdmfHEeUV6Z
 f+keyZKdiVmn1dOB7slKqTco5f+bgNh1/sQ074RJKFggLIskSbfHtjxD2S4UyVFq
 fK2Ch90+fWIyVP7XhSec8SKdna3mZyigMBWimf2oMmTeq/zyNLKAWj5PTzX4wch2
 UAhJDTA+A/PXuriD1Hl1k6hgV+Udsndy4r0hJfwSDHgF+MP7FU1e/bceA9s0KJ5z
 21DcIkSqfOdxZvM61S8uVhebBPPPrJDCk18YsZS0KyndIaI2MZdJXn8J/E7HkK1l
 9JG5BwDb75nrcTrgCj/9
 =sPMZ
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v3.14/dt-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/boards

Split omap3 core padconf area into two as some of the registers in
the padconf area are not accessible and used for other devices.
Also fix the interrupt number for omap2 RNG, and add basic support
for sbc-3xxx with cm-t3730.

Note that the minor merge conflicts for omap_hwmod_2xxx_ipblock_data.c
can be solved by just dropping the legacy hwmod data for interrupts
for v3.14.

* tag 'omap-for-v3.14/dt-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: OMAP2: fix interrupt number for rng
  ARM: dts: Split omap3 pinmux core device
  ARM: dts: Add omap specific pinctrl defines to use padconf addresses
  ARM: dts: Add support for sbc-3xxx with cm-t3730

Signed-off-by: Olof Johansson <olof@lixom.net>
2014-01-23 08:51:37 -08:00
Dmitry Torokhov
55df811f20 Merge branch 'next' into for-linus
First round of input updates for 3.14.
2014-01-23 08:10:44 -08:00
Heiko Stübner
e48ca29d30 dt-bindings: add rockchip vendor prefix
It seems I forgot to add the vendor prefix for rockchip to the vendor-prefix
list. Therefore add it now.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Rob Herring <robh@kernel.org>
2014-01-23 09:15:23 -06:00
Tony Prisk
a7e0edde16 serial: vt8500: Add missing binding document for arch-vt8500 serial driver.
The binding document for the vt8500/wm8xxx SoC UART driver is missing.
This patch adds the binding document.

Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: Rob Herring <robh@kernel.org>
2014-01-23 09:15:01 -06:00
Rob Herring
619d144013 Merge remote-tracking branch 'grant/devicetree/next' into for-3.14 2014-01-23 08:23:04 -06:00
Peter Zijlstra
5e3c1afd45 sched/x86/tsc: Initialize multiplier to 0
Since we keep the clock value linearly continuous on frequency change,
make sure the initial multiplier is 0, such that our initial value is 0.
Without this we compute the initial value at whatever the TSC has
managed to reach since power-on.

Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Fixes: 20d1c86a57 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs")
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: rui.zhang@intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: paulmck@linux.vnet.ibm.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: dyoung@redhat.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140123094804.GP30183@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23 14:48:36 +01:00
Peter Zijlstra
d375b4e0fa sched/clock: Fixup early initialization
The code would assume sched_clock_stable() and switch to !stable
later, this switch brings a discontinuity in time.

The discontinuity on switching from stable to unstable was always
present, but previously we would set stable/unstable before
initializing TSC and usually stick to the one we start out with.

So the static_key bits brought an extra switch where there previously
wasn't one.

Things are further complicated by the fact that we cannot use
static_key as early as we usually call set_sched_clock_stable().

Fix things by tracking the stable state in a regular variable and only
set the static_key to the right state on sched_clock_init(), which is
ran right after late_time_init->tsc_init().

Before this we would not be using the TSC anyway.

Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: dyoung@redhat.com
Fixes: 35af99e646 ("sched/clock, x86: Use a static_key for sched_clock_stable")
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: paulmck@linux.vnet.ibm.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: rui.zhang@intel.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140122115918.GG3694@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23 14:48:36 +01:00
Peter Zijlstra
215393bc1f sched/preempt/x86: Fix voluntary preempt for x86
The #ifdef CONFIG_PREEMPT is both not needed and wrong.

Its not required because asm/preempt.h should provide
{set,clear}_preempt_need_resched() regardless and its wrong because
for voluntary preempt we still rely on PREEMPT_NEED_RESCHED.

Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Fixes: 8cb75e0c4e ("sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding")
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20140122102435.GH31570@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23 14:48:35 +01:00
Vincent Guittot
9390675af0 Revert "sched: Fix sleep time double accounting in enqueue entity"
This reverts commit 282cf499f0.

With the current implementation, the load average statistics of a sched entity
change according to other activity on the CPU even if this activity is done
between the running window of the sched entity and have no influence on the
running duration of the task.

When a task wakes up on the same CPU, we currently update last_runnable_update
with the return  of __synchronize_entity_decay without updating the
runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
the load_contrib of the se with the rq's blocked_load_contrib before removing
it from the latter (with __synchronize_entity_decay) but we must keep
last_runnable_update unchanged for updating runnable_avg_sum/period during the
next update_entity_load_avg.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: pjt@google.com
Cc: alex.shi@linaro.org
Link: http://lkml.kernel.org/r/1390376734-6800-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23 14:48:34 +01:00
Roland Dreier
fb1b5034e4 Merge branch 'ip-roce' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
2014-01-22 23:24:21 -08:00
Roland Dreier
8f399921ea Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-01-22 23:24:13 -08:00
Eli Cohen
57761d8df8 IB/mlx5: Verify reserved fields are cleared
Verify that reserved fields in struct mlx5_ib_resize_cq are cleared
before continuing execution of the verb. This is required to allow
making use of this area in future revisions.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:54 -08:00
Eli Cohen
8c8a49148b IB/mlx5: Remove old field for create mkey mailbox
Match firmware specification.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:54 -08:00
Eli Cohen
1bde6e301c IB/mlx5: Abort driver cleanup if teardown hca fails
Do not continue with cleanup flow. If this ever happens we can check which
resources remained open.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:53 -08:00
Eli Cohen
9e9c47d07d IB/mlx5: Allow creation of QPs with zero-length work queues
The current code attmepts to call ib_umem_get() even if the length is
zero, which causes a failure. Since the spec allows zero length work
queues, change the code so we don't call ib_umem_get() in those cases.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:53 -08:00
Eli Cohen
05bdb2ab6b mlx5_core: Fix PowerPC support
1. Fix derivation of sub-page index from the dma address in free_4k.
2. Fix the DMA address passed to dma_unmap_page by masking it properly.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:52 -08:00
Eli Cohen
db81a5c374 mlx5_core: Improve debugfs readability
Use strings to display transport service or state of QPs.  Use numeric
value for MTU of a QP.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:50 -08:00
Eli Cohen
bde51583f4 IB/mlx5: Add support for resize CQ
Implement resize CQ which is a mandatory verb in mlx5.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:50 -08:00
Eli Cohen
3bdb31f688 IB/mlx5: Implement modify CQ
Modify CQ is used by ULPs like IPoIB to change moderation parameters.  This
patch adds support in mlx5.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:49 -08:00
Eli Cohen
ada388f7af IB/mlx5: Make sure doorbell record is visible before doorbell
Put a wmb() to make sure the doorbell record is visible to the HCA before we
hit doorbell.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:49 -08:00
Eli Cohen
042b9adae8 mlx5_core: Use mlx5 core style warning
Use mlx5_core_warn(), which is the standard warning emitter function, instead
of pr_warn().

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:47 -08:00
Eli Cohen
0b6e81b910 IB/mlx5: Clear out struct before create QP command
Output structs are expected by firmware to be cleared when a command is called.
Clear the "out" struct instead of "dout" which is used only later.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:46 -08:00
Haggai Eran
e08a8761d8 mlx5_core: Fix out arg size in access_register command
The output size should be the sum of the core access reg output struct
plus the size of the specific register data provided by the caller.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:44 -08:00
Ding Tianhong
79adc5321e RDMA/nes: Slight optimization of Ethernet address compare
Use the possibly more efficient ether_addr_equal() instead of memcmp().

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:22:26 -08:00
Ira Weiny
6e0ea9e6cb IB/qib: Fix QP check when looping back to/from QP1
The GSI QP type is compatible with and should be allowed to send data
to/from any UD QP.  This was found when testing ibacm on the same node
as an SA.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:16:47 -08:00