Commit Graph

36471 Commits

Author SHA1 Message Date
Tejun Heo
bbf1bb3eee cpu_stop: add dummy implementation for UP
When !CONFIG_SMP, cpu_stop functions weren't defined at all which
could lead to build failures if UP code uses cpu_stop facility.  Add
dummy cpu_stop implementation for UP.  The waiting variants execute
the work function directly with preempt disabled and
stop_one_cpu_nowait() schedules a workqueue work.

Makefile and ifdefs around stop_machine implementation are updated to
accomodate CONFIG_SMP && !CONFIG_STOP_MACHINE case.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
2010-05-08 17:12:33 +02:00
Tejun Heo
969c79215a sched: replace migration_thread with cpu_stop
Currently migration_thread is serving three purposes - migration
pusher, context to execute active_load_balance() and forced context
switcher for expedited RCU synchronize_sched.  All three roles are
hardcoded into migration_thread() and determining which job is
scheduled is slightly messy.

This patch kills migration_thread and replaces all three uses with
cpu_stop.  The three different roles of migration_thread() are
splitted into three separate cpu_stop callbacks -
migration_cpu_stop(), active_load_balance_cpu_stop() and
synchronize_sched_expedited_cpu_stop() - and each use case now simply
asks cpu_stop to execute the callback as necessary.

synchronize_sched_expedited() was implemented with private
preallocated resources and custom multi-cpu queueing and waiting
logic, both of which are provided by cpu_stop.
synchronize_sched_expedited_count is made atomic and all other shared
resources along with the mutex are dropped.

synchronize_sched_expedited() also implemented a check to detect cases
where not all the callback got executed on their assigned cpus and
fall back to synchronize_sched().  If called with cpu hotplug blocked,
cpu_stop already guarantees that and the condition cannot happen;
otherwise, stop_machine() would break.  However, this patch preserves
the paranoid check using a cpumask to record on which cpus the stopper
ran so that it can serve as a bisection point if something actually
goes wrong theree.

Because the internal execution state is no longer visible,
rcu_expedited_torture_stats() is removed.

This patch also renames cpu_stop threads to from "stopper/%d" to
"migration/%d".  The names of these threads ultimately don't matter
and there's no reason to make unnecessary userland visible changes.

With this patch applied, stop_machine() and sched now share the same
resources.  stop_machine() is faster without wasting any resources and
sched migration users are much cleaner.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Josh Triplett <josh@freedesktop.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
2010-05-06 18:49:21 +02:00
Tejun Heo
3fc1f1e27a stop_machine: reimplement using cpu_stop
Reimplement stop_machine using cpu_stop.  As cpu stoppers are
guaranteed to be available for all online cpus,
stop_machine_create/destroy() are no longer necessary and removed.

With resource management and synchronization handled by cpu_stop, the
new implementation is much simpler.  Asking the cpu_stop to execute
the stop_cpu() state machine on all online cpus with cpu hotplug
disabled is enough.

stop_machine itself doesn't need to manage any global resources
anymore, so all per-instance information is rolled into struct
stop_machine_data and the mutex and all static data variables are
removed.

The previous implementation created and destroyed RT workqueues as
necessary which made stop_machine() calls highly expensive on very
large machines.  According to Dimitri Sivanich, preventing the dynamic
creation/destruction makes booting faster more than twice on very
large machines.  cpu_stop resources are preallocated for all online
cpus and should have the same effect.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
2010-05-06 18:49:20 +02:00
Tejun Heo
1142d81029 cpu_stop: implement stop_cpu[s]()
Implement a simplistic per-cpu maximum priority cpu monopolization
mechanism.  A non-sleeping callback can be scheduled to run on one or
multiple cpus with maximum priority monopolozing those cpus.  This is
primarily to replace and unify RT workqueue usage in stop_machine and
scheduler migration_thread which currently is serving multiple
purposes.

Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
stop_cpus() and try_stop_cpus().

This is to allow clean sharing of resources among stop_cpu and all the
migration thread users.  One stopper thread per cpu is created which
is currently named "stopper/CPU".  This will eventually replace the
migration thread and take on its name.

* This facility was originally named cpuhog and lived in separate
  files but Peter Zijlstra nacked the name and thus got renamed to
  cpu_stop and moved into stop_machine.c.

* Better reporting of preemption leak as per Peter's suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
2010-05-06 18:49:20 +02:00
Peter Zijlstra
669c55e9f9 sched: Pre-compute cpumask_weight(sched_domain_span(sd))
Dave reported that his large SPARC machines spend lots of time in
hweight64(), try and optimize some of those needless cpumask_weight()
invocations (esp. with the large offstack cpumasks these are very
expensive indeed).

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-23 11:02:02 +02:00
Ingo Molnar
b257c14ceb Merge branch 'linus' into sched/core
Merge reason: merge the latest fixes, update to -rc4.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-15 09:36:16 +02:00
Linus Torvalds
0fdfe5ad28 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFSv4: fix delegated locking
  NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible
  NFS: Fix a race with the new commit code
  NFS: Ensure that writeback_single_inode() calls write_inode() when syncing
  NFS: Fix the mode calculation in nfs_find_open_context
  NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR
2010-04-13 15:10:16 -07:00
Trond Myklebust
0df5dd4aae NFSv4: fix delegated locking
Arnaud Giersch reports that NFSv4 locking is broken when we hold a
delegation since commit 8e469ebd6d (NFSv4:
Don't allow posix locking against servers that don't support it).

According to Arnaud, the lock succeeds the first time he opens the file
(since we cannot do a delegated open) but then fails after we start using
delegated opens.

The following patch fixes it by ensuring that locking behaviour is
governed by a per-filesystem capability flag that is initially set, but
gets cleared if the server ever returns an OPEN without the
NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.

Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-04-12 07:55:15 -04:00
David S. Miller
4a1032faac Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-04-11 02:44:30 -07:00
Linus Torvalds
2f4084209a Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
  cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
  loop: Update mtime when writing using aops
  block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
  backing-dev: Handle class_create() failure
  Block: Fix block/elevator.c elevator_get() off-by-one error
  drbd: lc_element_by_index() never returns NULL
  cciss: unlock on error path
  cfq-iosched: Do not merge queues of BE and IDLE classes
  cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
  i2o: Remove the dangerous kobj_to_i2o_device macro
  block: remove 16 bytes of padding from struct request on 64bits
  cfq-iosched: fix a kbuild regression
  block: make CONFIG_BLK_CGROUP visible
  Remove GENHD_FL_DRIVERFS
  block: Export max number of segments and max segment size in sysfs
  block: Finalize conversion of block limits functions
  block: Fix overrun in lcm() and move it to lib
  vfs: improve writeback_inodes_wb()
  paride: fix off-by-one test
  drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
  ...
2010-04-09 11:50:29 -07:00
David Howells
ce82653d6c radix_tree_tag_get() is not as safe as the docs make out [ver #2]
radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
or radix_tree_tag_clear().  The problem is that the double tag_get() in
radix_tree_tag_get():

		if (!tag_get(node, tag, offset))
			saw_unset_tag = 1;
		if (height == 1) {
			int ret = tag_get(node, tag, offset);

may see the value change due to the action of set/clear.  RCU is no protection
against this as no pointers are being changed, no nodes are being replaced
according to a COW protocol - set/clear alter the node directly.

The documentation in linux/radix-tree.h, however, says that
radix_tree_tag_get() is an exception to the rule that "any function modifying
the tree or tags (...) must exclude other modifications, and exclude any
functions reading the tree".

The problem is that the next statement in radix_tree_tag_get() checks that the
tag doesn't vary over time:

			BUG_ON(ret && saw_unset_tag);

This has been seen happening in FS-Cache:

	https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html

To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
comments that the value of the tag may change whilst the RCU read lock is held,
and thus that the return value of radix_tree_tag_get() may not be relied upon
unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
running concurrently with it.

Reported-by: Romain DEGEZ <romain.degez@smartjog.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-09 10:12:03 -07:00
Pekka Enberg
fc1c183353 slab: Generify kernel pointer validation
As suggested by Linus, introduce a kern_ptr_validate() helper that does some
sanity checks to make sure a pointer is a valid kernel pointer.  This is a
preparational step for fixing SLUB kmem_ptr_validate().

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-09 10:09:50 -07:00
Mark Lord
45c4d015a9 libata: Fix accesses at LBA28 boundary (old bug, but nasty) (v2)
Most drives from Seagate, Hitachi, and possibly other brands,
do not allow LBA28 access to sector number 0x0fffffff (2^28 - 1).
So instead use LBA48 for such accesses.

This bug could bite a lot of systems, especially when the user has
taken care to align partitions to 4KB boundaries. On misaligned systems,
it is less likely to be encountered, since a 4KB read would end at
0x10000000 rather than at 0x0fffffff.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-04-08 12:53:57 -04:00
Linus Torvalds
cf90bfe2eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
  ide: Fix IDE taskfile with cfq scheduler
  ide: Must hold queue lock when requeueing
  ide: Requeue request after DMA timeout
2010-04-08 07:45:36 -07:00
John Hughes
f5eb917b86 x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet.
Here is a patch to stop X.25 examining fields beyond the end of the packet.

For example, when a simple CALL ACCEPTED was received:

	10 10 0f

x25_parse_facilities was attempting to decode the FACILITIES field, but this
packet contains no facilities field.

Signed-off-by: John Hughes <john@calva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:29:25 -07:00
Michael S. Tsirkin
b7a413015d virtio: disable multiport console support.
Move MULTIPORT feature and related config changes
out of exported headers, and disable the feature
at runtime.

At this point, it seems less risky to keep code around
until we can enable it than rip it out completely.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-04-08 09:46:19 +09:30
Linus Torvalds
fb1ae63577 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
  x86: Increase CONFIG_NODES_SHIFT max to 10
  ibft, x86: Change reserve_ibft_region() to find_ibft_region()
  x86, hpet: Fix bug in RTC emulation
  x86, hpet: Erratum workaround for read after write of HPET comparator
  bootmem, x86: Fix 32bit numa system without RAM on node 0
  nobootmem, x86: Fix 32bit numa system without RAM on node 0
  x86: Handle overlapping mptables
  x86: Make e820_remove_range to handle all covered case
  x86-32, resume: do a global tlb flush in S4 resume
2010-04-07 11:02:23 -07:00
Linus Torvalds
84db18bbeb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: mixart: range checking proc file
  ALSA: hda - Fix a wrong array range check in patch_realtek.c
  ALSA: ASoC: move dma_data from snd_soc_dai to snd_soc_pcm_stream
  ALSA: hda - Enable amplifiers on Acer Inspire 6530G
  ASoC: Only do WM8994 bias off transition from standby
  ASoC: Don't use DCS_DATAPATH_BUSY for WM hubs devices
  ASoC: Don't do runtime wm_hubs DC servo updates if using offset correction
  ASoC: Support second DC servo readback method for wm_hubs
  ASoC: Avoid wraparound in wm_hubs DC servo correction
  ALSA: echoaudio - Eliminate use after free
  ALSA: i2c: cleanup: change parameter to pointer
  ALSA: hda - Add MSI blacklist for Aopen MZ915-M
  ASoC: OMAP: Fix capture pointer handling for OMAP1510 to work correctly with recent ALSA PCM code
  ALSA: hda - Update document about MSI and interrupts
  ALSA: hda: Fix 0 dB offset for Lenovo Thinkpad models using AD1981
  ALSA: hda - Add missing printk argument in previous patch
  ASoC: Fix passing platform_data to ac97 bus users and fix a leak
  ALSA: hda - Fix ADC/MUX assignment of ALC269 codec
  ALSA: hda - Fix invalid bit values passed to snd_hda_codec_amp_stereo()
  ASoC: wm8994: playback => capture
2010-04-07 08:42:25 -07:00
KAMEZAWA Hiroyuki
8725d54162 memcg: fix race in file_mapped accounting
Presently, memcg's FILE_MAPPED accounting has following race with
move_account (happens at rmdir()).

    increment page->mapcount (rmap.c)
    mem_cgroup_update_file_mapped()           move_account()
					      lock_page_cgroup()
					      check page_mapped() if
					      page_mapped(page)>1 {
						FILE_MAPPED -1 from old memcg
						FILE_MAPPED +1 to old memcg
					      }
					      .....
					      overwrite pc->mem_cgroup
					      unlock_page_cgroup()
    lock_page_cgroup()
    FILE_MAPPED + 1 to pc->mem_cgroup
    unlock_page_cgroup()

Then,
	old memcg (-1 file mapped)
	new memcg (+2 file mapped)

This happens because move_account see page_mapped() which is not guarded
by lock_page_cgroup().  This patch adds FILE_MAPPED flag to page_cgroup
and move account information based on it.  Now, all checks are synchronous
with lock_page_cgroup().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Balbir Singh <balbir@in.ibm.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Andrea Righi <arighi@develer.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07 08:38:05 -07:00
Naoya Horiguchi
116354d177 pagemap: fix pfn calculation for hugepage
When we look into pagemap using page-types with option -p, the value of
pfn for hugepages looks wrong (see below.) This is because pte was
evaluated only once for one vma although it should be updated for each
hugepage.  This patch fixes it.

  $ page-types -p 3277 -Nl -b huge
  voffset   offset  len     flags
  7f21e8a00 11e400  1       ___U___________H_G________________
  7f21e8a01 11e401  1ff     ________________TG________________
               ^^^
  7f21e8c00 11e400  1       ___U___________H_G________________
  7f21e8c01 11e401  1ff     ________________TG________________
               ^^^

One hugepage contains 1 head page and 511 tail pages in x86_64 and each
two lines represent each hugepage.  Voffset and offset mean virtual
address and physical address in the page unit, respectively.  The
different hugepages should not have the same offset value.

With this patch applied:

  $ page-types -p 3386 -Nl -b huge
  voffset   offset   len    flags
  7fec7a600 112c00   1      ___UD__________H_G________________
  7fec7a601 112c01   1ff    ________________TG________________
               ^^^
  7fec7a800 113200   1      ___UD__________H_G________________
  7fec7a801 113201   1ff    ________________TG________________
               ^^^
               OK

More info:

- This patch modifies walk_page_range()'s hugepage walker.  But the
  change only affects pagemap_read(), which is the only caller of hugepage
  callback.

- Without this patch, hugetlb_entry() callback is called per vma, that
  doesn't match the natural expectation from its name.

- With this patch, hugetlb_entry() is called per hugepte entry and the
  callback can become much simpler.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07 08:38:04 -07:00
Yong Zhang
bb1dc0bacb kernel.h: fix wrong usage of __ratelimit()
When __ratelimit() returns 1 this means that we can go ahead.

Signed-off-by: Yong Zhang <yong.zhang@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07 08:38:04 -07:00
Andrew Morton
b1dd3b2843 vfs: rename block_fsync() to blkdev_fsync()
Requested by hch, for consistency now it is exported.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Anton Blanchard <anton@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07 08:38:04 -07:00
Anton Blanchard
55ab3a1ff8 raw: fsync method is now required
Commit 148f948ba8 (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke
the raw driver.

We now call through generic_file_aio_write -> generic_write_sync ->
vfs_fsync_range.  vfs_fsync_range has:

        if (!fop || !fop->fsync) {
                ret = -EINVAL;
                goto out;
        }

But drivers/char/raw.c doesn't set an fsync method.

We have two options: fix it or remove the raw driver completely.  I'm
happy to do either, the fact this has been broken for so long suggests it
is rarely used.

The patch below adds an fsync method to the raw driver.  My knowledge of
the block layer is pretty sketchy so this could do with a once over.

If we instead decide to remove the raw driver, this patch might still be
useful as a backport to 2.6.33 and 2.6.32.

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07 08:38:04 -07:00
David Härdeman
530cd330dc include/linux/kfifo.h: fix INIT_KFIFO()
DECLARE_KFIFO creates a union with a struct kfifo and a buffer array with
size [size + sizeof(struct kfifo)].

INIT_KFIFO then sets the buffer pointer in struct kfifo to point to the
beginning of the buffer array which means that the first call to kfifo_in
will overwrite members of the struct kfifo.

Signed-off-by: David Härdeman <david@hardeman.nu>
Acked-by: Stefani Seibold <stefani@seibold.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07 08:38:02 -07:00
Andrew Morton
b01d0942c2 bitops: remove temporary for_each_bit()
Migration has been completed so remove this now.  There's one straggler in
linux-next's drivers/mtd/sm_ftl.c.  A patch has been sent.

Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07 08:38:01 -07:00
Takashi Iwai
7445c995b0 Merge branch 'fix/asoc' into for-linus 2010-04-07 09:54:41 +02:00
Linus Torvalds
ab195c58b8 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: unlock HPA if device shrunk
  libata: disable NCQ on Crucial C300 SSD
  libata: don't whine on spurious IRQ
2010-04-06 08:36:31 -07:00
Tejun Heo
445d211b0d libata: unlock HPA if device shrunk
Some BIOSes don't configure HPA during boot but do so while resuming.
This causes harddrives to shrink during resume making libata detach
and reattach them.  This can be worked around by unlocking HPA if old
size equals native size.

Add ATA_DFLAG_UNLOCK_HPA so that HPA unlocking can be controlled
per-device and update ata_dev_revalidate() such that it sets
ATA_DFLAG_UNLOCK_HPA and fails with -EIO when the above condition is
detected.

This patch fixes the following bug.

  https://bugzilla.kernel.org/show_bug.cgi?id=15396

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleksandr Yermolenko <yaa.bta@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-04-06 10:55:33 -04:00
Nick Piggin
5fbfb18d7a Fix up possibly racy module refcounting
Module refcounting is implemented with a per-cpu counter for speed.
However there is a race when tallying the counter where a reference may
be taken by one CPU and released by another.  Reference count summation
may then see the decrement without having seen the previous increment,
leading to lower than expected count.  A module which never has its
actual reference drop below 1 may return a reference count of 0 due to
this race.

Module removal generally runs under stop_machine, which prevents this
race causing bugs due to removal of in-use modules.  However there are
other real bugs in module.c code and driver code (module_refcount is
exported) where the callers do not run under stop_machine.

Fix this by maintaining running per-cpu counters for the number of
module refcount increments and the number of refcount decrements.  The
increments are tallied after the decrements, so any decrement seen will
always have its corresponding increment counted.  The final refcount is
the difference of the total increments and decrements, preventing a
low-refcount from being returned.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-05 19:50:02 -07:00
Linus Torvalds
749d229761 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: saving negative to unsigned char
  9p: return on mutex_lock_interruptible()
  9p: Creating files with names too long should fail with ENAMETOOLONG.
  9p: Make sure we are able to clunk the cached fid on umount
  9p: drop nlink remove
  fs/9p: Clunk the fid resulting from partial walk of the name
  9p: documentation update
  9p: Fix setting of protocol flags in v9fs_session_info structure.
2010-04-05 13:42:54 -07:00
Daniel Mack
5f712b2b73 ALSA: ASoC: move dma_data from snd_soc_dai to snd_soc_pcm_stream
This fixes a memory corruption when ASoC devices are used in
full-duplex mode. Specifically for pxa-ssp code, where this pointer
is dynamically allocated for each direction and destroyed upon each
stream start.

All other platforms are fixed blindly, I couldn't even compile-test
them. Sorry for any breakage I may have caused.

[Note that this is a backported version for 2.6.34.
 Upstream commit is fd23b7dee]

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Reported-by: Sven Neumann <s.neumann@raumfeld.com>
Reported-by: Michael Hirsch <m.hirsch@raumfeld.com>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2010-04-05 19:14:11 +01:00
Linus Torvalds
b66696e3c0 Merge branch 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc
* 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
  eeepc-wmi: include slab.h
  staging/otus: include slab.h from usbdrv.h
  percpu: don't implicitly include slab.h from percpu.h
  kmemcheck: Fix build errors due to missing slab.h
  include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
  iwlwifi: don't include iwl-dev.h from iwl-devtrace.h
  x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h

Fix up trivial conflicts in include/linux/percpu.h due to
is_kernel_percpu_address() having been introduced since the slab.h
cleanup with the percpu_up.c splitup.
2010-04-05 09:39:11 -07:00
Linus Torvalds
9e74e7c81a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  module: add stub for is_module_percpu_address
  percpu, module: implement and use is_kernel/module_percpu_address()
  module: encapsulate percpu handling better and record percpu_size
2010-04-05 09:16:37 -07:00
Aneesh Kumar K.V
6d96d3ab7a 9p: Make sure we are able to clunk the cached fid on umount
dcache prune happen on umount. So we cannot mark the client
satus disconnect. That will prevent a 9p call to the server

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-04-05 10:37:36 -05:00
Tejun Heo
336f5899d2 Merge branch 'master' into export-slabh 2010-04-05 11:37:28 +09:00
Linus Torvalds
8ce42c8b7f Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Always build the powerpc perf_arch_fetch_caller_regs version
  perf: Always build the stub perf_arch_fetch_caller_regs version
  perf, probe-finder: Build fix on Debian
  perf/scripts: Tuple was set from long in both branches in python_process_event()
  perf: Fix 'perf sched record' deadlock
  perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernels
  perf, x86: Fix AMD hotplug & constraint initialization
  x86: Move notify_cpu_starting() callback to a later stage
  x86,kgdb: Always initialize the hw breakpoint attribute
  perf: Use hot regs with software sched switch/migrate events
  perf: Correctly align perf event tracing buffer
2010-04-04 12:13:10 -07:00
Dan Carpenter
f11947c7c5 ALSA: i2c: cleanup: change parameter to pointer
We actually pass an array of 7 chars not 5.
This silences a smatch warning.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-04-04 12:21:39 +02:00
Linus Torvalds
5e11611a5d Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 5965/1: Fix soft lockup in at91 udc driver
  ARM: 6006/1: ARM: Use the correct NOP size in memmove for Thumb-2 kernel builds
  ARM: 6005/1: arm: kprobes: fix register corruption with jprobes
  ARM: 6003/1: removing compilation warning from pl061.h
  ARM: 6001/1: removing compilation warning comming from clkdev.h
  ARM: 6000/1: removing compilation warning comming from <asm/irq.h>
  ARM: 5999/1: Including device.h and resource.h header files in linux/amba/bus.h
  ARM: 5997/1: ARM: Correct the VFPv3 detection
  ARM: 5996/1: ARM: Change the mandatory barriers implementation (4/4)
  ARM: 5995/1: ARM: Add L2x0 outer_sync() support (3/4)
  ARM: 5994/1: ARM: Add outer_cache_fns.sync function pointer (2/4)
  ARM: 5993/1: ARM: Move the outer_cache definitions into a separate file (1/4)
2010-04-02 19:50:11 -07:00
Linus Torvalds
24b99d1576 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  Freezer: Fix buggy resume test for tasks frozen with cgroup freezer
  Freezer: Only show the state of tasks refusing to freeze
2010-04-02 19:44:42 -07:00
Peter Zijlstra
371fd7e7a5 sched: Add enqueue/dequeue flags
In order to reduce the dependency on TASK_WAKING rework the enqueue
interface to support a proper flags field.

Replace the int wakeup, bool head arguments with an int flags argument
and create the following flags:

  ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task,
  ENQUEUE_WAKING - the enqueue has relative vruntime due to
                   having sched_class::task_waking() called,
  ENQUEUE_HEAD - the waking task should be places on the head
                 of the priority queue (where appropriate).

For symmetry also convert sched_class::dequeue() to a flags scheme.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:12:05 +02:00
Peter Zijlstra
0017d73509 sched: Fix TASK_WAKING vs fork deadlock
Oleg noticed a few races with the TASK_WAKING usage on fork.

 - since TASK_WAKING is basically a spinlock, it should be IRQ safe
 - since we set TASK_WAKING (*) without holding rq->lock it could
   be there still is a rq->lock holder, thereby not actually
   providing full serialization.

(*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.

Cure the second issue by not setting TASK_WAKING in sched_fork(), but
only temporarily in wake_up_new_task() while calling select_task_rq().

Cure the first by holding rq->lock around the select_task_rq() call,
this will disable IRQs, this however requires that we push down the
rq->lock release into select_task_rq_fair()'s cgroup stuff.

Because select_task_rq_fair() still needs to drop the rq->lock we
cannot fully get rid of TASK_WAKING.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:12:03 +02:00
Oleg Nesterov
9084bb8246 sched: Make select_fallback_rq() cpuset friendly
Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems
with select_fallback_rq(). It can be called from any context and can't use
any cpuset locks including task_lock(). It is called when the task doesn't
have online cpus in ->cpus_allowed but ttwu/etc must be able to find a
suitable cpu.

I am not proud of this patch. Everything which needs such a fat comment
can't be good even if correct. But I'd prefer to not change the locking
rules in the code I hardly understand, and in any case I believe this
simple change make the code much more correct compared to deadlocks we
currently have.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091027.GA9155@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:12:03 +02:00
Oleg Nesterov
6a1bdc1b57 sched: _cpu_down(): Don't play with current->cpus_allowed
_cpu_down() changes the current task's affinity and then recovers it at
the end. The problems are well known: we can't restore old_allowed if it
was bound to the now-dead-cpu, and we can race with the userspace which
can change cpu-affinity during unplug.

_cpu_down() should not play with current->cpus_allowed at all. Instead,
take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
removes the dying cpu from cpu_online_mask.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091023.GA9148@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:12:03 +02:00
Oleg Nesterov
897f0b3c3f sched: Kill the broken and deadlockable cpuset_lock/cpuset_cpus_allowed_locked code
This patch just states the fact the cpusets/cpuhotplug interaction is
broken and removes the deadlockable code which only pretends to work.

- cpuset_lock() doesn't really work. It is needed for
  cpuset_cpus_allowed_locked() but we can't take this lock in
  try_to_wake_up()->select_fallback_rq() path.

- cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
  callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
  stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
  cpuset_lock() and hangs forever because CPU is already dead and thus
  T can't be scheduled.

- cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
  which is not irq-safe, but try_to_wake_up() can be called from irq.

Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
we currently do without CONFIG_CPUSETS.

Also, with or without this patch, with or without CONFIG_CPUSETS, the
callers of select_fallback_rq() can race with each other or with
set_cpus_allowed() pathes.

The subsequent patches try to to fix these problems.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091003.GA9123@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:12:01 +02:00
Ingo Molnar
c9494727cf Merge branch 'linus' into sched/core
Merge reason: update to latest upstream

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:03:08 +02:00
Ingo Molnar
50d11d190a Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent 2010-04-02 19:29:17 +02:00
Yinghai Lu
042be38e61 ibft, x86: Change reserve_ibft_region() to find_ibft_region()
This allows arch code could decide the way to reserve the ibft.

And we should reserve ibft as early as possible, instead of BOOTMEM
stage, in case the table is in RAM range and is not reserved by BIOS
(this will often be the case.)

Move to just after find_smp_config().

Also when CONFIG_NO_BOOTMEM=y, We will not have reserve_bootmem() anymore.

-v2: fix typo about ibft pointed by Konrad Rzeszutek Wilk <konrad@darnok.org>

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4BB510FB.80601@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Jones <pjones@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
CC: Jan Beulich <jbeulich@novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-01 16:12:48 -07:00
Linus Torvalds
42be79e37e Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (76 commits)
  drm/radeon/kms: enable ACPI powermanagement mode on radeon gpus.
  drm/radeon/kms: rs400/480 should set common registers.
  drm/radeon/kms: add sanity check to wptr.
  drm/radeon/kms/evergreen: get DP working
  drm/radeon/kms: add hw_i2c module option
  drm/radeon/kms: use new pre/post_xfer i2c bit algo hooks
  drm/radeon/kms: disable MSI on IGP chips
  drm/radeon/kms: display watermark updates (v2)
  drm/radeon/kms/dp: disable training pattern on the sink at the end of link training
  drm/radeon/kms: minor fixes for eDP with LCD* device tags (v2)
  drm/radeon/kms/dp: remove extraneous training complete call
  drm/radeon/kms/atom: minor fixes to transmitter setup
  drm/radeon/kms: Only restrict BO to visible VRAM size when pinning to VRAM.
  drm: fix build error when SYSRQ is disabled
  drm/radeon/kms: fix macbookpro connector quirk
  drm/radeon/r6xx/r7xx: further safe reg clean up
  drm/radeon: bump the UMS driver version for r6xx/r7xx const buffer support
  drm/radeon/kms: bump the version for r6xx/r7xx const buffer support
  drm/radeon/r6xx/r7xx: CS parser fixes
  drm/radeon/kms: fix some typos in r6xx/r7xx hpd setup
  ...

Fix up MSI-related conflicts in drivers/gpu/drm/radeon/radeon_irq_kms.c
2010-04-01 09:19:42 -07:00
Herbert Xu
6072f7491f ide: Requeue request after DMA timeout
I noticed that my KVM virtual machines were experiencing IDE
issues resulting in processes stuck on waiting for buffers to
complete.

The root cause is of course race conditions in the ancient qemu
backend that I'm using.  However, the fact that the guest isn't
recovering is a bug.

I've tracked it down to the change made last year to dequeue
requests at the start rather than at the end in the IDE layer.

commit 8f6205cd57
Author: Tejun Heo <tj@kernel.org>
Date:   Fri May 8 11:53:59 2009 +0900

    ide: dequeue in-flight request

The problem is that the function ide_dma_timeout_retry does not
requeue the current request, causing one request to be lost for
each DMA timeout.

This patch fixes this by requeueing the request.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-01 01:31:13 -07:00
Frederic Weisbecker
e49a5bd381 perf: Use hot regs with software sched switch/migrate events
Scheduler's task migration events don't work because they always
pass NULL regs perf_sw_event(). The event hence gets filtered
in perf_swevent_add().

Scheduler's context switches events use task_pt_regs() to get
the context when the event occured which is a wrong thing to
do as this won't give us the place in the kernel where we went
to sleep but the place where we left userspace. The result is
even more wrong if we switch from a kernel thread.

Use the hot regs snapshot for both events as they belong to the
non-interrupt/exception based events family. Unlike page faults
or so that provide the regs matching the exact origin of the event,
we need to save the current context.

This makes the task migration event working and fix the context
switch callchains and origin ip.

Example: perf record -a -e cs

Before:

    10.91%      ksoftirqd/0                  0  [k] 0000000000000000
                |
                --- (nil)
                    perf_callchain
                    perf_prepare_sample
                    __perf_event_overflow
                    perf_swevent_overflow
                    perf_swevent_add
                    perf_swevent_ctx_event
                    do_perf_sw_event
                    __perf_sw_event
                    perf_event_task_sched_out
                    schedule
                    run_ksoftirqd
                    kthread
                    kernel_thread_helper

After:

    23.77%  hald-addon-stor  [kernel.kallsyms]  [k] schedule
            |
            --- schedule
               |
               |--60.00%-- schedule_timeout
               |          wait_for_common
               |          wait_for_completion
               |          blk_execute_rq
               |          scsi_execute
               |          scsi_execute_req
               |          sr_test_unit_ready
               |          |
               |          |--66.67%-- sr_media_change
               |          |          media_changed
               |          |          cdrom_media_changed
               |          |          sr_block_media_changed
               |          |          check_disk_change
               |          |          cdrom_open

v2: Always build perf_arch_fetch_caller_regs() now that software
events need that too. They don't need it from modules, unlike trace
events, so we keep the EXPORT_SYMBOL in trace_event_perf.c

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
2010-04-01 08:26:31 +02:00