Commit Graph

26290 Commits

Author SHA1 Message Date
Stephen Warren
a5818a8bd0 pinctrl: get_group_pins() const fixes
get_group_pins() "returns" a pointer to an array of const objects, through
a pointer parameter. Fix the prototype so what's pointed at by the returned
pointer is const, rather than the function parameter being const.

This also allows the removal of a cast in each of the two current pinmux
drivers.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2011-10-20 11:41:49 +02:00
Ian Campbell
30d3c128ea mm: add a "struct page_frag" type containing a page, offset and length
A few network drivers currently use skb_frag_struct for this purpose but I have
patches which add additional fields and semantics there which these other uses
do not want.

A structure for reference sub-page regions seems like a generally useful thing
so do so instead of adding a network subsystem specific structure.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 04:58:32 -04:00
Ian Campbell
a0bec1cd8f net: do not take an additional reference in skb_frag_set_page
I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.

In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:

6a930b9f16 cxgb3: convert to SKB paged frag API.
5dc3e196ea myri10ge: convert to SKB paged frag API.
0e0634d20d vmxnet3: convert to SKB paged frag API.
86ee8130a4 virtionet: convert to SKB paged frag API.
4a22c4c919 sfc: convert to SKB paged frag API.
18324d690d cassini: convert to SKB paged frag API.
b061b39e3a benet: convert to SKB paged frag API.
b7b6a688d2 bnx2: convert to SKB paged frag API.
804cf14ea5 net: xfrm: convert to SKB frag APIs
ea2ab69379 net: convert core to skb paged frag APIs

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:40:39 -04:00
Dan Carpenter
4f25af2782 filter: use unsigned int to silence static checker warning
This is just a cleanup.

My testing version of Smatch warns about this:
net/core/filter.c +380 check_load_and_stores(6)
	warn: check 'flen' for negative values

flen comes from the user.  We try to clamp the values here between 1
and BPF_MAXINSNS but the clamp doesn't work because it could be
negative.  This is a bug, but it's not exploitable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:35:51 -04:00
Eric W. Biederman
672d82c18d class: Implement support for class attrs in tagged sysfs directories.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:24:15 -04:00
Eric W. Biederman
487505c257 sysfs: Implement support for tagged files in sysfs.
Looking up files in sysfs is hard to understand and analyize because we
currently allow placing untagged files in tagged directories.  In the
implementation of that we have two subtly different meanings of NULL.
NULL meaning there is no tag on a directory entry and NULL meaning
we don't care which namespace the lookup is performed for.  This
multiple uses of NULL have resulted in subtle bugs (since fixed)
in the code.

Currently it is only the bonding driver that needs to have an untagged
file in a tagged directory.

To untagle this mess I am adding support for tagged files to sysfs.
Modifying the bonding driver to implement bonding_masters as a tagged
file.  Registering bonding_masters once for each network namespace.
Then I am removing support for untagged entries in tagged sysfs
directories.

Resulting in code that is much easier to reason about.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:24:14 -04:00
Richard Cochran
4dc360c5e7 net: validate HWTSTAMP ioctl parameters
This patch adds a sanity check on the values provided by user space for
the hardware time stamping configuration. If the values lie outside of
the absolute limits, then the ioctl request will be denied.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 17:00:35 -04:00
Trond Myklebust
fba730050d NFS: Don't rely on PageError in nfs_readpage_release_partial
Don't rely on the PageError flag to tell us if one of the partial reads of
the page failed. Instead, replace that with a dedicated flag in the
struct nfs_page.

Then clean out redundant uses of the PageError flag: the VM no longer
checks it for reads.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:58:38 -07:00
Trond Myklebust
b8ef70639b NFS: Get rid of the unused nfs_write_data->flags field
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:37:34 -07:00
Trond Myklebust
a1940805d0 NFS: Get rid of the unused nfs_read_data->flags field
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:37:34 -07:00
Andy Fleming
3d153a7c8b net: Allow skb_recycle_check to be done in stages
skb_recycle_check resets the skb if it's eligible for recycling.
However, there are times when a driver might want to optionally
manipulate the skb data with the skb before resetting the skb,
but after it has determined eligibility.  We do this by splitting the
eligibility check from the skb reset, creating two inline functions to
accomplish that task.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 15:59:45 -04:00
J. Bruce Fields
8b289b2c23 nfsd4: implement new 4.1 open reclaim types
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-19 11:52:12 -04:00
Yevgeny Petrilin
f3a9d1f25d mlx4_en: Controlling FCS header removal
Canceling FCS removal where FW allows for better alignment
of incoming data.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:42:26 -04:00
Eric Dumazet
9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
Trond Myklebust
0c2e53f11a NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup()
...and also remove the associated nfs_v4_clientops entry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 16:13:51 -07:00
Jiri Slaby
fa90e1c935 TTY: make tty_add_file non-failing
If tty_add_file fails at the point it is now, we have to revert all
the changes we did to the tty. It means either decrease all refcounts
if this was a tty reopen or delete the tty if it was newly allocated.

There was a try to fix this in v3.0-rc2 using tty_release in 0259894c7
(TTY: fix fail path in tty_open). But instead it introduced a NULL
dereference. It's because tty_release dereferences
filp->private_data, but that one is set even in our tty_add_file. And
when tty_add_file fails, it's still NULL/garbage. Hence tty_release
cannot be called there.

To circumvent the original leak (and the current NULL deref) we split
tty_add_file into two functions, making the latter non-failing. In
that case we may do the former early in open, where handling failures
is easy. The latter stays as it is now. So there is no change in
functionality.

The original bug (leak) was introduced by f573bd176 (tty: Remove
__GFP_NOFAIL from tty_add_file()). Thanks Dan for reporting this.

Later, we may split tty_release into more functions and call only some
of them in this fail path instead. (If at all possible.)

Introduced-in: v2.6.37-rc2
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-18 14:22:37 -07:00
Thomas Meyer
8193c42906 tty: Support compat_ioctl get/set termios_locked
When running a Fedora 15 (x86) on an x86_64 kernel, in the boot process
plymouthd complains about those two missing ioctls:
[    2.581783] ioctl32(plymouthd:186): Unknown cmd fd(10) cmd(00005457){t:'T';sz:0} arg(ffb6a5d0) on /dev/tty1
[    2.581803] ioctl32(plymouthd:186): Unknown cmd fd(10) cmd(00005456){t:'T';sz:0} arg(ffb6a680) on /dev/tty1

both ioctl functions work on the 'struct termios' resp. 'struct termios2',
which has the same size (36 bytes resp. 44 bytes) on x86 and x86_64,
so it's just a matter of converting the pointer from userland.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-18 14:17:11 -07:00
Jason Baron
07613b0b5e dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions
Replace the repetitive struct _ddebug descriptor definitions with a new
DECLARE_DYNAMIC_DEBUG_META_DATA(name, fmt) macro.

[akpm@linux-foundation.org: s/DECLARE/DEFINE/]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-18 11:22:00 -07:00
Kai Jiang
27a90700a4 uio: Support physical addresses >32 bits on 32-bit systems
To support >32-bit physical addresses for UIO_MEM_PHYS type we need to
extend the width of 'addr' in struct uio_mem.  Numerous platforms like
embedded PPC, ARM, and X86 have support for systems with larger physical
address than logical.

Since 'addr' may contain a physical, logical, or virtual address the
easiest solution is to just change the type to 'phys_addr_t' which
should always be greater than or equal to the sizeof(void *) such that
it can properly hold any of the address types.

For physical address we can support up to a 44-bit physical address on a
typical 32-bit system as we utilize remap_pfn_range() for the mapping of
the memory region and pfn's are represnted by shifting the address by
the page size (typically 4k).

Signed-off-by: Kai Jiang <Kai.Jiang@freescale.com>
Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Hans J. Koch <hjk@hansjkoch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-18 11:18:57 -07:00
Trond Myklebust
a9a4a87a59 NFS: Use the inode->i_version to cache NFSv4 change attribute information
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:14:34 -07:00
Trond Myklebust
d77385f238 SUNRPC: Fix rpc_sockaddr2uaddr
rpc_sockaddr2uaddr is only used by net/sunrpc/rpcb_clnt.c, where
it is used in a non-blockable context in at least one case.

Add non-blocking capability by adding a gfp_t argument

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:13:32 -07:00
Peng Tao
c1225158a8 SUNRPC/NFS: make rpc pipe upcall generic
The same function is used by idmap, gss and blocklayout code. Make it
generic.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:12 -07:00
Marc Kleine-Budde
f861c2b80c can: remove references to berlios mailinglist
The BerliOS project, which currently hosts our mailinglist, will
close with the end of the year. Now take the chance and remove all
occurrences of the mailinglist address from the source files.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-17 19:22:46 -04:00
John W. Linville
41ebe9cde7 Merge branch 'master' of git://git.infradead.org/users/linville/wireless-next into for-davem 2011-10-17 15:05:26 -04:00
Ian Campbell
9bab0b7fba genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlier
This adds a mechanism to resume selected IRQs during syscore_resume
instead of dpm_resume_noirq.

Under Xen we need to resume IRQs associated with IPIs early enough
that the resched IPI is unmasked and we can therefore schedule
ourselves out of the stop_machine where the suspend/resume takes
place.

This issue was introduced by 676dc3cf5b "xen: Use IRQF_FORCE_RESUME".

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk
Cc: stable@kernel.org (at least to 2.6.32.y)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-17 11:42:49 +02:00
Rafael J. Wysocki
2aede851dd PM / Hibernate: Freeze kernel threads after preallocating memory
There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers.  Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out.  Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation.  If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Unfortunately, this requires the memory preallocation to be done
before the "prepare" stage of device freeze, so after this change the
only way drivers can allocate additional memory for their freeze
routines in a clean way is to use PM notifiers.

Reported-by: Christoph <cr2005@u-club.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:28:52 +02:00
H Hartley Sweeten
37cce26b32 PM / VT: Cleanup #if defined uglyness and fix compile error
Introduce the config option CONFIG_VT_CONSOLE_SLEEP in order to cleanup
the #if defined ugliness for the vt suspend support functions. Note that
CONFIG_VT_CONSOLE is already dependant on CONFIG_VT.

The function pm_set_vt_switch is actually dependant on CONFIG_VT and not
CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is
not set:

drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch'
include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here

Also, remove the incorrect path from the comment in console.c.

[rjw: Replaced #if defined() with #ifdef in suspend.h.]

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:28:51 +02:00
Martin Schwidefsky
85055dd805 PM / Hibernate: Include storage keys in hibernation image on s390
For s390 there is one additional byte associated with each page,
the storage key. This byte contains the referenced and changed
bits and needs to be included into the hibernation image.
If the storage keys are not restored to their previous state all
original pages would appear to be dirty. This can cause
inconsistencies e.g. with read-only filesystems.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:27:46 +02:00
ShuoX Liu
2a77c46de1 PM / Suspend: Add statistics debugfs file for suspend to RAM
Record S3 failure time about each reason and the latest two failed
devices' names in S3 progress.
We can check it through 'suspend_stats' entry in debugfs.

The motivation of the patch:

We are enabling power features on Medfield. Comparing with PC/notebook,
a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far
more frequently. If it can't enter suspend-2-ram in time, the power
might be used up soon.

We often find sometimes, a device suspend fails. Then, system retries
s3 over and over again. As display is off, testers and developers
don't know what happens.

Some testers and developers complain they don't know if system
tries suspend-2-ram, and what device fails to suspend. They need
such info for a quick check. The patch adds suspend_stats under
debugfs for users to check suspend to RAM statistics quickly.

If not using this patch, we have other methods to get info about
what device fails. One is to turn on  CONFIG_PM_DEBUG, but users
would get too much info and testers need recompile the system.

In addition, dynamic debug is another good tool to dump debug info.
But it still doesn't match our utilization scenario closely.
1) user need write a user space parser to process the syslog output;
2) Our testing scenario is we leave the mobile for at least hours.
   Then, check its status. No serial console available during the
   testing. One is because console would be suspended, and the other
   is serial console connecting with spi or HSU devices would consume
   power. These devices are powered off at suspend-2-ram.

Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:27:45 +02:00
Greg Rose
5f8444a3fa if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
Add configuration setting for drivers to turn spoof checking on or off
for discrete VFs.

v2 - Fix indentation problem, wrap the ifla_vf_info structure in
     #ifdef __KERNEL__ to prevent user space from accessing and
     change function paramater for the spoof check setting netdev
     op from u8 to bool.
v3 - Preset spoof check setting to -1 so that user space tools such
     as ip can detect that the driver didn't report a spoofcheck
     setting.  Prevents incorrect display of spoof check settings
     for drivers that don't report it.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-10-16 13:15:38 -07:00
Helmut Schaa
bb6e753e95 nl80211: Add sta_flags to the station info
Reuse the already existing struct nl80211_sta_flag_update to specify
both, a flag mask and the flag set itself. This means
nl80211_sta_flag_update is now used for setting station flags and also
for getting station flags.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-10-14 14:48:23 -04:00
Eric Dumazet
87fb4b7b53 net: more accurate skb truesize
skb truesize currently accounts for sk_buff struct and part of skb head.
kmalloc() roundings are also ignored.

Considering that skb_shared_info is larger than sk_buff, its time to
take it into account for better memory accounting.

This patch introduces SKB_TRUESIZE(X) macro to centralize various
assumptions into a single place.

At skb alloc phase, we put skb_shared_info struct at the exact end of
skb head, to allow a better use of memory (lowering number of
reallocations), since kmalloc() gives us power-of-two memory blocks.

Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
aligned to cache lines, as before.

Note: This patch might trigger performance regressions because of
misconfigured protocol stacks, hitting per socket or global memory
limits that were previously not reached. But its a necessary step for a
more accurate memory accounting.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <ak@linux.intel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-13 16:05:07 -04:00
Neil Zhang
dde34cc501 usb: gadget: mv_udc: refine the driver structure
This patch do the following things:

1. Add header and Copyright for marvell usb driver.
2. Add mv_usb.h in include/linux/platform_data, make the driver
   fits all the marvell platform using the same ChipIdea usb ip.
3. Some SOC may has mutiple clock sources, so let me define it
   in mv_usb_platform_data and give two helper functions named
   udc_clock_enable/udc_clock_disable to deal with the clocks.
4. Different SOCs will have some difference in PHY initialization,
   so we will remove file mv_udc_phy.c and add two funtions in
   mv_usb_platform_data, let the platform relative driver to realize it.
5. Rewrite probe function according to the modification list above. Find
   it will kernel panic when probe failed. The root cause is as follows:
	When probe failed, the error handle may call device_unregister()
	which in return will call gadget_release.In current code,
	gadget_release have two issues:
		1: the_controller is a NULL pointer.
		2: if we free udc here, then the following code in probe
		   will access NULL pointer.

Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:41:56 +03:00
Kuninori Morimoto
f427eb64f4 usb: gadget: renesas_usbhs: support otg pin control
some renesas_usbhs device is supporting OTG external device interface.
In that device, it is necessary to control PWEN/EXTLP on DVSTCTR.
This patch support it.
But renesas_usbhs driver doesn't have OTG support for now.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:41:47 +03:00
Kuninori Morimoto
258485d990 usb: gadget: renesas_usbhs: add bus control functions
this patch add DVSTCTR control function for HOST support

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:41:38 +03:00
Kuninori Morimoto
11935de557 usb: gadget: renesas_usbhs: change usbhsc_bus_ctrl() to usbsc_set_buswait()
renesas_usbhs will have register DVSTCTR control function for HOST support.
This patch changes usbhsc_bus_ctrl() to usbsc_set_buswait(),
to remove DVSTCTR access from it,

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:41:37 +03:00
Felipe Balbi
089b837a39 usb: gadget: fix typo for default U1/U2 exit latencies
s/DEFULT/DEFAULT/, no functional changes.

Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:39:59 +03:00
Yoshihiro Shimoda
b8a56e17e1 usb: gadget: r8a66597-udc: add support for SUDMAC
SH7757 has a USB function with internal DMA controller (SUDMAC).
This patch supports the SUDMAC. The SUDMAC is incompatible with
general-purpose DMAC. So, it doesn't use dmaengine.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:38:39 +03:00
Linus Walleij
2744e8afb3 drivers: create a pin control subsystem
This creates a subsystem for handling of pin control devices.
These are devices that control different aspects of package
pins.

Currently it handles pinmuxing, i.e. assigning electronic
functions to groups of pins on primarily PGA and BGA type of
chip packages which are common in embedded systems.

The plan is to also handle other I/O pin control aspects
such as biasing, driving, input properties such as
schmitt-triggering, load capacitance etc within this
subsystem, to remove a lot of ARM arch code as well as
feature-creepy GPIO drivers which are implementing the same
thing over and over again.

This is being done to depopulate the arch/arm/* directory
of such custom drivers and try to abstract the infrastructure
they all need. See the Documentation/pinctrl.txt file that is
part of this patch for more details.

ChangeLog v1->v2:

- Various minor fixes from Joe's and Stephens review comments
- Added a pinmux_config() that can invoke custom configuration
  with arbitrary data passed in or out to/from the pinmux driver

ChangeLog v2->v3:

- Renamed subsystem folder to "pinctrl" since we will likely
  want to keep other pin control such as biasing in this
  subsystem too, so let us keep to something generic even though
  we're mainly doing pinmux now.
- As a consequence, register pins as an abstract entity separate
  from the pinmux. The muxing functions will claim pins out of the
  pin pool and make sure they do not collide. Pins can now be
  named by the pinctrl core.
- Converted the pin lookup from a static array into a radix tree,
  I agreed with Grant Likely to try to avoid any static allocation
  (which is crap for device tree stuff) so I just rewrote this
  to be dynamic, just like irq number descriptors. The
  platform-wide definition of number of pins goes away - this is
  now just the sum total of the pins registered to the subsystem.
- Make sure mappings with only a function name and no device
  works properly.

ChangeLog v3->v4:

- Define a number space per controller instead of globally,
  Stephen and Grant requested the same thing so now maps need to
  define target controller, and the radix tree of pin descriptors
  is a property on each pin controller device.
- Add a compulsory pinctrl device entry to the pinctrl mapping
  table. This must match the pinctrl device, like "pinctrl.0"
- Split the file core.c in two: core.c and pinmux.c where the
  latter carry all pinmux stuff, the core is for generic pin
  control, and use local headers to access functionality between
  files. It is now possible to implement a "blank" pin controller
  without pinmux capabilities. This split will make new additions
  like pindrive.c, pinbias.c etc possible for combined drivers
  and chunks of functionality which is a GoodThing(TM).
- Rewrite the interaction with the GPIO subsystem - the pin
  controller descriptor now handles this by defining an offset
  into the GPIO numberspace for its handled pin range. This is
  used to look up the apropriate pin controller for a GPIO pin.
  Then that specific GPIO range is matched 1-1 for the target
  controller instance.
- Fixed a number of review comments from Joe Perches.
- Broke out a header file pinctrl.h for the core pin handling
  stuff that will be reused by other stuff than pinmux.
- Fixed some erroneous EXPORT() stuff.
- Remove mispatched U300 Kconfig and Makefile entries
- Fixed a number of review comments from Stephen Warren, not all
  of them - still WIP. But I think the new mapping that will
  specify which function goes to which pin mux controller address
  50% of your concerns (else beat me up).

ChangeLog v4->v5:

- Defined a "position" for each function, so the pin controller now
  tracks a function in a certain position, and the pinmux maps define
  what position you want the function in. (Feedback from Stephen
  Warren and Sascha Hauer).
- Since we now need to request a combined function+position from
  the machine mapping table that connect mux settings to drivers,
  it was extended with a position field and a name field. The
  name field is now used if you e.g. need to switch between two
  mux map settings at runtime.
- Switched from a class device to using struct bus_type for this
  subsystem. Verified sysfs functionality: seems to work fine.
  (Feedback from Arnd Bergmann and Greg Kroah-Hartman)
- Define a per pincontroller list of GPIO ranges from the GPIO
  pin space that can be handled by the pin controller. These can
  be added one by one at runtime. (Feedback from Barry Song)
- Expanded documentation of regulator_[get|enable|disable|put]
  semantics.
- Fixed a number of review comments from Barry Song. (Thanks!)

ChangeLog v5->v6:

- Create an abstract pin group concept that can sort pins into
  named and enumerated groups no matter what the use of these
  groups may be, one possible usecase is a group of pins being
  muxed in or so. The intention is however to also use these
  groups for other pin control activities.
- Make it compulsory for pinmux functions to associate with
  at least one group, so the abstract pin group concept is used
  to define the groups of pins affected by a pinmux function.
  The pinmux driver interface has been altered so as to enforce
  a function to list applicable groups per function.
- Provide an optional .group entry in the pinmux machine map
  so the map can select beteween different available groups
  to be used with a certain function.
- Consequent changes all over the place so that e.g. debugfs
  present reasonable information about the world.
- Drop the per-pin mux (*config) function in the pinmux_ops
  struct - I was afraid that this would start to be used for
  things totally unrelated to muxing, we can introduce that to
  the generic struct pinctrl_ops if needed. I want to keep
  muxing orthogonal to other pin control subjects and not mix
  these things up.

ChangeLog v6->v7:

- Make it possible to have several map entries matching the
  same device, pin controller and function, but using
  a different group, and alter the semantics so that
  pinmux_get() will pick all matching map entries, and
  store the associated groups in a list. The list will
  then be iterated over at pinmux_enable()/pinmux_disable()
  and corresponding driver functions called for each
  defined group. Notice that you're only allowed to map
  multiple *groups* to the same
  { device, pin controller, function } triplet, attempts
  to map the same device to multiple pin controllers will
  for example fail. This is hopefully the crucial feature
  requested by Stephen Warren.
- Add a pinmux hogging field to the pinmux mapping entries,
  and enable the pinmux core to hog pinmux map entries.
  This currently only works for pinmuxes without assigned
  devices as it looks now, but with device trees we can
  look up the corresponding struct device * entries when
  we register the pinmux driver, and have it hog each
  pinmux map in turn, for a simple approach to
  non-dynamic pin muxing. This addresses an issue from
  Grant Likely that the machine should take care of as
  much of the pinmux setup as possible, not the devices.
  By supplying a list of hogs, it can now instruct the
  core to take care of any static mappings.
- Switch pinmux group retrieveal function to grab an
  array of strings representing the groups rather than an
  array of unsigned and rewrite accordingly.
- Alter debugfs to show the grouplist handled by each
  pinmux. Also add a list of hogs.
- Dynamically allocate a struct pinmux at pinmux_get() and
  free it at pinmux_put(), then add these to the global
  list of pinmuxes active as we go along.
- Go over the list of pinmux maps at pinmux_get() time
  and repeatedly apply matches.
- Retrieve applicable groups per function from the driver
  as a string array rather than a unsigned array, then
  lookup the enumerators.
- Make the device to pinmux map a singleton - only allow the
  mapping table to be registered once and even tag the
  registration function with __init so it surely won't be
  abused.
- Create a separate debugfs file to view the pinmux map at
  runtime.
- Introduce a spin lock to the pin descriptor struct, lock it
  when modifying pin status entries. Reported by Stijn Devriendt.
- Fix up the documentation after review from Stephen Warren.
- Let the GPIO ranges give names as const char * instead of some
  fixed-length string.
- add a function to unregister GPIO ranges to mirror the
  registration function.
- Privatized the struct pinctrl_device and removed it from the
  <linux/pinctrl/pinctrl.h> API, the drivers do not need to know
  the members of this struct. It is now in the local header
  "core.h".
- Rename the concept of "anonymous" mux maps to "system" muxes
  and add convenience macros and documentation.

ChangeLog v7->v8:

- Delete the leftover pinmux_config() function from the
 <linux/pinctrl/pinmux.h> header.
- Fix a race condition found by Stijn Devriendt in pin_request()

ChangeLog v8->v9:

- Drop the bus_type and the sysfs attributes and all, we're not on
  the clear about how this should be used for e.g. userspace
  interfaces so let us save this for the future.
- Use the right name in MAINTAINERS, PIN CONTROL rather than
  PINMUX
- Don't kfree() the device state holder, let the .remove() callback
  handle this.
- Fix up numerous kerneldoc headers to have one line for the function
  description and more verbose documentation below the parameters

ChangeLog v9->v10:
- pinctrl: EXPORT_SYMBOL needs export.h, folded in a patch
  from Steven Rothwell
- fix pinctrl_register error handling, folded in a patch from
  Axel Lin
- Various fixes to documentation text so that it's consistent.
- Removed pointless comment from drivers/Kconfig
- Removed dependency on SYSFS since we removed the bus in
  v9.
- Renamed hopelessly abbreviated pctldev_* functions to the
  more verbose pinctrl_dev_*
- Drop mutex properly when looking up GPIO ranges
- Return NULL instead of ERR_PTR() errors on registration of
  pin controllers, using cast pointers is fragile. We can
  live without the detailed error codes for sure.

Cc: Stijn Devriendt <highguy@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2011-10-13 12:49:17 +02:00
Murali Raja
3ceca74966 net-netlink: Add a new attribute to expose TOS values via netlink
This patch exposes the tos value for the TCP sockets when the TOS flag
is requested in the ext_flags for the inet_diag request. This would mainly be
used to expose TOS values for both for TCP and UDP sockets. Currently it is
supported for TCP. When netlink support for UDP would be added the support
to expose the TOS values would alse be done. For IPV4 tos value is exposed
and for IPV6 tclass value is exposed.

Signed-off-by: Murali Raja <muralira@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-12 19:09:18 -04:00
Ingo Molnar
910e94dd0c Merge branch 'tip/perf/core' of git://github.com/rostedt/linux into perf/core 2011-10-12 17:14:47 +02:00
John W. Linville
094daf7db7 Merge branch 'master' of git://git.infradead.org/users/linville/wireless-next into for-davem
Conflicts:
	Documentation/feature-removal-schedule.txt
2011-10-11 15:35:42 -04:00
Greg Kroah-Hartman
15b80d6417 hv: remove struct hv_device_info from hyperv.h
This is only used/needed by the vmbus core code, so move it out of the
hyperv.h file and into the .c file that uses it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 09:51:22 -06:00
Greg Kroah-Hartman
9f3e28e375 hv: remove free_channel() from hyperv.h
This function is only used in the file it is declared in
(channel_mgmt.c) so make it static and remove it from the hyperv.h file.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 09:51:22 -06:00
Greg Kroah-Hartman
2726f95e0b hv: hyperv.h: remove unneeded forward declarations of structures
This file was created by mushing different .h files together and it
shows.  This change removes some unneeded forward declarations.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 09:51:22 -06:00
Greg Kroah-Hartman
7a4ba88cc1 hv: hyperv.h: remove unused module macros
I have no idea what these were ever for, but they aren't used, so delete
them.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 09:51:22 -06:00
Greg Kroah-Hartman
5557e8a605 hv: remove unused LOWORD and HIWORD macros from hyperv.h
They aren't used anywhere anymore now that the debugging macros are
gone, so remove it from hyperv.h as well.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 09:51:22 -06:00
Greg Kroah-Hartman
815166b95d Staging: hv: remove vmbus_loglevel as it is not used at all anymore
As there is no user of this variable, it's time to delete it.  For
dynamic debugging of the hyperv code, use the standard dynamic debug
kernel interface.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 09:51:21 -06:00
Greg Kroah-Hartman
1a2643012f Staging: hv: remove last user of DPRINT() macro
This also removed the unused function hv_dump_ring_info().

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 09:51:21 -06:00
Greg Kroah-Hartman
d181daa06d Staging: hv: storvsc: remove last usage of DPRINT_WARN
Used the correct dev_warn() call instead.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 09:51:21 -06:00
Greg Kroah-Hartman
a832a1eba9 hv: remove a bunch of unused debug macros from hyperv.h
These aren't used by anyone anymore, so remove them before someone tries
to use them again.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 08:49:19 -06:00
Greg Kroah-Hartman
da0e96315c hv: rename prep_negotiate_resp() to vmbus_prep_negotiate_resp()
It's a global symbol, so properly prefix it and use the proper EXPORT
value as well.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 08:49:19 -06:00
Greg Kroah-Hartman
407dd16443 Staging: hv: remove unneeded asm include file in hyperv.h
No one outside of the hyperv core needs to include the asm/hyperv.h
file, so don't put it in the "global" include/linux/hyperv.h file.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 08:49:19 -06:00
Stephen Rothwell
540f41edc1 llist: Add back llist_add_batch() and llist_del_first() prototypes
Commit 1230db8e15 ("llist: Make some llist functions inline")
has deleted the definitions, causing problems for (not upstream yet)
code that tries to make use of them.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: David Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20111005172528.0d0a8afc65acef7ace22a24e@canb.auug.org.au
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-11 12:51:22 +02:00
Greg Kroah-Hartman
46a9719136 Staging: hv: move hyperv code out of staging directory
After many years wandering the desert, it is finally time for the
Microsoft HyperV code to move out of the staging directory.  Or at least
the core hyperv bus code, and the utility driver, the rest still have
some review to get through by the various subsystem maintainers.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
2011-10-10 22:52:55 -06:00
Harro Haan
276532ba96 USB: fix ehci alignment error
The Kirkwood gave an unaligned memory access error on
line 742 of drivers/usb/host/echi-hcd.c:
"ehci->last_periodic_enable = ktime_get_real();"

Signed-off-by: Harro Haan <hrhaan@gmail.com>
Cc: stable <stable@kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-10 16:43:53 -07:00
Rafael J. Wysocki
7811ac276b Merge branch 'pm-devfreq' into pm-for-linus
* pm-devfreq:
  PM / devfreq: Add basic governors
  PM / devfreq: Add common sysfs interfaces
  PM: Introduce devfreq: generic DVFS framework with device-specific OPPs
  PM / OPP: Add OPP availability change notifier.
2011-10-07 23:17:18 +02:00
Rafael J. Wysocki
9696cc9007 Merge branch 'pm-qos' into pm-for-linus
* pm-qos:
  PM / QoS: Update Documentation for the pm_qos and dev_pm_qos frameworks
  PM / QoS: Add function dev_pm_qos_read_value() (v3)
  PM QoS: Add global notification mechanism for device constraints
  PM QoS: Implement per-device PM QoS constraints
  PM QoS: Generalize and export constraints management code
  PM QoS: Reorganize data structs
  PM QoS: Code reorganization
  PM QoS: Minor clean-ups
  PM QoS: Move and rename the implementation files
2011-10-07 23:17:07 +02:00
Rafael J. Wysocki
c28b56b1d4 Merge branch 'pm-domains' into pm-for-linus
* pm-domains:
  PM / Domains: Split device PM domain data into base and need_restore
  ARM: mach-shmobile: sh7372 sleep warning fixes
  ARM: mach-shmobile: sh7372 A3SM support
  ARM: mach-shmobile: sh7372 generic suspend/resume support
  PM / Domains: Preliminary support for devices with power.irq_safe set
  PM: Move clock-related definitions and headers to separate file
  PM / Domains: Use power.sybsys_data to reduce overhead
  PM: Reference counting of power.subsys_data
  PM: Introduce struct pm_subsys_data
  ARM / shmobile: Make A3RV be a subdomain of A4LC on SH7372
  PM / Domains: Rename argument of pm_genpd_add_subdomain()
  PM / Domains: Rename GPD_STATE_WAIT_PARENT to GPD_STATE_WAIT_MASTER
  PM / Domains: Allow generic PM domains to have multiple masters
  PM / Domains: Add "wait for parent" status for generic PM domains
  PM / Domains: Make pm_genpd_poweron() always survive parent removal
  PM / Domains: Do not take parent locks to modify subdomain counters
  PM / Domains: Implement subdomain counters as atomic fields
2011-10-07 23:17:02 +02:00
Rafael J. Wysocki
d727b60659 Merge branch 'pm-runtime' into pm-for-linus
* pm-runtime:
  PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set
  PM / Runtime: Replace dev_dbg() with trace_rpm_*()
  PM / Runtime: Introduce trace points for tracing rpm_* functions
  PM / Runtime: Don't run callbacks under lock for power.irq_safe set
  USB: Add wakeup info to debugging messages
  PM / Runtime: pm_runtime_idle() can be called in atomic context
  PM / Runtime: Add macro to test for runtime PM events
  PM / Runtime: Add might_sleep() to runtime PM functions
2011-10-07 23:16:55 +02:00
David S. Miller
88c5100c28 Merge branch 'master' of github.com:davem330/net
Conflicts:
	net/batman-adv/soft-interface.c
2011-10-07 13:38:43 -04:00
Linus Torvalds
6367f1775e Merge branch 'for-linus' of http://people.redhat.com/agk/git/linux-dm
* 'for-linus' of http://people.redhat.com/agk/git/linux-dm:
  dm crypt: always disable discard_zeroes_data
  dm: raid fix write_mostly arg validation
  dm table: avoid crash if integrity profile changes
  dm: flakey fix corrupt_bio_byte error path
2011-10-06 08:31:47 -07:00
Joerg Roedel
a240f76165 perf, core: Introduce attrs to count in either host or guest mode
The two new attributes exclude_guest and exclude_host can
bes used by user-space to tell the kernel to setup
performance counter to either only count while the CPU is in
guest or in host mode.

An additional check is also introduced to make sure
user-space does not try to exclude guest and host mode from
counting.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317816084-18026-2-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-06 13:00:28 +02:00
Ingo Molnar
9243a169ac Merge commit 'v3.1-rc9' into sched/core
Merge reason: pick up latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-06 12:43:35 +02:00
Jamie Iles
a1330228f9 dw_apb_timer: constify clocksource name
The clocksource name should be const for correctness.

Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-10-04 13:08:18 -07:00
Rafael J. Wysocki
1a9a91525d PM / QoS: Add function dev_pm_qos_read_value() (v3)
To read the current PM QoS value for a given device we need to
make sure that the device's power.constraints object won't be
removed while we're doing that.  For this reason, put the
operation under dev->power.lock and acquire the lock
around the initialization and removal of power.constraints.

Moreover, since we're using the value of power.constraints to
determine whether or not the object is present, the
power.constraints_state field isn't necessary any more and may be
removed.  However, dev_pm_qos_add_request() needs to check if the
device is being removed from the system before allocating a new
PM QoS constraints object for it, so make it use the
power.power_state field of struct device for this purpose.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-04 21:54:26 +02:00
Linus Torvalds
8a04b45367 Merge git://github.com/davem330/net
* git://github.com/davem330/net:
  pch_gbe: Fixed the issue on which a network freezes
  pch_gbe: Fixed the issue on which PC was frozen when link was downed.
  make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
  net: xen-netback: correctly restart Tx after a VM restore/migrate
  bonding: properly stop queuing work when requested
  can bcm: fix incomplete tx_setup fix
  RDSRDMA: Fix cleanup of rds_iw_mr_pool
  net: Documentation: Fix type of variables
  ibmveth: Fix oops on request_irq failure
  ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
  cxgb4: Fix EEH on IBM P7IOC
  can bcm: fix tx_setup off-by-one errors
  MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
  dp83640: reduce driver noise
  ptp: fix L2 event message recognition
2011-10-04 10:37:06 -07:00
Jon Mason
5f39e6705f PCI: Disable MPS configuration by default
Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults.  Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.

Also, add the option for peer to peer DMA MPS configuration.  Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another.  To work around this, simply make the system
wide MPS the smallest possible value (128B).

Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-04 09:52:28 -07:00
Peter Zijlstra
f0f1d32f93 llist: Remove cpu_relax() usage in cmpxchg loops
Initial benchmarks show they're a net loss:

 $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
 $ echo 4096 32000 64 128 > /proc/sys/kernel/sem
 $ ./sembench -t 2048 -w 1900 -o 0

Pre:

 run time 30 seconds 778936 worker burns per second
 run time 30 seconds 912190 worker burns per second
 run time 30 seconds 817506 worker burns per second
 run time 30 seconds 830870 worker burns per second
 run time 30 seconds 845056 worker burns per second

Post:

 run time 30 seconds 905920 worker burns per second
 run time 30 seconds 849046 worker burns per second
 run time 30 seconds 886286 worker burns per second
 run time 30 seconds 822320 worker burns per second
 run time 30 seconds 900283 worker burns per second

So about 4% faster. (!)

cpu_relax() stalls the pipeline, therefore, when used in a tight loop
it has the following benefits:

 - allows SMT siblings to have a go;
 - reduces pressure on the CPU interconnect.

However, cmpxchg loops are unfair and thus have unbounded completion
time, therefore we should avoid getting in such heavily contended
situations where the above benefits make any difference.

A typical cmpxchg loop should not go round more than a handfull of
times at worst, therefore adding extra delays just slows things down.

Since the llist primitives are new, there aren't any bad users yet,
and we should avoid growing them. Heavily contended sites should
generally be better off using the ticket locks for serialization since
they provide bounded completion times (fifo-fair over the cpus).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1315836358.26517.43.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 12:44:03 +02:00
Peter Zijlstra
fa14ff4acc sched: Convert to struct llist
Use the generic llist primitives.

We had a private lockless list implementation in the scheduler in the wake-list
code, now that we have a generic llist implementation that provides all required
operations, switch to it.

This patch is not expected to change any behavior.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1315836353.26517.42.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 12:43:58 +02:00
Peter Zijlstra
924f8f5af3 llist: Add llist_next()
So we don't have to expose the struct list_node member.

Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315836348.26517.41.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 12:43:53 +02:00
Huang Ying
38aaf8090d irq_work: Use llist in the struct irq_work logic
Use llist in irq_work instead of the lock-less linked list
implementation in irq_work to avoid the code duplication.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-6-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 12:43:49 +02:00
Huang Ying
781f7fd916 llist: Return whether list is empty before adding in llist_add()
Extend the llist_add*() functions to return a success indicator, this
allows us in the scheduler code to send an IPI if the queue was empty.

( There's no effect on existing users, because the list_add_xxx() functions
  are inline, thus this will be optimized out by the compiler if not used
  by callers. )

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-5-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 12:43:44 +02:00
Huang Ying
a3127336b7 llist: Move cpu_relax() to after the cmpxchg()
If in llist_add()/etc. functions the first cmpxchg() call succeeds, it is
not necessary to use cpu_relax() before the cmpxchg(). So cpu_relax() in
a busy loop involving cmpxchg() should go after cmpxchg() instead of before
that.

This patch fixes this for all involved llist functions.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-4-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 12:43:39 +02:00
Ingo Molnar
2c30245c65 llist: Remove the platform-dependent NMI checks
Remove the nmi() checks spread around the code. in_nmi() is not available
on every architecture and it's a pretty obscure and ugly check in any case.

Cc: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-3-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 12:43:11 +02:00
Huang Ying
1230db8e15 llist: Make some llist functions inline
Because llist code will be used in performance critical scheduler
code path, make llist_add() and llist_del_all() inline to avoid
function calling overhead and related 'glue' overhead.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-2-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 11:30:53 +02:00
Ingo Molnar
22f92bacbe Merge branch 'linus' into sched/core
Merge reason: pick up the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 11:09:08 +02:00
Sangwook Lee
e209c5a7ed net:rfkill: add a gpio setup function into GPIO rfkill
Add a gpio setup function which gives a chance to set up
platform specific configuration such as pin multiplexing,
input/output direction at the runtime or booting time.

Signed-off-by: Sangwook Lee <sangwook.lee@linaro.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-10-03 15:19:19 -04:00
Vasily Averin
349d2895cc ipv4: NET_IPV4_ROUTE_GC_INTERVAL removal
removing obsoleted sysctl,
ip_rt_gc_interval variable no longer used since 2.6.38

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03 14:13:01 -04:00
Jiří Župka
96c131842a Repair wrong named definition aligned_u64
This repairs problem with compile library in userspace (libnl).

Signed-off-by: Jiří Župka <jzupka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03 14:03:48 -04:00
Eric Dumazet
b5c5693bb7 tcp: report ECN_SEEN in tcp_info
Allows ss command (iproute2) to display "ecnseen" if at least one packet
with ECT(0) or ECT(1) or ECN was received by this socket.

"ecn" means ECN was negotiated at session establishment (TCP level)

"ecnseen" means we received at least one packet with ECT fields set (IP
level)

ss -i
...
ESTAB      0      0   192.168.20.110:22  192.168.20.144:38016
ino:5950 sk:f178e400
	 mem:(r0,w0,f0,t0) ts sack ecn ecnseen bic wscale:7,8 rto:210
rtt:12.5/7.5 cwnd:10 send 9.3Mbps rcv_space:14480

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03 14:01:21 -04:00
Marc Zyngier
1e7c5fd294 genirq: percpu: allow interrupt type to be set at enable time
As request_percpu_irq() doesn't allow for a percpu interrupt to have
its type configured (it is generally impossible to configure it on all
CPUs at once), add a 'type' argument to enable_percpu_irq().

This allows some low-level, board specific init code to be switched to
a generic API.

[ tglx: Added WARN_ON argument ]

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-03 15:35:27 +02:00
Marc Zyngier
31d9d9b6d8 genirq: Add support for per-cpu dev_id interrupts
The ARM GIC interrupt controller offers per CPU interrupts (PPIs),
which are usually used to connect local timers to each core. Each CPU
has its own private interface to the GIC, and only sees the PPIs that
are directly connect to it.

While these timers are separate devices and have a separate interrupt
line to a core, they all use the same IRQ number.

For these devices, request_irq() is not the right API as it assumes
that an IRQ number is visible by a number of CPUs (through the
affinity setting), but makes it very awkward to express that an IRQ
number can be handled by all CPUs, and yet be a different interrupt
line on each CPU, requiring a different dev_id cookie to be passed
back to the handler.

The *_percpu_irq() functions is designed to overcome these
limitations, by providing a per-cpu dev_id vector:

int request_percpu_irq(unsigned int irq, irq_handler_t handler,
		   const char *devname, void __percpu *percpu_dev_id);
void free_percpu_irq(unsigned int, void __percpu *);
int setup_percpu_irq(unsigned int irq, struct irqaction *new);
void remove_percpu_irq(unsigned int irq, struct irqaction *act);
void enable_percpu_irq(unsigned int irq);
void disable_percpu_irq(unsigned int irq);

The API has a number of limitations:
- no interrupt sharing
- no threading
- common handler across all the CPUs

Once the interrupt is requested using setup_percpu_irq() or
request_percpu_irq(), it must be enabled by each core that wishes its
local interrupt to be delivered.

Based on an initial patch by Thomas Gleixner.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-03 15:35:26 +02:00
MyungJoo Ham
ce26c5bb95 PM / devfreq: Add basic governors
Four cpufreq-like governors are provided as examples.

powersave: use the lowest frequency possible. The user (device) should
set the polling_ms as 0 because polling is useless for this governor.

performance: use the highest freqeuncy possible. The user (device)
should set the polling_ms as 0 because polling is useless for this
governor.

userspace: use the user specified frequency stored at
devfreq.user_set_freq. With sysfs support in the following patch, a user
may set the value with the sysfs interface.

simple_ondemand: simplified version of cpufreq's ondemand governor.

When a user updates OPP entries (enable/disable/add), OPP framework
automatically notifies devfreq to update operating frequency
accordingly. Thus, devfreq users (device drivers) do not need to update
devfreq manually with OPP entry updates or set polling_ms for powersave
, performance, userspace, or any other "static" governors.

Note that these are given only as basic examples for governors and any
devices with devfreq may implement their own governors with the drivers
and use them.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mike Turquette <mturquette@ti.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-02 00:19:34 +02:00
MyungJoo Ham
a3c98b8b2e PM: Introduce devfreq: generic DVFS framework with device-specific OPPs
With OPPs, a device may have multiple operable frequency and voltage
sets. However, there can be multiple possible operable sets and a system
will need to choose one from them. In order to reduce the power
consumption (by reducing frequency and voltage) without affecting the
performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
scheme may be used.

This patch introduces the DVFS capability to non-CPU devices with OPPs.
DVFS is a techique whereby the frequency and supplied voltage of a
device is adjusted on-the-fly. DVFS usually sets the frequency as low
as possible with given conditions (such as QoS assurance) and adjusts
voltage according to the chosen frequency in order to reduce power
consumption and heat dissipation.

The generic DVFS for devices, devfreq, may appear quite similar with
/drivers/cpufreq.  However, cpufreq does not allow to have multiple
devices registered and is not suitable to have multiple heterogenous
devices with different (but simple) governors.

Normally, DVFS mechanism controls frequency based on the demand for
the device, and then, chooses voltage based on the chosen frequency.
devfreq also controls the frequency based on the governor's frequency
recommendation and let OPP pick up the pair of frequency and voltage
based on the recommended frequency. Then, the chosen OPP is passed to
device driver's "target" callback.

When PM QoS is going to be used with the devfreq device, the device
driver should enable OPPs that are appropriate with the current PM QoS
requests. In order to do so, the device driver may call opp_enable and
opp_disable at the notifier callback of PM QoS so that PM QoS's
update_target() call enables the appropriate OPPs. Note that at least
one of OPPs should be enabled at any time; be careful when there is a
transition.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mike Turquette <mturquette@ti.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-02 00:19:15 +02:00
Linus Torvalds
f72a209a3e Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  irq: Fix check for already initialized irq_domain in irq_domain_add
  irq: Add declaration of irq_domain_simple_ops to irqdomain.h

* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86/rtc: Don't recursively acquire rtc_lock

* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  posix-cpu-timers: Cure SMP wobbles
  sched: Fix up wchan borkage
  sched/rt: Migrate equal priority tasks to available CPUs
2011-10-01 08:37:25 -07:00
MyungJoo Ham
03ca370fbf PM / OPP: Add OPP availability change notifier.
The patch enables to register notifier_block for an OPP-device in order
to get notified for any changes in the availability of OPPs of the
device. For example, if a new OPP is inserted or enable/disable status
of an OPP is changed, the notifier is executed.

This enables the usage of opp_add, opp_enable, and opp_disable to
directly take effect with any connected entities such as cpufreq or
devfreq.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mike Turquette <mturquette@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-09-30 22:35:12 +02:00
Arik Nemtsov
07ba55d7f1 nl80211/mac80211: allow adding TDLS peers as stations
When adding a TDLS peer STA, mark it with a new flag in both nl80211 and
mac80211. Before adding a peer, make sure the wiphy supports TDLS and
our operating mode is appropriate (managed).

In addition, make sure all peers are removed on disassociation.

A TDLS peer is first added just before link setup is initiated. In later
setup stages we have more info about peer supported rates, capabilities,
etc. This info is reported via nl80211_set_station().

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Cc: Kalyan C Gaddam <chakkal@iit.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-09-30 15:57:08 -04:00
Arik Nemtsov
dfe018bf99 mac80211: handle TDLS high-level commands and frames
Register and implement the TDLS cfg80211 callback functions.

Internally prepare and send TDLS management frames. We incorporate
local STA capabilities and supported rates with extra IEs given by
usermode. The resulting packet is either encapsulated in a data frame,
or assembled as an action frame. It is transmitted either directly or
through the AP, as mandated by the TDLS specification.

Declare support for the TDLS external setup wiphy capability. This
tells usermode to handle link setup and discovery on its own, and use the
kernel driver for sending TDLS mgmt packets.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Cc: Kalyan C Gaddam <chakkal@iit.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-09-30 15:57:07 -04:00
Arik Nemtsov
109086ce0b nl80211: support sending TDLS commands/frames
Add support for sending high-level TDLS commands and TDLS frames via
NL80211_CMD_TDLS_OPER and NL80211_CMD_TDLS_MGMT, respectively. Add
appropriate cfg80211 callbacks for lower level drivers.

Add wiphy capability flags for TDLS support and advertise them via
nl80211.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Cc: Kalyan C Gaddam <chakkal@iit.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-09-30 15:57:05 -04:00
John W. Linville
8e00f5fbb4 Merge branch 'master' of git://git.infradead.org/users/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-pci.c
	drivers/net/wireless/wl12xx/main.c
2011-09-30 14:52:29 -04:00
Dimitris Papastamos
6eb0f5e015 regmap: Implement regcache_cache_bypass helper function
Ensure we've got a function so users can enable/disable the
cache bypass option.

Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-09-30 13:57:47 +01:00
Peter Zijlstra
d670ec1317 posix-cpu-timers: Cure SMP wobbles
David reported:

  Attached below is a watered-down version of rt/tst-cpuclock2.c from
  GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
  similar.

  Run it several times, and you will see cases where the main thread
  will measure a process clock difference before and after the nanosleep
  which is smaller than the cpu-burner thread's individual thread clock
  difference.  This doesn't make any sense since the cpu-burner thread
  is part of the top-level process's thread group.

  I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
  64-bit binaries).

  For example:

  [davem@boricha build-x86_64-linux]$ ./test
  process: before(0.001221967) after(0.498624371) diff(497402404)
  thread:  before(0.000081692) after(0.498316431) diff(498234739)
  self:    before(0.001223521) after(0.001240219) diff(16698)
  [davem@boricha build-x86_64-linux]$ 

  The diff of 'process' should always be >= the diff of 'thread'.

  I make sure to wrap the 'thread' clock measurements the most tightly
  around the nanosleep() call, and that the 'process' clock measurements
  are the outer-most ones.

  ---
  #include <unistd.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <fcntl.h>
  #include <string.h>
  #include <errno.h>
  #include <pthread.h>

  static pthread_barrier_t barrier;

  static void *chew_cpu(void *arg)
  {
	  pthread_barrier_wait(&barrier);
	  while (1)
		  __asm__ __volatile__("" : : : "memory");
	  return NULL;
  }

  int main(void)
  {
	  clockid_t process_clock, my_thread_clock, th_clock;
	  struct timespec process_before, process_after;
	  struct timespec me_before, me_after;
	  struct timespec th_before, th_after;
	  struct timespec sleeptime;
	  unsigned long diff;
	  pthread_t th;
	  int err;

	  err = clock_getcpuclockid(0, &process_clock);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
	  if (err)
		  return 1;

	  pthread_barrier_init(&barrier, NULL, 2);
	  err = pthread_create(&th, NULL, chew_cpu, NULL);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(th, &th_clock);
	  if (err)
		  return 1;

	  pthread_barrier_wait(&barrier);

	  err = clock_gettime(process_clock, &process_before);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_before);
	  if (err)
		  return 1;

	  err = clock_gettime(th_clock, &th_before);
	  if (err)
		  return 1;

	  sleeptime.tv_sec = 0;
	  sleeptime.tv_nsec = 500000000;
	  nanosleep(&sleeptime, NULL);

	  err = clock_gettime(th_clock, &th_after);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_after);
	  if (err)
		  return 1;

	  err = clock_gettime(process_clock, &process_after);
	  if (err)
		  return 1;

	  diff = process_after.tv_nsec - process_before.tv_nsec;
	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 process_before.tv_sec, process_before.tv_nsec,
		 process_after.tv_sec, process_after.tv_nsec, diff);
	  diff = th_after.tv_nsec - th_before.tv_nsec;
	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 th_before.tv_sec, th_before.tv_nsec,
		 th_after.tv_sec, th_after.tv_nsec, diff);
	  diff = me_after.tv_nsec - me_before.tv_nsec;
	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 me_before.tv_sec, me_before.tv_nsec,
		 me_after.tv_sec, me_after.tv_nsec, diff);

	  return 0;
  }

This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.

This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().

Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-30 14:07:06 +02:00
Serge Hallyn
d178bc3a70 user namespace: usb: make usb urbs user namespace aware (v2)
Add to the dev_state and alloc_async structures the user namespace
corresponding to the uid and euid.  Pass these to kill_pid_info_as_uid(),
which can then implement a proper, user-namespace-aware uid check.

Changelog:
Sep 20: Per Oleg's suggestion: Instead of caching and passing user namespace,
	uid, and euid each separately, pass a struct cred.
Sep 26: Address Alan Stern's comments: don't define a struct cred at
	usbdev_open(), and take and put a cred at async_completed() to
	ensure it lasts for the duration of kill_pid_info_as_cred().

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-09-29 13:13:08 -07:00
Paul E. McKenney
82e78d80fc rcu: Simplify unboosting checks
Commit 7765be (Fix RCU_BOOST race handling current->rcu_read_unlock_special)
introduced a new ->rcu_boosted field in the task structure.  This is
redundant because the existing ->rcu_boost_mutex will be non-NULL at
any time that ->rcu_boosted is nonzero.  Therefore, this commit removes
->rcu_boosted and tests ->rcu_boost_mutex instead.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:39 -07:00
Paul E. McKenney
6206ab9bab rcu: Move __rcu_read_unlock()'s barrier() within if-statement
We only need to constrain the compiler if we are actually exiting
the top-level RCU read-side critical section.  This commit therefore
moves the first barrier() cal in __rcu_read_unlock() to inside the
"if" statement, thus avoiding needless register flushes for inner
rcu_read_unlock() calls.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:35 -07:00
Paul E. McKenney
6846c0c540 rcu: Improve rcu_assign_pointer() and RCU_INIT_POINTER() documentation
The differences between rcu_assign_pointer() and RCU_INIT_POINTER() are
subtle, and it is easy to use the the cheaper RCU_INIT_POINTER() when
the more-expensive rcu_assign_pointer() should have been used instead.
The consequences of this mistake are quite severe.

This commit therefore carefully lays out the situations in which it it
permissible to use RCU_INIT_POINTER().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:34 -07:00
Eric Dumazet
d322f45cee rcu: Make rcu_assign_pointer() unconditionally insert a memory barrier
Recent changes to gcc give warning messages on rcu_assign_pointers()'s
checks that allow it to determine when it is OK to omit the memory
barrier.  Stephen Hemminger tried a number of gcc tricks to silence
this warning, but #pragmas and CPP macros do not work together in the
way that would be required to make this work.

However, we now have RCU_INIT_POINTER(), which already omits this
memory barrier, and which therefore may be used when assigning NULL to
an RCU-protected pointer that is accessible to readers.  This commit
therefore makes rcu_assign_pointer() unconditionally emit the memory
barrier.

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:33 -07:00
Shi, Alex
fc0763f53e nohz: Remove nohz_cpu_mask
RCU no longer uses this global variable, nor does anyone else.  This
commit therefore removes this variable.  This reduces memory footprint
and also removes some atomic instructions and memory barriers from
the dyntick-idle path.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:29 -07:00
Paul E. McKenney
22507ed9b9 rcu: Remove unused and redundant interfaces
The rcu_dereference_bh_protected() and rcu_dereference_sched_protected()
macros are synonyms for rcu_dereference_protected() and are not used
anywhere in mainline.  This commit therefore removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:26 -07:00