Commit Graph

45246 Commits

Author SHA1 Message Date
Rob Herring
6d274309d0 irq: support domains with non-zero hwirq base
Interrupt controllers can have non-zero starting value for h/w irq numbers.
Adding support in irq_domain allows the domain hwirq numbering to match
the interrupt controllers' numbering.

As this makes looping over irqs for a domain more complicated, add loop
iterators to iterate over all hwirqs and irqs for a domain.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Tested-by: Thomas Abraham <thomas.abraham@linaro.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-31 14:03:23 +01:00
Rob Herring
c71a54b082 of/irq: introduce of_irq_init
of_irq_init will scan the devicetree for matching interrupt controller
nodes. Then it calls an initialization function for each found controller
in the proper order with parent nodes initialized before child nodes.

Based on initial pseudo code from Grant Likely.

Changes in v4:
- Drop unnecessary empty list check
- Be more verbose on errors
- Simplify "if (!desc) WARN_ON(1)" to "if (WARN_ON(!desc))"

Changes in v3:
- add missing kfree's found by Jamie
- Implement Grant's comments to simplify the init loop
- fix function comments

Changes in v2:
- Complete re-write of list searching code from Grant Likely

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Tested-by: Thomas Abraham <thomas.abraham@linaro.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-31 14:03:22 +01:00
Rob Herring
3a82543642 Merge remote-tracking branch 'rmk/devel-stable' into HEAD 2011-10-24 14:02:37 -05:00
Russell King
34471a9168 Merge branch 'ppi-irq-core-for-rmk' of git://github.com/mzyngier/arm-platforms into devel-stable 2011-10-23 14:42:30 +01:00
Linus Torvalds
8a04b45367 Merge git://github.com/davem330/net
* git://github.com/davem330/net:
  pch_gbe: Fixed the issue on which a network freezes
  pch_gbe: Fixed the issue on which PC was frozen when link was downed.
  make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
  net: xen-netback: correctly restart Tx after a VM restore/migrate
  bonding: properly stop queuing work when requested
  can bcm: fix incomplete tx_setup fix
  RDSRDMA: Fix cleanup of rds_iw_mr_pool
  net: Documentation: Fix type of variables
  ibmveth: Fix oops on request_irq failure
  ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
  cxgb4: Fix EEH on IBM P7IOC
  can bcm: fix tx_setup off-by-one errors
  MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
  dp83640: reduce driver noise
  ptp: fix L2 event message recognition
2011-10-04 10:37:06 -07:00
Jon Mason
5f39e6705f PCI: Disable MPS configuration by default
Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults.  Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.

Also, add the option for peer to peer DMA MPS configuration.  Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another.  To work around this, simply make the system
wide MPS the smallest possible value (128B).

Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-04 09:52:28 -07:00
Marc Zyngier
1e7c5fd294 genirq: percpu: allow interrupt type to be set at enable time
As request_percpu_irq() doesn't allow for a percpu interrupt to have
its type configured (it is generally impossible to configure it on all
CPUs at once), add a 'type' argument to enable_percpu_irq().

This allows some low-level, board specific init code to be switched to
a generic API.

[ tglx: Added WARN_ON argument ]

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-03 15:35:27 +02:00
Marc Zyngier
31d9d9b6d8 genirq: Add support for per-cpu dev_id interrupts
The ARM GIC interrupt controller offers per CPU interrupts (PPIs),
which are usually used to connect local timers to each core. Each CPU
has its own private interface to the GIC, and only sees the PPIs that
are directly connect to it.

While these timers are separate devices and have a separate interrupt
line to a core, they all use the same IRQ number.

For these devices, request_irq() is not the right API as it assumes
that an IRQ number is visible by a number of CPUs (through the
affinity setting), but makes it very awkward to express that an IRQ
number can be handled by all CPUs, and yet be a different interrupt
line on each CPU, requiring a different dev_id cookie to be passed
back to the handler.

The *_percpu_irq() functions is designed to overcome these
limitations, by providing a per-cpu dev_id vector:

int request_percpu_irq(unsigned int irq, irq_handler_t handler,
		   const char *devname, void __percpu *percpu_dev_id);
void free_percpu_irq(unsigned int, void __percpu *);
int setup_percpu_irq(unsigned int irq, struct irqaction *new);
void remove_percpu_irq(unsigned int irq, struct irqaction *act);
void enable_percpu_irq(unsigned int irq);
void disable_percpu_irq(unsigned int irq);

The API has a number of limitations:
- no interrupt sharing
- no threading
- common handler across all the CPUs

Once the interrupt is requested using setup_percpu_irq() or
request_percpu_irq(), it must be enabled by each core that wishes its
local interrupt to be delivered.

Based on an initial patch by Thomas Gleixner.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-03 15:35:26 +02:00
Linus Torvalds
f72a209a3e Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  irq: Fix check for already initialized irq_domain in irq_domain_add
  irq: Add declaration of irq_domain_simple_ops to irqdomain.h

* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86/rtc: Don't recursively acquire rtc_lock

* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  posix-cpu-timers: Cure SMP wobbles
  sched: Fix up wchan borkage
  sched/rt: Migrate equal priority tasks to available CPUs
2011-10-01 08:37:25 -07:00
Peter Zijlstra
d670ec1317 posix-cpu-timers: Cure SMP wobbles
David reported:

  Attached below is a watered-down version of rt/tst-cpuclock2.c from
  GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
  similar.

  Run it several times, and you will see cases where the main thread
  will measure a process clock difference before and after the nanosleep
  which is smaller than the cpu-burner thread's individual thread clock
  difference.  This doesn't make any sense since the cpu-burner thread
  is part of the top-level process's thread group.

  I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
  64-bit binaries).

  For example:

  [davem@boricha build-x86_64-linux]$ ./test
  process: before(0.001221967) after(0.498624371) diff(497402404)
  thread:  before(0.000081692) after(0.498316431) diff(498234739)
  self:    before(0.001223521) after(0.001240219) diff(16698)
  [davem@boricha build-x86_64-linux]$ 

  The diff of 'process' should always be >= the diff of 'thread'.

  I make sure to wrap the 'thread' clock measurements the most tightly
  around the nanosleep() call, and that the 'process' clock measurements
  are the outer-most ones.

  ---
  #include <unistd.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <fcntl.h>
  #include <string.h>
  #include <errno.h>
  #include <pthread.h>

  static pthread_barrier_t barrier;

  static void *chew_cpu(void *arg)
  {
	  pthread_barrier_wait(&barrier);
	  while (1)
		  __asm__ __volatile__("" : : : "memory");
	  return NULL;
  }

  int main(void)
  {
	  clockid_t process_clock, my_thread_clock, th_clock;
	  struct timespec process_before, process_after;
	  struct timespec me_before, me_after;
	  struct timespec th_before, th_after;
	  struct timespec sleeptime;
	  unsigned long diff;
	  pthread_t th;
	  int err;

	  err = clock_getcpuclockid(0, &process_clock);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
	  if (err)
		  return 1;

	  pthread_barrier_init(&barrier, NULL, 2);
	  err = pthread_create(&th, NULL, chew_cpu, NULL);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(th, &th_clock);
	  if (err)
		  return 1;

	  pthread_barrier_wait(&barrier);

	  err = clock_gettime(process_clock, &process_before);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_before);
	  if (err)
		  return 1;

	  err = clock_gettime(th_clock, &th_before);
	  if (err)
		  return 1;

	  sleeptime.tv_sec = 0;
	  sleeptime.tv_nsec = 500000000;
	  nanosleep(&sleeptime, NULL);

	  err = clock_gettime(th_clock, &th_after);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_after);
	  if (err)
		  return 1;

	  err = clock_gettime(process_clock, &process_after);
	  if (err)
		  return 1;

	  diff = process_after.tv_nsec - process_before.tv_nsec;
	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 process_before.tv_sec, process_before.tv_nsec,
		 process_after.tv_sec, process_after.tv_nsec, diff);
	  diff = th_after.tv_nsec - th_before.tv_nsec;
	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 th_before.tv_sec, th_before.tv_nsec,
		 th_after.tv_sec, th_after.tv_nsec, diff);
	  diff = me_after.tv_nsec - me_before.tv_nsec;
	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 me_before.tv_sec, me_before.tv_nsec,
		 me_after.tv_sec, me_after.tv_nsec, diff);

	  return 0;
  }

This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.

This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().

Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-30 14:07:06 +02:00
Richard Cochran
f75159e993 ptp: fix L2 event message recognition
The IEEE 1588 standard defines two kinds of messages, event and general
messages. Event messages require time stamping, and general do not. When
using UDP transport, two separate ports are used for the two message
types.

The BPF designed to recognize event messages incorrectly classifies L2
general messages as event messages. This commit fixes the issue by
extending the filter to check the message type field for L2 PTP packets.
Event messages are be distinguished from general messages by testing
the "general" bit.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-29 00:32:03 -04:00
Linus Torvalds
e689ec8057 Merge branch 'writeback-for-linus' of git://github.com/fengguang/linux
* 'writeback-for-linus' of git://github.com/fengguang/linux:
  writeback: show raw dirtied_when in trace writeback_single_inode
2011-09-28 08:01:05 -07:00
Linus Torvalds
b6c8069d35 vfs: remove LOOKUP_NO_AUTOMOUNT flag
That flag no longer makes sense, since we don't look up automount points
as eagerly any more.  Additionally, it turns out that the NO_AUTOMOUNT
handling was buggy to begin with: it would avoid automounting even for
cases where we really *needed* to do the automount handling, and could
return ENOENT for autofs entries that hadn't been instantiated yet.

With our new non-eager automount semantics, one discussion has been
about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
newfstatat() and fstatat64() system calls), but it's probably not worth
it: you can always force at least directory automounting by simply
adding the final '/' to the filename, which works for *all* of the stat
family system calls, old and new.

So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
result of our bad default behavior.

Acked-by: Ian Kent <raven@themaw.net>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-27 08:12:33 -07:00
Linus Torvalds
d94c177bee vfs pathname lookup: Add LOOKUP_AUTOMOUNT flag
Since we've now turned around and made LOOKUP_FOLLOW *not* force an
automount, we want to add the ability to force an automount event on
lookup even if we don't happen to have one of the other flags that force
it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)

Most cases will never want to use this, since you'd normally want to
delay automounting as long as possible, which usually implies
LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
the automount any more).

But Trond argued sufficiently forcefully that at a minimum bind mounting
a file and quotactl will want to force the automount lookup.  Some other
cases (like nfs_follow_remote_path()) could use it too, although
LOOKUP_DIRECTORY would work there as well.

This commit just adds the flag and logic, no users yet, though.  It also
doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
was made irrelevant by the same change that made us not follow on
LOOKUP_FOLLOW.

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-26 17:44:55 -07:00
Colin Cross
ab10023e00 cpu_pm: Add cpu power management notifiers
During some CPU power modes entered during idle, hotplug and
suspend, peripherals located in the CPU power domain, such as
the GIC, localtimers, and VFP, may be powered down.  Add a
notifier chain that allows drivers for those peripherals to
be notified before and after they may be reset.

Notified drivers can include VFP co-processor, interrupt controller
and it's PM extensions, local CPU timers context save/restore which
shouldn't be interrupted. Hence CPU PM event APIs  must be called
with interrupts disabled.

Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Tested-and-Acked-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Kevin Hilman <khilman@ti.com>
Tested-by: Vishwanath BS <vishwanath.bs@ti.com>
2011-09-23 12:05:29 +05:30
Linus Torvalds
fae3f6f2ee Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] kvm: extension capability for new address space layout
  [S390] kvm: fix address mode switching
2011-09-22 09:32:21 -07:00
Linus Torvalds
fed678dc8a Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block:
  floppy: use del_timer_sync() in init cleanup
  blk-cgroup: be able to remove the record of unplugged device
  block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request
  mm: Add comment explaining task state setting in bdi_forker_thread()
  mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread()
  block: simplify force plug flush code a little bit
  block: change force plug flush call order
  block: Fix queue_flag update when rq_affinity goes from 2 to 1
  block: separate priority boosting from REQ_META
  block: remove READ_META and WRITE_META
  xen-blkback: fixed indentation and comments
  xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
2011-09-21 13:20:21 -07:00
Christian Borntraeger
b6cf8788a3 [S390] kvm: extension capability for new address space layout
598841ca99 ([S390] use gmap address
spaces for kvm guest images) changed kvm on s390 to use a separate
address space for kvm guests. We can now put KVM guests anywhere
in the user address mode with a size up to 8PB - as long as the
memory is 1MB-aligned. This change was done without KVM extension
capability bit.
The change was added after 3.0, but we still have a chance to add
a feature bit before 3.1 (keeping the releases in a sane state).
We use number 71 to avoid collisions with other pending kvm patches
as requested by Alexander Graf.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2011-09-20 17:07:34 +02:00
Rob Herring
5bd078dda4 irq: Add declaration of irq_domain_simple_ops to irqdomain.h
irq_domain_simple_ops is exported, but is not declared in irqdomain.h,
so add it.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: marc.zyngier@arm.com
Cc: thomas.abraham@linaro.org
Cc: jamie@jamieiles.com
Cc: b-cousson@ti.com
Cc: shawn.guo@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: devicetree-discuss@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1316017900-19918-2-git-send-email-robherring2@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-20 12:16:22 +02:00
Linus Torvalds
1f0772fe27 Merge git://github.com/davem330/net
* git://github.com/davem330/net:
  tcp: fix validation of D-SACK
  tcp: fix build error if !CONFIG_SYN_COOKIES
2011-09-19 07:48:04 -07:00
Eric Dumazet
e05c82d366 tcp: fix build error if !CONFIG_SYN_COOKIES
commit 946cedccbd (tcp: Change possible SYN flooding messages)
added a build error if CONFIG_SYN_COOKIES=n

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-18 21:48:01 -04:00
Linus Torvalds
6bf3b0dc32 Merge branch 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6
* 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6:
  mfd: Fix omap-usb-host build failure
  mfd: Make omap-usb-host TLL mode work again
  mfd: Set MAX8997 irq pointer
  mfd: Fix initialisation of tps65910 interrupts
  mfd: Check for twl4030-madc NULL pointer
  mfd: Copy the device pointer to the twl4030-madc structure
  mfd: Rename wm8350 static gpio_set_debounce()
  mfd: Fix value of WM8994_CONFIGURE_GPIO
2011-09-18 18:18:55 -07:00
Linus Torvalds
b0e7031ac0 Merge git://github.com/davem330/net
* git://github.com/davem330/net: (62 commits)
  ipv6: don't use inetpeer to store metrics for routes.
  can: ti_hecc: include linux/io.h
  IRDA: Fix global type conflicts in net/irda/irsysctl.c v2
  net: Handle different key sizes between address families in flow cache
  net: Align AF-specific flowi structs to long
  ipv4: Fix fib_info->fib_metrics leak
  caif: fix a potential NULL dereference
  sctp: deal with multiple COOKIE_ECHO chunks
  ibmveth: Fix checksum offload failure handling
  ibmveth: Checksum offload is always disabled
  ibmveth: Fix issue with DMA mapping failure
  ibmveth: Fix DMA unmap error
  pch_gbe: support ML7831 IOH
  pch_gbe: added the process of FIFO over run error
  pch_gbe: fixed the issue which receives an unnecessary packet.
  sfc: Use 64-bit writes for TX push where possible
  Revert "sfc: Use write-combining to reduce TX latency" and follow-ups
  bnx2x: Fix ethtool advertisement
  bnx2x: Fix 578xx link LED
  bnx2x: Fix XMAC loopback test
  ...
2011-09-18 11:02:26 -07:00
dpward
aa1c366e4f net: Handle different key sizes between address families in flow cache
With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-16 17:47:28 -04:00
David Ward
728871bc05 net: Align AF-specific flowi structs to long
AF-specific flowi structs are now passed to flow_key_compare, which must
also be aligned to a long.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-16 17:45:07 -04:00
Max Matveev
d5ccd49660 sctp: deal with multiple COOKIE_ECHO chunks
Attempt to reduce the number of IP packets emitted in response to single
SCTP packet (2e3216cd) introduced a complication - if a packet contains
two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the
socket while processing first COOKIE_ECHO and then loses the association
and forgets to uncork the socket. To deal with the issue add new SCTP
command which can be used to set association explictly. Use this new
command when processing second COOKIE_ECHO chunk to restore the context
for SCTP state machine.

Signed-off-by: Max Matveev <makc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-16 17:17:22 -04:00
David S. Miller
52b9aca7ae Merge branch 'master' of ../netdev/ 2011-09-16 01:09:02 -04:00
Michael S. Tsirkin
48c830120f net: copy userspace buffers on device forwarding
dev_forward_skb loops an skb back into host networking
stack which might hang on the memory indefinitely.
In particular, this can happen in macvtap in bridged mode.
Copy the userspace fragments to avoid blocking the
sender in that case.

As this patch makes skb_copy_ubufs extern now,
I also added some documentation and made it clear
the SKBTX_DEV_ZEROCOPY flag automatically instead
of doing it in all callers. This can be made into a separate
patch if people feel it's worth it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15 14:49:44 -04:00
Eric Dumazet
946cedccbd tcp: Change possible SYN flooding messages
"Possible SYN flooding on port xxxx " messages can fill logs on servers.

Change logic to log the message only once per listener, and add two new
SNMP counters to track :

TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client

TCPReqQFullDrop : number of times a SYN request was dropped because
syncookies were not enabled.

Based on a prior patch from Tom Herbert, and suggestions from David.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15 14:49:43 -04:00
Russell King
4f5b04800a drivers/gpio/gpio-generic.c: fix build errors
Building a kernel with hotplug disabled results in a link failure:

  `bgpio_remove' referenced in section `___ksymtab_gpl+bgpio_remove' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o

This is because of bgpio_remove() is exported.  It is illegal to export
symbols which are discarded either at link time or as part of an
init/exit section.

Fix this by dropping the __devexit attributation from bgpio_remove().
Also drop the __devinit attributation from bgpio_init().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-14 18:09:38 -07:00
Johannes Weiner
185efc0f9a memcg: Revert "memcg: add memory.vmscan_stat"
Revert the post-3.0 commit 82f9d486e5 ("memcg: add
memory.vmscan_stat").

The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.

The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between.  Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.

Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-14 18:09:38 -07:00
Santosh Shilimkar
60f96b41f7 genirq: Add IRQCHIP_SKIP_SET_WAKE flag
Some irq chips need the irq_set_wake() functionality, but do not
require a irq_set_wake() callback. Instead of forcing an empty
callback to be implemented add a flag which notes this fact. Check for
the flag in set_irq_wake_real() and return success when set.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
2011-09-12 09:52:49 +02:00
Randy Dunlap
bff747c58c regulator: fix kernel-doc warning in consumer.h
Fix kernel-doc warning about internal/private data by marking it
as "private:" so that kernel-doc will ignore it.

  Warning(include/linux/regulator/consumer.h:128): No description found for parameter 'ret'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-08 14:43:03 -07:00
Randy Dunlap
d2f152878d wireless: fix kernel-doc warning in net/cfg80211.h
Fix kernel-doc warning in net/cfg80211.h:

  Warning(include/net/cfg80211.h:1884): No description found for parameter 'registered'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-08 14:43:03 -07:00
Linus Torvalds
b0fb422281 Merge branch 'perf-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'perf-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86, perf: Check that current->mm is alive before getting user callchain
  perf_event: Fix broken calc_timer_values()
  perf events: Fix slow and broken cgroup context switch code
2011-09-07 13:00:11 -07:00
Mark Brown
8efcc57ded mfd: Fix value of WM8994_CONFIGURE_GPIO
This needs to be an out of band value for the register and on this device
registers are 16 bit so we must shift left one to the 17th bit.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@kernel.org
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-09-06 16:37:58 +02:00
Jim Garlick
51b8b4fb32 fs/9p: Use protocol-defined value for lock/getlock 'type' field.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2011-09-06 08:17:16 -05:00
Aneesh Kumar K.V
f88657ce3f fs/9p: Add OS dependent open flags in 9p protocol
Some of the flags are OS/arch dependent we add a 9p
protocol value which maps to asm-generic/fcntl.h values in Linux
Based on the original patch from Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2011-09-06 08:17:15 -05:00
Wu Fengguang
c8ad620638 writeback: show raw dirtied_when in trace writeback_single_inode
Save inode->dirtied_when in the raw trace output for reliable scripting,
and to also show in formatted output the relative age in seconds for
easy human reading.

CC: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-08-31 08:48:15 +08:00
Maciej Żenczykowski
ec0506dbe4 net: relax PKTINFO non local ipv6 udp xmit check
Allow transparent sockets to be less restrictive about
the source ip of ipv6 udp packets being sent.

Google-Bug-Id: 5018138
Signed-off-by: Maciej Żenczykowski <maze@google.com>
CC: "Erik Kline" <ek@google.com>
CC: "Lorenzo Colitti" <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-30 17:39:01 -04:00
Linus Torvalds
0f43dd546d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (42 commits)
  netpoll: fix incorrect access to skb data in __netpoll_rx
  cassini: init before use in cas_interruptN.
  can: ti_hecc: Fix uninitialized spinlock in probe
  can: ti_hecc: Fix unintialized variable
  net: sh_eth: fix the compile error
  net/phy: fix DP83865 phy interrupt handler
  sendmmsg/sendmsg: fix unsafe user pointer access
  ibmveth: Fix leak when recycling skb and hypervisor returns error
  arp: fix rcu lockdep splat in arp_process()
  bridge: fix a possible use after free
  bridge: Pseudo-header required for the checksum of ICMPv6
  mcast: Fix source address selection for multicast listener report
  MAINTAINERS: Update GIT trees for network development
  ath9k: Fix PS wrappers in ath9k_set_coverage_class
  carl9170: Fix mismatch in carl9170_op_set_key mutex lock-unlock
  wl12xx: add max_sched_scan_ssids value to the hw description
  wl12xx: Fix validation of pm_runtime_get_sync return value
  wl12xx: Remove obsolete testmode NVS push command
  bcma: add uevent to the bus, to autoload drivers
  ath9k_hw: Fix STA (AR9485) bringup issue due to incorrect MAC address
  ...
2011-08-29 13:38:29 -07:00
Stephane Eranian
a8d757ef07 perf events: Fix slow and broken cgroup context switch code
The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10684.51 ctxsw/s

Now start a cgroup perf stat:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

$ ./pong
 Both processes pinned to CPU1, running for 10s
 6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

 Performance counter stats for 'sleep 100':

 CPU1 <not counted> cycles   test
 CPU1 16,984,189,138 cycles  #    0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10775.30 ctxsw/s

Start perf stat with cgroup:

 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

Run pong outside the cgroup:
 $ /pong
 Both processes pinned to CPU1, running for 10s
 10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 <not counted> cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 22,297,726,205 cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

      10.001457237 seconds time elapsed

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-29 12:28:33 +02:00
NeilBrown
f5b9409973 All Arch: remove linkage for sys_nfsservctl system call
The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-26 15:09:58 -07:00
Linus Torvalds
efe45ab1ee Merge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  omap-serial: Allow IXON and IXOFF to be disabled.
  TTY: serial, document ignoring of uart->ops->startup error
  TTY: pty, fix pty counting
  8250: Fix race condition in serial8250_backup_timeout().
  serial/8250_pci: delete duplicate data definition
  8250_pci: add support for Rosewill RC-305 4x serial port card
  tty: Add "spi:" prefix for spi modalias
  atmel_serial: fix atmel_default_console_device
  serial: 8250_pnp: add Intermec CV60 touchscreen device
  drivers/serial/ucc_uart.c: Fix compiler warning
  pch_uart: Set PCIe bus number using probe parameter
  serial: samsung: Fix build error
2011-08-26 13:06:06 -07:00
Linus Torvalds
3ab47029d9 Merge branch 'driver-core-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* 'driver-core-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  drivers:misc: ti-st: fix unexpected UART close
  drivers:misc: ti-st: free skb on firmware download
  drivers:misc: ti-st: wait for completion at fail
  drivers:misc: ti-st: reinit completion before send
  drivers:misc: ti-st: fail-safe on wrong pkt type
  drivers:misc: ti-st: reinit completion on ver read
  drivers:misc:ti-st: platform hooks for chip states
  drivers:misc: ti-st: avoid a misleading dbg msg
  base/devres.c: quiet sparse noise about context imbalance
  pti: add missing CONFIG_PCI dependency
  drivers/base/devtmpfs.c: correct annotation of `setup_done'
  driver core: fix kernel-doc warning in platform.c
  firmware: fix google/gsmi.c build warning
2011-08-26 13:05:09 -07:00
John W. Linville
e2e6be56df Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2011-08-26 10:33:51 -04:00
Dilan Lee
cc7993f643 backlight: add a callback 'notify_after' for backlight control
We need a callback to do some things after pwm_enable, pwm_disable
and pwm_config.

Signed-off-by: Dilan Lee <dilee@nvidia.com>
Reviewed-by: Robert Morell <rmorell@nvidia.com>
Reviewed-by: Arun Murthy <arun.murthy@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 16:25:34 -07:00
Alexandre Bounine
284fb68d00 rapidio: fix use of non-compatible registers
Replace/remove use of RIO v.1.2 registers/bits that are not
forward-compatible with newer versions of RapidIO specification.

RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR,
Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.

Use of removed (since RIO v.1.3) register bits affects users of
currently available 1.3 and 2.x compliant devices who may use not so
recent kernel versions.

Removing checks for unsupported bits makes corresponding routines
compatible with all versions of RapidIO specification.  Therefore,
backporting makes stable kernel versions compliant with RIO v.1.3 and
later as well.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Chul Kim <chul.kim@idt.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 16:25:34 -07:00
Evgeniy Polyakov
a801876638 MAINTAINERS: Evgeniy has moved
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 16:25:33 -07:00
Josh Boyer
e096d0c7e2 lockdep: Add helper function for dir vs file i_mutex annotation
Purely in-memory filesystems do not use the inode hash as the dcache
tells us if an entry already exists.  As a result, they do not call
unlock_new_inode, and thus directory inodes do not get put into a
different lockdep class for i_sem.

We need the different lockdep classes, because the locking order for
i_mutex is different for directory inodes and regular inodes.  Directory
inodes can do "readdir()", which takes i_mutex *before* possibly taking
mm->mmap_sem (due to a page fault while copying the directory entry to
user space).

In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
before accessing i_mutex.

The two cases can never happen for the same inode, so no real deadlock
can occur, but without the different lockdep classes, lockdep cannot
understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
can lead to false positives from lockdep like below:

    find/645 is trying to acquire lock:
     (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac

    but task is already holding lock:
     (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
    vfs_readdir+0x5b/0xb4

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
          [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
          [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
          [<ffffffff81111557>] mmap_region+0x258/0x432
          [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
          [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
          [<ffffffff8100c858>] sys_mmap+0x22/0x24
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

    -> #0 (&mm->mmap_sem){++++++}:
          [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff81109541>] might_fault+0x89/0xac
          [<ffffffff81149cff>] filldir+0x6f/0xc7
          [<ffffffff811586ea>] dcache_readdir+0x67/0x205
          [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
          [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

This patch moves the directory vs file lockdep annotation into a helper
function that can be called by in-memory filesystems and has hugetlbfs
call it.

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 10:50:18 -07:00