rt->rt_iif is only ever inspected on input routes, for example DCCP
uses this to populate a route lookup flow key when generating replies
to another packet.
Therefore, setting it to anything other than zero on output routes
makes no sense.
Signed-off-by: David S. Miller <davem@davemloft.net>
We burn a lot of useless cycles, cpu store buffer traffic, and
memory operations memset()'ing the on-stack flow used to perform
output route lookups in __ip_route_output_key().
Only the first half of the flow object members even matter for
output route lookups in this context, specifically:
FIB rules matching cares about:
dst, src, tos, iif, oif, mark
FIB trie lookup cares about:
dst
FIB semantic match cares about:
tos, scope, oif
Therefore only initialize these specific members and elide the
memset entirely.
On Niagara2 this kills about ~300 cycles from the output route
lookup path.
Likely, we can take things further, since all callers of output
route lookups essentially throw away the on-stack flow they use.
So they don't care if we use it as a scratch-pad to compute the
final flow key.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Pass down the correct node for a transparent hugepage allocation. Most
callers continue to use the current node, however the hugepaged daemon
now uses the previous node of the first to be collapsed page instead.
This ensures that khugepaged does not mess up local memory for an
existing process which uses local policy.
The choice of node is somewhat primitive currently: it just uses the
node of the first page in the pmd range. An alternative would be to
look at multiple pages and use the most popular node. I used the
simplest variant for now which should work well enough for the case of
all pages being on the same node.
[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes a difference for LOCAL policy, where the node cannot be
determined from the policy itself, but has to be gotten from the original
page.
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a alloc_page_vma_node that allows passing the "local" node in. Used
in a followon patch.
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently alloc_pages_vma() always uses the local node as policy node for
the LOCAL policy. Pass this node down as an argument instead.
No behaviour change from this patch, but will be needed for followons.
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add maintainer of Samsung Mobile machine support. Currently, Aquila,
Goni, Universal (C210), and Nuri board are supported.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Joe Perches <joe@perches.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This driver causes hard lockups, when the active clock soure is jiffies.
The reason is that it loops with interrupts disabled waiting for a
timestamp to be reached by polling getnstimeofday(). Though with a
jiffies clocksource, when that code runs on the same CPU which is
responsible for updating jiffies, then we loop in circles for ever
simply because the timer interrupt cannot update jiffies. So both UP
and SMP can be affected.
There is no easy fix for that problem so make it depend on BROKEN for
now.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Cc: Rodolfo Giometti <giometti@linux.it>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The device table is required to load modules based on modaliases.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Shubhrajyoti D <shubhrajyoti@ti.com>
Cc: Christoph Mair <christoph.mair@gmail.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix s3c_rtc_setaie() prototype to eliminate the following compile
warning:
drivers/rtc/rtc-s3c.c:383: warning: initialization from incompatible pointer type
(akpm: the rtc_class_ops.alarm_irq_enable() handler is being passed two
arguments where it expects just one, presumably with undesired effects)
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin:
Blackfin: iflush: update anomaly 05000491 workaround
Blackfin: outs[lwb]: make sure count is greater than 0
* 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Change __nosave_XXX symbols to long
sh: Flush executable pages in copy_user_highpage
sh: Ensure ST40-300 BogoMIPS value is consistent
sh: sh7750: Fix incompatible pointer type
sh: sh7750: move machtypes.h to include/generated
The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):
bad_page+0x69/0x91
free_hot_cold_page+0x81/0x144
skb_release_data+0x5f/0x98
__kfree_skb+0x11/0x1a
tcp_ack+0x6a3/0x1868
tcp_rcv_established+0x7a6/0x8b9
tcp_v4_do_rcv+0x2a/0x2fa
tcp_v4_rcv+0x9a2/0x9f6
do_timer+0x2df/0x52c
ip_local_deliver+0x19d/0x263
ip_rcv+0x539/0x57c
netif_receive_skb+0x470/0x49f
:virtio_net:virtnet_poll+0x46b/0x5c5
net_rx_action+0xac/0x1b3
__do_softirq+0x89/0x133
call_softirq+0x1c/0x28
do_softirq+0x2c/0x7d
do_IRQ+0xec/0xf5
default_idle+0x0/0x50
ret_from_intr+0x0/0xa
default_idle+0x29/0x50
cpu_idle+0x95/0xb8
start_kernel+0x220/0x225
_sinittext+0x22f/0x236
It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.
We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set. We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:
1) ensure that when we create the list of pages, no page struct has
PG_Slab set
or
2) not use a page list to send this data
Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go. I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it. This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked. We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages. This way the data
will be properly freed when the ack comes in
Successfully tested by myself to solve the above oops.
Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: security@kernel.org
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David noticed :
------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.
If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.
This makes the case of a create==0 lookup for a not-present entry
really expensive. It is on the order of 600 cpu cycles on my
Niagara2.
I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.
This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------
One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.
After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.
Note: Add one private rcu_deref_locked() macro to place in one spot the
access to spinlock included in seqlock.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The standby logic used to be pretty dependent on the work requeueing
behavior that changed when we switched to WQ_NON_REENTRANT. It was also
very fragile.
Restructure things so that:
- We clear WRITE_PENDING when we set STANDBY. This ensures we will
requeue work when we wake up later.
- con_work backs off if STANDBY is set. There is nothing to do if we are
in standby.
- clear_standby() helper is called by both con_send() and con_keepalive(),
the two actions that can wake us up again. Move the connect_seq++
logic here.
Signed-off-by: Sage Weil <sage@newdream.net>
With commit f363e45f we replaced a bunch of hacky workqueue mutual
exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is
that the exponential backoff breaks in certain cases:
* con_work attempts to connect.
* we get an immediate failure, and the socket state change handler queues
immediate work.
* con_work calls con_fault, we decide to back off, but can't queue delayed
work.
In this case, we add a BACKOFF bit to make con_work reschedule delayed work
next time it runs (which should be immediately).
Signed-off-by: Sage Weil <sage@newdream.net>
Usually H/W generate statistics notify once per about 100ms, but
sometimes we can receive notify in shorter time, even 2 ms.
This can be problem for plcp health and ack health checking.
I.e. with 2 plcp errors happens randomly in 2 ms duration, we
exceed plcp delta threshold equal to 100 (2*100/2).
Also checking ack's in short time, can results not necessary false
positive and firmware reset, for example when channel is noised and
we do not receive ACKs frames or when remote device does not send
ACKs at the moment.
Patch change code to do statistic check and possible recovery only
if 99ms elapsed from last check. Forced delay should assure we have
good statistic data to estimate hardware state.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make iwl_good_plcp_health code easiest to read.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Put generic rx_handlers (except iwlagn_rx_reply_compressed_ba) to
iwl-rx.c . Make functions static and change prefix from iwlagn_ to
iwl_ . Beautify iwl_setup_rx_handlers and do some other minor coding
style changes.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
mac80211 does the same afterwards anyway. Hence, just drop
this redundant code.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
No use printing addresses of pointers, just print the
pointers themselves.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Now that all accesses to the data_queue structures is done via the specialized
rt2x00queue_get_tx_queue function or via direct accesses, there is no
need for the rt2x00queue_get_queue function anymore, so remove it.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In the spirit of optimizing the code to get the queue structure of TX queues,
also optimize the code to get beacon queues. We can simply use the bcn queue
field of the rt2x00_dev structure instead of using the rt2x00queue_get_queue
function.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The ATIM queue is considered to be a TX queue by the drivers that support
the queue. Therefore include support for the ATIM queue to the
rt2x00queue_get_tx_queue function so that the drivers that support the ATIM
queue can also use that function.
Add the support in such a way that drivers that do not support the ATIM
queue are not penalized in their efficiency.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Current code for the atim queue is strange, as it is considered in the
rt2x00_dev structure as a second beacon queue.
Normalize this by letting the atim queue have its own struct data_queue
pointer in the rt2x00_dev structure.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We don't use interrupt threads anymore. Fix the comment.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The PCI device irqmask is locked by a spin_lock. Currently
spin_lock_irqsave is used everywhere. To reduce the locking overhead
replace spin_lock_irqsave in hard irq context with spin_lock and in
soft irq context with spin_lock_irq.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When setting up multiple BSSIDs in AP mode on an rt2800pci device we
previously used the STAs AID to select an appropriate key slot. But
since the AID is per VIF we can end up with two STAs having the same AID
and thus using the same key index. This resulted in one STA overwriting
the key information of another STA.
Fix this by simply searching for the next unused entry in the pairwise
key table.
Also bring the key table init in sync with deleting keys by initializing
the key table entries to 0 instead of 1.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This makes the code less error-prone.
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ieee80211_get_tx_rate is not valid for HT rates. Hence, restructure the
TX desciptor creation to be aware of MCS rates. The generic TX desciptor
creation now cares about the rate_mode (CCK, OFDM, MCS, GF).
As a result, ieee80211_get_tx_rate gets only called for legacy rates.
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
"ifs" is only used by no-HT devices. Move it into the plcp substruct and
fill in the value only for no-HT devices.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some fields only need to be u8 and for ifs and txop we can use the
already available enums.
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
HT and no-HT rt2x00 devices use a partly different TX descriptor.
Optimize the tx desciptor memory layout by putting the PLCP and HT
substructs into a union and introduce a new driver flag to decide which
TX desciptor format is used by the device.
This saves us the expensive PLCP calculation fOr HT devices and the HT
descriptor setup on no-HT devices.
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Newer devices like rt2800* own a hardware sequence counter and thus
don't need to use a software sequence counter at all. Add a new driver
flag to shortcut the software sequence number generation on devices that
don't need it.
rt61pci, rt73usb and rt2800* seem to make use of a hw sequence counter
while rt2400pci and rt2500* need to do it in software.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rt2x00queue_write_tx_frame is unlikely to fail. Tell the compiler.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This special case shouldn't happen very often. Only if a frame that
is not intended to be aggregated ends up in an AMPDU _and_ was intended
to be sent at a different MCS rate as the aggregate. Hence, using
unlikely is justified.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since tx_info->control.vif was already accessed before it cant't be NULL
here.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These conditions are unlikely to happen, tell the compiler.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>