Commit Graph

5484 Commits

Author SHA1 Message Date
Andrei Emeltchenko
b078b56429 Bluetooth: Add HCI logical link cmds definitions
Add a few definitions to hci.h

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:09:29 -03:00
John W. Linville
791ef39cd1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-09-24 14:39:16 -04:00
John W. Linville
9b4e9e7565 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-09-24 14:34:40 -04:00
Johan Hedberg
92a25256f1 Bluetooth: mgmt: Implement support for passkey notification
This patch adds support for Secure Simple Pairing with devices that have
KeyboardOnly as their IO capability. Such devices will cause a passkey
notification on our side and optionally also keypress notifications.
Without this patch some keyboards cannot be paired using the mgmt
interface.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-18 22:27:29 -03:00
Johannes Berg
e548c49e6d mac80211: add key flag for management keys
Mark keys that might be used to receive management
frames so drivers can fall back on software crypto
for them if they don't support hardware offload.
As the new flag is only set correctly for RX keys
and the existing IEEE80211_KEY_FLAG_SW_MGMT flag
can only affect TX, also rename the latter to
IEEE80211_KEY_FLAG_SW_MGMT_TX.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-10 11:29:17 +02:00
Andrei Emeltchenko
376261ae36 Bluetooth: debug: Print refcnt for hci_dev
Add debug output for HCI kref.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-08 18:06:10 -03:00
Andrei Emeltchenko
9472007c62 Bluetooth: trivial: Make hci_chan_del return void
Return code is not needed in hci_chan_del

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-08 17:27:18 -03:00
Andrei Emeltchenko
6b536b5e5e Bluetooth: Remove unneeded zero init
hdev is allocated with kzalloc so zero initialization is not needed.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-08 16:53:48 -03:00
John W. Linville
fac805f8c1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2012-09-07 15:07:55 -04:00
John W. Linville
4a3e12fd7a Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-09-07 14:49:46 -04:00
Johannes Berg
00b14825ee Merge remote-tracking branch 'wireless-next/master' into mac80211-next 2012-09-06 17:05:28 +02:00
Johannes Berg
944b9e375d Merge remote-tracking branch 'mac80211/master' into mac80211-next
Pull in mac80211.git to let the next patch apply
without conflicts, also resolving a hwsim conflict.

Conflicts:
	drivers/net/wireless/mac80211_hwsim.c

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-06 15:56:02 +02:00
Robert P. J. Day
c9a0a30252 cfg80211: add kerneldoc entry for "vht_cap"
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-05 15:54:07 +02:00
Vinicius Costa Gomes
cc110922da Bluetooth: Change signature of smp_conn_security()
To make it clear that it may be called from contexts that may not have
any knowledge of L2CAP, we change the connection parameter, to receive
a hci_conn.

This also makes it clear that it is checking the security of the link.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-27 08:07:18 -07:00
John W. Linville
01e17dacd4 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	drivers/net/wireless/mac80211_hwsim.c
2012-08-21 16:00:21 -04:00
Syam Sidhardhan
144ad33020 Bluetooth: Use kref for l2cap channel reference counting
This patch changes the struct l2cap_chan and associated code to use
kref api for object refcounting and freeing.

Suggested-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:41 -03:00
Mikel Astiz
f0d6a0ea33 Bluetooth: mgmt: Add device disconnect reason
MGMT_EV_DEVICE_DISCONNECTED will now expose the disconnection reason to
userland, distinguishing four possible values:

	0x00	Reason not known or unspecified
	0x01	Connection timeout
	0x02	Connection terminated by local host
	0x03	Connection terminated by remote host

Note that the local/remote distinction just determines which side
terminated the low-level connection, regardless of the disconnection of
the higher-level profiles.

This can sometimes be misleading and thus must be used with care. For
example, some hardware combinations would report a locally initiated
disconnection even if the user turned Bluetooth off in the remote side.

Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:40 -03:00
Mikel Astiz
cdcba7c650 Bluetooth: Add more HCI error codes
Add more HCI error codes as defined in the specification.

Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:39 -03:00
Johannes Berg
6d71117a27 mac80211: add IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF
Some devices like the current iwlwifi implementation
require that the P2P interface address match the P2P
Device address (only one P2P interface is supported.)
Add the HW flag IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF
that allows drivers to request that P2P Interfaces
added while a P2P Device is active get the same MAC
address by default.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:23 +02:00
Johannes Berg
98104fdeda cfg80211: add P2P Device abstraction
In order to support using a different MAC address
for the P2P Device address we must first have a
P2P Device abstraction that can be assigned a MAC
address.

This abstraction will also be useful to support
offloading P2P operations to the device, e.g.
periodic listen for discoverability.

Currently, the driver is responsible for assigning
a MAC address to the P2P Device, but this could be
changed by allowing a MAC address to be given to
the NEW_INTERFACE command.

As it has no associated netdev, a P2P Device can
only be identified by its wdev identifier but the
previous patches allowed using the wdev identifier
in various APIs, e.g. remain-on-channel.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:21 +02:00
Johannes Berg
4c29867790 mac80211: support A-MPDU status reporting
Support getting A-MPDU status information from the
drivers and reporting it to userspace via radiotap
in the standard fields.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:53:22 +02:00
Johannes Berg
48613ece3d wireless: add radiotap A-MPDU status field
Define the A-MPDU status field in radiotap, also
update the radiotap parser for it and the MCS field
that was apparently missed last time.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:53:09 +02:00
Antonio Quartulli
e687f61eed mac80211: add supported rates change notification in IBSS
In IBSS it is possible that the supported rates set for a station changes over
time (e.g. it gets first initialised as an empty set because of no available
information about rates and updated later). In this case the driver has to be
notified about the change in order to update its internal table accordingly (if
needed).

This behaviour is needed by all those drivers that handle rc internally but
leave stations management to mac80211

Reported-by: Gui Iribarren <gui@altermundi.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
[Johannes - add docs, validate IBSS mode only, fix compilation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:31:43 +02:00
Vinicius Costa Gomes
57f5d0d1d9 Bluetooth: Remove some functions from being exported
Some connection related functions are only used inside hci_conn.c
so no need to have them exported.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 00:53:11 -03:00
John W. Linville
57f784fed3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-08-10 15:13:12 -04:00
Andre Guedes
b9b343d254 Bluetooth: Fix hci_le_conn_complete_evt
We need to check the 'Role' parameter from the LE Connection
Complete Event in order to properly set 'out' and 'link_mode'
fields from hci_conn structure.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:05:10 -03:00
Masatake YAMATO
256a06c8a8 Bluetooth: /proc/net/ entries for bluetooth protocols
lsof command can tell the type of socket processes are using.
Internal lsof uses inode numbers on socket fs to resolve the type of
sockets. Files under /proc/net/, such as tcp, udp, unix, etc provides
such inode information.

Unfortunately bluetooth related protocols don't provide such inode
information. This patch series introduces /proc/net files for the protocols.

This patch against af_bluetooth.c provides facility to the implementation
of protocols. This patch extends bt_sock_list and introduces two exported
function bt_procfs_init, bt_procfs_cleanup.

The type bt_sock_list is already used in some of implementation of
protocols. bt_procfs_init prepare seq_operations which converts
protocol own bt_sock_list data to protocol own proc entry when the
entry is accessed.

What I, lsof user, need is just inode number of bluetooth
socket. However, people may want more information. The bt_procfs_init
takes a function pointer for customizing the show handler of
seq_operations.

In v4 patch, __acquires and __releases attributes are added to suppress
sparse warning. Suggested by Andrei Emeltchenko.

In v5 patch, linux/proc_fs.h is included to use PDE. Build error is
reported by Fengguang Wu.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:58 -03:00
Jaganath Kanakkassery
4af66c691f Bluetooth: Free the l2cap channel list only when refcount is zero
Move the l2cap channel list chan->global_l under the refcnt
protection and free it based on the refcnt.

Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:58 -03:00
Jaganath Kanakkassery
3064837289 Bluetooth: Move l2cap_chan_hold/put to l2cap_core.c
Refactor the code in order to use the l2cap_chan_destroy()
from l2cap_chan_put() under the refcnt protection.

Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:58 -03:00
Andrei Emeltchenko
9e66463127 Bluetooth: Make connect / disconnect cfm functions return void
Return values are never used because callers hci_proto_connect_cfm
and hci_proto_disconn_cfm return void.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:58 -03:00
Andrei Emeltchenko
ab846ec4ea Bluetooth: Define AMP controller statuses
AMP status codes copied from Bluez patch sent by Peter Krystad
<pkrystad@codeaurora.org>.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:55 -03:00
Andrei Emeltchenko
b93a68295f Bluetooth: trivial: Fix mixing spaces and tabs in smp
Change spaces to tabs in smp code

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:55 -03:00
Andrei Emeltchenko
71becf0cea Bluetooth: debug: Fix printing refcnt for hci_conn
Use the same style for refcnt printing through all Bluetooth code
taking the reference the l2cap_chan refcnt printing.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:55 -03:00
Andrei Emeltchenko
bb4b2a9ae3 Bluetooth: mgmt: Managing only BR/EDR HCI controllers
Add check that HCI controller is BR/EDR. AMP controller shall not be
managed by mgmt interface and consequently user space.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:54 -03:00
Andre Guedes
3f1732462c Bluetooth: Remove missing code
This patch removes the struct adv_entry since it is not used anymore.
This struct should have been removed in commit 479453d (Bluetooth:
Remove advertising cache).

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:54 -03:00
John W. Linville
1a26904eb6 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-08-02 13:49:38 -04:00
Seth Forshee
03f6b0843a cfg80211: add channel flag to prohibit OFDM operation
Currently the only way for wireless drivers to tell whether or not OFDM
is allowed on the current channel is to check the regulatory
information. However, this requires hodling cfg80211_mutex, which is not
visible to the drivers.

Other regulatory restrictions are provided as flags in the channel
definition, so let's do similarly with OFDM. This patch adds a new flag,
IEEE80211_CHAN_NO_OFDM, to tell drivers that OFDM on a channel is not
allowed. This flag is set on any channels for which regulatory indicates
that OFDM is prohibited.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-02 15:30:49 +02:00
Fan Du
e3c0d04750 Fix unexpected SA hard expiration after changing date
After SA is setup, one timer is armed to detect soft/hard expiration,
however the timer handler uses xtime to do the math. This makes hard
expiration occurs first before soft expiration after setting new date
with big interval. As a result new child SA is deleted before rekeying
the new one.

Signed-off-by: Fan Du <fdu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02 00:19:17 -07:00
Ben Hutchings
1485348d24 tcp: Apply device TSO segment limit earlier
Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs.  This avoids the need to fall back to
software GSO for local TCP senders.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02 00:19:17 -07:00
Linus Torvalds
ac694dbdbc Merge branch 'akpm' (Andrew's patch-bomb)
Merge Andrew's second set of patches:
 - MM
 - a few random fixes
 - a couple of RTC leftovers

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  rtc/rtc-88pm80x: remove unneed devm_kfree
  rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
  mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
  tmpfs: distribute interleave better across nodes
  mm: remove redundant initialization
  mm: warn if pg_data_t isn't initialized with zero
  mips: zero out pg_data_t when it's allocated
  memcg: gix memory accounting scalability in shrink_page_list
  mm/sparse: remove index_init_lock
  mm/sparse: more checks on mem_section number
  mm/sparse: optimize sparse_index_alloc
  memcg: add mem_cgroup_from_css() helper
  memcg: further prevent OOM with too many dirty pages
  memcg: prevent OOM with too many dirty pages
  mm: mmu_notifier: fix freed page still mapped in secondary MMU
  mm: memcg: only check anon swapin page charges for swap cache
  mm: memcg: only check swap cache pages for repeated charging
  mm: memcg: split swapin charge function into private and public part
  mm: memcg: remove needless !mm fixup to init_mm when charging
  mm: memcg: remove unneeded shmem charge type
  ...
2012-07-31 19:25:39 -07:00
Mel Gorman
c76562b670 netvm: prevent a stream-specific deadlock
This patch series is based on top of "Swap-over-NBD without deadlocking
v15" as it depends on the same reservation of PF_MEMALLOC reserves logic.

When a user or administrator requires swap for their application, they
create a swap partition and file, format it with mkswap and activate it
with swapon.  In diskless systems this is not an option so if swap if
required then swapping over the network is considered.  The two likely
scenarios are when blade servers are used as part of a cluster where the
form factor or maintenance costs do not allow the use of disks and thin
clients.

The Linux Terminal Server Project recommends the use of the Network Block
Device (NBD) for swap but this is not always an option.  There is no
guarantee that the network attached storage (NAS) device is running Linux
or supports NBD.  However, it is likely that it supports NFS so there are
users that want support for swapping over NFS despite any performance
concern.  Some distributions currently carry patches that support swapping
over NFS but it would be preferable to support it in the mainline kernel.

Patch 1 avoids a stream-specific deadlock that potentially affects TCP.

Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
	reserves.

Patch 3 adds three helpers for filesystems to handle swap cache pages.
	For example, page_file_mapping() returns page->mapping for
	file-backed pages and the address_space of the underlying
	swap file for swap cache pages.

Patch 4 adds two address_space_operations to allow a filesystem
	to pin all metadata relevant to a swapfile in memory. Upon
	successful activation, the swapfile is marked SWP_FILE and
	the address space operation ->direct_IO is used for writing
	and ->readpage for reading in swap pages.

Patch 5 notes that patch 3 is bolting
	filesystem-specific-swapfile-support onto the side and that
	the default handlers have different information to what
	is available to the filesystem. This patch refactors the
	code so that there are generic handlers for each of the new
	address_space operations.

Patch 6 adds an API to allow a vector of kernel addresses to be
	translated to struct pages and pinned for IO.

Patch 7 adds support for using highmem pages for swap by kmapping
	the pages before calling the direct_IO handler.

Patch 8 updates NFS to use the helpers from patch 3 where necessary.

Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.

Patch 10 implements the new swapfile-related address_space operations
	for NFS and teaches the direct IO handler how to manage
	kernel addresses.

Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
	where appropriate.

Patch 12 fixes a NULL pointer dereference that occurs when using
	swap-over-NFS.

With the patches applied, it is possible to mount a swapfile that is on an
NFS filesystem.  Swap performance is not great with a swap stress test
taking roughly twice as long to complete than if the swap device was
backed by NBD.

This patch: netvm: prevent a stream-specific deadlock

It could happen that all !SOCK_MEMALLOC sockets have buffered so much data
that we're over the global rmem limit.  This will prevent SOCK_MEMALLOC
buffers from receiving data, which will prevent userspace from running,
which is needed to reduce the buffered data.

Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.  Once
this change it applied, it is important that sockets that set
SOCK_MEMALLOC do not clear the flag until the socket is being torn down.
If this happens, a warning is generated and the tokens reclaimed to avoid
accounting errors until the bug is fixed.

[davem@davemloft.net: Warning about clearing SOCK_MEMALLOC]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:47 -07:00
Mel Gorman
b4b9e35585 netvm: set PF_MEMALLOC as appropriate during SKB processing
In order to make sure pfmemalloc packets receive all memory needed to
proceed, ensure processing of pfmemalloc SKBs happens under PF_MEMALLOC.
This is limited to a subset of protocols that are expected to be used for
writing to swap.  Taps are not allowed to use PF_MEMALLOC as these are
expected to communicate with userspace processes which could be paged out.

[a.p.zijlstra@chello.nl: Ideas taken from various patches]
[jslaby@suse.cz: Lock imbalance fix]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:46 -07:00
Mel Gorman
c93bdd0e03 netvm: allow skb allocation to use PFMEMALLOC reserves
Change the skb allocation API to indicate RX usage and use this to fall
back to the PFMEMALLOC reserve when needed.  SKBs allocated from the
reserve are tagged in skb->pfmemalloc.  If an SKB is allocated from the
reserve and the socket is later found to be unrelated to page reclaim, the
packet is dropped so that the memory remains available for page reclaim.
Network protocols are expected to recover from this packet loss.

[a.p.zijlstra@chello.nl: Ideas taken from various patches]
[davem@davemloft.net: Use static branches, coding style corrections]
[sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:46 -07:00
Mel Gorman
7cb0240492 netvm: allow the use of __GFP_MEMALLOC by specific sockets
Allow specific sockets to be tagged SOCK_MEMALLOC and use __GFP_MEMALLOC
for their allocations.  These sockets will be able to go below watermarks
and allocate from the emergency reserve.  Such sockets are to be used to
service the VM (iow.  to swap over).  They must be handled kernel side,
exposing such a socket to user-space is a bug.

There is a risk that the reserves be depleted so for now, the
administrator is responsible for increasing min_free_kbytes as necessary
to prevent deadlock for their workloads.

[a.p.zijlstra@chello.nl: Original patches]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:46 -07:00
Mel Gorman
99a1dec70d net: introduce sk_gfp_atomic() to allow addition of GFP flags depending on the individual socket
Introduce sk_gfp_atomic(), this function allows to inject sock specific
flags to each sock related allocation.  It is only used on allocation
paths that may be required for writing pages back to network storage.

[davem@davemloft.net: Use sk_gfp_atomic only when necessary]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:46 -07:00
Andrew Morton
c255a45805 memcg: rename config variables
Sanity:

CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

[mhocko@suse.cz: fix missed bits]
Cc: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:43 -07:00
David S. Miller
caacf05e5a ipv4: Properly purge netdev references on uncached routes.
When a device is unregistered, we have to purge all of the
references to it that may exist in the entire system.

If a route is uncached, we currently have no way of accomplishing
this.

So create a global list that is scanned when a network device goes
down.  This mirrors the logic in net/core/dst.c's dst_ifdown().

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-31 15:06:50 -07:00
David S. Miller
c5038a8327 ipv4: Cache routes in nexthop exception entries.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-31 15:02:02 -07:00
Eric Dumazet
d26b3a7c4b ipv4: percpu nh_rth_output cache
Input path is mostly run under RCU and doesnt touch dst refcnt

But output path on forwarding or UDP workloads hits
badly dst refcount, and we have lot of false sharing, for example
in ipv4_mtu() when reading rt->rt_pmtu

Using a percpu cache for nh_rth_output gives a nice performance
increase at a small cost.

24 udpflood test on my 24 cpu machine (dummy0 output device)
(each process sends 1.000.000 udp frames, 24 processes are started)

before : 5.24 s
after : 2.06 s
For reference, time on linux-3.5 : 6.60 s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-31 14:41:39 -07:00
Eric Dumazet
54764bb647 ipv4: Restore old dst_free() behavior.
commit 404e0a8b6a (net: ipv4: fix RCU races on dst refcounts) tried
to solve a race but added a problem at device/fib dismantle time :

We really want to call dst_free() as soon as possible, even if sockets
still have dst in their cache.
dst_release() calls in free_fib_info_rcu() are not welcomed.

Root of the problem was that now we also cache output routes (in
nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
rt_free(), because output route lookups are done in process context.

Based on feedback and initial patch from David Miller (adding another
call_rcu_bh() call in fib, but it appears it was not the right fix)

I left the inet_sk_rx_dst_set() helper and added __rcu attributes
to nh_rth_output and nh_rth_input to better document what is going on in
this code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-31 14:41:38 -07:00