linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-02-05 13:45:21 +07:00

Author	SHA1	Message	Date
Fred Zhou	1f4ffde845	mac80211: improve default WMM parameter setting Move the default setting for WMM parameters outside the for loop to avoid redundant assignment multiple times. Signed-off-by: Fred Zhou <fred.zy@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-10-01 12:24:29 +02:00
Michal Kazior	0cfcefef19	mac80211: support reporting A-MSDU subframes individually Some devices may not be able to report A-MSDUs in single buffers. Drivers for such devices were forced to re-assemble A-MSDUs which would then be eventually disassembled by mac80211. This could lead to CPU cache thrashing and poor performance. Since A-MSDU has a single sequence number all subframes share it. This was in conflict with retransmission/duplication recovery (IEEE802.11-2012: 9.3.2.10). Patch introduces a new flag that is meant to be set for all individually reported A-MSDU subframes except the last one. This ensures the last_seq_ctrl is updated after the last subframe is processed. If an A-MSDU is actually a duplicate transmission all reported subframes will be properly discarded. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> [johannes: add braces that were missing even before] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-10-01 12:22:03 +02:00
Fred Zhou	15e230abaa	mac80211: use exact-size allocation for authentication frame The authentication frame has a fixied size of 30 bytes (including header, algo num, trans seq num, and status) followed by a variable challenge text. Allocate using exact size, instead of over-allocation by sizeof(ieee80211_mgmt). Signed-off-by: Fred Zhou <fred.zy@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-10-01 12:20:38 +02:00
Janusz Dziedzic	f0823475d5	cfg80211: parse dfs region for internal regdb option Add support for parsing and setting the dfs region (ETSI, FCC, JP) when the internal regulatory database is used. Before this the DFS region was being ignored even if present on the used db.txt Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-10-01 12:18:36 +02:00
Johannes Berg	55fff50113	mac80211: add explicit IBSS driver operations This can be useful for drivers if they have any failure cases when joining an IBSS. Also move setting the queue parameters to before this new call, in case the new driver op needs them already. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-10-01 12:17:45 +02:00
Eliad Peller	5eb7906b47	ieee80211: fix vht cap definitions VHT_CAP_BEAMFORMER_ANTENNAS cap is actually defined in the draft as VHT_CAP_BEAMFORMEE_STS_MAX, and its size is 3 bits long. VHT_CAP_SOUNDING_DIMENSIONS is also 3 bits long. Fix the definitions and change the cap masking accordingly. Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-10-01 12:17:08 +02:00
Eliad Peller	f364ef99a8	mac80211: fix some snprintf misuses In some debugfs related functions snprintf was used while scnprintf should have been used instead. (blindly adding the return value of snprintf and supplying it to the next snprintf might result in buffer overflow when the input is too big) Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-10-01 12:16:51 +02:00
John W. Linville	15214c2f6c	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2013-09-30 16:14:27 -04:00
Jouni Malinen	22c4ceed01	mac80211: Run deferred scan if last roc_list item is not started mac80211 scan processing could get stuck if roc work for pending, but not started when a scan request was deferred due to such roc item. Normally the deferred scan would be started from ieee80211_start_next_roc(), but ieee80211_sw_roc_work() calls that only if the finished ROC was started. Fix this by calling ieee80211_run_deferred_scan() in the case the last ROC was not actually started. This issue was hit relatively easily in P2P find operations where Listen state (remain-on-channel) and Search state (scan) are repeated in a loop. Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-30 12:36:56 +02:00
Felix Fietkau	0c5b93290b	mac80211: update sta->last_rx on acked tx frames When clients are idle for too long, hostapd sends nullfunc frames for probing. When those are acked by the client, the idle time needs to be updated. To make this work (and to avoid unnecessary probing), update sta->last_rx whenever an ACK was received for a tx packet. Only do this if the flag IEEE80211_HW_REPORTS_TX_ACK_STATUS is set. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-30 12:34:09 +02:00
Felix Fietkau	03bb7f4276	mac80211: use sta_info_get_bss() for nl80211 tx and client probing This allows calls for clients in AP_VLANs (e.g. for 4-addr) to succeed Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-30 11:30:57 +02:00
Gustavo Padovan	1025c04cec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Conflicts: net/bluetooth/hci_core.c	2013-09-27 11:56:14 -03:00
Johannes Berg	aa5f66d5a1	cfg80211: fix sysfs registration race My locking rework/race fixes caused a regression in the registration, causing uevent notifications for wireless devices before the device is really fully registered and available in nl80211. Fix this by moving the device_add() under rtnl and move the rfkill to afterwards (it can't be under rtnl.) Reported-and-tested-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 20:03:45 +02:00
Chun-Yeow Yeoh	cc63ec766b	mac80211: fix the setting of extended supported rate IE The patch "mac80211: select and adjust bitrates according to channel mode" causes regression and breaks the extended supported rate IE setting. Since "i" is starting with 8, so this is not necessary to introduce "skip" here. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com> Signed-off-by: Colleen Twitty <colleen@cozybit.com> Reviewed-by: Jason Abele <jason@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 19:56:59 +02:00
Felix Fietkau	6329b8d917	mac80211: drop spoofed packets in ad-hoc mode If an Ad-Hoc node receives packets with the Cell ID or its own MAC address as source address, it hits a WARN_ON in sta_info_insert_check() With many packets, this can massively spam the logs. One way that this can easily happen is through having Cisco APs in the area with rouge AP detection and countermeasures enabled. Such Cisco APs will regularly send fake beacons, disassoc and deauth packets that trigger these warnings. To fix this issue, drop such spoofed packets early in the rx path. Cc: stable@vger.kernel.org Reported-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 19:56:06 +02:00
John W. Linville	7c6a4acc64	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	2013-09-26 13:47:05 -04:00
Bruno Randolf	f478f33a93	cfg80211: fix warning when using WEXT for IBSS Fix kernel warning when using WEXT for configuring ad-hoc mode, e.g. "iwconfig wlan0 essid test channel 1" WARNING: at net/wireless/chan.c:373 cfg80211_chandef_usable+0x50/0x21c [cfg80211]() The warning is caused by an uninitialized variable center_freq1. Cc: stable@vger.kernel.org Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 19:43:14 +02:00
Simon Wunderlich	ee4bc9e758	nl80211: enable IBSS support for channel switch announcements Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:27:15 +02:00
Simon Wunderlich	9449410f3b	mac80211: send a CSA action frame when changing channel IBSS members may not immediately be able to send out their beacon when performing CSA, therefore also send a CSA action frame. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:27:15 +02:00
Simon Wunderlich	cd7760e62c	mac80211: add support for CSA in IBSS mode This function adds the channel switch announcement implementation for the IBSS code. It is triggered by userspace (mac80211/cfg) or by external channel switch announcement, which have to be adopted. Both CSAs in beacons and action frames are supported. As for AP mode, the channel switch is applied after some time. However in IBSS mode, the channel switch IEs are generated in the kernel. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:27:14 +02:00
Simon Wunderlich	871a4180b8	mac80211: split off ibss disconnect IBSS CSA will require to disconnect if a channel switch fails, but mac80211 should search and re-connect after this disconnect. To allow such usage, split off the ibss disconnect process in a separate function which only performs the disconnect without overwriting nl80211-supplied parameters. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:27:14 +02:00
Simon Wunderlich	e6b7cde4d3	mac80211: split off channel switch parsing function The channel switch parsing function can be re-used for the IBSS code, put the common part into an extra function. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> [also move/rename chandef_downgrade] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:27:13 +02:00
Simon Wunderlich	774f073461	cfg80211: export cfg80211_chandef_dfs_required It will be used later by the IBSS CSA implementation of mac80211. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:27:13 +02:00
Lorenzo Bianconi	37feb7e2fb	mac80211: do not override fixed_rate_idx in minstrel_ht_update_stats Do not override max_tp_rate, max_tp_rate2 and max_prob_rate configured according to fixed_rate in minstrel_ht_update_stats throughput computation Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:27:12 +02:00
Lorenzo Bianconi	45966aebad	mac80211: add fixed_rate management to minstrel rc Add the capability to use a fixed modulation rate to minstrel rate controller Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:27:12 +02:00
Stanislaw Gruszka	392b9ffb05	mac80211: change beacon/connection polling Since when we detect beacon lost we do active AP probing (using nullfunc frame or probe request) there is no need to have beacon polling. Flags IEEE80211_STA_BEACON_POLL seems to be used just for historical reasons. Change also make that after we start connection poll due to beacon loss, next received beacon will abort the poll. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:27:11 +02:00
Luciano Coelho	180032973e	cfg80211: use the correct macro to check for active monitor support Use MONITOR_FLAG_ACTIVE, which is a flag mask, instead of NL80211_MNTR_FLAG_ACTIVE, which is a flag index, when checking if the hardware supports active monitoring. Cc: stable@vger.kernel.org Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:22:45 +02:00
Sergey Ryazanov	a6ececf4ee	mac80211: Remove superfluous is_multicast_ether_addr() call Remove superfluous call and use locally stored previous result. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:21:39 +02:00
Johannes Berg	c5dc164df6	mac80211: use ERR_CAST() No need for ERR_PTR(PTR_ERR()) since there's ERR_CAST, use it. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:21:38 +02:00
Johannes Berg	c7c71066c2	mac80211: add ieee80211_iterate_active_interfaces_rtnl() If it is needed to disconnect multiple virtual interfaces after (WoWLAN-) suspend, the most obvious approach would be to iterate all interfaces by calling ieee80211_iterate_active_interfaces() and then call ieee80211_resume_disconnect() for each one. This is what the iwlmvm driver does. Unfortunately, this causes a locking dependency from mac80211's iflist_mtx to the key_mtx. This is problematic as the former is intentionally never held while calling any driver operation to allow drivers to iterate with their own locks held. The key_mtx is held while installing a key into the driver though, so this new lock dependency means drivers implementing the logic above can no longer hold their own lock while iterating. To fix this, add a new ieee80211_iterate_active_interfaces_rtnl() function that iterates while the RTNL is already held. This is true during suspend/resume, so that then the locking dependency isn't introduced. While at it, also refactor the various interface iterators and keep only a single implementation called by the various cases. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-09-26 13:21:37 +02:00
Johan Hedberg	4375f1037d	Bluetooth: Add new mgmt_set_advertising command This patch adds a new mgmt command for enabling and disabling LE advertising. The command depends on the LE setting being enabled first and will return a "rejected" response otherwise. The patch also adds safeguards so that there will ever only be one set_le or set_advertising command pending per adapter. The response handling and new_settings event sending is done in an asynchronous request callback, meaning raw HCI access from user space to enable advertising (e.g. hciconfig leadv) will not trigger the new_settings event. This is intentional since trying to support mixed raw HCI and mgmt access would mean adding extra state tracking or new helper functions, essentially negating the benefit of using the asynchronous request framework. The HCI_LE_ENABLED and HCI_LE_PERIPHERAL flags however are updated correctly even with raw HCI access so this will not completely break subsequent access over mgmt. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:11 -03:00
Johan Hedberg	eeca6f8913	Bluetooth: Add new mgmt setting for LE advertising This patch adds a new mgmt setting for LE advertising and hooks up the necessary places in the mgmt code to operate on the HCI_LE_PERIPHERAL flag (which corresponds to this setting). This patch does not yet add any new command for enabling the setting - that is left for a subsequent patch. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:11 -03:00
Johan Hedberg	416a4ae56b	Bluetooth: Use async request for LE enable/disable This patch updates the code to use an asynchronous request for handling the enabling and disabling of LE support. This refactoring is necessary as a preparation for adding advertising support, since when LE is disabled we should also disable advertising, and the cleanest way to do this is to perform the two respective HCI commands in the same asynchronous request. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:11 -03:00
Johan Hedberg	bd99abdd5b	Bluetooth: Move mgmt response convenience functions to a better location The settings_rsp and cmd_status_rsp functions can be useful for all mgmt command handlers when asynchronous request callbacks are used. They will e.g. be used by subsequent patches to change set_le to use an async request as well as a new set_advertising command. Therefore, move them higher up in the mgmt.c file to avoid unnecessary forward declarations or mixing this trivial change with other patches. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Johan Hedberg	87b95ba64e	Bluetooth: Fix busy return for mgmt_set_powered in some cases We should return a "busy" error always when there is another mgmt_set_powered operation in progress. Previously when powering on while the auto off timer was still set the code could have let two or more pending power on commands to be queued. This patch fixes the issue by moving the check for duplicate commands to an earlier point in the set_powered handler. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Johan Hedberg	970871bc9c	Bluetooth: Clean up socket locking in l2cap_sock_recvmsg This patch cleans up the locking login in l2cap_sock_recvmsg by pairing up each lock_sock call with a release_sock call. The function already has a "done" label that handles releasing the socket and returning from the function so the fix is rather simple. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Johan Hedberg	0fba96f97b	Bluetooth: Add clarifying comment to bt_sock_wait_state() The bt_sock_wait_state requires the sk lock to be held (through lock_sock) so document it clearly in the code. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Eric Lapuyade	2bed278517	NFC: NCI: Modify NCI SPI to implement CS/INT handshake per the spec The NFC Forum NCI specification defines both a hardware and software protocol when using a SPI physical transport to connect an NFC NCI Chipset. The hardware requirement is that, after having raised the chip select line, the SPI driver must wait for an INT line from the NFC chipset to raise before it sends the data. The chip select must be raised first though, because this is the signal that the NFC chipset will detect to wake up and then raise its INT line. If the INT line doesn't raise in a timely fashion, the SPI driver should abort operation. When data is transferred from Device host (DH) to NFC Controller (NFCC), the signaling sequence is the following: Data Transfer from DH to NFCC • 1-Master asserts SPI_CSN • 2-Slave asserts SPI_INT • 3-Master sends NCI-over-SPI protocol header and payload data • 4-Slave deasserts SPI_INT • 5-Master deasserts SPI_CSN When data must be transferred from NFCC to DH, things are a little bit different. Data Transfer from NFCC to DH • 1-Slave asserts SPI_INT -> NFC chipset irq handler called -> process reading from SPI • 2-Master asserts SPI_CSN • 3-Master send 2-octet NCI-over-SPI protocol header • 4-Slave sends 2-octet NCI-over-SPI protocol payload length • 5-Slave sends NCI-over-SPI protocol payload • 6-Master deasserts SPI_CSN In this case, SPI driver should function normally as it does today. Note that the INT line can and will be lowered anytime between beginning of step 3 and end of step 5. A low INT is therefore valid after chip select has been raised. This would be easily implemented in a single driver. Unfortunately, we don't write the SPI driver and I had to imagine some workaround trick to get the SPI and NFC drivers to work in a synchronized fashion. The trick is the following: - send an empty spi message: this will raise the chip select line, and send nothing. We expect the /CS line will stay arisen because we asked for it in the spi_transfer cs_change field - wait for a completion, that will be completed by the NFC driver IRQ handler when it knows we are in the process of sending data (NFC spec says that we use SPI in a half duplex mode, so we are either sending or receiving). - when completed, proceed with the normal data send. This has been tested and verified to work very consistently on a Nexus 10 (spi-s3c64xx driver). It may not work the same with other spi drivers. The previously defined nci_spi_ops{} whose intended purpose were to address this problem are not used anymore and therefore totally removed. The nci_spi_send() takes a new optional write_handshake_completion completion pointer. If non NULL, the nci spi layer will run the above trick when sending data to the NFC Chip. If NULL, the data is sent normally all at once and it is then the NFC driver responsibility to know what it's doing. Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 14:59:56 +02:00
Eric Lapuyade	22d4aae589	NFC: NCI: nci_spi_recv_frame() now returns (not forward) the read frame Previously, nci_spi_recv_frame() would directly transmit incoming frames to the NCI Core. However, it turns out that some NFC NCI Chips will add additional proprietary headers that must be handled/removed before NCI Core gets a chance to handle the frame. With this modification, the chip phy or driver are now responsible to transmit incoming frames to NCI Core after proper treatment, and NCI SPI becomes a driver helper instead of sitting between the NFC driver and NCI Core. As a general rule in NFC, *_recv_frame() APIs are used to deliver an incoming frame to an upper layer. To better suit the actual purpose of nci_spi_recv_frame(), and go along with its nci_spi_send() counterpart, the function is renamed to nci_spi_read() The skb is returned as the function result Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 14:25:41 +02:00
Eric Lapuyade	a4ada6cadb	NFC: NCI: zero struct spi_transfer variables before usage Using ARM compiler, and without zero-ing spi_transfer, spi-s3c64xx driver would issue abnormal errors due to bpw field value being set to unexpected value. This structure MUST be set to all zeros except for those field specifically used. Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 14:21:09 +02:00
Samuel Ortiz	5ce3f32b52	NFC: netlink: SE API implementation Implementation of the NFC_CMD_SE_IO command for sending ISO7816 APDUs to NFC embedded secure elements. The reply is forwarded to user space through NFC_CMD_SE_IO as well. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:35:05 +02:00
Thierry Escande	13292c9a1e	NFC: digital: Fix sens_res endiannes handling This was triggered by the following sparse warning: net/nfc/digital_technology.c:272:20: sparse: cast to restricted __be16 The SENS_RES response must be treated as __le16 with the first byte received as LSB and the second one as MSB. This is the way neard handles it in the sens_res field of the nfc_target structure which is treated as u16 in cpu endianness. So le16_to_cpu() is used on the received SENS_RES instead of memcpy'ing it. SENS_RES test macros have also been fixed accordingly. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:42 +02:00
Thierry Escande	4cf7e03296	NFC: rawsock: Fix a memory leak In the rawsock data exchange callback, the sk_buff is not freed on error. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:40 +02:00
Fengguang Wu	180106bd07	NFC: digital: digital_tg_send_sensf_res() can be static Fixes sparse hint: net/nfc/digital_technology.c:640:5: sparse: symbol 'digital_tg_send_sensf_res' was not declared. Should it be static? Cc: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:35 +02:00
Samuel Ortiz	260425308d	NFC: digital: Add newline to pr_* calls We do not add the newline to the pr_fmt macro, in order to give more flexibility to the caller and to keep the logging style consistent with the rest of the NFC and kernel code. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:34 +02:00
Samuel Ortiz	c5da0e4a35	NFC: digital: Remove PR_ERR and PR_DBG macros They can be replaced by the standard pr_err and pr_debug one after defining the right pr_fmt macro. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:32 +02:00
Eric Lapuyade	645d5087bd	NFC: NCI: Store the spi device pointer from the spi instance Storing the spi device was forgotten in the original implementation, which would pretty obviously cause some kind of serious crash when actually trying to send something through that device. Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:30 +02:00
Thierry Escande	1c7a4c24fb	NFC Digital: Add target NFC-DEP support This adds support for NFC-DEP target mode for NFC-A and NFC-F technologies. If the driver provides it, the stack uses an automatic mode for technology detection and automatic anti-collision. Otherwise the stack tries to use non-automatic synchronization and listens for SENS_REQ and SENSF_REQ commands. The detection, activation, and data exchange procedures work exactly the same way as in initiator mode, as described in the previous commits, except that the digital stack waits for commands and sends responses back to the peer device. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:28 +02:00
Thierry Escande	7d0911c02f	NFC Digital: Add initiator NFC-DEP support This adds support for NFC-DEP protocol in initiator mode for NFC-A and NFC-F technologies. When a target is detected, the process flow is as follow: For NFC-A technology: 1 - The digital stack receives a SEL_RES as the reply of the SEL_REQ command. 2 - If b7 of SEL_RES is set, the peer device is configure for NFC-DEP protocol. NFC core is notified through nfc_targets_found(). Execution continues at step 4. 3 - Otherwise, it's a tag and the NFC core is notified. Detection ends. 4 - The digital stacks sends an ATR_REQ command containing a randomly generated NFCID3 and the general bytes obtained from the LLCP layer of NFC core. For NFC-F technology: 1 - The digital stack receives a SENSF_RES as the reply of the SENSF_REQ command. 2 - If B1 and B2 of NFCID2 are 0x01 and 0xFE respectively, the peer device is configured for NFC-DEP protocol. NFC core is notified through nfc_targets_found(). Execution continues at step 4. 3 - Otherwise it's a type 3 tag. NFC core is notified. Detection ends. 4 - The digital stacks sends an ATR_REQ command containing the NFC-F NFCID2 as NFCID3 and the general bytes obtained from the LLCP layer of NFC core. For both technologies: 5 - The digital stacks receives the ATR_RES response containing the NFCID3 and the general bytes of the peer device. 6 - The digital stack notifies NFC core that the DEP link is up through nfc_dep_link_up(). 7 - The NFC core performs data exchange through tm_transceive(). 8 - The digital stack sends a DEP_REQ command containing an I PDU with the data from NFC core. 9 - The digital stack receives a DEP_RES command 10 - If the DEP_RES response contains a supervisor PDU with timeout extension request (RTOX) the digital stack sends a DEP_REQ command containing a supervisor PDU acknowledging the RTOX request. The execution continues at step 9. 11 - If the DEP_RES response contains an I PDU, the response data is passed back to NFC core through the response callback. The execution continues at step 8. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:27 +02:00
Thierry Escande	8c0695e499	NFC Digital: Add NFC-F technology support This adds polling support for NFC-F technology at 212 kbits/s and 424 kbits/s. A user space application like neard can send type 3 tag commands through the NFC core. Process flow for NFC-F detection is as follow: 1 - The digital stack sends the SENSF_REQ command to the NFC device. 2 - A peer device replies with a SENSF_RES response. 3 - The digital stack notifies the NFC core of the presence of a target in the operation field and passes the target NFCID2. This also adds support for CRC calculation of type CRC-F. The CRC calculation is handled by the digital stack if the NFC device doesn't support it. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:25 +02:00
Thierry Escande	2c66daecc4	NFC Digital: Add NFC-A technology support This adds support for NFC-A technology at 106 kbits/s. The stack can detect tags of type 1 and 2. There is no support for collision detection. Tags can be read and written by using a user space application or a daemon like neard. The flow of polling operations for NFC-A detection is as follow: 1 - The digital stack sends the SENS_REQ command to the NFC device. 2 - The NFC device receives a SENS_RES response from a peer device and passes it to the digital stack. 3 - If the SENS_RES response identifies a type 1 tag, detection ends. NFC core is notified through nfc_targets_found(). 4 - Otherwise, the digital stack sets the cascade level of NFCID1 to CL1 and sends the SDD_REQ command. 5 - The digital stack selects SEL_CMD and SEL_PAR according to the cascade level and sends the SDD_REQ command. 4 - The digital stack receives a SDD_RES response for the cascade level passed in the SDD_REQ command. 5 - The digital stack analyses (part of) NFCID1 and verify BCC. 6 - The digital stack sends the SEL_REQ command with the NFCID1 received in the SDD_RES. 6 - The peer device replies with a SEL_RES response 7 - Detection ends if NFCID1 is complete. NFC core notified of new target by nfc_targets_found(). 8 - If NFCID1 is not complete, the cascade level is incremented (up to and including CL3) and the execution continues at step 5 to get the remaining bytes of NFCID1. Once target detection is done, type 1 and 2 tag commands must be handled by a user space application (i.e neard) through the NFC core. Responses for type 1 tag are returned directly to user space via NFC core. Responses of type 2 commands are handled differently. The digital stack doesn't analyse the type of commands sent through im_transceive() and must differentiate valid responses from error ones. The response process flow is as follow: 1 - If the response length is 16 bytes, it is a valid response of a READ command. the packet is returned to the NFC core through the callback passed to im_transceive(). Processing stops. 2 - If the response is 1 byte long and is a ACK byte (0x0A), it is a valid response of a WRITE command for example. First packet byte is set to 0 for no-error and passed back to the NFC core. Processing stops. 3 - Any other response is treated as an error and -EIO error code is returned to the NFC core through the response callback. Moreover, since the driver can't differentiate success response from a NACK response, the digital stack has to handle CRC calculation. Thus, this patch also adds support for CRC calculation. If the driver doesn't handle it, the digital stack will calculate CRC and will add it to sent frames. CRC will also be checked and removed from received frames. Pointers to the correct CRC calculation functions are stored in the digital stack device structure when a target is detected. This avoids the need to check the current target type for every call to im_transceive() and for every response received from a peer device. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:23 +02:00
Thierry Escande	59ee2361c9	NFC Digital: Implement driver commands mechanism This implements the mechanism used to send commands to the driver in initiator mode through in_send_cmd(). Commands are serialized and sent to the driver by using a work item on the system workqueue. Responses are handled asynchronously by another work item. Once the digital stack receives the response through the command_complete callback, the next command is sent to the driver. This also implements the polling mechanism. It's handled by a work item cycling on all supported protocols. The start poll command for a given protocol is sent to the driver using the mechanism described above. The process continues until a peer is discovered or stop_poll is called. This patch implements the poll function for NFC-A that sends a SENS_REQ command and waits for the SENS_RES response. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 02:02:07 +02:00
Thierry Escande	4b10884eb4	NFC: Digital Protocol stack implementation This is the initial commit of the NFC Digital Protocol stack implementation. It offers an interface for devices that don't have an embedded NFC Digital protocol stack. The driver instantiates the digital stack by calling nfc_digital_allocate_device(). Within the nfc_digital_ops structure, the driver specifies a set of function pointers for driver operations. These functions must be implemented by the driver and are: in_configure_hw: Hardware configuration for RF technology and communication framing in initiator mode. This is a synchronous function. in_send_cmd: Initiator mode data exchange using RF technology and framing previously set with in_configure_hw. The peer response is returned through callback cb. If an io error occurs or the peer didn't reply within the specified timeout (ms), the error code is passed back through the resp pointer. This is an asynchronous function. tg_configure_hw: Hardware configuration for RF technology and communication framing in target mode. This is a synchronous function. tg_send_cmd: Target mode data exchange using RF technology and framing previously set with tg_configure_hw. The peer next command is returned through callback cb. If an io error occurs or the peer didn't reply within the specified timeout (ms), the error code is passed back through the resp pointer. This is an asynchronous function. tg_listen: Put the device in listen mode waiting for data from the peer device. This is an asynchronous function. tg_listen_mdaa: If supported, put the device in automatic listen mode with mode detection and automatic anti-collision. In this mode, the device automatically detects the RF technology and executes the anti-collision detection using the command responses specified in mdaa_params. The mdaa_params structure contains SENS_RES, NFCID1, and SEL_RES for 106A RF tech. NFCID2 and system code (sc) for 212F and 424F. The driver returns the NFC-DEP ATR_REQ command through cb. The digital stack deducts the RF tech by analyzing the SoD of the frame containing the ATR_REQ command. This is an asynchronous function. switch_rf: Turns device radio on or off. The stack does not call explicitly switch_rf to turn the radio on. A call to in\|tg_configure_hw must turn the device radio on. abort_cmd: Discard the last sent command. Then the driver registers itself against the digital stack by using nfc_digital_register_device() which in turn registers the digital stack against the NFC core layer. The digital stack implements common NFC operations like dev_up(), dev_down(), start_poll(), stop_poll(), etc. This patch is only a skeleton and NFC operations are just stubs. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 01:35:42 +02:00
Samuel Ortiz	e29a9e2ae1	NFC: Set active target upon DEP up event reception As we can potentially get DEP up events without having sent a netlink command, we need to set the active target properly from dep_link_is_up. Spontaneous DEP up events can come from devices that detected an active p2p target. In that case there is no need to call the netlink DEP up command as the link is already up and running. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 01:35:41 +02:00
Eric Lapuyade	fa544fff62	NFC: NCI: Simplify NCI SPI to become a simple framing/checking layer NCI SPI layer should not manage the nci dev, this is the job of the nci chipset driver. This layer should be limited to frame/deframe nci packets, and optionnaly check integrity (crc) and manage the ack/nak protocol. The NCI SPI must not be mixed up with an NCI dev. spi_[dev\|device] are therefore renamed to a simple spi for more clarity. The header and crc sizes are moved to nci.h so that drivers can use them to reserve space in outgoing skbs. nci_spi_send() is exported to be accessible by drivers. Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 01:35:41 +02:00
Eric Lapuyade	d593751129	NFC: NCI: Rename spi ndev -> nsdev and nci_dev -> ndev for consistency An hci dev is an hdev. An nci dev is an ndev. Calling an nci spi dev an ndev is misleading since it's not the same thing. The nci dev contained in the nci spi dev is also named inconsistently. Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 01:35:40 +02:00
Eric Lapuyade	079797c3b7	NFC: NCI: Fix wrong allocation size in nci_spi_allocate_device() Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 01:35:40 +02:00
Arron Wang	d8eb18eeca	NFC: Export nfc_find_se() This will be needed by all NFC driver implementing the SE ops. Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-09-25 01:35:39 +02:00
Peter Senna Tschudin	941247f910	Bluetooth: Fix assignment of 0/1 to bool variables Convert 0 to false and 1 to true when assigning values to bool variables. Inspired by commit `3db1cd5c05`. The simplified semantic patch that find this problem is as follows (http://coccinelle.lip6.fr/): @@ bool b; @@ ( -b = 0 +b = false \| -b = 1 +b = true ) Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-22 16:59:14 -05:00
Gianluca Anzolin	29cd718beb	Bluetooth: don't release the port in rfcomm_dev_state_change() When the dlc is closed, rfcomm_dev_state_change() tries to release the port in the case it cannot get a reference to the tty. However this is racy and not even needed. Infact as Peter Hurley points out: 1. Only consider dlcs that are 'stolen' from a connected socket, ie. reused. Allocated dlcs cannot have been closed prior to port activate and so for these dlcs a tty reference will always be avail in rfcomm_dev_state_change() -- except for the conditions covered by #2b below. 2. If a tty was at some point previously created for this rfcomm, then either (a) the tty reference is still avail, so rfcomm_dev_state_change() will perform a hangup. So nothing to do, or, (b) the tty reference is no longer avail, and the tty_port will be destroyed by the last tty_port_put() in rfcomm_tty_cleanup. Again, no action required. 3. Prior to obtaining the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will not 'see' a rfcomm_dev so nothing to do here. 4. After releasing the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will 'see' an incomplete rfcomm_dev if a tty reference could not be obtained. Again, the best thing to do here is nothing. Any future attempted open() will block on rfcomm_dev_carrier_raised(). The unconnected device will exist until released by ioctl(RFCOMMRELEASEDEV). The patch removes the aforementioned code and uses the tty_port_tty_hangup() helper to hangup the tty. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-20 14:17:54 -05:00
Linus Torvalds	b75ff5e84b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) If the local_df boolean is set on an SKB we have to allocate a unique ID even if IP_DF is set in the ipv4 headers, from Ansis Atteka. 2) Some fixups for the new chipset support that went into the sfc driver, from Ben Hutchings. 3) Because SCTP bypasses a good chunk of, and actually duplicates, the logic of the ipv6 output path, some IPSEC things don't get done properly. Integrate SCTP better into the ipv6 output path so that these problems are fixed and such issues don't get missed in the future either. From Daniel Borkmann. 4) Fix skge regressions added by the DMA mapping error return checking added in v3.10, from Mikulas Patocka. 5) Kill some more IRQF_DISABLED references, from Michael Opdenacker. 6) Fix races and deadlocks in the bridging code, from Hong Zhiguo. 7) Fix error handling in tun_set_iff(), in particular don't leak resources. From Jason Wang. 8) Prevent format-string injection into xen-netback driver, from Kees Cook. 9) Fix regression added to netpoll ARP packet handling, in particular check for the right ETH_P_ARP protocol code. From Sonic Zhang. 10) Try to deal with AMD IOMMU errors when using r8169 chips, from Francois Romieu. 11) Cure freezes due to recent changes in the rt2x00 wireless driver, from Stanislaw Gruszka. 12) Don't do SPI transfers (which can sleep) in interrupt context in cw1200 driver, from Solomon Peachy. 13) Fix LEDs handling bug in 5720 tg3 chips already handled for 5719. From Nithin Sujir. 14) Make xen_netbk_count_skb_slots() count the actual number of slots that will be used, taking into consideration packing and other issues that the transmit path will run into. From David Vrabel. 15) Use the correct maximum age when calculating the bridge message_age_timer, from Chris Healy. 16) Get rid of memory leaks in mcs7780 IRDA driver, from Alexey Khoroshilov. 17) Netfilter conntrack extensions were converted to RCU but are not always freed properly using kfree_rcu(). Fix from Michal Kubecek. 18) VF reset recovery not being done correctly in qlcnic driver, from Manish Chopra. 19) Fix inverted test in ATM nicstar driver, from Andy Shevchenko. 20) Missing workqueue destroy in cxgb4 error handling, from Wei Yang. 21) Internal switch not initialized properly in bgmac driver, from Rafał Miłecki. 22) Netlink messages report wrong local and remote addresses in IPv6 tunneling, from Ding Zhi. 23) ICMP redirects should not generate socket errors in DCCP and SCTP. We're still working out how this should be handled for RAW and UDP sockets. From Daniel Borkmann and Duan Jiong. 24) We've had several bugs wherein the network namespace's loopback device gets accessed after it is free'd, NULL it out so that we can catch these problems more readily. From Eric W Biederman. 25) Fix regression in TCP RTO calculations, from Neal Cardwell. 26) Fix too early free of xen-netback network device when VIFs still exist. From Paul Durrant. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) netconsole: fix a deadlock with rtnl and netconsole's mutex netpoll: fix NULL pointer dereference in netpoll_cleanup skge: fix broken driver ip: generate unique IP identificator if local fragmentation is allowed ip: use ip_hdr() in __ip_make_skb() to retrieve IP header xen-netback: Don't destroy the netdev until the vif is shut down net:dccp: do not report ICMP redirects to user space cnic: Fix crash in cnic_bnx2x_service_kcq() bnx2x, cnic, bnx2i, bnx2fc: Fix bnx2i and bnx2fc regressions. vxlan: Avoid creating fdb entry with NULL destination tcp: fix RTO calculated from cached RTT drivers: net: phy: cicada.c: clears warning Use #include <linux/io.h> instead of <asm/io.h> net loopback: Set loopback_dev to NULL when freed batman-adv: set the TAG flag for the vid passed to BLA netfilter: nfnetlink_queue: use network skb for sequence adjustment net: sctp: rfc4443: do not report ICMP redirects to user space net: usb: cdc_ether: use usb.h macros whenever possible net: usb: cdc_ether: fix checkpatch errors and warnings net: usb: cdc_ether: Use wwan interface for Telit modules ip6_tunnels: raddr and laddr are inverted in nl msg ...	2013-09-19 13:57:28 -05:00
Nikolay Aleksandrov	d0fe8c888b	netpoll: fix NULL pointer dereference in netpoll_cleanup I've been hitting a NULL ptr deref while using netconsole because the np->dev check and the pointer manipulation in netpoll_cleanup are done without rtnl and the following sequence happens when having a netconsole over a vlan and we remove the vlan while disabling the netconsole: CPU 1 CPU2 removes vlan and calls the notifier enters store_enabled(), calls netdev_cleanup which checks np->dev and then waits for rtnl executes the netconsole netdev release notifier making np->dev == NULL and releases rtnl continues to dereference a member of np->dev which at this point is == NULL Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-19 14:15:53 -04:00
Ansis Atteka	703133de33	ip: generate unique IP identificator if local fragmentation is allowed If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-19 14:11:15 -04:00
Ansis Atteka	749154aa56	ip: use ip_hdr() in __ip_make_skb() to retrieve IP header skb->data already points to IP header, but for the sake of consistency we can also use ip_hdr() to retrieve it. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-19 14:11:15 -04:00
Linus Torvalds	e9ff04dd94	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "These fix several bugs with RBD from 3.11 that didn't get tested in time for the merge window: some error handling, a use-after-free, and a sequencing issue when unmapping and image races with a notify operation. There is also a patch fixing a problem with the new ceph + fscache code that just went in" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: fscache: check consistency does not decrement refcount rbd: fix error handling from rbd_snap_name() rbd: ignore unmapped snapshots that no longer exist rbd: fix use-after free of rbd_dev->disk rbd: make rbd_obj_notify_ack() synchronous rbd: complete notifies before cleaning up osd_client and rbd_dev libceph: add function to ensure notifies are complete	2013-09-19 12:50:37 -05:00
Johan Hedberg	d62e6d67a7	Bluetooth: Add event mask page 2 setting support For those controller that support the HCI_Set_Event_Mask_Page_2 command we should include it in the init sequence. This patch implements sending of the command and enables the events in it based on supported features (currently only CSB is checked). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-19 10:21:44 -05:00
Johan Hedberg	5d4e7e8db0	Bluetooth: Add synchronization train parameters reading support This patch adds support for reading the synchronization train parameters for controllers that support the feature. Since the feature is detectable through the local features page 2, which is retreived only in stage 3 of the HCI init sequence, there is no other option than to add a fourth stage to the init sequence. For now the patch doesn't yet add storing of the parameters, but it is nevertheless convenient to have around to see what kind of parameters various controllers use by default (analyzable e.g. with the btmon user space tool). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-19 10:20:07 -05:00
Johan Hedberg	e793dcf082	Bluetooth: Fix waiting for clearing of BT_SK_SUSPEND flag In the case of blocking sockets we should not proceed with sendmsg() if the socket has the BT_SK_SUSPEND flag set. So far the code was only ensuring that POLLOUT doesn't get set for non-blocking sockets using poll() but there was no code in place to ensure that blocking sockets do the right thing when writing to them. This patch adds a new bt_sock_wait_ready helper function to sleep in the sendmsg call if the BT_SK_SUSPEND flag is set, and wake up as soon as it is unset. It also updates the L2CAP and RFCOMM sendmsg callbacks to take advantage of this new helper function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 17:02:59 -05:00
Johan Hedberg	69c4e4e8b4	Bluetooth: Fix responding to invalid L2CAP signaling commands When we have an LE link we should not respond to any data on the BR/EDR L2CAP signaling channel (0x0001) and vice-versa when we have a BR/EDR link we should not respond to LE L2CAP (CID 0x0005) signaling commands. This patch fixes this issue by checking for a valid link type and ignores data if it is wrong. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:50:53 -05:00
Johan Hedberg	9245e73758	Bluetooth: Fix sending responses to identified L2CAP response packets When L2CAP packets return a non-zero error and the value is passed onwards by l2cap_bredr_sig_cmd this will trigger a command reject packet to be sent. However, the core specification (page 1416 in core 4.0) says the following: "Command Reject packets should not be sent in response to an identified Response packet.". This patch ensures that a command reject packet is not sent for any identified response packet by ignoring the error return value from the response handler functions. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:48:32 -05:00
Johan Hedberg	7c2005d6f9	Bluetooth: Fix L2CAP command reject reason There are several possible reason codes that can be sent in the command reject L2CAP packet. Before this patch the code has used a hard-coded single response code ("command not understood"). This patch adds a helper function to map the return value of an L2CAP handler function to the correct command reject reason. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:45:28 -05:00
Johan Hedberg	c4ea249f5f	Bluetooth: Fix L2CAP Disconnect response for unknown CID If we receive an L2CAP Disconnect Request for an unknown CID we should not just silently drop it but reply with a proper Command Reject response. This patch fixes this by ensuring that the disconnect handler returns a proper error instead of 0 and will cause the function caller to send the right response. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:44:32 -05:00
Johan Hedberg	21870b523e	Bluetooth: Fix L2CAP error return used for failed channel lookups The EFAULT error should only be used for memory address related errors and ENOENT might be needed for other purposes than invalid CID errors. This patch fixes the l2cap_config_req, l2cap_connect_create_rsp and l2cap_create_channel_req handlers to use the unique EBADSLT error to indicate failed lookups on a given CID. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:43:40 -05:00
Johan Hedberg	dc280801da	Bluetooth: Fix double error response for l2cap_create_chan_req When an L2CAP request handler returns non-zero the calling code will send a command reject response. The l2cap_create_chan_req function will in some cases send its own response but then still return a -EFAULT error which would cause two responses to be sent. This patch fixes this by making the function return 0 after sending its own response. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:41:07 -05:00
Johan Hedberg	bf5430360e	Bluetooth: Fix rfkill functionality during the HCI setup stage We need to let the setup stage complete cleanly even when the HCI device is rfkilled. Otherwise the HCI device will stay in an undefined state and never get notified to user space through mgmt (even when it gets unblocked through rfkill). This patch makes sure that hci_dev_open() can be called in the HCI_SETUP stage, that blocking the device doesn't abort the setup stage, and that the device gets proper powered down as soon as the setup stage completes in case it was blocked meanwhile. The bug that this patch fixed can be very easily reproduced using e.g. the rfkill command line too. By running "rfkill block all" before inserting a Bluetooth dongle the resulting HCI device goes into a state where it is never announced over mgmt, not even when "rfkill unblock all" is run. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: stable@vger.kernel.org Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 12:39:23 -05:00
Johan Hedberg	5e130367d4	Bluetooth: Introduce a new HCI_RFKILLED flag This makes it more convenient to check for rfkill (no need to check for dev->rfkill before calling rfkill_blocked()) and also avoids potential races if the RFKILL state needs to be checked from within the rfkill callback. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: stable@vger.kernel.org Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 12:37:27 -05:00
Duan Jiong	bd784a1407	net:dccp: do not report ICMP redirects to user space DCCP shouldn't be setting sk_err on redirects as it isn't an error condition. it should be doing exactly what tcp is doing and leaving the error handler without touching the socket. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-18 12:33:44 -04:00
David S. Miller	4bdc944729	Included change: - fix the Bridge Loop Avoidance component by marking the variables containing the VLAN ID with the HAS_TAG flag when needed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iQIcBAABCAAGBQJSOKw8AAoJEADl0hg6qKeOVkQP/2kTHSfxe/jUBD5Jzk1wtxhi 1aJ2ihqPErYrjE6biY36e0S+ioIzWd44t/nwn1N8LCaxnHi+dhMwyKkqGCTXXmgT NPgdgYfE1zqvsNN8CoETpbxpic21lpXI8HBJ1jMu6yF8vWFRO2ZEXC/h72/bxUM4 sV07n5irn9ssbNhthQbIvoPagHm+Kpr4BRTpqZ0pPyA93GfpKLHTZX67Wi/pftC4 HVapzduObkliqxGJ/Xk8Ng6wNe4XnMuCWotdjaq0Bnff4AvVNoxYwZfpvfgJA8TT /Lttx0bgjLwGLpm3sjfD9b7IcshOFmhiGWuVUUFmlUFBU0gFtlevOEkjAh4bZeHS F8pVXJFBFec7GVBSYnWWRzGte1rYSAMD4kCgOyLjruRf2PhmenqRIQGiIbsjKk2a CU/hfJP4DpM5yDQ+typOrq3j2I4gb9JF7dTv1J7RSXjpxwFwlQeUfttiuLQGltkP WFREgKoikF/MFr2NwoVsckWLDx7+77hFBtdTnnlybtiviO87sEXSfFSS35M8PBnU tbAW5Sj3mD5r6GwQDtwEOhhETWdNtgJNFl4wLmOXl01KFgM9G2FzxUpsHULI2u5H vGdTTer4a482n1y1m8NEJPeo27SMgX2bW7IooTy4qjtSFD/+Bo7ThlVexQmzaYK6 OwqW2nRa4S13MQCPbmAQ =ipxk -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included change: - fix the Bridge Loop Avoidance component by marking the variables containing the VLAN ID with the HAS_TAG flag when needed.	2013-09-18 12:22:17 -04:00
David S. Miller	61c5923a2f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for you net tree, mostly targeted to ipset, they are: * Fix ICMPv6 NAT due to wrong comparison, code instead of type, from Phil Oester. * Fix RCU race in conntrack extensions release path, from Michal Kubecek. * Fix missing inversion in the userspace ipset test command match if the nomatch option is specified, from Jozsef Kadlecsik. * Skip layer 4 protocol matching in ipset in case of IPv6 fragments, also from Jozsef Kadlecsik. * Fix sequence adjustment in nfnetlink_queue due to using the netlink skb instead of the network skb, from Gao feng. * Make sure we cannot swap of sets with different layer 3 family in ipset, from Jozsef Kadlecsik. * Fix possible bogus matching in ipset if hash sets with net elements are used, from Oliver Smith. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-17 20:22:53 -04:00
Neal Cardwell	269aa759b4	tcp: fix RTO calculated from cached RTT Commit `1b7fdd2ab5` ("tcp: do not use cached RTT for RTT estimation") did not correctly account for the fact that crtt is the RTT shifted left 3 bits. Fix the calculation to consistently reflect this fact. Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-By: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-17 19:08:08 -04:00
Antonio Quartulli	4c18c425b2	batman-adv: set the TAG flag for the vid passed to BLA When receiving or sending a packet a packet on a VLAN, the vid has to be marked with the TAG flag in order to make any component in batman-adv understand that the packet is coming from a really tagged network. This fix the Bridge Loop Avoidance behaviour which was not able to send announces over VLAN interfaces. Introduced by 0b1da1765fdb00ca5d53bc95c9abc70dfc9aae5b ("batman-adv: change VID semantic in the BLA code") Signed-off-by: Antonio Quartulli <antonio@open-mesh.org> Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2013-09-17 21:15:16 +02:00
Gao feng	0a0d80eb39	netfilter: nfnetlink_queue: use network skb for sequence adjustment Instead of the netlink skb. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-17 13:05:12 +02:00
Daniel Borkmann	3f96a53211	net: sctp: rfc4443: do not report ICMP redirects to user space Adapt the same behaviour for SCTP as present in TCP for ICMP redirect messages. For IPv6, RFC4443, section 2.4. says: ... (e) An ICMPv6 error message MUST NOT be originated as a result of receiving the following: ... (e.2) An ICMPv6 redirect message [IPv6-DISC]. ... Therefore, do not report an error to user space, just invoke dst's redirect callback and leave, same for IPv4 as done in TCP as well. The implication w/o having this patch could be that the reception of such packets would generate a poll notification and in worst case it could even tear down the whole connection. Therefore, stop updating sk_err on redirects. Reported-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-16 21:40:15 -04:00
Ding Zhi	0d2ede929f	ip6_tunnels: raddr and laddr are inverted in nl msg IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted. Introduced by `c075b13098` (ip6tnl: advertise tunnel param via rtnl). Signed-off-by: Ding Zhi <zhi.ding@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-16 21:36:12 -04:00
Oliver Smith	2cf55125c6	netfilter: ipset: Fix serious failure in CIDR tracking This fixes a serious bug affecting all hash types with a net element - specifically, if a CIDR value is deleted such that none of the same size exist any more, all larger (less-specific) values will then fail to match. Adding back any prefix with a CIDR equal to or more specific than the one deleted will fix it. Steps to reproduce: ipset -N test hash:net ipset -A test 1.1.0.0/16 ipset -A test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS in set ipset -D test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS NOT in set This is due to the fact that the nets counter was unconditionally decremented prior to the iteration that shifts up the entries. Now, we first check if there is a proceeding entry and if not, decrement it and return. Otherwise, we proceed to iterate and then zero the last element, which, in most cases, will already be zero. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-16 20:36:09 +02:00
Jozsef Kadlecsik	169faa2e19	netfilter: ipset: Validate the set family and not the set type family at swapping This closes netfilter bugzilla #843, reported by Quentin Armitage. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-16 20:36:05 +02:00
Jozsef Kadlecsik	0f1799ba1a	netfilter: ipset: Consistent userspace testing with nomatch flag The "nomatch" commandline flag should invert the matching at testing, similarly to the --return-nomatch flag of the "set" match of iptables. Until now it worked with the elements with "nomatch" flag only. From now on it works with elements without the flag too, i.e: # ipset n test hash:net # ipset a test 10.0.0.0/24 nomatch # ipset t test 10.0.0.1 10.0.0.1 is NOT in set test. # ipset t test 10.0.0.1 nomatch 10.0.0.1 is in set test. # ipset a test 192.168.0.0/24 # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is NOT in set test. Before the patch the results were ... # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is in set test. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-16 20:35:55 +02:00
Jozsef Kadlecsik	55524c219a	netfilter: ipset: Skip really non-first fragments for IPv6 when getting port/protocol Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-16 20:33:44 +02:00
Syam Sidhardhan	330b6c1521	Bluetooth: Fix ACL alive for long in case of non pariable devices For certain devices (ex: HID mouse), support for authentication, pairing and bonding is optional. For such devices, the ACL alive for too long after the L2CAP disconnection. To avoid the ACL alive for too long after L2CAP disconnection, reset the ACL disconnect timeout back to HCI_DISCONN_TIMEOUT during L2CAP connect. While merging the commit id:a9ea3ed9b71cc3271dd59e76f65748adcaa76422 this issue might have introduced. Hcidump info: sh-4.1# /opt/hcidump -Xt 2013-08-05 16:49:00.894129 < ACL data: handle 12 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004a scid 0x0041 2013-08-05 16:49:00.894195 < HCI Command: Exit Sniff Mode (0x02\|0x0004) plen 2 handle 12 2013-08-05 16:49:00.894269 < ACL data: handle 12 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x0049 scid 0x0040 2013-08-05 16:49:00.895645 > HCI Event: Command Status (0x0f) plen 4 Exit Sniff Mode (0x02\|0x0004) status 0x00 ncmd 1 2013-08-05 16:49:00.934391 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:49:00.936592 > HCI Event: Number of Completed Packets (0x13) plen 5 handle 12 packets 2 2013-08-05 16:49:00.951577 > ACL data: handle 12 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004a scid 0x0041 2013-08-05 16:49:00.952820 > ACL data: handle 12 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x0049 scid 0x0040 2013-08-05 16:49:00.969165 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x02 interval 50 Mode: Sniff 2013-08-05 16:49:48.175533 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:49:48.219045 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x02 interval 108 Mode: Sniff 2013-08-05 16:51:00.968209 < HCI Command: Disconnect (0x01\|0x0006) plen 3 handle 12 reason 0x13 Reason: Remote User Terminated Connection 2013-08-05 16:51:00.969056 > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01\|0x0006) status 0x00 ncmd 1 2013-08-05 16:51:01.013495 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:51:01.073777 > HCI Event: Disconn Complete (0x05) plen 4 status 0x00 handle 12 reason 0x16 Reason: Connection Terminated by Local Host ============================ After fix ================================ 2013-08-05 16:57:35.986648 < ACL data: handle 11 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004c scid 0x0041 2013-08-05 16:57:35.986713 < HCI Command: Exit Sniff Mode (0x02\|0x0004) plen 2 handle 11 2013-08-05 16:57:35.986785 < ACL data: handle 11 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004b scid 0x0040 2013-08-05 16:57:35.988110 > HCI Event: Command Status (0x0f) plen 4 Exit Sniff Mode (0x02\|0x0004) status 0x00 ncmd 1 2013-08-05 16:57:36.030714 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x00 interval 0 Mode: Active 2013-08-05 16:57:36.032950 > HCI Event: Number of Completed Packets (0x13) plen 5 handle 11 packets 2 2013-08-05 16:57:36.047926 > ACL data: handle 11 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004c scid 0x0041 2013-08-05 16:57:36.049200 > ACL data: handle 11 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004b scid 0x0040 2013-08-05 16:57:36.065509 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x02 interval 50 Mode: Sniff 2013-08-05 16:57:40.052006 < HCI Command: Disconnect (0x01\|0x0006) plen 3 handle 11 reason 0x13 Reason: Remote User Terminated Connection 2013-08-05 16:57:40.052869 > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01\|0x0006) status 0x00 ncmd 1 2013-08-05 16:57:40.104731 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x00 interval 0 Mode: Active 2013-08-05 16:57:40.146935 > HCI Event: Disconn Complete (0x05) plen 4 status 0x00 handle 11 reason 0x16 Reason: Connection Terminated by Local Host Signed-off-by: Sang-Ki Park <sangki79.park@samsung.com> Signed-off-by: Chan-yeol Park <chanyeol.park@samsung.com> Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:41:02 -03:00
Andre Guedes	89cbb4da0a	Bluetooth: Fix encryption key size for peripheral role This patch fixes the connection encryption key size information when the host is playing the peripheral role. We should set conn->enc_key_ size in hci_le_ltk_request_evt, otherwise it is left uninitialized. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:36:56 -03:00
Andre Guedes	f8776218e8	Bluetooth: Fix security level for peripheral role While playing the peripheral role, the host gets a LE Long Term Key Request Event from the controller when a connection is established with a bonded device. The host then informs the LTK which should be used for the connection. Once the link is encrypted, the host gets an Encryption Change Event. Therefore we should set conn->pending_sec_level instead of conn-> sec_level in hci_le_ltk_request_evt. This way, conn->sec_level is properly updated in hci_encrypt_change_evt. Moreover, since we have a LTK associated to the device, we have at least BT_SECURITY_MEDIUM security level. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:36:55 -03:00
Marcel Holtmann	52de599e04	Bluetooth: Only schedule raw queue when user channel is active When the user channel is set and an user application has full control over the device, do not bother trying to schedule any queues except the raw queue. This is an optimization since with user channel, only the raw queue is in use. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:56 -03:00
Marcel Holtmann	a675d7f1a0	Bluetooth: Use GFP_KERNEL when cloning SKB in a workqueue There is no need to use GFP_ATOMIC with skb_clone() when the code is executed in a workqueue. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:56 -03:00
Marcel Holtmann	af750e942e	Bluetooth: Disable upper layer connections when user channel is active When the device has the user channel flag set, it means it is driven by an user application. In that case do not allow any connections from L2CAP or SCO sockets. This is the same situation as when the device has the raw flag set and it will then return EHOSTUNREACH. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:56 -03:00
Marcel Holtmann	23500189d7	Bluetooth: Introduce new HCI socket channel for user operation This patch introcuces a new HCI socket channel that allows user applications to take control over a specific HCI device. The application gains exclusive access to this device and forces the kernel to stay away and not manage it. In case of the management interface it will actually hide the device. Such operation is useful for security testing tools that need to operate underneath the Bluetooth stack and need full control over a device. The advantage here is that the kernel still provides the service of hardware abstraction and HCI level access. The use of Bluetooth drivers for hardware access also means that sniffing tools like btmon or hcidump are still working and the whole set of transaction can be traced with existing tools. With the new channel it is possible to send HCI commands, ACL and SCO data packets and receive HCI events, ACL and SCO packets from the device. The format follows the well established H:4 protocol. The new HCI user channel can only be established when a device has been through its setup routine and is currently powered down. This is enforced to not cause any problems with current operations. In addition only one user channel per HCI device is allowed. It is exclusive access for one user application. Access to this channel is limited to process with CAP_NET_RAW capability. Using this new facility does not require any external library or special ioctl or socket filters. Just create the socket and bind it. After that the file descriptor is ready to speak H:4 protocol. struct sockaddr_hci addr; int fd; fd = socket(AF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI); memset(&addr, 0, sizeof(addr)); addr.hci_family = AF_BLUETOOTH; addr.hci_dev = 0; addr.hci_channel = HCI_CHANNEL_USER; bind(fd, (struct sockaddr *) &addr, sizeof(addr)); The example shows on how to create a user channel for hci0 device. Error handling has been left out of the example. However with the limitations mentioned above it is advised to handle errors. Binding of the user cahnnel socket can fail for various reasons. Specifically if the device is currently activated by BlueZ or if the access permissions are not present. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	0736cfa8e5	Bluetooth: Introduce user channel flag for HCI devices This patch introduces a new user channel flag that allows to give full control of a HCI device to a user application. The kernel will stay away from the device and does not allow any further modifications of the device states. The existing raw flag is not used since it has a bit of unclear meaning due to its legacy. Using a new flag makes the code clearer. A device with the user channel flag set can still be enumerate using the legacy API, but it does not longer enumerate using the new management interface used by BlueZ 5 and beyond. This is intentional to not confuse users of modern systems. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	c1c4f95670	Bluetooth: Restrict ioctls to HCI raw channel sockets The various legacy ioctls used with HCI sockets are limited to raw channel only. They are not used on the other channels and also have no meaning there. So return an error if tried to use them. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	c2371e80b3	Bluetooth: Fix error handling for HCI socket options The HCI sockets for monitor and control do not support any HCI specific socket options and if tried, an error will be returned. However the error used is EINVAL and that is not really descriptive. To make it clear that these sockets are not handling HCI socket options, return EBADFD instead. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	808a049e26	Bluetooth: Report error for HCI reset ioctl when device is down Even if this is legacy API, there is no reason to not report a proper error when trying to reset a HCI device that is down. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	9d4b68b239	Bluetooth: Fix handling of getsockname() for HCI sockets The hci_dev check is not protected and so move it into the socket lock. In addition return the HCI channel identifier instead of always 0 channel. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:54 -03:00
Marcel Holtmann	06f43cbc4d	Bluetooth: Fix handling of getpeername() for HCI sockets The HCI sockets do not have a peer associated with it and so make sure that getpeername() returns EOPNOTSUPP since this operation is actually not supported on HCI sockets. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:54 -03:00
Marcel Holtmann	f81fe64f3d	Bluetooth: Refactor raw socket filter into more readable code The handling of the raw socket filter is rather obscure code and it gets in the way of future extensions. Instead of inline filtering in the raw socket packet routine, refactor it into its own function. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:54 -03:00
Hong Zhiguo	716ec052d2	bridge: fix NULL pointer deref of br_port_get_rcu The NULL deref happens when br_handle_frame is called between these 2 lines of del_nbp: dev->priv_flags &= ~IFF_BRIDGE_PORT; /* --> br_handle_frame is called at this time */ netdev_rx_handler_unregister(dev); In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced without check but br_port_get_rcu(dev) returns NULL if: !(dev->priv_flags & IFF_BRIDGE_PORT) Eric Dumazet pointed out the testing of IFF_BRIDGE_PORT is not necessary here since we're in rcu_read_lock and we have synchronize_net() in netdev_rx_handler_unregister. So remove the testing of IFF_BRIDGE_PORT and by the previous patch, make sure br_port_get_rcu is called in bridging code. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-15 22:03:33 -04:00
Hong Zhiguo	1fb1754a8c	bridge: use br_port_get_rtnl within rtnl lock current br_port_get_rcu is problematic in bridging path (NULL deref). Change these calls in netlink path first. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-15 22:03:33 -04:00
Linus Torvalds	9bf12df31f	Merge git://git.kvack.org/~bcrl/aio-next Pull aio changes from Ben LaHaise: "First off, sorry for this pull request being late in the merge window. Al had raised a couple of concerns about 2 items in the series below. I addressed the first issue (the race introduced by Gu's use of mm_populate()), but he has not provided any further details on how he wants to rework the anon_inode.c changes (which were sent out months ago but have yet to be commented on). The bulk of the changes have been sitting in the -next tree for a few months, with all the issues raised being addressed" * git://git.kvack.org/~bcrl/aio-next: (22 commits) aio: rcu_read_lock protection for new rcu_dereference calls aio: fix race in ring buffer page lookup introduced by page migration support aio: fix rcu sparse warnings introduced by ioctx table lookup patch aio: remove unnecessary debugging from aio_free_ring() aio: table lookup: verify ctx pointer staging/lustre: kiocb->ki_left is removed aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3" aio: be defensive to ensure request batching is non-zero instead of BUG_ON() aio: convert the ioctx list to table lookup v3 aio: double aio_max_nr in calculations aio: Kill ki_dtor aio: Kill ki_users aio: Kill unneeded kiocb members aio: Kill aio_rw_vect_retry() aio: Don't use ctx->tail unnecessarily aio: io_cancel() no longer returns the io_event aio: percpu ioctx refcount aio: percpu reqs_available aio: reqs_active -> reqs_available aio: fix build when migration is disabled ...	2013-09-13 10:55:58 -07:00
Martin Schwidefsky	0244ad004a	Remove GENERIC_HARDIRQ config option After the last architecture switched to generic hard irqs the config options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code for !CONFIG_GENERIC_HARDIRQS can be removed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-09-13 15:09:52 +02:00
Phil Oester	d830f0fa1d	netfilter: nf_nat_proto_icmpv6:: fix wrong comparison in icmpv6_manip_pkt In commit `58a317f1` (netfilter: ipv6: add IPv6 NAT support), icmpv6_manip_pkt was added with an incorrect comparison of ICMP codes to types. This causes problems when using NAT rules with the --random option. Correct the comparison. This closes netfilter bugzilla #851, reported by Alexander Neumann. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-13 11:58:48 +02:00
Herbert Xu	be4f154d5e	bridge: Clamp forward_delay when enabling STP At some point limits were added to forward_delay. However, the limits are only enforced when STP is enabled. This created a scenario where you could have a value outside the allowed range while STP is disabled, which then stuck around even after STP is enabled. This patch fixes this by clamping the value when we enable STP. I had to move the locking around a bit to ensure that there is no window where someone could insert a value outside the range while we're in the middle of enabling STP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-12 23:32:14 -04:00
Chris Healy	9a0620133c	resubmit bridge: fix message_age_timer calculation This changes the message_age_timer calculation to use the BPDU's max age as opposed to the local bridge's max age. This is in accordance with section 8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification. With the current implementation, when running with very large bridge diameters, convergance will not always occur even if a root bridge is configured to have a longer max age. Tested successfully on bridge diameters of ~200. Signed-off-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-12 23:30:37 -04:00
Linus Torvalds	ac4de9543a	Merge branch 'akpm' (patches from Andrew Morton) Merge more patches from Andrew Morton: "The rest of MM. Plus one misc cleanup" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits) mm/Kconfig: add MMU dependency for MIGRATION. kernel: replace strict_strto() with kstrto() mm, thp: count thp_fault_fallback anytime thp fault fails thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page() thp: do_huge_pmd_anonymous_page() cleanup thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() mm: cleanup add_to_page_cache_locked() thp: account anon transparent huge pages into NR_ANON_PAGES truncate: drop 'oldsize' truncate_pagecache() parameter mm: make lru_add_drain_all() selective memcg: document cgroup dirty/writeback memory statistics memcg: add per cgroup writeback pages accounting memcg: check for proper lock held in mem_cgroup_update_page_stat memcg: remove MEMCG_NR_FILE_MAPPED memcg: reduce function dereference memcg: avoid overflow caused by PAGE_ALIGN memcg: rename RESOURCE_MAX to RES_COUNTER_MAX memcg: correct RESOURCE_MAX to ULLONG_MAX mm: memcg: do not trap chargers with full callstack on OOM mm: memcg: rework and document OOM waiting and wakeup ...	2013-09-12 15:44:27 -07:00
Sha Zhengju	6de5a8bfca	memcg: rename RESOURCE_MAX to RES_COUNTER_MAX RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Signed-off-by: Qiang Huang <h.huangqiang@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-12 15:38:02 -07:00
Linus Torvalds	26935fb06e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 4 from Al Viro: "list_lru pile, mostly" This came out of Andrew's pile, Al ended up doing the merge work so that Andrew didn't have to. Additionally, a few fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) super: fix for destroy lrus list_lru: dynamically adjust node arrays shrinker: Kill old ->shrink API. shrinker: convert remaining shrinkers to count/scan API staging/lustre/libcfs: cleanup linux-mem.h staging/lustre/ptlrpc: convert to new shrinker API staging/lustre/obdclass: convert lu_object shrinker to count/scan API staging/lustre/ldlm: convert to shrinkers to count/scan API hugepage: convert huge zero page shrinker to new shrinker API i915: bail out earlier when shrinker cannot acquire mutex drivers: convert shrinkers to new count/scan API fs: convert fs shrinkers to new scan/count API xfs: fix dquot isolation hang xfs-convert-dquot-cache-lru-to-list_lru-fix xfs: convert dquot cache lru to list_lru xfs: rework buffer dispose list tracking xfs-convert-buftarg-lru-to-generic-code-fix xfs: convert buftarg LRU to generic code fs: convert inode and dentry shrinking to be node aware vmscan: per-node deferred work ...	2013-09-12 15:01:38 -07:00
Daniel Borkmann	95ee62083c	net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmit Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport does not seem to have the desired effect: SCTP + IPv4: 22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116) 192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72 22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340) 192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1): SCTP + IPv6: 22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364) fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp 1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10] Moreover, Alan says: This problem was seen with both Racoon and Racoon2. Other people have seen this with OpenSwan. When IPsec is configured to encrypt all upper layer protocols the SCTP connection does not initialize. After using Wireshark to follow packets, this is because the SCTP packet leaves Box A unencrypted and Box B believes all upper layer protocols are to be encrypted so it drops this packet, causing the SCTP connection to fail to initialize. When IPsec is configured to encrypt just SCTP, the SCTP packets are observed unencrypted. In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext" string on the other end, results in cleartext on the wire where SCTP eventually does not report any errors, thus in the latter case that Alan reports, the non-paranoid user might think he's communicating over an encrypted transport on SCTP although he's not (tcpdump ... -X): ... 0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000 ]p.......}.l.... 0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000 ....plaintext... Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the receiver side. Initial follow-up analysis from Alan's bug report was done by Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this. SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit(). This has the implication that it probably never really got updated along with changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers. SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since a call to inet6_csk_xmit() would solve this problem, but result in unecessary route lookups, let us just use the cached flowi6 instead that we got through sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(), we do the route lookup / flow caching in sctp_transport_route(), hold it in tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst() instead to get the correct source routed dst entry, which we assign to the skb. Also source address routing example from `625034113` ("sctp: fix sctp to work with ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095 it is actually 'recommended' to not use that anyway due to traffic amplification [1]. So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if we overwrite the flow destination here, the lower IPv6 layer will be unable to put the correct destination address into IP header, as routing header is added in ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside, result of this patch is that we do not have any XfrmInTmplMismatch increase plus on the wire with this patch it now looks like: SCTP + IPv6: 08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba: AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72 08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a: AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296 This fixes Kernel Bugzilla 24412. This security issue seems to be present since 2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have its fun with that. lksctp-tools IPv6 regression test suite passes as well with this patch. [1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf Reported-by: Alan Chester <alan.chester@tekelec.com> Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-12 17:24:43 -04:00
Sonic Zhang	b0dd663b60	netpoll: Should handle ETH_P_ARP other than ETH_P_IP in netpoll_neigh_reply The received ARP request type in the Ethernet packet head is ETH_P_ARP other than ETH_P_IP. [ Bug introduced by commit `b7394d2429` ("netpoll: prepare for ipv6") ] Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-12 17:19:14 -04:00
Linus Torvalds	1d7b24ff33	NFS client bugfixes: - Fix a few credential reference leaks resulting from the SP4_MACH_CRED NFSv4.1 state protection code. - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the intended 64 byte structure. - Fix a long standing XDR issue with FREE_STATEID - Fix a potential WARN_ON spamming issue - Fix a missing dprintk() kuid conversion New features: - Enable the NFSv4.1 state protection support for the WRITE and COMMIT operations. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSMiO+AAoJEGcL54qWCgDyuwEQALNAMpcRhASpqrRSuX94aKn3 ATENr87ov2FCXcTP/OBjdlcryyjp+0e5JBW5T0nHn90Uylz4p/87eOILlqIq4ax2 4QldKAuHdk5gLwiX5ebWpDtlwjTwyth1PRD7iPHT8lvIlO0IT7S/VDaa/04J37PL Lw1zaTD0cpdRkdTnA12RDJ5oTW0YwmSBb5qJQROjinwa/ALuIZJpoBNCV01lIP2k VaW0Yd8A+hqtawmxnf3G14r50Ds269AZ5K4hcRjQMEWeetlwfXFSTSjx8dzgsQkx 4VF6wiCSwsKEdrp8csRv+fsHiGRjNfzdSTrQxcJa+ssP6qX0KWHYPdw2jgbozX+2 kUQw2bFgxug+zdNjp+z1daJzw4QAfkjfNBWzt4w7a+8VOnR+/fydJzmka4mlJUKB IDy8l/KrSCjCHi9VYal27+IQs/bcLAIvASUF14cZ/+ZY9MUsWhYXVPHNLhwTPds2 jFvawh77V6MHg/wA2+D7yHbHmOOmZaH2/Af9v3HKsVhhoLwqr5LO9qfAq63KSxzW udzmjlSEhlOiJKDMZo9HigjKhU+Ndujr7RqsP6WFjTPa4yn6499cbTy7izze6MPB JZDlmkInnZAtLDOuHAwxSNuNfBD6Yrzk1PV8Gv2xMEdp41bxgAg//K3WXx2vSGWa 4TQMHjaegAkdHyTK0rJD =IdGo -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes (part 2) from Trond Myklebust: "Bugfixes: - Fix a few credential reference leaks resulting from the SP4_MACH_CRED NFSv4.1 state protection code. - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the intended 64 byte structure. - Fix a long standing XDR issue with FREE_STATEID - Fix a potential WARN_ON spamming issue - Fix a missing dprintk() kuid conversion New features: - Enable the NFSv4.1 state protection support for the WRITE and COMMIT operations" * tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: No, I did not intend to create a 256KiB hashtable sunrpc: Add missing kuids conversion for printing NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE NFSv4.1: sp4_mach_cred: no need to ref count creds NFSv4.1: fix SECINFO* use of put_rpccred NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT NFSv4.1 fix decode_free_stateid	2013-09-12 13:39:34 -07:00
Trond Myklebust	23c323af03	SUNRPC: No, I did not intend to create a 256KiB hashtable Fix the declaration of the gss_auth_hash_table so that it creates a 16 bucket hashtable, as I had intended. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-12 10:16:31 -04:00
Geert Uytterhoeven	134293059b	sunrpc: Add missing kuids conversion for printing m68k/allmodconfig: net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’: net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘kuid_t’ commit `cdba321e29` ("sunrpc: Convert kuids and kgids to uids and gids for printing") forgot to convert one instance. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-12 10:16:06 -04:00
Linus Torvalds	c2d95729e3	Merge branch 'akpm' (patches from Andrew Morton) Merge first patch-bomb from Andrew Morton: - Some pidns/fork/exec tweaks - OCFS2 updates - Most of MM - there remain quite a few memcg parts which depend on pending core cgroups changes. Which might have been already merged - I'll check tomorrow... - Various misc stuff all over the place - A few block bits which I never got around to sending to Jens - relatively minor things. - MAINTAINERS maintenance - A small number of lib/ updates - checkpatch updates - epoll - firmware/dmi-scan - Some kprobes work for S390 - drivers/rtc updates - hfsplus feature work - vmcore feature work - rbtree upgrades - AOE updates - pktcdvd cleanups - PPS - memstick - w1 - New "inittmpfs" feature, which does the obvious - More IPC work from Davidlohr. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (303 commits) lz4: fix compression/decompression signedness mismatch ipc: drop ipc_lock_check ipc, shm: drop shm_lock_check ipc: drop ipc_lock_by_ptr ipc, shm: guard against non-existant vma in shmdt(2) ipc: document general ipc locking scheme ipc,msg: drop msg_unlock ipc: rename ids->rw_mutex ipc,shm: shorten critical region for shmat ipc,shm: cleanup do_shmat pasta ipc,shm: shorten critical region for shmctl ipc,shm: make shmctl_nolock lockless ipc,shm: introduce shmctl_nolock ipc: drop ipcctl_pre_down ipc,shm: shorten critical region in shmctl_down ipc,shm: introduce lockless functions to obtain the ipc object initmpfs: use initramfs if rootfstype= or root= specified initmpfs: make rootfs use tmpfs when CONFIG_TMPFS enabled initmpfs: move rootfs code from fs/ramfs/ to init/ initmpfs: move bdi setup from init_rootfs to init_ramfs ...	2013-09-11 16:08:54 -07:00
Mathieu Desnoyers	3ddc5b46a8	kernel-wide: fix missing validations on __get/__put/__copy_to/__copy_from_user() I found the following pattern that leads in to interesting findings: grep -r "ret.\|=.__put_user" * grep -r "ret.\|=.__get_user" * grep -r "ret.\|=.__copy" * The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat, since those appear in compat code, we could probably expect the kernel addresses not to be reachable in the lower 32-bit range, so I think they might not be exploitable. For the "__get_user" cases, I don't think those are exploitable: the worse that can happen is that the kernel will copy kernel memory into in-kernel buffers, and will fail immediately afterward. The alpha csum_partial_copy_from_user() seems to be missing the access_ok() check entirely. The fix is inspired from x86. This could lead to information leak on alpha. I also noticed that many architectures map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I wonder if the latter is performing the access checks on every architectures. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-11 15:58:18 -07:00
Linus Torvalds	bbda1baeeb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Brown paper bag fix in HTB scheduler, class options set incorrectly due to a typoe. Fix from Vimalkumar. 2) It's possible for the ipv6 FIB garbage collector to run before all the necessary datastructure are setup during init, defer the notifier registry to avoid this problem. Fix from Michal Kubecek. 3) New i40e ethernet driver from the Intel folks. 4) Add new qmi wwan device IDs, from Bjørn Mork. 5) Doorbell lock in bnx2x driver is not initialized properly in some configurations, fix from Ariel Elior. 6) Revert an ipv6 packet option padding change that broke standardized ipv6 implementation test suites. From Jiri Pirko. 7) Fix synchronization of ARP information in bonding layer, from Nikolay Aleksandrov. 8) Fix missing error return resulting in illegal memory accesses in openvswitch, from Daniel Borkmann. 9) SCTP doesn't signal poll events properly due to mistaken operator precedence, fix also from Daniel Borkmann. 10) __netdev_pick_tx() passes wrong index to sk_tx_queue_set() which essentially disables caching of TX queue in sockets :-/ Fix from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) net_sched: htb: fix a typo in htb_change_class() net: qmi_wwan: add new Qualcomm devices ipv6: don't call fib6_run_gc() until routing is ready net: tilegx driver: avoid compiler warning fib6_rules: fix indentation irda: vlsi_ir: Remove casting the return value which is a void pointer irda: donauboe: Remove casting the return value which is a void pointer net: fix multiqueue selection net: sctp: fix smatch warning in sctp_send_asconf_del_ip net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE net: fib: fib6_add: fix potential NULL pointer dereference net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs bcm63xx_enet: remove deprecated IRQF_DISABLED net: korina: remove deprecated IRQF_DISABLED macvlan: Move skb_clone check closer to call qlcnic: Fix warning reported by kbuild test robot. bonding: fix bond_arp_rcv setting and arp validate desync state bonding: fix store_arp_validate race with mode change ipv6/exthdrs: accept tlv which includes only padding bnx2x: avoid atomic allocations during initialization ...	2013-09-11 14:33:16 -07:00
Vimalkumar	f3ad857e3d	net_sched: htb: fix a typo in htb_change_class() Fix a typo added in commit `56b765b79` ("htb: improved accuracy at high rates") cbuffer should not be a copy of buffer. Signed-off-by: Vimalkumar <j.vimal@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-11 17:16:22 -04:00
Michal Kubeček	2c861cc65e	ipv6: don't call fib6_run_gc() until routing is ready When loading the ipv6 module, ndisc_init() is called before ip6_route_init(). As the former registers a handler calling fib6_run_gc(), this opens a window to run the garbage collector before necessary data structures are initialized. If a network device is initialized in this window, adding MAC address to it triggers a NETDEV_CHANGEADDR event, leading to a crash in fib6_clean_all(). Take the event handler registration out of ndisc_init() into a separate function ndisc_late_init() and move it after ip6_route_init(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-11 17:04:09 -04:00
Stefan Tomanek	04f0888da2	fib6_rules: fix indentation This change just removes two tabs from the source file. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-11 16:16:29 -04:00
Eric Dumazet	50d1784ee4	net: fix multiqueue selection commit `416186fbf8` ("net: Split core bits of netdev_pick_tx into __netdev_pick_tx") added a bug that disables caching of queue index in the socket. This is the source of packet reorders for TCP flows, and again this is happening more often when using FQ pacing. Old code was doing if (queue_index != old_index) sk_tx_queue_set(sk, queue_index); Alexander renamed the variables but forgot to change sk_tx_queue_set() 2nd parameter. if (queue_index != new_index) sk_tx_queue_set(sk, queue_index); This means we store -1 over and over in sk->sk_tx_queue_mapping Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-11 16:10:00 -04:00
Daniel Borkmann	88362ad8f9	net: sctp: fix smatch warning in sctp_send_asconf_del_ip This was originally reported in [1] and posted by Neil Horman [2], he said: Fix up a missed null pointer check in the asconf code. If we don't find a local address, but we pass in an address length of more than 1, we may dereference a NULL laddr pointer. Currently this can't happen, as the only users of the function pass in the value 1 as the addrcnt parameter, but its not hot path, and it doesn't hurt to check for NULL should that ever be the case. The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1 while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR so that we do not find a single address in the association's bind address list that is not in the packed array of addresses. If this happens when we have an established association with ASCONF-capable peers, then we could get a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1 and call later sctp_make_asconf_update_ip() with NULL laddr. BUT: this actually won't happen as sctp_bindx_rem() will catch such a case and return with an error earlier. As this is incredably unintuitive and error prone, add a check to catch at least future bugs here. As Neil says, its not hot path. Introduced by `8a07eb0a5` ("sctp: Add ASCONF operation on the single-homed host"). [1] http://www.spinics.net/lists/linux-sctp/msg02132.html [2] http://www.spinics.net/lists/linux-sctp/msg02133.html Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Michio Honda <micchie@sfc.wide.ad.jp> Acked-By: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-11 16:10:00 -04:00
Daniel Borkmann	a0fb05d1ae	net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE If we do not add braces around ... mask \|= POLLERR \| sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? POLLPRI : 0; ... then this condition always evaluates to true as POLLERR is defined as 8 and binary or'd with whatever result comes out of sock_flag(). Hence instead of (X \| Y) ? A : B, transform it into X \| (Y ? A : B). Unfortunatelty, commit `8facd5fb73` ("net: fix smatch warnings inside datagram_poll") forgot about SCTP. :-( Introduced by `7d4c04fc17` ("net: add option to enable error queue packets waking select"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-11 16:09:59 -04:00
Daniel Borkmann	ae7b4e1f21	net: fib: fib6_add: fix potential NULL pointer dereference When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return with an error in fn = fib6_add_1(), then error codes are encoded into the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we write the error code into err and jump to out, hence enter the if(err) condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for: if (pn != fn && pn->leaf == rt) ... if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO)) ... Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn evaluates to true and causes a NULL-pointer dereference on further checks on pn. Fix it, by setting both NULL in error case, so that pn != fn already evaluates to false and no further dereference takes place. This was first correctly implemented in `4a287eba2` ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag"), but the bug got later on introduced by `188c517a0` ("ipv6: return errno pointers consistently for fib6_add_1()"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Lin Ming <mlin@ss.pku.edu.cn> Cc: Matti Vaittinen <matti.vaittinen@nsn.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Matti Vaittinen <matti.vaittinen@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-11 16:09:59 -04:00
Daniel Borkmann	3bf4b5b11d	net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs In function __parse_flow_nlattrs(), we check for condition (type > OVS_KEY_ATTR_MAX) and if true, print an error, but we do not return from this function as in other checks. It seems this has been forgotten, as otherwise, we could access beyond the memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1]. Hence, a maliciously prepared nla_type from user space could access beyond this upper limit. Introduced by `03f0d916a` ("openvswitch: Mega flow implementation"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Andy Zhou <azhou@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-11 16:09:58 -04:00
Jiri Pirko	8112b1fe07	ipv6/exthdrs: accept tlv which includes only padding In rfc4942 and rfc2460 I cannot find anything which would implicate to drop packets which have only padding in tlv. Current behaviour breaks TAHI Test v6LC.1.2.6. Problem was intruduced in: `9b905fe684` "ipv6/exthdrs: strict Pad1 and PadN check" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-11 15:52:27 -04:00
Linus Torvalds	2b76db6a0f	for-linus-3.12-merge minor 9p fixes and tweaks for 3.12 merge window The first fixes namespace issues which causes a kernel NULL pointer dereference, the second fixes uevent handling to work better with udev, and the third switches some code to use srlcpy instead of strncpy in order to be safer. All changes have been baking in for-next for at least 2 weeks. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: GPGTools - http://gpgtools.org iQIcBAABAgAGBQJSMJjZAAoJEDZk62b0Tg6x81sQAKa60QStBKhnL65bvG+ooIsS mhwfmFyaWOKw1ezwY2Vk0+JnmKDBpKmqjjwyL3nLP18TcRZStPiFdcJBKWl+czge FTv14t54CcjysYPbYN7+gUap4F5mfg0mcHaR0UGow505dNyjwd7mqkZhy1IqhdvP Ue/h0RE46GeNtdirxrKBdEfW/7TAL0tcoRgjKu0ev1V2sXCJZywuXgkzWjByRXwT JOg04gGnYThuek0/KUPRhf0KxB0CyKrZiics7LGb40HkYYxs7ahADACttLyiDr8l GntfHXLgvVlU5QcSbKRfLp0zNbi7AxWmJrwYsEwpas4tUw1Q+pVJ2EE2Ameuq5G+ LrMGmRVQCVYw8UN+OYUO7glhXEJcCPJj6vxgm+NVXx24yaQyGI1aTsIEjHwZ/hkm wlQHC47z6/fIypkXpsU6pYWF/r3GwXHokYReejATQWEPIzIxvHeThe0jjqMLth7F zmsHZTpmECqtti1fizy5wBZD25wAIxdf+rf8nKy1VvcSN4s08ESSlC/kV/siNeko efFnL8xbjP5SPEVoBtXM6eTDHrQ0S+ACSGWtp0FGXKOW4PKzS60ve2Stp+FYZgQc WgXI7+NBU6Z9z+cZ9bsY0hrGwK1YZiR4F3KJ5ofTuxAO6n7zd+N3fGBuQJ2tiW9P pKtIXNozWqnAU9Wx4rGa =YbFT -----END PGP SIGNATURE----- Merge tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p updates from Eric Van Hensbergen: "Minor 9p fixes and tweaks for 3.12 merge window The first fixes namespace issues which causes a kernel NULL pointer dereference, the second fixes uevent handling to work better with udev, and the third switches some code to use srlcpy instead of strncpy in order to be safer. All changes have been baking in for-next for at least 2 weeks" * tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: avoid accessing utsname after namespace has been torn down 9p: send uevent after adding/removing mount_tag attribute fs: 9p: use strlcpy instead of strncpy	2013-09-11 12:34:13 -07:00
Linus Torvalds	cf596766fc	Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "This was a very quiet cycle! Just a few bugfixes and some cleanup" * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: rpc: let xdr layer allocate gssproxy receieve pages rpc: fix huge kmalloc's in gss-proxy rpc: comment on linux_cred encoding, treat all as unsigned rpc: clean up decoding of gssproxy linux creds svcrpc: remove unused rq_resused nfsd4: nfsd4_create_clid_dir prints uninitialized data nfsd4: fix leak of inode reference on delegation failure Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR" sunrpc: prepare NFS for 2038 nfsd4: fix setlease error return nfsd: nfs4_file_get_access: need to be more careful with O_RDWR	2013-09-10 20:04:59 -07:00
Dave Chinner	70534a739c	shrinker: convert remaining shrinkers to count/scan API Convert the remaining couple of random shrinkers in the tree to the new API. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-09-10 18:56:32 -04:00
Josh Durgin	dd935f44a4	libceph: add function to ensure notifies are complete Without a way to flush the osd client's notify workqueue, a watch event that is unregistered could continue receiving callbacks indefinitely. Unregistering the event simply means no new notifies are added to the queue, but there may still be events in the queue that will call the watch callback for the event. If the queue is flushed after the event is unregistered, the caller can be sure no more watch callbacks will occur for the canceled watch. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2013-09-09 11:15:49 -07:00
Linus Torvalds	bf97293eb8	NFS client updates for Linux 3.12 Highlights include: - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as lease loss due to a network partition, where doing so may result in data corruption. Add a kernel parameter to control choice of legacy behaviour or not. - Performance improvements when 2 processes are writing to the same file. - Flush data to disk when an RPCSEC_GSS session timeout is imminent. - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other NFS clients from being able to manipulate our lease and file lockingr state. - Allow sharing of RPCSEC_GSS caches between different rpc clients - Fix the broken NFSv4 security auto-negotiation between client and server - Fix rmdir() to wait for outstanding sillyrename unlinks to complete - Add a tracepoint framework for debugging NFSv4 state recovery issues. - Add tracing to the generic NFS layer. - Add tracing for the SUNRPC socket connection state. - Clean up the rpc_pipefs mount/umount event management. - Merge more patches from Chuck in preparation for NFSv4 migration support. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSLelVAAoJEGcL54qWCgDyo2IQAKOfRJyZVnf4ipxi3xLNl1QF w/70DVSIF1S1djWN7G3vgkxj/R8KCvJ8CcvkAD2BEgRDeZJ9TtyKAdM/jYLZ+W05 7k2QKk8fkwZmc1Y2qDqFwKHzP5ZgP5L2nGx7FNhi/99wEAe47yFG3qd3rUWKrcOf mnd863zgGDE2Q10slhoq/bywwMJo6tKZNeaIE8kPjgFbBEh/jslpAWr8dSA4QgvJ nZ8VB5XU8L+XJ0GpHHdjYm9LvQ51DbQ6omOF+0P4fI093azKmf4ZsrjMDWT8+iu3 XkXlnQmKLGTi7yB43hHtn2NiRqwGzCcZ1Amo9PpCFaHUt1RP9cc37UhG1T+x1xWJ STEKDbvCdQ3FU9FvbgrGEwBR0e8fNS4fZY3ToDBflIcfwre0aWs5RCodZMUD0nUI 4wY5J9NsQR/bL+v8KeUR4V4cXK8YrgL0zB4u4WYzH5Npxr5KD0NEKDNqRPhrB9l2 LLF9Haql8j76Ff0ek6UGFIZjDE0h6Fs71wLBpLj+ZWArOJ7vBuLMBSOVqNpld9+9 f2fEG7qoGF4FGTY4myH/eakMPaWnk9Ol4Ls/svSIapJ9+rePD+a93e/qnmdofIMf 4TuEYk6ERib1qXgaeDRQuCsm2YE1Co5skGMaOsRFWgReE1c12QoJQVst2nMtEKp3 uV2w8LgX18aZOZXJVkCM =ZuW+ -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as lease loss due to a network partition, where doing so may result in data corruption. Add a kernel parameter to control choice of legacy behaviour or not. - Performance improvements when 2 processes are writing to the same file. - Flush data to disk when an RPCSEC_GSS session timeout is imminent. - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other NFS clients from being able to manipulate our lease and file locking state. - Allow sharing of RPCSEC_GSS caches between different rpc clients. - Fix the broken NFSv4 security auto-negotiation between client and server. - Fix rmdir() to wait for outstanding sillyrename unlinks to complete - Add a tracepoint framework for debugging NFSv4 state recovery issues. - Add tracing to the generic NFS layer. - Add tracing for the SUNRPC socket connection state. - Clean up the rpc_pipefs mount/umount event management. - Merge more patches from Chuck in preparation for NFSv4 migration support" * tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits) NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set NFSv4: Allow security autonegotiation for submounts NFSv4: Disallow security negotiation for lookups when 'sec=' is specified NFSv4: Fix security auto-negotiation NFS: Clean up nfs_parse_security_flavors() NFS: Clean up the auth flavour array mess NFSv4.1 Use MDS auth flavor for data server connection NFS: Don't check lock owner compatability unless file is locked (part 2) NFS: Don't check lock owner compatibility in writes unless file is locked nfs4: Map NFS4ERR_WRONG_CRED to EPERM nfs4.1: Add SP4_MACH_CRED write and commit support nfs4.1: Add SP4_MACH_CRED stateid support nfs4.1: Add SP4_MACH_CRED secinfo support nfs4.1: Add SP4_MACH_CRED cleanup support nfs4.1: Add state protection handler nfs4.1: Minimal SP4_MACH_CRED implementation SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid SUNRPC: Add an identifier for struct rpc_clnt SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints ...	2013-09-09 09:19:15 -07:00
Linus Torvalds	6cccc7d301	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "This includes both the first pile of Ceph patches (which I sent to torvalds@vger, sigh) and a few new patches that add support for fscache for Ceph. That includes a few fscache core fixes that David Howells asked go through the Ceph tree. (Thanks go to Milosz Tanski for putting this feature together) This first batch of patches (included here) had (has) several important RBD bug fixes, hole punch support, several different cleanups in the page cache interactions, improvements in the truncate code (new truncate mutex to avoid shenanigans with i_mutex), and a series of fixes in the synchronous striping read/write code. On top of that is a random collection of small fixes all across the tree (error code checks and error path cleanup, obsolete wq flags, etc)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits) ceph: use d_invalidate() to invalidate aliases ceph: remove ceph_lookup_inode() ceph: trivial buildbot warnings fix ceph: Do not do invalidate if the filesystem is mounted nofsc ceph: page still marked private_2 ceph: ceph_readpage_to_fscache didn't check if marked ceph: clean PgPrivate2 on returning from readpages ceph: use fscache as a local presisent cache fscache: Netfs function for cleanup post readpages FS-Cache: Fix heading in documentation CacheFiles: Implement interface to check cache consistency FS-Cache: Add interface to check consistency of a cached object rbd: fix null dereference in dout rbd: fix buffer size for writes to images with snapshots libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc rbd: fix I/O error propagation for reads ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem ceph: allow sync_read/write return partial successed size of read/write. ceph: fix bugs about handling short-read for sync read mode. ceph: remove useless variable revoked_rdcache ...	2013-09-09 09:13:22 -07:00
Linus Torvalds	c7c4591db6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace changes from Eric Biederman: "This is an assorted mishmash of small cleanups, enhancements and bug fixes. The major theme is user namespace mount restrictions. nsown_capable is killed as it encourages not thinking about details that need to be considered. A very hard to hit pid namespace exiting bug was finally tracked and fixed. A couple of cleanups to the basic namespace infrastructure. Finally there is an enhancement that makes per user namespace capabilities usable as capabilities, and an enhancement that allows the per userns root to nice other processes in the user namespace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Kill nsown_capable it makes the wrong thing easy capabilities: allow nice if we are privileged pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD userns: Allow PR_CAPBSET_DROP in a user namespace. namespaces: Simplify copy_namespaces so it is clear what is going on. pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup sysfs: Restrict mounting sysfs userns: Better restrictions on when proc and sysfs can be mounted vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces kernel/nsproxy.c: Improving a snippet of code. proc: Restrict mounting the proc filesystem vfs: Lock in place mounts from more privileged users	2013-09-07 14:35:32 -07:00
Linus Torvalds	0ffb01d9de	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "A quick set of fixes, some to deal with fallout from yesterday's net-next merge. 1) Fix compilation of bnx2x driver with CONFIG_BNX2X_SRIOV disabled, from Dmitry Kravkov. 2) Fix a bnx2x regression caused by one of Dave Jones's mistaken braces changes, from Eilon Greenstein. 3) Add some protective filtering in the netlink tap code, from Daniel Borkmann. 4) Fix TCP congestion window growth regression after timeouts, from Yuchung Cheng. 5) Correctly adjust TCP's rcv_ssthresh for out of order packets, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: tcp: properly increase rcv_ssthresh for ofo packets net: add documentation for BQL helpers mlx5: remove unused MLX5_DEBUG param in Kconfig bnx2x: Restore a call to config_init bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set tcp: fix no cwnd growth after timeout net: netlink: filter particular protocols from analyzers	2013-09-07 14:27:46 -07:00
Eric Dumazet	4e4f1fc226	tcp: properly increase rcv_ssthresh for ofo packets TCP receive window handling is multi staged. A socket has a memory budget, static or dynamic, in sk_rcvbuf. Because we do not really know how this memory budget translates to a TCP window (payload), TCP announces a small initial window (about 20 MSS). When a packet is received, we increase TCP rcv_win depending on the payload/truesize ratio of this packet. Good citizen packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2 This heuristic takes place in tcp_grow_window() Problem is : We currently call tcp_grow_window() only for in-order packets. This means that reorders or packet losses stop proper grow of rcv_win, and senders are unable to benefit from fast recovery, or proper reordering level detection. Really, a packet being stored in OFO queue is not a bad citizen. It should be part of the game as in-order packets. In our traces, we very often see sender is limited by linux small receive windows, even if linux hosts use autotuning (DRS) and should allow rcv_win to grow to ~3MB. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-06 14:43:49 -04:00
Yuchung Cheng	16edfe7ee0	tcp: fix no cwnd growth after timeout In commit `0f7cc9a3` "tcp: increase throughput when reordering is high", it only allows cwnd to increase in Open state. This mistakenly disables slow start after timeout (CA_Loss). Moreover cwnd won't grow if the state moves from Disorder to Open later in tcp_fastretrans_alert(). Therefore the correct logic should be to allow cwnd to grow as long as the data is received in order in Open, Loss, or even Disorder state. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-06 14:43:49 -04:00
Daniel Borkmann	5ffd5cddd4	net: netlink: filter particular protocols from analyzers Fix finer-grained control and let only a whitelist of allowed netlink protocols pass, in our case related to networking. If later on, other subsystems decide they want to add their protocol as well to the list of allowed protocols they shall simply add it. While at it, we also need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can not pick it up (as it's not filled out). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-06 14:43:48 -04:00
Milosz Tanski	cd0a2df681	Patches for Ceph FS-Cache support -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIVAwUAUimQLxOxKuMESys7AQJc+Q/+N3BN+ZWRhfqSKANFyEuXIsUmzueCmmYc ZOgdRGTorlYKefqFHTFOHLvPbbxXaT2/HKUhq38yuA8UuIkoPL3rRQpjNcvWR+TX s/H17MRqPeZOfaDC/p9y2uL5MbuUvzhlCZ/GTi5w2ZwiNuWBo10gxeyCrXQSOFtH YFq0dVuG0AXzWdZuWcM3MtgY0llcMmfnZpIDjF4JDXJidgXY+wtjNUF3ByYZ+33+ +CmzXrnCaY+3N44Ji2Dn+ci8tym8uht4dnbTZFkQ0I6B+k93V2RkZeHWnDQWqW+c THjyG9c+LSf0m8FIh43DNNJSkywbh5dxsBgnqxhQTJMij0dV1ne8wjKptJMgz+0b HFUi4rE6oRQtbLdTtJhdjdFFORBGdFj71gW8foBdFAZTP4Amf/fbiAfHeK+33oDt s5PMJyfA3BDM90eBFoxDWjCEe+o6YcccHC0SVWM1ZJPQ/U0hXL99O6NSHBrr9iNP spBgM+fNTgtUMf6P6MwjJfTQHov5xevBNaLB3boUPAI6/yK8KQt9xoevJ7t902uQ /19bXoNgMmwAti1Gd6T5UnWlAHsOWdnIASUu8LVqEoh2PY42T2I4NTWhgs4NFntu 91MYysF93sTx0sMvGvWdCUSl7zMMeKXCUSUnPvMld+BMqyuK3XzsO3xkMn97t2/U p6kQZXZDlwU= =s35L -----END PGP SIGNATURE----- Merge tag 'fscache-fixes-for-ceph' into wip-fscache Patches for Ceph FS-Cache support	2013-09-06 16:41:20 +00:00
Linus Torvalds	2e515bf096	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "The usual trivial updates all over the tree -- mostly typo fixes and documentation updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits) doc: Documentation/cputopology.txt fix typo treewide: Convert retrun typos to return Fix comment typo for init_cma_reserved_pageblock Documentation/trace: Correcting and extending tracepoint documentation mm/hotplug: fix a typo in Documentation/memory-hotplug.txt power: Documentation: Update s2ram link doc: fix a typo in Documentation/00-INDEX Documentation/printk-formats.txt: No casts needed for u64/s64 doc: Fix typo "is is" in Documentations treewide: Fix printks with 0x%# zram: doc fixes Documentation/kmemcheck: update kmemcheck documentation doc: documentation/hwspinlock.txt fix typo PM / Hibernate: add section for resume options doc: filesystems : Fix typo in Documentations/filesystems scsi/megaraid fixed several typos in comments ppc: init_32: Fix error typo "CONFIG_START_KERNEL" treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks page_isolation: Fix a comment typo in test_pages_isolated() doc: fix a typo about irq affinity ...	2013-09-06 09:36:28 -07:00
Linus Torvalds	22e04f6b4b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: "Highlights: - conversion of HID subsystem to use devm-based resource management, from Benjamin Tissoires - i2c-hid support for DT bindings, from Benjamin Tissoires - much improved support for Win8-multitouch devices, from Benjamin Tissoires - cleanup of core code using common hidinput_input_event(), from David Herrmann - fix for bug in implement() access to the bit stream (causing oops) that has been present in the code for ages, but devices that are able to trigger it have started to appear only now, from Jiri Kosina - fixes for CVE-2013-2899, CVE-2013-2898, CVE-2013-2896, CVE-2013-2892, CVE-2013-2888 (all triggerable only by specially crafted malicious HW devices plugged into the system), from Kees Cook - hidraw oops fix, from Manoj Chourasia - various smaller fixes here and there, support for a bunch of new devices by various contributors" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (53 commits) HID: MAINTAINERS: add roccat drivers HID: hid-sensor-hub: change kmalloc + memcpy by kmemdup HID: hid-sensor-hub: move to devm_kzalloc HID: hid-sensor-hub: fix indentation accross the code HID: move HID_REPORT_TYPES closer to the report-definitions HID: check for NULL field when setting values HID: picolcd_core: validate output report details HID: sensor-hub: validate feature report details HID: ntrig: validate feature report details HID: pantherlord: validate output report details HID: hid-wiimote: print small buffers via %*phC HID: uhid: improve uhid example client HID: Correct the USB IDs for the new Macbook Air 6 HID: wiimote: add support for Guitar-Hero guitars HID: wiimote: add support for Guitar-Hero drums Input: introduce BTN/ABS bits for drums and guitars HID: battery: don't do DMA from stack HID: roccat: add support for KonePureOptical v2 HID: picolcd: Prevent NULL pointer dereference on _remove() HID: usbhid: quirk for N-Trig DuoSense Touch Screen ...	2013-09-06 09:30:36 -07:00
J. Bruce Fields	d4a516560f	rpc: let xdr layer allocate gssproxy receieve pages In theory the linux cred in a gssproxy reply can include up to NGROUPS_MAX data, 256K of data. In the common case we expect it to be shorter. So do as the nfsv3 ACL code does and let the xdr code allocate the pages as they come in, instead of allocating a lot of pages that won't typically be used. Tested-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-09-06 11:45:58 -04:00
J. Bruce Fields	9dfd87da1a	rpc: fix huge kmalloc's in gss-proxy The reply to a gssproxy can include up to NGROUPS_MAX gid's, which will take up more than a page. We therefore need to allocate an array of pages to hold the reply instead of trying to allocate a single huge buffer. Tested-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-09-06 11:45:58 -04:00
J. Bruce Fields	6a36978e69	rpc: comment on linux_cred encoding, treat all as unsigned The encoding of linux creds is a bit confusing. Also: I think in practice it doesn't really matter whether we treat any of these things as signed or unsigned, but unsigned seems more straightforward: uid_t/gid_t are unsigned and it simplifies the ngroups overflow check. Tested-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-09-06 11:45:57 -04:00
J. Bruce Fields	778e512bb1	rpc: clean up decoding of gssproxy linux creds We can use the normal coding infrastructure here. Two minor behavior changes: - we're assuming no wasted space at the end of the linux cred. That seems to match gss-proxy's behavior, and I can't see why it would need to do differently in the future. - NGROUPS_MAX check added: note groups_alloc doesn't do this, this is the caller's responsibility. Tested-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-09-06 11:45:56 -04:00
Linus Torvalds	cc998ff881	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David Miller: "Noteworthy changes this time around: 1) Multicast rejoin support for team driver, from Jiri Pirko. 2) Centralize and simplify TCP RTT measurement handling in order to reduce the impact of bad RTO seeding from SYN/ACKs. Also, when both timestamps and local RTT measurements are available prefer the later because there are broken middleware devices which scramble the timestamp. From Yuchung Cheng. 3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel memory consumed to queue up unsend user data. From Eric Dumazet. 4) Add a "physical port ID" abstraction for network devices, from Jiri Pirko. 5) Add a "suppress" operation to influence fib_rules lookups, from Stefan Tomanek. 6) Add a networking development FAQ, from Paul Gortmaker. 7) Extend the information provided by tcp_probe and add ipv6 support, from Daniel Borkmann. 8) Use RCU locking more extensively in openvswitch data paths, from Pravin B Shelar. 9) Add SCTP support to openvswitch, from Joe Stringer. 10) Add EF10 chip support to SFC driver, from Ben Hutchings. 11) Add new SYNPROXY netfilter target, from Patrick McHardy. 12) Compute a rate approximation for sending in TCP sockets, and use this to more intelligently coalesce TSO frames. Furthermore, add a new packet scheduler which takes advantage of this estimate when available. From Eric Dumazet. 13) Allow AF_PACKET fanouts with random selection, from Daniel Borkmann. 14) Add ipv6 support to vxlan driver, from Cong Wang" Resolved conflicts as per discussion. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits) openvswitch: Fix alignment of struct sw_flow_key. netfilter: Fix build errors with xt_socket.c tcp: Add missing braces to do_tcp_setsockopt caif: Add missing braces to multiline if in cfctrl_linkup_request bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize vxlan: Fix kernel panic on device delete. net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls net: mvneta: properly disable HW PHY polling and ensure adjust_link() works icplus: Use netif_running to determine device state ethernet/arc/arc_emac: Fix huge delays in large file copies tuntap: orphan frags before trying to set tx timestamp tuntap: purge socket error queue on detach qlcnic: use standard NAPI weights ipv6:introduce function to find route for redirect bnx2x: VF RSS support - VF side bnx2x: VF RSS support - PF side vxlan: Notify drivers for listening UDP port changes net: usbnet: update addr_assign_type if appropriate driver/net: enic: update enic maintainers and driver driver/net: enic: Exposing symbols for Cisco's low latency driver ...	2013-09-05 14:54:29 -07:00
Jesse Gross	0d40f75bda	openvswitch: Fix alignment of struct sw_flow_key. sw_flow_key alignment was declared as " __aligned(__alignof__(long))". However, this breaks on the m68k architecture where long is 32 bit in size but 16 bit aligned by default. This aligns to the size of a long to ensure that we can always do comparsions in full long-sized chunks. It also adds an additional build check to catch any reduction in alignment. CC: Andy Zhou <azhou@nicira.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 15:54:37 -04:00
David S. Miller	06c54055be	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c net/bridge/br_multicast.c net/ipv6/sit.c The conflicts were minor: 1) sit.c changes overlap with change to ip_tunnel_xmit() signature. 2) br_multicast.c had an overlap between computing max_delay using msecs_to_jiffies and turning MLDV2_MRC() into an inline function with a name using lowercase instead of uppercase letters. 3) stmmac had two overlapping changes, one which conditionally allocated and hooked up a dma_cfg based upon the presence of the pbl OF property, and another one handling store-and-forward DMA made. The latter of which should not go into the new of_find_property() basic block. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:58:52 -04:00
David S. Miller	1a5bbfc3d6	netfilter: Fix build errors with xt_socket.c As reported by Randy Dunlap: ==================== when CONFIG_IPV6=m and CONFIG_NETFILTER_XT_MATCH_SOCKET=y: net/built-in.o: In function `socket_mt6_v1_v2': xt_socket.c:(.text+0x51b55): undefined reference to `udp6_lib_lookup' net/built-in.o: In function `socket_mt_init': xt_socket.c:(.init.text+0x1ef8): undefined reference to `nf_defrag_ipv6_enable' ==================== Like several other modules under net/netfilter/ we have to have a dependency "IPV6 disabled or set compatibly with this module" clause. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:38:03 -04:00
Dave Jones	e2e5c4c07c	tcp: Add missing braces to do_tcp_setsockopt Signed-off-by: Dave Jones <davej@fedoraproject.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:31:02 -04:00
Dave Jones	0c1db731bf	caif: Add missing braces to multiline if in cfctrl_linkup_request The indentation here implies this was meant to be a multi-line if. Introduced several years back in commit `c85c2951d4` ("caif: Handle dev_queue_xmit errors.") Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:31:02 -04:00
Duan Jiong	b55b76b221	ipv6:introduce function to find route for redirect RFC 4861 says that the IP source address of the Redirect is the same as the current first-hop router for the specified ICMP Destination Address, so the gateway should be taken into consideration when we find the route for redirect. There was once a check in commit `a6279458c5` ("NDISC: Search over all possible rules on receipt of redirect.") and the check went away in commit `b94f1c0904` ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect()"). The bug is only "exploitable" on layer-2 because the source address of the redirect is checked to be a valid link-local address but it makes spoofing a lot easier in the same L2 domain nonetheless. Thanks very much for Hannes's help. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 12:44:31 -04:00
Linus Lüssing	3c3769e633	bridge: apply multicast snooping to IPv6 link-local, too The multicast snooping code should have matured enough to be safely applicable to IPv6 link-local multicast addresses (excluding the link-local all nodes address, ff02::1), too. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 12:35:53 -04:00
Linus Lüssing	8fad9c39f3	bridge: prevent flooding IPv6 packets that do not have a listener Currently if there is no listener for a certain group then IPv6 packets for that group are flooded on all ports, even though there might be no host and router interested in it on a port. With this commit they are only forwarded to ports with a multicast router. Just like commit `bd4265fe36` ("bridge: Only flood unregistered groups to routers") did for IPv4, let's do the same for IPv6 with the same reasoning. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 12:35:41 -04:00
Trond Myklebust	2f048db468	SUNRPC: Add an identifier for struct rpc_clnt Add an identifier in order to aid debugging. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-05 10:13:15 -04:00
Linus Torvalds	27703bb4a6	PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJSJo+1AAoJENkgDmzRrbjxIC4QALJK95o8AUXuwUkl+2fmFkUt hh2/PJ1vDYgk4Xt0J6hyoK7XMa0H1RkbBrROuDdsBnorMFpEsGcgdkUZte9ufoAS 97Bg+7N0KPbTB/S8vOwtW1vbERTJIVPN2uf6h1Wqm9Xc2puCh3HbMMr1AWMGu0WQ NqY5+Zz8zecy1UOrMhEP6H1CjeQcL1w1DO6YM5ydeqlKNzAz+JMfDXriLPDwiE7+ XFPDF/O3Vtd2ckA7L70Lio7hfHwxV5U4WwFVfiwls98XB4jcZqDKIoh1r8z4SRgR +0Rae2DN3BaOabGMr//5XdrzQVpwJTh5m2w8BAOHJvCJ9HR7Sq29UIN4u+TowZBy L2xYo4dvFxkympwu5zEd3c7vHYWKIaqmSq5PIjr4gF/uIo2OeOTrpPIK782ZEYb7 e+qUgOEM05V9AmQZCrSZeP9u474Sj8ow3sCtWxfdRtwNfoEIcUXsNNJd/zDHlVtW cEtXqc2xXIpcuUJQWlSaGp8fmRQjVZPzrLKYLM2m39ZcOOJbf5rzQAYS7hHPosIa SK+YVux/+Zzi+Xo/vXq1OlM/SruCr5S7JOgCxLowoQ88vupgXME6uPyC8EO+QQ50 GsrHes5ZNLbk0uVsfcexIyojkUnyvDmmnDpv+1zdC6RgZLJQn8OXp5yNhHhnhrFT BiHX6YFWtDDqRlVv8Q0F =LeaW -----END PGP SIGNATURE----- Merge tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull PTR_RET() removal patches from Rusty Russell: "PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle" [ There are still some PTR_RET users scattered about, with some of them possibly being new, but most of them existing in Rusty's tree too. We have that #define PTR_RET(p) PTR_ERR_OR_ZERO(p) thing in <linux/err.h>, so they continue to work for now - Linus ] * tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR(). staging/zcache: don't use PTR_RET(). remoteproc: don't use PTR_RET(). pinctrl: don't use PTR_RET(). acpi: Replace weird use of PTR_RET. s390: Replace weird use of PTR_RET. PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. PTR_RET is now PTR_ERR_OR_ZERO	2013-09-04 17:31:11 -07:00
Daniel Borkmann	b4af8def5c	net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it more readable. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:21 -04:00
Daniel Borkmann	2b7c121f82	net: ipv6: mld: refactor query processing into v1/v2 functions Make igmp6_event_query() a bit easier to read by refactoring code parts into mld_process_v1() and mld_process_v2(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:21 -04:00
Daniel Borkmann	cc7f7ab758	net: ipv6: mld: similarly to MLDv2 have min max_delay of 1 Similarly as we do in MLDv2 queries, set a forged MLDv1 query with 0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is eventually done in igmp6_group_queried() anyway, so we can simplify a check there. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:21 -04:00
Daniel Borkmann	58c0ecfd8d	net: ipv6: mld: implement RFC3810 MLDv2 mode only RFC3810, 10. Security Considerations says under subsection 10.1. Query Message: A forged Version 1 Query message will put MLDv2 listeners on that link in MLDv1 Host Compatibility Mode. This scenario can be avoided by providing MLDv2 hosts with a configuration option to ignore Version 1 messages completely. Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic: echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version or echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version Note that <all> device has a higher precedence as it was previously also the case in the macro MLD_V1_SEEN() that would "short-circuit" if condition on <all> case. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:20 -04:00
Daniel Borkmann	e3f5b17047	net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation Get rid of MLDV2_MRC and use our new macros for mantisse and exponent to calculate Maximum Response Delay out of the Maximum Response Code. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:20 -04:00
Daniel Borkmann	6c567b78c8	net: ipv6: mld: clean up MLD_V1_SEEN macro Replace the macro with a function to make it more readable. GCC will eventually decide whether to inline this or not (also, that's not fast-path anyway). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:20 -04:00
Daniel Borkmann	89225d1ce6	net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. i) RFC3810, 9.2. Query Interval [QI] says: The Query Interval variable denotes the interval between General Queries sent by the Querier. Default value: 125 seconds. [...] ii) RFC3810, 9.3. Query Response Interval [QRI] says: The Maximum Response Delay used to calculate the Maximum Response Code inserted into the periodic General Queries. Default value: 10000 (10 seconds) [...] The number of seconds represented by the [Query Response Interval] must be less than the [Query Interval]. iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says: The Older Version Querier Present Timeout is the time-out for transitioning a host back to MLDv2 Host Compatibility Mode. When an MLDv1 query is received, MLDv2 hosts set their Older Version Querier Present Timer to [Older Version Querier Present Timeout]. This value MUST be ([Robustness Variable] times (the [Query Interval] in the last Query received)) plus ([Query Response Interval]). Hence, on default the timeout results in: [RV] = 2, [QI] = 125sec, [QRI] = 10sec [OVQPT] = [RV] * [QI] + [QRI] = 260sec Having that said, we currently calculate [OVQPT] (here given as 'switchback' variable) as ... switchback = (idev->mc_qrv + 1) * max_delay RFC3810, 9.12. says "the [Query Interval] in the last Query received". In section "9.14. Configuring timers", it is said: This section is meant to provide advice to network administrators on how to tune these settings to their network. Ambitious router implementations might tune these settings dynamically based upon changing characteristics of the network. [...] iv) RFC38010, 9.14.2. Query Interval: The overall level of periodic MLD traffic is inversely proportional to the Query Interval. A longer Query Interval results in a lower overall level of MLD traffic. The value of the Query Interval MUST be equal to or greater than the Maximum Response Delay used to calculate the Maximum Response Code inserted in General Query messages. I assume that was why switchback is calculated as is (3 * max_delay), although this setting seems to be meant for routers only to configure their [QI] interval for non-default intervals. So usage here like this is clearly wrong. Concluding, the current behaviour in IPv6's multicast code is not conform to the RFC as switch back is calculated wrongly. That is, it has a too small value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs instead of ~260secs on default. Hence, introduce necessary helper functions and fix this up properly as it should be. Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu who did initial testing. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: David Stevens <dlstevens@us.ibm.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:20 -04:00
Trond Myklebust	8d1018c774	SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-04 14:45:13 -04:00
Daniel Borkmann	3a1c756590	net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv In tcp_v6_do_rcv() code, when processing pkt options, we soley work on our skb clone opt_skb that we've created earlier before entering tcp_rcv_established() on our way. However, only in condition ... if (np->rxopt.bits.rxtclass) np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb)); ... we work on skb itself. As we extract every other information out of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can already be released by tcp_rcv_established() earlier on. When we try to access it in ipv6_hdr(), we will dereference freed skb. [ Bug added by commit `4c507d2897` ("net: implement IP_RECVTOS for IP_PKTOPTIONS") ] Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:44:41 -04:00
Yuchung Cheng	52f20e655d	tcp: better comments for RTO initiallization Commit 1b7fdd2ab585("tcp: do not use cached RTT for RTT estimation") removes important comments on how RTO is initialized and updated. Hopefully this patch puts those information back. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:41:55 -04:00
Thomas Graf	25a6e6b84f	ipv6: Don't depend on per socket memory for neighbour discovery messages Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, use of alloc_skb() bypasses the socket wmem buffer size enforcement while the manual call to skb_set_owner_w() maintains the socket reference needed for the IPv6 output path. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Tested-by: Stephen Warren <swarren@nvidia.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:37:41 -04:00
Hannes Frederic Sowa	639739b5e6	ipv6: fix null pointer dereference in __ip6addrlbl_add Commit `b67bfe0d42` ("hlist: drop the node parameter from iterators") changed the behavior of hlist_for_each_entry_safe to leave the p argument NULL. Fix this up by tracking the last argument. Reported-by: Michele Baldessari <michele@acksyn.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Michele Baldessari <michele@acksyn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:14:53 -04:00
Alexander Sverdlin	c08751c851	net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4 net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4 Initially the problem was observed with ipsec, but later it became clear that SCTP data chunk fragmentation algorithm has problems with MTU values which are not multiple of 4. Test program was used which just transmits 2000 bytes long packets to other host. tcpdump was used to observe re-fragmentation in IP layer after SCTP already fragmented data chunks. With MTU 1500: 12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500) 10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0] 12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596) 10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0] 12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48) 10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0] With MTU 1499: 13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492) 10.151.38.153.39084 > 10.151.24.91.54321: sctp[\|sctp] 13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28) 10.151.38.153 > 10.151.24.91: ip-proto-132 13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600) 10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0] 13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48) 10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0] Here problem in data portion limit calculation leads to re-fragmentation in IP, which is sub-optimal. The problem is max_data initial value, which doesn't take into account the fact, that data chunk must be padded to 4-bytes boundary. It's enough to correct max_data, because all later adjustments are correctly aligned to 4-bytes boundary. After the fix is applied, everything is fragmented correctly for uneven MTUs: 15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496) 10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0] 15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600) 10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0] 15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48) 10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0] The bug was there for years already, but - is a performance issue, the packets are still transmitted - doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438) Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nsn.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 13:20:27 -04:00
David S. Miller	48f8e0af86	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following batch contains: * Three fixes for the new synproxy target available in your net-next tree, from Jesper D. Brouer and Patrick McHardy. * One fix for TCPMSS to correctly handling the fragmentation case, from Phil Oester. I'll pass this one to -stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 12:28:02 -04:00
Trond Myklebust	40b5ea0c25	SUNRPC: Add tracepoints to help debug socket connection issues Add client side debugging to help trace socket connection/disconnection and unexpected state change issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-04 12:26:31 -04:00
Phil Oester	1205e1fa61	netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packet In commit `b396966c4` (netfilter: xt_TCPMSS: Fix missing fragmentation handling), I attempted to add safe fragment handling to xt_TCPMSS. However, Andy Padavan of Project N56U correctly points out that returning XT_CONTINUE in this function does not work. The callers (tcpmss_tg[46]) expect to receive a value of 0 in order to return XT_CONTINUE. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 14:20:03 +02:00
Jesper Dangaard Brouer	7cc9eb6ef7	netfilter: SYNPROXY: let unrelated packets continue Packets reaching SYNPROXY were default dropped, as they were most likely invalid (given the recommended state matching). This patch, changes SYNPROXY target to let packets, not consumed, continue being processed by the stack. This will be more in line other target modules. As it will allow more flexible configurations of handling, logging or matching on packets in INVALID states. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 11:44:23 +02:00
Patrick McHardy	f4de4c89d8	netfilter: synproxy_core: fix warning in __nf_ct_ext_add_length() With CONFIG_NETFILTER_DEBUG we get the following warning during SYNPROXY init: [ 80.558906] WARNING: CPU: 1 PID: 4833 at net/netfilter/nf_conntrack_extend.c:80 __nf_ct_ext_add_length+0x217/0x220 [nf_conntrack]() The reason is that the conntrack template is set to confirmed before adding the extension and it is invalid to add extensions to already confirmed conntracks. Fix by adding the extensions before setting the conntrack to confirmed. Reported-by: Jesper Dangaard Brouer <jesper.brouer@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 11:43:36 +02:00
Jesper Dangaard Brouer	775ada6d9f	netfilter: more strict TCP flag matching in SYNPROXY Its seems Patrick missed to incoorporate some of my requested changes during review v2 of SYNPROXY netfilter module. Which were, to avoid SYN+ACK packets to enter the path, meant for the ACK packet from the client (from the 3WHS). Further there were a bug in ip6t_SYNPROXY.c, for matching SYN packets that didn't exclude the ACK flag. Go a step further with SYN packet/flag matching by excluding flags ACK+FIN+RST, in both IPv4 and IPv6 modules. The intented usage of SYNPROXY is as follows: (gracefully describing usage in commit) iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK iptables -A INPUT -i eth0 -p tcp --dport 80 -m state UNTRACKED,INVALID \ -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose This does filter SYN flags early, for packets in the UNTRACKED state, but packets in the INVALID state with other TCP flags could still reach the module, thus this stricter flag matching is still needed. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 11:43:11 +02:00
Jiri Kosina	efd15f5f4f	Merge branch 'master' into for-3.12/upstream Sync with Linus' tree to be able to apply fixup patch on top of `9d9a04ee75` ("HID: apple: Add support for the 2013 Macbook Air") Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-09-04 10:49:57 +02:00
Sage Weil	9542cf0bf9	libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc Fix a typo that used the wrong bitmask for the pg.seed calculation. This is normally unnoticed because in most cases pg_num == pgp_num. It is, however, a bug that is easily corrected. CC: stable@vger.kernel.org Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <alex.elder@linary.org>	2013-09-03 22:08:10 -07:00
Vijay Subramanian	c995ae2259	tcp: Change return value of tcp_rcv_established() tcp_rcv_established() returns only one value namely 0. We change the return value to void (as suggested by David Miller). After commit `0c24604b` (tcp: implement RFC 5961 4.2), we no longer send RSTs in response to SYNs. We can remove the check and processing on the return value of tcp_rcv_established(). We also fix jtcp_rcv_established() in tcp_probe.c to match that of tcp_rcv_established(). Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:28 -04:00
Daniel Borkmann	cc8c6c1b21	net: tcp_probe: adapt tbuf size for recent changes With recent changes in tcp_probe module (e.g. `f925d0a62d` ("net: tcp_probe: add IPv6 support")) we also need to take into account that tbuf needs to be updated as format string will be further expanded. tbuf sits on the stack in tcpprobe_read() function that is invoked when user space reads procfs file /proc/net/tcpprobe, hence not fast path as in jtcp_rcv_established(). Having a size similarly as in sctp_probe module of 256 bytes is fully sufficient for that, we need theoretical maximum of 252 bytes otherwise we could get truncated. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:28 -04:00
Dan Carpenter	80aa4e1096	x25: add a sanity check parsing X.25 facilities This was found with a manual audit and I don't have a reproducer. We limit ->calling_len and ->called_len when we get them from copy_from_user() in x25_ioctl() so when they come from skb->data then we should cap them there as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:27 -04:00
Veaceslav Falico	82476b3160	net: correctly interlink lower/upper devices Currently we're linking upper devices to lower ones, which results in upside-down relationship: upper devices seeing lower devices via its upper lists. Fix this by correctly linking lower devices to the upper ones. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:26 -04:00
Nicolas Dichtel	ea23192e8e	tunnels: harmonize cleanup done on skb on rx path The goal of this patch is to harmonize cleanup done on a skbuff on rx path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:26 -04:00
Nicolas Dichtel	963a88b31d	tunnels: harmonize cleanup done on skb on xmit path The goal of this patch is to harmonize cleanup done on a skbuff on xmit path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:25 -04:00
Nicolas Dichtel	8b27f27797	skb: allow skb_scrub_packet() to be used by tunnels This function was only used when a packet was sent to another netns. Now, it can also be used after tunnel encapsulation or decapsulation. Only skb_orphan() should not be done when a packet is not crossing netns. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:25 -04:00
Nicolas Dichtel	117961878c	vxlan: remove net arg from vxlan[6]_xmit_skb() This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:25 -04:00
Nicolas Dichtel	8b7ed2d91d	iptunnels: remove net arg from iptunnel_xmit() This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:25 -04:00
Joe Perches	1372a298ea	wireless: scan: Remove comment to compare_ether_addr This function is being removed, so remove the reference to it. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:34:48 -04:00
Joe Perches	c3923b7a3d	batman: Remove reference to compare_ether_addr This function is being removed, rename the reference. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:34:48 -04:00
Joe Perches	951fd874c3	llc: Use normal etherdevice.h tests Convert the llc_<foo> static inlines to the equivalents from etherdevice.h and remove the llc_<foo> static inline functions. llc_mac_null -> is_zero_ether_addr llc_mac_multicast -> is_multicast_ether_addr llc_mac_match -> ether_addr_equal Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:34:47 -04:00
Petr Holasek	13c7bf0871	ipv6: ipv6_create_tempaddr cleanup This two-liner removes max_addresses variable which is now unecessary related to patch [ipv6: remove max_addresses check from ipv6_create_tempaddr]. Signed-off-by: Petr Holasek <pholasek@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:16:50 -04:00
Jiri Bohac	61e76b178d	ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination unreachable) messages: 5 - Source address failed ingress/egress policy 6 - Reject route to destination Now they are treated as protocol error and icmpv6_err_convert() converts them to EPROTO. RFC 4443 says: "Codes 5 and 6 are more informative subsets of code 1." Treat codes 5 and 6 as code 1 (EACCES) Btw, connect() returning -EPROTO confuses firefox, so that fallback to other/IPv4 addresses does not work: https://bugzilla.mozilla.org/show_bug.cgi?id=910773 Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:11:44 -04:00
David S. Miller	c12a22428a	Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next Marc Kleine-Budde says: ==================== this is a pull request for net-next. There are two patches from Gerhard Sittig, which improves the clock handling on mpc5121. Oliver Hartkopp provides a patch that adds a per rule limitation of frame hops. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 21:54:02 -04:00
David S. Miller	e7abfe4092	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please accept this batch of updates intended for the 3.12 stream. For the mac80211 bits, Johannes says this: "This time I have various improvements all over the place: IBSS, mesh, testmode, AP client powersave handling, one of the rare rfkill patches and some code cleanup." Also for mac80211: "And I also have some more changes for -next, just a few small fixes and improvements, nothing really stands out." And for iwlwifi: "This time I have some powersave work (notably uAPSD support), CQM offloads, support for a new firmware API and various code cleanups." Regarding the Bluetooth bits, Gustavo says: "Patches to 3.12, here we have: * implementation of a proper tty_port for RFCOMM devices, this fixes some issues people were seeing lately in the kernel. * Add voice_setting option for SCO, it is used for SCO Codec selection * bugfixes, small improvements and clean ups" For the NFC bits, Samuel says: "With this one we have: - A few pn533 improvements and minor fixes. Testing our pn533 driver against Google's NCI stack triggered a few issues that we fixed now. We also added Tx fragmentation support to this driver. - More NFC secure element handling. We added a GET_SE netlink command for getting all the discovered secure elements, and we defined 2 additional secure element netlink event (transaction and connectivity). We also fixed a couple of typos and copy-paste bugs from the secure element handling code. - Firmware download support for the pn544 driver. This chipset can enter a special mode where it's waiting for firmware blobs to replace the already flashed one. We now support that mode." With repect to the ath tree, Kalle says: "New features in ath10k are rx/tx checsumming in hw and survey scan implemented by Michal. Also he made fixes to different areas of the driver, most notable being fixing the case when using two streams and reducing the number of interface combinations to avoid firmware crashes. Bartosz did a clean related to how we handle SoC power save in PCI layer. For ath6kl Mohammed and Vasanth sent each a patch to fix two infrequent crashes." I also pulled the wireless tree into wireless-next to support a request from Johannes. On top of all that, there are the usual sort of driver updates. The mwifiex, brcmfmac, brcmsmac, ath9k, and rt2x00 drivers all get some attention, as does the bcma bus and a few other random bits here and there. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 21:45:31 -04:00
Daniel Borkmann	b1b72076b9	net: sctp: probe: allow more advanced ingress filtering by mark This is a follow-up commit for commit `b1dcdc68b1` ("net: tcp_probe: allow more advanced ingress filtering by mark") that allows for advanced SCTP probe module filtering based on skb mark (for a more detailed description and advantages using mark, refer to `b1dcdc68b1`). The current option to filter by a given port is still being preserved. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 21:44:11 -04:00
Tim Gardner	3e25c65ed0	net: neighbour: Remove CONFIG_ARPD This config option is superfluous in that it only guards a call to neigh_app_ns(). Enabling CONFIG_ARPD by default has no change in behavior. There will now be call to __neigh_notify() for each ARP resolution, which has no impact unless there is a user space daemon waiting to receive the notification, i.e., the case for which CONFIG_ARPD was designed anyways. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Joe Perches <joe@perches.com> Cc: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 21:41:43 -04:00
Linus Torvalds	32dad03d16	Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot of activities on the cgroup front. Most changes aren't visible to userland at all at this point and are laying foundation for the planned unified hierarchy. - The biggest change is decoupling the lifetime management of css (cgroup_subsys_state) from that of cgroup's. Because controllers (cpu, memory, block and so on) will need to be dynamically enabled and disabled, css which is the association point between a cgroup and a controller may come and go dynamically across the lifetime of a cgroup. Till now, css's were created when the associated cgroup was created and stayed till the cgroup got destroyed. Assumptions around this tight coupling permeated through cgroup core and controllers. These assumptions are gradually removed, which consists bulk of patches, and css destruction path is completely decoupled from cgroup destruction path. Note that decoupling of creation path is relatively easy on top of these changes and the patchset is pending for the next window. - cgroup has its own event mechanism cgroup.event_control, which is only used by memcg. It is overly complex trying to achieve high flexibility whose benefits seem dubious at best. Going forward, new events will simply generate file modified event and the existing mechanism is being made specific to memcg. This pull request contains prepatory patches for such change. - Various fixes and cleanups" Fixed up conflict in kernel/cgroup.c as per Tejun. * 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits) cgroup: fix cgroup_css() invocation in css_from_id() cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp() cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup cgroup: implement CFTYPE_NO_PREFIX cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax cgroup: fix cgroup_write_event_control() cgroup: fix subsystem file accesses on the root cgroup cgroup: change cgroup_from_id() to css_from_id() cgroup: use css_get() in cgroup_create() to check CSS_ROOT cpuset: remove an unncessary forward declaration cgroup: RCU protect each cgroup_subsys_state release cgroup: move subsys file removal to kill_css() cgroup: factor out kill_css() cgroup: decouple cgroup_subsys_state destruction from cgroup destruction cgroup: replace cgroup->css_kill_cnt with ->nr_css cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item cgroup: move cgroup->subsys[] assignment to online_css() cgroup: reorganize css init / exit paths cgroup: add __rcu modifier to cgroup->subsys[] ...	2013-09-03 18:25:03 -07:00
Bjørn Mork	2fcc800583	net: dsa: inherit addr_assign_type along with dev_addr A device inheriting a random or set address should reflect this in its addr_assign_type. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 20:57:49 -04:00
Bjørn Mork	6b93f4a1f2	net: vlan: inherit addr_assign_type along with dev_addr A device inheriting a random or set address should reflect this in its addr_assign_type. Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 20:57:49 -04:00
Andy Adamson	35fa5f7b35	SUNRPC refactor rpcauth_checkverf error returns Most of the time an error from the credops crvalidate function means the server has sent us a garbage verifier. The gss_validate function is the exception where there is an -EACCES case if the user GSS_context on the client has expired. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:25:09 -04:00
Andy Adamson	4de6caa270	SUNRPC new rpc_credops to test credential expiry This patch provides the RPC layer helper functions to allow NFS to manage data in the face of expired credentials - such as avoiding buffered WRITEs and COMMITs when the gss context will expire before the WRITEs are flushed and COMMITs are sent. These helper functions enable checking the expiration of an underlying credential key for a generic rpc credential, e.g. the gss_cred gss context gc_expiry which for Kerberos is set to the remaining TGT lifetime. A new rpc_authops key_timeout is only defined for the generic auth. A new rpc_credops crkey_to_expire is only defined for the generic cred. A new rpc_credops crkey_timeout is only defined for the gss cred. Set a credential key expiry watermark, RPC_KEY_EXPIRE_TIMEO set to 240 seconds as a default and can be set via a module parameter as we need to ensure there is time for any dirty data to be flushed. If key_timeout is called on a credential with an underlying credential key that will expire within watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON flag in the generic_cred acred so that the NFS layer can clean up prior to key expiration. Checking a generic credential's underlying credential involves a cred lookup. To avoid this lookup in the normal case when the underlying credential has a key that is valid (before the watermark), a notify flag is set in the generic credential the first time the key_timeout is called. The generic credential then stops checking the underlying credential key expiry, and the underlying credential (gss_cred) match routine then checks the key expiration upon each normal use and sets a flag in the associated generic credential only when the key expiration is within the watermark. This in turn signals the generic credential key_timeout to perform the extra credential lookup thereafter. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:25:08 -04:00
Andy Adamson	f1ff0c27fd	SUNRPC: don't map EKEYEXPIRED to EACCES in call_refreshresult The NFS layer needs to know when a key has expired. This change also returns -EKEYEXPIRED to the application, and the informative "Key has expired" error message is displayed. The user then knows that credential renewal is required. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-03 15:25:08 -04:00
Linus Torvalds	542a086ac7	Driver core patches for 3.12-rc1 Here's the big driver core pull request for 3.12-rc1. Lots of tiny changes here fixing up the way sysfs attributes are created, to try to make drivers simpler, and fix a whole class race conditions with creations of device attributes after the device was announced to userspace. All the various pieces are acked by the different subsystem maintainers. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (GNU/Linux) iEYEABECAAYFAlIlIPcACgkQMUfUDdst+ynUMwCaAnITsxyDXYQ4DqEsz8EcOtMk 718AoLrgnUZs3B+70AT34DVktg4HSThk =USl9 -----END PGP SIGNATURE----- Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg KH: "Here's the big driver core pull request for 3.12-rc1. Lots of tiny changes here fixing up the way sysfs attributes are created, to try to make drivers simpler, and fix a whole class race conditions with creations of device attributes after the device was announced to userspace. All the various pieces are acked by the different subsystem maintainers" * tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits) firmware loader: fix pending_fw_head list corruption drivers/base/memory.c: introduce help macro to_memory_block dynamic debug: line queries failing due to uninitialized local variable sysfs: sysfs_create_groups returns a value. debugfs: provide debugfs_create_x64() when disabled rbd: convert bus code to use bus_groups firmware: dcdbas: use binary attribute groups sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled driver core: add #include <linux/sysfs.h> to core files. HID: convert bus code to use dev_groups Input: serio: convert bus code to use drv_groups Input: gameport: convert bus code to use drv_groups driver core: firmware: use __ATTR_RW() driver core: core: use DEVICE_ATTR_RO driver core: bus: use DRIVER_ATTR_WO() driver core: create write-only attribute macros for devices and drivers sysfs: create __ATTR_WO() driver-core: platform: convert bus code to use dev_groups workqueue: convert bus code to use dev_groups MEI: convert bus code to use dev_groups ...	2013-09-03 11:37:15 -07:00
Cong Wang	5a17a390de	net: make snmp_mib_free static inline Fengguang reported: net/built-in.o: In function `in6_dev_finish_destroy': (.text+0x4ca7d): undefined reference to `snmp_mib_free' this is due to snmp_mib_free() is defined when CONFIG_INET is enabled, but in6_dev_finish_destroy() is now moved to core kernel. I think snmp_mib_free() is small enough to be inlined, so just make it static inline. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-02 21:00:50 -07:00
Trond Myklebust	280ebcf97c	SUNRPC: rpcauth_create needs to know about rpc_clnt clone status Ensure that we set rpc_clnt->cl_parent before calling rpc_client_register so that rpcauth_create can find any existing RPCSEC_GSS caches for this transport. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-02 13:32:48 -04:00
Trond Myklebust	eb6dc19d8e	RPCSEC_GSS: Share all credential caches on a per-transport basis Ensure that all struct rpc_clnt for any given socket/rdma channel share the same RPCSEC_GSS/krb5,krb5i,krb5p caches. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-02 12:48:40 -04:00
Trond Myklebust	414a629598	RPCSEC_GSS: Share rpc_pipes when an rpc_clnt owns multiple rpcsec auth caches Ensure that if an rpc_clnt owns more than one RPCSEC_GSS-based authentication mechanism, then those caches will share the same 'gssd' upcall pipe. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-01 11:12:43 -04:00
Trond Myklebust	298fc3558b	SUNRPC: Add a helper to allow sharing of rpc_pipefs directory objects Add support for looking up existing objects and creating new ones if there is no match. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-01 11:12:43 -04:00
Trond Myklebust	c36dcfe1f7	SUNRPC: Remove the rpc_client->cl_dentry It is now redundant. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-01 11:12:42 -04:00
Trond Myklebust	5f42b016d7	SUNRPC: Remove the obsolete auth-only interface for pipefs dentry management Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-01 11:12:41 -04:00
Trond Myklebust	1917228435	RPCSEC_GSS: Switch auth_gss to use the new framework for pipefs dentries Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-01 11:12:41 -04:00
Cong Wang	eb3c0d83cc	net: unify skb_udp_tunnel_segment() and skb_udp6_tunnel_segment() As suggested by Pravin, we can unify the code in case of duplicated code. Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:01 -04:00
Cong Wang	d949d826c0	ipv6: Add generic UDP Tunnel segmentation Similar to commit `7313626745` (tunneling: Add generic Tunnel segmentation) This patch adds generic tunneling offloading support for IPv6-UDP based tunnels. This can be used by tunneling protocols like VXLAN. Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:01 -04:00
Cong Wang	f564f45c45	vxlan: add ipv6 proxy support This patch adds the IPv6 version of "arp_reduce", ndisc_send_na() will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:01 -04:00
Cong Wang	f39dc1023d	ipv6: move in6_dev_finish_destroy() into core kernel in6_dev_put() will be needed by vxlan module, so is in6_dev_finish_destroy(). Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	e15a00aafa	vxlan: add ipv6 route short circuit support route short circuit only has IPv4 part, this patch adds the IPv6 part. nd_tbl will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	e4c7ed4153	vxlan: add ipv6 support This patch adds IPv6 support to vxlan device, as the new version RFC already mentions it: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 Cc: David Stevens <dlstevens@us.ibm.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	caf92bc400	ipv6: do not call ndisc_send_rs() with write lock Because vxlan module will call ip6_dst_lookup() in TX path, which will hold write lock. So we have to release this write lock before calling ndisc_send_rs(), otherwise could deadlock. Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	034dfc5df9	ipv6: export in6addr_loopback to modules It is needed by vxlan module. Noticed by Mike. Cc: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	5f81bd2e5d	ipv6: export a stub for IPv6 symbols used by vxlan In case IPv6 is compiled as a module, introduce a stub for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used by vxlan module. Suggested by Ben. This is an ugly but easy solution for now. Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	788787b559	ipv6: move ip6_local_out into core kernel It will be used the vxlan kernel module. Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	3ce9b35ff6	ipv6: move ip6_dst_hoplimit() into core kernel It will be used by vxlan, and may not be inlined. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:29:59 -04:00
stephen hemminger	34aedd3f3b	qdisc: fix build with !CONFIG_NET_SCHED Multiqueue scheduler refers to default_qdisc_ops; therefore the variable definition needs to be moved to handle case where net scheduler API is not available. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 18:09:45 -04:00
stephen hemminger	d2a7f269f9	qdisc: make args to qdisc_create_default const Fixes warnings introduced by the qdisc default patch. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 18:09:45 -04:00
Eric W. Biederman	c7b96acf14	userns: Kill nsown_capable it makes the wrong thing easy nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and CAP_SETGID. For the existing users it doesn't noticably simplify things and from the suggested patches I have seen it encourages people to do the wrong thing. So remove nsown_capable. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-08-30 23:44:11 -07:00
stephen hemminger	6da7c8fcbc	qdisc: allow setting default queuing discipline By default, the pfifo_fast queue discipline has been used by default for all devices. But we have better choices now. This patch allow setting the default queueing discipline with sysctl. This allows easy use of better queueing disciplines on all devices without having to use tc qdisc scripts. It is intended to allow an easy path for distributions to make fq_codel or sfq the default qdisc. This patch also makes pfifo_fast more of a first class qdisc, since it is now possible to manually override the default and explicitly use pfifo_fast. The behavior for systems who do not use the sysctl is unchanged, they still get pfifo_fast Also removes leftover random # in sysctl net core. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 00:32:32 -04:00
Linus Torvalds	a8787645e1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) There was a simplification in the ipv6 ndisc packet sending attempted here, which avoided using memory accounting on the per-netns ndisc socket for sending NDISC packets. It did fix some important issues, but it causes regressions so it gets reverted here too. Specifically, the problem with this change is that the IPV6 output path really depends upon there being a valid skb->sk attached. The reason we want to do this change in some form when we figure out how to do it right, is that if a device goes down the ndisc_sk socket send queue will fill up and block NDISC packets that we want to send to other devices too. That's really bad behavior. Hopefully Thomas can come up with a better version of this change. 2) Fix a severe TCP performance regression by reverting a change made to dev_pick_tx() quite some time ago. From Eric Dumazet. 3) TIPC returns wrongly signed error codes, fix from Erik Hugne. 4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the skb->sk too early. Fix from Li Hongjun. 5) RAW ipv4 sockets can use the wrong routing key during lookup, from Chris Clark. 6) Similar to #1 revert an older change that tried to use plain alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner mark which needs to see the skb->sk for such frames. From Phil Oester. 7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz, specifically in the handling of virtual functions. 8) IPSEC path error propagations to sockets is not done properly when we have v4 in v6, and v6 in v4 type rules. Fix from Hannes Frederic Sowa. 9) Fix missing channel context release in mac80211, from Johannes Berg. 10) Fix network namespace handing wrt. SCM_RIGHTS, from Andy Lutomirski. 11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic drivers. From Michal Schmidt. 12) Hopefully a complete and correct fix for the genetlink dump locking and module reference counting. From Pravin B Shelar. 13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir. 14) Fix handling of timestamp offset when restoring a snapshotted TCP socket. From Andrew Vagin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) net: fec: fix time stamping logic after napi conversion net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay mISDN: return -EINVAL on error in dsp_control_req() net: revert `8728c544a9` ("net: dev_pick_tx() fix") Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages" ipv4 tunnels: fix an oops when using ipip/sit with IPsec tipc: set sk_err correctly when connection fails tcp: tcp_make_synack() should use sock_wmalloc bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones ipv6: Don't depend on per socket memory for neighbour discovery messages ipv4: sendto/hdrincl: don't use destination address found in header tcp: don't apply tsoffset if rcv_tsecr is zero tcp: initialize rcv_tstamp for restored sockets net: xilinx: fix memleak net: usb: Add HP hs2434 device to ZLP exception table net: add cpu_relax to busy poll loop net: stmmac: fixed the pbl setting with DT genl: Hold reference on correct module while netlink-dump. genl: Fix genl dumpit() locking. xfrm: Fix potential null pointer dereference in xdst_queue_output ...	2013-08-30 17:43:17 -07:00
Daniel Borkmann	2d98c29b6f	net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay While looking into MLDv1/v2 code, I noticed that bridging code does not convert it's max delay into jiffies for MLDv2 messages as we do in core IPv6' multicast code. RFC3810, 5.1.3. Maximum Response Code says: The Maximum Response Code field specifies the maximum time allowed before sending a responding Report. The actual time allowed, called the Maximum Response Delay, is represented in units of milliseconds, and is derived from the Maximum Response Code as follows: [...] As we update timers that work with jiffies, we need to convert it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Linus Lüssing <linus.luessing@web.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:56:47 -04:00
Eric Dumazet	702821f4ea	net: revert `8728c544a9` ("net: dev_pick_tx() fix") commit `8728c544a9` ("net: dev_pick_tx() fix") and commit `b6fe83e952` ("bonding: refine IFF_XMIT_DST_RELEASE capability") are quite incompatible : Queue selection is disabled because skb dst was dropped before entering bonding device. This causes major performance regression, mainly because TCP packets for a given flow can be sent to multiple queues. This is particularly visible when using the new FQ packet scheduler with MQ + FQ setup on the slaves. We can safely revert the first commit now that `416186fbf8` ("net: Split core bits of netdev_pick_tx into __netdev_pick_tx") properly caps the queue_index. Reported-by: Xi Wang <xii@google.com> Diagnosed-by: Xi Wang <xii@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Denys Fedorysychenko <nuclearcat@nuclearcat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:48:04 -04:00
David S. Miller	25ad6117e7	Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages" This reverts commit `1f324e3887`. It seems to cause regressions, and in particular the output path really depends upon there being a socket attached to skb->sk for checks such as sk_mc_loop(skb->sk) for example. See ip6_output_finish2(). Reported-by: Stephen Warren <swarren@wwwdotorg.org> Reported-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:39:33 -04:00
Thomas Graf	816c5b5b01	ipv6: Remove redundant sk variable A sk variable initialized to ndisc_sk is already available outside of the branch. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:18:59 -04:00
Li Hongjun	737e828bdb	ipv4 tunnels: fix an oops when using ipip/sit with IPsec Since commit `3d7b46cd20` (ip_tunnel: push generic protocol handling to ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on an IPv4 over IPv4 tunnel. xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb). Signed-off-by: Li Hongjun <hongjun.li@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:13:28 -04:00
Erik Hugne	2c8d851823	tipc: set sk_err correctly when connection fails Should a connect fail, if the publication/server is unavailable or due to some other error, a positive value will be returned and errno is never set. If the application code checks for an explicit zero return from connect (success) or a negative return (failure), it will not catch the error and subsequent send() calls will fail as shown from the strace snippet below. socket(0x1e /* PF_??? /, SOCK_SEQPACKET, 0) = 3 connect(3, {sa_family=0x1e / AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111 sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe) The reason for this behaviour is that TIPC wrongly inverts error codes set in sk_err. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 16:06:57 -04:00
Phil Oester	eb8895debe	tcp: tcp_make_synack() should use sock_wmalloc In commit `90ba9b19` (tcp: tcp_make_synack() can use alloc_skb()), Eric changed the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so, the netfilter owner match lost its ability to block the SYNACK packet on outbound listening sockets. Revert the change, restoring the owner match functionality. This closes netfilter bugzilla #847. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 16:02:04 -04:00
Linus Lüssing	cc0fdd8028	bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones Currently we would still potentially suffer multicast packet loss if there is just either an IGMP or an MLD querier: For the former case, we would possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is because we are currently assuming that if either an IGMP or MLD querier is present that the other one is present, too. This patch makes the behaviour and fix added in "bridge: disable snooping if there is no querier" (`b00589af3b`) to also work if there is either just an IGMP or an MLD querier on the link: It refines the deactivation of the snooping to be protocol specific by using separate timers for the snooped IGMP and MLD queries as well as separate timers for our internal IGMP and MLD queriers. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 15:24:37 -04:00
Yuchung Cheng	1b7fdd2ab5	tcp: do not use cached RTT for RTT estimation RTT cached in the TCP metrics are valuable for the initial timeout because SYN RTT usually does not account for serialization delays on low BW path. However using it to seed the RTT estimator maybe disruptive because other components (e.g., pacing) require the smooth RTT to be obtained from actual connection. The solution is to use the higher cached RTT to set the first RTO conservatively like tcp_rtt_estimator(), but avoid seeding the other RTT estimator variables such as srtt. It is also a good idea to keep RTO conservative to obtain the first RTT sample, and the performance is insured by TCP loss probe if SYN RTT is available. To keep the seeding formula consistent across SYN RTT and cached RTT, the rttvar is twice the cached RTT instead of cached RTTVAR value. The reason is because cached variation may be too small (near min RTO) which defeats the purpose of being conservative on first RTO. However the metrics still keep the RTT variations as they might be useful for user applications (through ip). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 15:14:38 -04:00
Eric Dumazet	08f89b981b	pkt_sched: fq: prefetch() fix kbuild bot reported following m68k build error : net/sched/sch_fq.c: In function 'fq_dequeue': >> net/sched/sch_fq.c:491:2: error: implicit declaration of function 'prefetch' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors While we are fixing this, move this prefetch() call a bit earlier. Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 14:51:59 -04:00
Trond Myklebust	6739ffb754	SUNRPC: Add a framework to clean up management of rpc_pipefs directories The current system requires everyone to set up notifiers, manage directory locking, etc. What we really want to do is have the rpc_client create its directory, and then create all the entries. This patch will allow the RPCSEC_GSS and NFS code to register all the objects that they want to have appear in the directory, and then have the sunrpc code call them back to actually create/destroy their pipefs dentries when the rpc_client creates/destroys the parent. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:38 -04:00
Trond Myklebust	6b2fddd3e7	RPCSEC_GSS: Fix an Oopsable condition when creating/destroying pipefs objects If an error condition occurs on rpc_pipefs creation, or the user mounts rpc_pipefs and then unmounts it, then the dentries in struct gss_auth need to be reset to NULL so that a second call to gss_pipes_dentries_destroy doesn't try to free them again. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:37 -04:00
Trond Myklebust	e726340ac9	RPCSEC_GSS: Further cleanups Don't pass the rpc_client as a parameter, when what we really want is the net namespace. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:37 -04:00
Trond Myklebust	c219066103	SUNRPC: Replace clnt->cl_principal The clnt->cl_principal is being used exclusively to store the service target name for RPCSEC_GSS/krb5 callbacks. Replace it with something that is stored only in the RPCSEC_GSS-specific code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:36 -04:00
Trond Myklebust	bd4a3eb15b	RPCSEC_GSS: Clean up upcall message allocation Optimise away gss_encode_msg: we don't need to look up the pipe version a second time. Save the gss target name in struct gss_auth. It is a property of the auth cache itself, and doesn't really belong in the rpc_client. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:36 -04:00
Trond Myklebust	41b6b4d0b8	SUNRPC: Cleanup rpc_setup_pipedir The directory name is _always_ clnt->cl_program->pipe_dir_name. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:35 -04:00
Trond Myklebust	1dada8e1f9	SUNRPC: Remove unused struct rpc_clnt field cl_protname Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:35 -04:00
Trond Myklebust	55909f21a1	SUNRPC: Deprecate rpc_client->cl_protname It just duplicates the cl_program->name, and is not used in any fast paths where the extra dereference will cause a hit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-30 09:19:34 -04:00
Eric Dumazet	afe4fd0624	pkt_sched: fq: Fair Queue packet scheduler - Uses perfect flow match (not stochastic hash like SFQ/FQ_codel) - Uses the new_flow/old_flow separation from FQ_codel - New flows get an initial credit allowing IW10 without added delay. - Special FIFO queue for high prio packets (no need for PRIO + FQ) - Uses a hash table of RB trees to locate the flows at enqueue() time - Smart on demand gc (at enqueue() time, RB tree lookup evicts old unused flows) - Dynamic memory allocations. - Designed to allow millions of concurrent flows per Qdisc. - Small memory footprint : ~8K per Qdisc, and 104 bytes per flow. - Single high resolution timer for throttled flows (if any). - One RB tree to link throttled flows. - Ability to have a max rate per flow. We might add a socket option to add per socket limitation. Attempts have been made to add TCP pacing in TCP stack, but this seems to add complex code to an already complex stack. TCP pacing is welcomed for flows having idle times, as the cwnd permits TCP stack to queue a possibly large number of packets. This removes the 'slow start after idle' choice, hitting badly large BDP flows, and applications delivering chunks of data as video streams. Nicely spaced packets : Here interface is 10Gbit, but flow bottleneck is ~20Mbit cwin is big, yet FQ avoids the typical bursts generated by TCP (as in netperf TCP_RR -- -r 100000,100000) 15:01:23.545279 IP A > B: . 78193:81089(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.545394 IP B > A: . ack 81089 win 3668 <nop,nop,timestamp 11597985 1115> 15:01:23.546488 IP A > B: . 81089:83985(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.546565 IP B > A: . ack 83985 win 3668 <nop,nop,timestamp 11597986 1115> 15:01:23.547713 IP A > B: . 83985:86881(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.547778 IP B > A: . ack 86881 win 3668 <nop,nop,timestamp 11597987 1115> 15:01:23.548911 IP A > B: . 86881:89777(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.548949 IP B > A: . ack 89777 win 3668 <nop,nop,timestamp 11597988 1115> 15:01:23.550116 IP A > B: . 89777:92673(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.550182 IP B > A: . ack 92673 win 3668 <nop,nop,timestamp 11597989 1115> 15:01:23.551333 IP A > B: . 92673:95569(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.551406 IP B > A: . ack 95569 win 3668 <nop,nop,timestamp 11597991 1115> 15:01:23.552539 IP A > B: . 95569:98465(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.552576 IP B > A: . ack 98465 win 3668 <nop,nop,timestamp 11597992 1115> 15:01:23.553756 IP A > B: . 98465:99913(1448) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554138 IP A > B: P 99913:100001(88) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554204 IP B > A: . ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.554234 IP B > A: . 65248:68144(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.555620 IP B > A: . 68144:71040(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.557005 IP B > A: . 71040:73936(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.558390 IP B > A: . 73936:76832(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.559773 IP B > A: . 76832:79728(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.561158 IP B > A: . 79728:82624(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.562543 IP B > A: . 82624:85520(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.563928 IP B > A: . 85520:88416(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.565313 IP B > A: . 88416:91312(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.566698 IP B > A: . 91312:94208(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.568083 IP B > A: . 94208:97104(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.569467 IP B > A: . 97104:100000(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.570852 IP B > A: . 100000:102896(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.572237 IP B > A: . 102896:105792(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.573639 IP B > A: . 105792:108688(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.575024 IP B > A: . 108688:111584(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.576408 IP B > A: . 111584:114480(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.577793 IP B > A: . 114480:117376(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> TCP timestamps show that most packets from B were queued in the same ms timeframe (TSval 1159799{3,4}), but FQ managed to send them right in time to avoid a big burst. In slow start or steady state, very few packets are throttled [1] FQ gets a bunch of tunables as : limit : max number of packets on whole Qdisc (default 10000) flow_limit : max number of packets per flow (default 100) quantum : the credit per RR round (default is 2 MTU) initial_quantum : initial credit for new flows (default is 10 MTU) maxrate : max per flow rate (default : unlimited) buckets : number of RB trees (default : 1024) in hash table. (consumes 8 bytes per bucket) [no]pacing : disable/enable pacing (default is enable) All of them can be changed on a live qdisc. $ tc qd add dev eth0 root fq help Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ] [ quantum BYTES ] [ initial_quantum BYTES ] [ maxrate RATE ] [ buckets NUMBER ] [ [no]pacing ] $ tc -s -d qd qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 256 quantum 3028 initial_quantum 15140 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14) backlog 0b 0p requeues 14 511 flows, 511 inactive, 0 throttled 110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit [1] Except if initial srtt is overestimated, as if using cached srtt in tcp metrics. We'll provide a fix for this issue. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 21:38:31 -04:00
Oliver Hartkopp	391ac1282d	can: gw: add a per rule limitation of frame hops Usually the received CAN frames can be processed/routed as much as 'max_hops' times (which is given at module load time of the can-gw module). Introduce a new configuration option to reduce the number of possible hops for a specific gateway rule to a value smaller then max_hops. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2013-08-29 22:58:24 +02:00
Daniel Borkmann	f55d112e52	net: packet: use reciprocal_divide in fanout_demux_hash Instead of hard-coding reciprocal_divide function, use the inline function from reciprocal_div.h. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:43:29 -04:00
Daniel Borkmann	5df0ddfbc9	net: packet: add randomized fanout scheduler We currently allow for different fanout scheduling policies in pf_packet such as scheduling by skb's rxhash, round-robin, by cpu, and rollover. Also allow for a random, equidistributed selection of the socket from the fanout process group. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:43:29 -04:00
Veaceslav Falico	48311f4685	net: add netdev_upper_get_next_dev_rcu(dev, iter) This function returns the next dev in the dev->upper_dev_list after the struct list_head *iter position, and updates iter accordingly. Returns NULL if there are no devices left. Caller must hold RCU read lock. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:19:42 -04:00
Veaceslav Falico	620f3186ca	net: remove search_list from netdev_adjacent We already don't need it cause we see every upper/lower device in the list already. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:19:42 -04:00
Veaceslav Falico	5d261913ca	net: add lower_dev_list to net_device and make a full mesh This patch adds lower_dev_list list_head to net_device, which is the same as upper_dev_list, only for lower devices, and begins to use it in the same way as the upper list. It also changes the way the whole adjacent device lists work - now they contain all of upper/lower devices, not only the first level. The first level devices are distinguished by the bool neighbour field in netdev_adjacent, also added by this patch. There are cases when a device can be added several times to the adjacent list, the simplest would be: /---- eth0.10 ---\ eth0- --- bond0 \---- eth0.20 ---/ where both bond0 and eth0 'see' each other in the adjacent lists two times. To avoid duplication of netdev_adjacent structures ref_nr is being kept as the number of times the device was added to the list. The 'full view' is achieved by adding, on link creation, all of the upper_dev's upper_dev_list devices as upper devices to all of the lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice versa. On unlink they are removed using the same logic. I've tested it with thousands vlans/bonds/bridges, everything works ok and no observable lags even on a huge number of interfaces. Memory footprint for 128 devices interconnected with each other via both upper and lower (which is impossible, but for the comparison) lists would be: 1281282*sizeof(netdev_adjacent) = 1.5MB but in the real world we usualy have at most several devices with slaves and a lot of vlans, so the footprint will be much lower. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:19:42 -04:00
Veaceslav Falico	aa9d85605f	net: rename netdev_upper to netdev_adjacent Rename the structure to reflect the upcoming addition of lower_dev_list. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:19:41 -04:00
David S. Miller	79f9ab7e0a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== This pull request fixes some issues that arise when 6in4 or 4in6 tunnels are used in combination with IPsec, all from Hannes Frederic Sowa and a null pointer dereference when queueing packets to the policy hold queue. 1) We might access the local error handler of the wrong address family if 6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer to the correct local_error to xfrm_state_afinet. 2) Add a helper function to always refer to the correct interpretation of skb->sk. 3) Call skb_reset_inner_headers to record the position of the inner headers when adding a new one in various ipv6 tunnels. This is needed to identify the addresses where to send back errors in the xfrm layer. 4) Dereference inner ipv6 header if encapsulated to always call the right error handler. 5) Choose protocol family by skb protocol to not call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is used in ipv4 mode. 6) Partly revert "xfrm: introduce helper for safe determination of mtu" because this introduced pmtu discovery problems. 7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs. We need this to get the correct mtu informations in xfrm. 8) Fix null pointer dereference in xdst_queue_output. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:05:30 -04:00
Thomas Graf	1f324e3887	ipv6: Don't depend on per socket memory for neighbour discovery messages Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, simply use alloc_skb() and don't depend on any socket wmem space any longer. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:01:05 -04:00
Chris Clark	c27c9322d0	ipv4: sendto/hdrincl: don't use destination address found in header ipv4: raw_sendmsg: don't use header's destination address A sendto() regression was bisected and found to start with commit `f8126f1d51` (ipv4: Adjust semantics of rt->rt_gateway.) The problem is that it tries to ARP-lookup the constructed packet's destination address rather than the explicitly provided address. Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used. cf. commit `2ad5b9e4bd` Reported-by: Chris Clark <chris.clark@alcatel-lucent.com> Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com> Tested-by: Chris Clark <chris.clark@alcatel-lucent.com> Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:57:52 -04:00
Daniel Borkmann	7613f5fe11	net: sctp: sctp_verify_init: clean up mandatory checks and add comment Add a comment related to RFC4960 explaning why we do not check for initial TSN, and while at it, remove yoda notation checks and clean up code from checks of mandatory conditions. That's probably just really minor, but makes reviewing easier. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:54:48 -04:00
Eric Dumazet	95bd09eb27	tcp: TSO packets automatic sizing After hearing many people over past years complaining against TSO being bursty or even buggy, we are proud to present automatic sizing of TSO packets. One part of the problem is that tcp_tso_should_defer() uses an heuristic relying on upcoming ACKS instead of a timer, but more generally, having big TSO packets makes little sense for low rates, as it tends to create micro bursts on the network, and general consensus is to reduce the buffering amount. This patch introduces a per socket sk_pacing_rate, that approximates the current sending rate, and allows us to size the TSO packets so that we try to send one packet every ms. This field could be set by other transports. Patch has no impact for high speed flows, where having large TSO packets makes sense to reach line rate. For other flows, this helps better packet scheduling and ACK clocking. This patch increases performance of TCP flows in lossy environments. A new sysctl (tcp_min_tso_segs) is added, to specify the minimal size of a TSO packet (default being 2). A follow-up patch will provide a new packet scheduler (FQ), using sk_pacing_rate as an input to perform optional per flow pacing. This explains why we chose to set sk_pacing_rate to twice the current rate, allowing 'slow start' ramp up. sk_pacing_rate = 2 * cwnd * mss / srtt v2: Neal Cardwell reported a suspect deferring of last two segments on initial write of 10 MSS, I had to change tcp_tso_should_defer() to take into account tp->xmit_size_goal_segs Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Tom Herbert <therbert@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:50:06 -04:00
Hannes Frederic Sowa	b800c3b966	ipv6: drop fragmented ndisc packets by default (RFC 6980) This patch implements RFC6980: Drop fragmented ndisc packets by default. If a fragmented ndisc packet is received the user is informed that it is possible to disable the check. Cc: Fernando Gont <fernando@gont.com.ar> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:32:08 -04:00
Florian Fainelli	fd094808a0	bridge: inherit slave devices needed_headroom Some slave devices may have set a dev->needed_headroom value which is different than the default one, most likely in order to prepend a hardware descriptor in front of the Ethernet frame to send. Whenever a new slave is added to a bridge, ensure that we update the needed_headroom value accordingly to account for the slave needed_headroom value. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:17:09 -04:00
Andrew Vagin	e3e1202831	tcp: don't apply tsoffset if rcv_tsecr is zero The zero value means that tsecr is not valid, so it's a special case. tsoffset is used to customize tcp_time_stamp for one socket. tsoffset is usually zero, it's used when a socket was moved from one host to another host. Currently this issue affects logic of tcp_rcv_rtt_measure_ts. Due to incorrect value of rcv_tsecr, tcp_rcv_rtt_measure_ts sets rto to TCP_RTO_MAX. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:11:12 -04:00
Andrew Vagin	c7781a6e3c	tcp: initialize rcv_tstamp for restored sockets u32 rcv_tstamp; /* timestamp of last received ACK */ Its value used in tcp_retransmit_timer, which closes socket if the last ack was received more then TCP_RTO_MAX ago. Currently rcv_tstamp is initialized to zero and if tcp_retransmit_timer is called before receiving a first ack, the connection is closed. This patch initializes rcv_tstamp to a timestamp, when a socket was restored. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:11:11 -04:00
John W. Linville	0d8165e9fc	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c	2013-08-29 14:08:24 -04:00
Eric W. Biederman	7dc5dbc879	sysfs: Restrict mounting sysfs Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights over the net namespace. The principle here is if you create or have capabilities over it you can mount it, otherwise you get to live with what other people have mounted. Instead of testing this with a straight forward ns_capable call, perform this check the long and torturous way with kobject helpers, this keeps direct knowledge of namespaces out of sysfs, and preserves the existing sysfs abstractions. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-08-28 21:35:14 -07:00
Pravin B Shelar	33c6b1f6b1	genl: Hold reference on correct module while netlink-dump. netlink dump operations take module as parameter to hold reference for entire netlink dump duration. Currently it holds ref only on genl module which is not correct when we use ops registered to genl from another module. Following patch adds module pointer to genl_ops so that netlink can hold ref count on it. CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-28 17:19:17 -04:00
Pravin B Shelar	9b96309c5b	genl: Fix genl dumpit() locking. In case of genl-family with parallel ops off, dumpif() callback is expected to run under genl_lock, But commit `def3117493` (genl: Allow concurrent genl callbacks.) changed this behaviour where only first dumpit() op was called under genl-lock. For subsequent dump, only nlk->cb_lock was taken. Following patch fixes it by defining locked dumpit() and done() callback which takes care of genl-locking. CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-28 17:19:17 -04:00
Trond Myklebust	347e2233b7	SUNRPC: Fix memory corruption issue on 32-bit highmem systems Some architectures, such as ARM-32 do not return the same base address when you call kmap_atomic() twice on the same page. This causes problems for the memmove() call in the XDR helper routine "_shift_data_right_pages()", since it defeats the detection of overlapping memory ranges, and has been seen to corrupt memory. The fix is to distinguish between the case where we're doing an inter-page copy or not. In the former case of we know that the memory ranges cannot possibly overlap, so we can additionally micro-optimise by replacing memmove() with memcpy(). Reported-by: Mark Young <MYoung@nvidia.com> Reported-by: Matt Craighead <mcraighead@nvidia.com> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Matt Craighead <mcraighead@nvidia.com>	2013-08-28 15:43:43 -04:00
John W. Linville	f3e979a52c	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2013-08-28 13:51:40 -04:00
John W. Linville	cd80e107b7	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2013-08-28 13:49:20 -04:00
John W. Linville	b35c809708	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c net/mac80211/ibss.c	2013-08-28 10:36:09 -04:00
Antonio Quartulli	c6eaa3f067	batman-adv: send GW_DEL event when the gw client mode is deselected Whenever the GW client mode is deselected, a DEL event has to be sent in order to tell userspace that the current gateway has been lost. Send the uevent on state change only if a gateway was currently selected. Reported-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>	2013-08-28 11:33:00 +02:00
Simon Wunderlich	c00a072d3f	batman-adv: Start new development cycle Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-08-28 11:31:52 +02:00
Antonio Quartulli	791c2a2d3f	batman-adv: move enum definition at the top of the file Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-08-28 11:31:51 +02:00
Simon Wunderlich	c54f38c9aa	batman-adv: set skb priority according to content The skb priority field may help the wireless driver to choose the right queue (e.g. WMM queues). This should be set in batman-adv, as this information is only available here. This patch adds support for IPv4/IPv6 DS fields and VLAN PCP. Note that only VLAN PCP is used if a VLAN header is present. Also initially set TC_PRIO_CONTROL only for self-generated packets, and keep the priority set by higher layers. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-08-28 11:31:50 +02:00
Steffen Klassert	302a50bc94	xfrm: Fix potential null pointer dereference in xdst_queue_output The net_device might be not set on the skb when we try refcounting. This leads to a null pointer dereference in xdst_queue_output(). It turned out that the refcount to the net_device is not needed after all. The dst_entry has a refcount to the net_device before we queue the skb, so it can't go away. Therefore we can remove the refcount on queueing to fix the null pointer dereference. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-28 08:47:14 +02:00
David S. Miller	5b2941b18d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A number of significant new features and optimizations for net-next/3.12. Highlights are: * "Megaflows", an optimization that allows userspace to specify which flow fields were used to compute the results of the flow lookup. This allows for a major reduction in flow setups (the major performance bottleneck in Open vSwitch) without reducing flexibility. * Converting netlink dump operations to use RCU, allowing for additional parallelism in userspace. * Matching and modifying SCTP protocol fields. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 22:11:18 -04:00
Florian Westphal	b7e092c05b	netfilter: ctnetlink: fix uninitialized variable net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_nfqueue_attach_expect': 'helper' may be used uninitialized in this function It was only initialized in if CTA_EXPECT_HELP_NAME attribute was present, it must be NULL otherwise. Problem added recently in `bd077937` (netfilter: nfnetlink_queue: allow to attach expectations to conntracks). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:28:19 +02:00
Patrick McHardy	4ad362282c	netfilter: add IPv6 SYNPROXY target Add an IPv6 version of the SYNPROXY target. The main differences to the IPv4 version is routing and IP header construction. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:28:13 +02:00
Patrick McHardy	81eb6a1487	net: syncookies: export cookie_v6_init_sequence/cookie_v6_check Extract the local TCP stack independant parts of tcp_v6_init_sequence() and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY target. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:28:04 +02:00
Patrick McHardy	48b1de4c11	netfilter: add SYNPROXY core/target Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy core with common functions and an address family specific target. The SYNPROXY receives the connection request from the client, responds with a SYN/ACK containing a SYN cookie and announcing a zero window and checks whether the final ACK from the client contains a valid cookie. It then establishes a connection to the original destination and, if successful, sends a window update to the client with the window size announced by the server. Support for timestamps, SACK, window scaling and MSS options can be statically configured as target parameters if the features of the server are known. If timestamps are used, the timestamp value sent back to the client in the SYN/ACK will be different from the real timestamp of the server. In order to now break PAWS, the timestamps are translated in the direction server->client. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:27:54 +02:00
Patrick McHardy	0198230b77	net: syncookies: export cookie_v4_init_sequence/cookie_v4_check Extract the local TCP stack independant parts of tcp_v4_init_sequence() and cookie_v4_check() and export them for use by the upcoming SYNPROXY target. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:27:44 +02:00
Patrick McHardy	41d73ec053	netfilter: nf_conntrack: make sequence number adjustments usuable without NAT Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:26:48 +02:00
Nathan Hintz	706f5151e3	netfilter: nf_defrag_ipv6.o included twice 'nf_defrag_ipv6' is built as a separate module; it shouldn't be included in the 'nf_conntrack_ipv6' module as well. Signed-off-by: Nathan Hintz <nlhintz@hotmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:13:41 +02:00
Phil Oester	affe759dba	netfilter: ip[6]t_REJECT: tcp-reset using wrong MAC source if bridged As reported by Casper Gripenberg, in a bridged setup, using ip[6]t_REJECT with the tcp-reset option sends out reset packets with the src MAC address of the local bridge interface, instead of the MAC address of the intended destination. This causes some routers/firewalls to drop the reset packet as it appears to be spoofed. Fix this by bypassing ip[6]_local_out and setting the MAC of the sender in the tcp reset packet. This closes netfilter bugzilla #531. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:13:12 +02:00
Andy Zhou	5828cd9a68	openvswitch: optimize flow compare and mask functions Make sure the sw_flow_key structure and valid mask boundaries are always machine word aligned. Optimize the flow compare and mask operations using machine word size operations. This patch improves throughput on average by 15% when CPU is the bottleneck of forwarding packets. This patch is inspired by ideas and code from a patch submitted by Peter Klausler titled "replace memcmp() with specialized comparator". However, The original patch only optimizes for architectures support unaligned machine word access. This patch optimizes for all architectures. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-27 13:13:09 -07:00
David S. Miller	decf4f3f2d	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== This is one more set of fixes intended for the 3.11 stream... For the mac80211 bits, Johannes says: "I have three more patches for the 3.11 stream: Felix's fix for the fairly visible brcmsmac crash, a fix from Simon for an IBSS join bug I found and a fix for a channel context bug in IBSS I'd introduced." Along with those... Sujith Manoharan makes a minor change to not use a PLL hang workaroun for AR9550. This one-liner fixes a couple of bugs reported in the Red Hat bugzilla. Helmut Schaa addresses an ath9k_htc bug that mangles frame headers during Tx. This fix is small, tested by the bug reported and isolated to ath9k_htc. Stanislaw Gruszka reverts a recent iwl4965 change that broke rfkill notification to user space. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 15:54:47 -04:00
Daniel Borkmann	b1dcdc68b1	net: tcp_probe: allow more advanced ingress filtering by mark Currently, the tcp_probe snooper can either filter packets by a given port (handed to the module via module parameter e.g. port=80) or lets all TCP traffic pass (port=0, default). When a port is specified, the port number is tested against the sk's source/destination port. Thus, if one of them matches, the information will be further processed for the log. As this is quite limited, allow for more advanced filtering possibilities which can facilitate debugging/analysis with the help of the tcp_probe snooper. Therefore, similarly as added to BPF machine in commit `7e75f93e` ("pkt_sched: ingress socket filter by mark"), add the possibility to use skb->mark as a filter. If the mark is not being used otherwise, this allows ingress filtering by flow (e.g. in order to track updates from only a single flow, or a subset of all flows for a given port) and other things such as dynamic logging and reconfiguration without removing/re-inserting the tcp_probe module, etc. Simple example: insmod net/ipv4/tcp_probe.ko fwmark=8888 full=1 ... iptables -A INPUT -i eth4 -t mangle -p tcp --dport 22 \ --sport 60952 -j MARK --set-mark 8888 [... sampling interval ...] iptables -D INPUT -i eth4 -t mangle -p tcp --dport 22 \ --sport 60952 -j MARK --set-mark 8888 The current option to filter by a given port is still being preserved. A similar approach could be done for the sctp_probe module as a follow-up. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 15:53:34 -04:00
Dan Carpenter	dbcae088fa	libceph: create_singlethread_workqueue() doesn't return ERR_PTRs create_singlethread_workqueue() returns NULL on error, and it doesn't return ERR_PTRs. I tweaked the error handling a little to be consistent with earlier in the function. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-08-27 12:26:31 -07:00
Dan Carpenter	b72e19b922	libceph: potential NULL dereference in ceph_osdc_handle_map() There are two places where we read "nr_maps" if both of them are set to zero then we would hit a NULL dereference here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-08-27 12:26:30 -07:00
Dan Carpenter	1874119664	libceph: fix error handling in handle_reply() We've tried to fix the error paths in this function before, but there is still a hidden goto in the ceph_decode_need() macro which goes to the wrong place. We need to release the "req" and unlock a mutex before returning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-08-27 12:26:30 -07:00
Andy Lutomirski	d661684cf6	net: Check the correct namespace when spoofing pid over SCM_RIGHTS This is a security bug. The follow-up will fix nsproxy to discourage this type of issue from happening again. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 13:52:52 -04:00
Andy Zhou	02237373b1	openvswitch: Rename key_len to key_end Key_end is a better name describing the ending boundary than key_len. Rename those variables to make it less confusing. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-26 14:03:14 -07:00
Joe Stringer	a175a72330	openvswitch: Add SCTP support This patch adds support for rewriting SCTP src,dst ports similar to the functionality already available for TCP/UDP. Rewriting SCTP ports is expensive due to double-recalculation of the SCTP checksums; this is performed to ensure that packets traversing OVS with invalid checksums will continue to the destination with any checksum corruption intact. Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-26 14:03:13 -07:00
David S. Miller	b05930f5d1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-26 16:37:08 -04:00
Will Deacon	50192abe02	fs/9p: avoid accessing utsname after namespace has been torn down During trinity fuzzing in a kvmtool guest, I stumbled across the following: Unable to handle kernel NULL pointer dereference at virtual address 00000004 PC is at v9fs_file_do_lock+0xc8/0x1a0 LR is at v9fs_file_do_lock+0x48/0x1a0 [<c01e2ed0>] (v9fs_file_do_lock+0xc8/0x1a0) from [<c0119154>] (locks_remove_flock+0x8c/0x124) [<c0119154>] (locks_remove_flock+0x8c/0x124) from [<c00d9bf0>] (__fput+0x58/0x1e4) [<c00d9bf0>] (__fput+0x58/0x1e4) from [<c0044340>] (task_work_run+0xac/0xe8) [<c0044340>] (task_work_run+0xac/0xe8) from [<c002e36c>] (do_exit+0x6bc/0x8d8) [<c002e36c>] (do_exit+0x6bc/0x8d8) from [<c002e674>] (do_group_exit+0x3c/0xb0) [<c002e674>] (do_group_exit+0x3c/0xb0) from [<c002e6f8>] (__wake_up_parent+0x0/0x18) I believe this is due to an attempt to access utsname()->nodename, after exit_task_namespaces() has been called, leaving current->nsproxy->uts_ns as NULL and causing the above dereference. A similar issue was fixed for lockd in `9a1b6bf818` ("LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs"), so this patch attempts something similar for 9pfs. Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2013-08-26 10:28:46 -05:00
Michael Marineau	e0d6cb9cd3	9p: send uevent after adding/removing mount_tag attribute This driver adds an attribute to the existing virtio device so a CHANGE event is required in order udev rules to make use of it. The ADD event happens before this driver is probed and unlike a more typical driver like a block device there isn't a higher level device to watch for. Signed-off-by: Michael Marineau <michael.marineau@coreos.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2013-08-26 10:28:22 -05:00
Hannes Frederic Sowa	9c9c9ad5fa	ipv6: set skb->protocol on tcp, raw and ip6_append_data genereated skbs Currently we don't initialize skb->protocol when transmitting data via tcp, raw(with and without inclhdr) or udp+ufo or appending data directly to the socket transmit queue (via ip6_append_data). This needs to be done so that we can get the correct mtu in the xfrm layer. Setting of skb->protocol happens only in functions where we also have a transmitting socket and a new skb, so we don't overwrite old values. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-26 12:46:24 +02:00
Hannes Frederic Sowa	5a25cf1e31	xfrm: revert ipv4 mtu determination to dst_mtu In commit `0ea9d5e3e0` ("xfrm: introduce helper for safe determination of mtu") I switched the determination of ipv4 mtus from dst_mtu to ip_skb_dst_mtu. This was an error because in case of IP_PMTUDISC_PROBE we fall back to the interface mtu, which is never correct for ipv4 ipsec. This patch partly reverts `0ea9d5e3e0` ("xfrm: introduce helper for safe determination of mtu"). Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-26 12:40:53 +02:00
Johannes Berg	a986553877	mac80211: fix change_interface queue assignments Jouni reported that with mac80211_hwsim, multicast TX was causing crashes due to invalid vif->cab_queue assignment. It turns out that this is caused by change_interface() getting invoked and not having the vif->type/vif->p2p assigned correctly before calling the queue check (ieee80211_check_queues). Fix this by passing the 'external' interface type to the function and adjusting it accordingly. While at it, also fix the error path in change_interface, it wasn't correctly resetting to the external type but using the internal one instead. Fortunately this affects on hwsim because all other drivers set the vif->type/vif->p2p variables when changing iftype. This shouldn't be needed, but almost all implementations actually do it for their own internal handling. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-26 09:52:58 +02:00
Dan Carpenter	b4de77ade3	ipip: potential race in ip_tunnel_init_net() Eric Dumazet says that my previous fix for an ERR_PTR dereference (`ea857f28ab` 'ipip: dereferencing an ERR_PTR in ip_tunnel_init_net()') could be racy and suggests the following fix instead. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-25 18:39:59 -04:00
Andy Zhou	03f0d916aa	openvswitch: Mega flow implementation Add wildcarded flow support in kernel datapath. Wildcarded flow can improve OVS flow set up performance by avoid sending matching new flows to the user space program. The exact performance boost will largely dependent on wildcarded flow hit rate. In case all new flows hits wildcard flows, the flow set up rate is within 5% of that of linux bridge module. Pravin has made significant contributions to this patch. Including API clean ups and bug fixes. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:43:07 -07:00
Cong Wang	3fa34de678	openvswitch: check CONFIG_OPENVSWITCH_GRE in makefile Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:43:07 -07:00
Justin Pettit	2694838d60	openvswitch: Fix argument descriptions in vport.c. Signed-off-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:38:00 -07:00
Jiri Pirko	2537b4dd0a	openvswitch:: link upper device for port devices Link upper device properly. That will make IFLA_MASTER filled up. Set the master to port 0 of the datapath under which the port belongs. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:38:00 -07:00
Pravin B Shelar	76a66c7e7f	openvswitch: Use non rcu hlist_del() flow table entry. Flow table destroy is done in rcu call-back context. Therefore there is no need to use rcu variant of hlist_del(). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:38:00 -07:00
Pravin B Shelar	59a35d60af	openvswitch: Use RCU lock for dp dump operation. RCUfy dp-dump operation which is already read-only. This makes all ovs dump operations lockless. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:37:59 -07:00
Pravin B Shelar	d57170b1b1	openvswitch: Use RCU lock for flow dump operation. Flow dump operation is read-only operation. There is no need to take ovs-lock. Following patch use rcu-lock for dumping flows. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:37:59 -07:00
John W. Linville	81ca2ff945	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2013-08-23 11:47:48 -04:00
Johannes Berg	d70b7616d9	mac80211: ignore (E)CSA in probe response frames Seth reports that some APs, notably the Netgear WNDAP360, send invalid ECSA IEs in probe response frames with the operating class and channel number both set to zero, even when no channel switch is being done. As a result, any scan while connected to such an AP results in the connection being dropped. Fix this by ignoring any channel switch announcment in probe response frames entirely, since we're connected to the AP we will be receiving a beacon (and maybe even an action frame) if a channel switch is done, which is sufficient. Cc: stable@vger.kernel.org # 3.10 Reported-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 17:05:12 +02:00
Vladimir Kondratiev	19504cf5f3	cfg80211: add flags to cfg80211_rx_mgmt() Add flags intended to report various auxiliary information and introduce the NL80211_RXMGMT_FLAG_ANSWERED flag to report that the frame was already answered by the device. Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com> [REPLIED->ANSWERED, reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 16:06:03 +02:00
Bob Copeland	c4c205f3cd	mac80211: assign seqnums for group QoS frames According to 802.11-2012 9.3.2.10, paragraph 4, QoS data frames with a group address in the Address 1 field have sequence numbers allocated from the same counter as non-QoS data and management frames. Without this flag, some drivers may not assign sequence numbers, and in rare cases frames might get dropped. Set the control flag accordingly. Signed-off-by: Bob Copeland <bob@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 15:43:38 +02:00
Chun-Yeow Yeoh	a4ef66a915	mac80211: only respond to probe request with mesh ID Previously, the mesh STA responds to probe request from legacy STA but now it will only respond to legacy STA if the legacy STA does include the specific mesh ID or wildcard mesh ID in the probe request. The iw patch "iw: scan using meshid" can be used either by legacy STA or by mesh STA to do active scanning by inserting the mesh ID in the probe request frame. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com> Acked-by: Thomas Pedersen <thomas@cozybit.com> Acked-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 15:25:06 +02:00
Johannes Berg	1fb9026000	mac80211: move setting WIPHY_FLAG_SUPPORTS_SCHED_SCAN into drivers mac80211 currently sets WIPHY_FLAG_SUPPORTS_SCHED_SCAN based on whether the start_sched_scan operation is supported or not, but that will not be correct for all drivers, we're adding scheduled scan to the iwlmvm driver but it depends on firmware support. Therefore, move setting WIPHY_FLAG_SUPPORTS_SCHED_SCAN into the drivers so that they can control it regardless of implementing the operation. This currently only affects the TI drivers since they're the only ones implementing scheduled scan (in a mac80211 driver.) Acked-by: Luciano Coelho <luca@coelho.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 12:02:26 +02:00
Daniel Borkmann	05f147ef7c	net: sctp_probe: simplify code by using %pISc format specifier We can simply use the %pISc format specifier that was recently added and thus remove some code that distinguishes between IPv4 and IPv6. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 22:07:06 -07:00
Duan Jiong	c92a59eca8	ipv6: handle Redirect ICMP Message with no Redirected Header option rfc 4861 says the Redirected Header option is optional, so the kernel should not drop the Redirect Message that has no Redirected Header option. In this patch, the function ip6_redirect_no_header() is introduced to deal with that condition. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>	2013-08-22 20:08:21 -07:00
Daniel Borkmann	f925d0a62d	net: tcp_probe: add IPv6 support The tcp_probe currently only supports analysis of IPv4 connections. Therefore, it would be nice to have IPv6 supported as well. Since we have the recently added %pISpc specifier that is IPv4/IPv6 generic, build related sockaddress structures from the flow information and pass this to our format string. Tested with SSH and HTTP sessions on IPv4 and IPv6. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:19:50 -07:00
Daniel Borkmann	d8cdeda6dd	net: tcp_probe: kprobes: adapt jtcp_rcv_established signature This patches fixes a rather unproblematic function signature mismatch as the const specifier was missing for the th variable; and next to that it adds a build-time assertion so that future function signature mismatches for kprobes will not end badly, similarly as commit `22222997` ("net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack") did it for SCTP. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:19:50 -07:00
Daniel Borkmann	b4c1c1d038	net: tcp_probe: also include rcv_wnd next to snd_wnd It is helpful to sometimes know the TCP window sizes of an established socket e.g. to confirm that window scaling is working or to tweak the window size to improve high-latency connections, etc etc. Currently the TCP snooper only exports the send window size, but not the receive window size. Therefore, also add the receive window size to the end of the output line. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:19:50 -07:00
David S. Miller	baf3b3f227	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Some constifications, from Mathias Krause. 2) Catch bugs if a hold timer is still active when xfrm_policy_destroy() is called, from Fan Du. 3) Remove a redundant address family checking, from Fan Du. 4) Make xfrm_state timer monotonic to be independent of system clock changes, from Fan Du. 5) Remove an outdated comment on returning -EREMOTE in the xfrm_lookup(), from Rami Rosen. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:04:41 -07:00
Yuchung Cheng	0f7cc9a3c2	tcp: increase throughput when reordering is high The stack currently detects reordering and avoid spurious retransmission very well. However the throughput is sub-optimal under high reordering because cwnd is increased only if the data is deliverd in order. I.e., FLAG_DATA_ACKED check in tcp_ack(). The more packet are reordered the worse the throughput is. Therefore when reordering is proven high, cwnd should advance whenever the data is delivered regardless of its ordering. If reordering is low, conservatively advance cwnd only on ordered deliveries in Open state, and retain cwnd in Disordered state (RFC5681). Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms to 55ms (for reordering effect). This change increases TCP throughput by 20 - 25% to near bottleneck BW. A special case is the stretched ACK with new SACK and/or ECE mark. For example, a receiver may receive an out of order or ECN packet with unacked data buffered because of LRO or delayed ACK. The principle on such an ACK is to advance cwnd on the cummulative acked part first, then reduce cwnd in tcp_fastretrans_alert(). Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 14:39:46 -07:00
Johannes Berg	9d47b38056	Revert "genetlink: fix family dump race" This reverts commit `58ad436fcf`. It turns out that the change introduced a potential deadlock by causing a locking dependency with netlink's cb_mutex. I can't seem to find a way to resolve this without doing major changes to the locking, so revert this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 13:24:02 -07:00
John W. Linville	69b307a48a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2013-08-22 14:27:31 -04:00
John W. Linville	89b5f74a26	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2013-08-22 11:35:22 -04:00
Johannes Berg	e133fae263	mac80211: minstrel_ht: don't use control.flags in TX status path Sujith reports that my commit `af61a16518` ("mac80211: add control port protocol TX control flag") broke ath9k (aggregation). The reason is that I made minstrel_ht use the flag in the TX status path, where it can have been overwritten by the driver. Since we have no more space in info->flags, revert that part of the change for now, until we can reshuffle the flags or so. Reported-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-22 08:37:08 +02:00
Frédéric Dalleau	2dea632f9a	Bluetooth: Add SCO connection fallback When initiating a transparent eSCO connection, make use of T2 settings at first try. T2 is the recommended settings from HFP 1.6 WideBand Speech. Upon connection failure, try T1 settings. When CVSD is requested and eSCO is supported, try to establish eSCO connection using S3 settings. If it fails, fallback in sequence to S2, S1, D1, D0 settings. To know which setting should be used, conn->attempt is used. It indicates the currently ongoing SCO connection attempt and can be used as the index for the fallback settings table. These setting and the fallback order are described in Bluetooth HFP 1.6 specification p. 101. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:13 +02:00
Frédéric Dalleau	1a4c958cf9	Bluetooth: Handle specific error for SCO connection fallback Synchronous Connection Complete event can return error "Connection Rejected due to Limited resources (0x10)". Handling this error is required for SCO connection fallback. This error happens when the server tried to accept the connection but failed to negotiate settings. This error code has been verified experimentally by sending a T2 request to a T1 only SCO listener. Client dump follows : < HCI Command (0x01\|0x0028) plen 17 [hci0] 3.696064 Handle: 12 Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Retransmission effort: Optimize for link quality (0x02) Packet type: 0x0380 > HCI Event (0x0f) plen 4 [hci0] 3.697034 Setup Synchronous Connection (0x01\|0x0028) ncmd 1 Status: Success (0x00) > HCI Event (0x2c) plen 17 [hci0] 3.736059 Status: Connection Rejected due to Limited Resources (0x0d) Handle: 0 Address: xx:xx:xx:xx:xx:AB (OUI 70-F3-95) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x06 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) Server dump follows : > HCI Event (0x04) plen 10 [hci0] 4.741513 Address: xx:xx:xx:xx:xx:D9 (OUI 20-68-9D) Class: 0x620100 Major class: Computer (desktop, notebook, PDA, organizers) Minor class: Uncategorized, code for device not assigned Networking (LAN, Ad hoc) Audio (Speaker, Microphone, Headset) Telephony (Cordless telephony, Modem, Headset) Link type: eSCO (0x02) < HCI Command (0x01\|0x0029) plen 21 [hci0] 4.743269 Address: xx:xx:xx:xx:xx:D9 (OUI 20-68-9D) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Retransmission effort: Optimize for link quality (0x02) Packet type: 0x03c1 > HCI Event (0x0f) plen 4 [hci0] 4.745517 Accept Synchronous Connection (0x01\|0x0029) ncmd 1 Status: Success (0x00) > HCI Event (0x2c) plen 17 [hci0] 4.749508 Status: Connection Rejected due to Limited Resources (0x0d) Handle: 0 Address: xx:xx:xx:xx:xx:D9 (OUI 20-68-9D) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x06 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:13 +02:00
Frédéric Dalleau	79dc0087c3	Bluetooth: Prevent transparent SCO on older devices Older Bluetooth devices may not support Setup Synchronous Connection or SCO transparent data. This is indicated by the corresponding LMP feature bits. It is not possible to know if the adapter support these features before setting BT_VOICE option since the socket is not bound to an adapter. An adapter can also be added after the socket is created. The socket can be bound to an address before adapter is plugged in. Thus, on a such adapters, if user request BT_VOICE_TRANSPARENT, outgoing connections fail on connect() and returns -EOPNOTSUPP. Incoming connections do not fail. However, they should only be allowed depending on what was specified in Write_Voice_Settings command. EOPNOTSUPP is choosen because connect() system call is failing after selecting route but before any connection attempt. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:12 +02:00
Frédéric Dalleau	10c62ddc6f	Bluetooth: Parameters for outgoing SCO connections In order to establish a transparent SCO connection, the correct settings must be specified in the Setup Synchronous Connection request. For that, a setting field is added to ACL connection data to set up the desired parameters. The patch also removes usage of hdev->voice_setting in CVSD connection and makes use of T2 parameters for transparent data. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:11 +02:00
Frédéric Dalleau	2f69a82acf	Bluetooth: Use voice setting in deferred SCO connection request When an incoming eSCO connection is requested, check the selected voice setting and reply appropriately. Voice setting should have been negotiated previously. For example, in case of HFP, the codec is negotiated using AT commands on the RFCOMM channel. This patch only changes replies for socket with deferred setup enabled. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:11 +02:00
Frédéric Dalleau	ad10b1a487	Bluetooth: Add Bluetooth socket voice option This patch extends the current Bluetooth socket options with BT_VOICE. This is intended to choose voice data type at runtime. It only applies to SCO sockets. Incoming connections shall be setup during deferred setup. Outgoing connections shall be setup before connect(). The desired setting is stored in the SCO socket info. This patch declares needed members, modifies getsockopt() and setsockopt(). Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:09 +02:00
Frédéric Dalleau	33f2404823	Bluetooth: Remove unused mask parameter in sco_conn_defer_accept From Bluetooth Core v4.0 specification, 7.1.8 Accept Connection Request Command "When accepting synchronous connection request, the Role parameter is not used and will be ignored by the BR/EDR Controller." Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:09 +02:00
Frédéric Dalleau	e660ed6c70	Bluetooth: Use hci_connect_sco directly hci_connect is a super function for connecting hci protocols. But the voice_setting parameter (introduced in subsequent patches) is only needed by SCO and security requirements are not needed for SCO channels. Thus, it makes sense to have a separate function for SCO. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:08 +02:00
Gianluca Anzolin	ffe6b68cc5	Bluetooth: Purge the dlc->tx_queue to avoid circular dependency In rfcomm_tty_cleanup we purge the dlc->tx_queue which may contain socket buffers referencing the tty_port and thus preventing the tty_port destruction. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:08 +02:00
Gianluca Anzolin	ece3150dea	Bluetooth: Fix the reference counting of tty_port The tty_port can be released in two cases: when we get a HUP in the functions rfcomm_tty_hangup() and rfcomm_dev_state_change(). Or when the user releases the device in rfcomm_release_dev(). In these cases we set the flag RFCOMM_TTY_RELEASED so that no other function can get a reference to the tty_port. The use of !test_and_set_bit(RFCOMM_TTY_RELEASED) ensures that the 'initial' tty_port reference is only dropped once. The rfcomm_dev_del function is removed becase it isn't used anymore. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:07 +02:00
Gianluca Anzolin	cad348a17e	Bluetooth: Implement .activate, .shutdown and .carrier_raised methods Implement .activate, .shutdown and .carrier_raised methods of tty_port to manage the dlc, moving the code from rfcomm_tty_install() and rfcomm_tty_cleanup() functions. At the same time the tty .open()/.close() and .hangup() methods are changed to use the tty_port helpers that properly call the aforementioned tty_port methods. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:07 +02:00
Gianluca Anzolin	54b926a143	Bluetooth: Move the tty initialization and cleanup out of open/close Move the tty_struct initialization from rfcomm_tty_open() to rfcomm_tty_install() and do the same for the cleanup moving the code from rfcomm_tty_close() to rfcomm_tty_cleanup(). Add also extra error handling in rfcomm_tty_install() because, unlike .open()/.close(), .cleanup() is not called if .install() fails. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:06 +02:00
Gianluca Anzolin	ebe937f74b	Bluetooth: Remove the device from the list in the destructor The current code removes the device from the device list in several places. Do it only in the destructor instead and in the error path of rfcomm_add_dev() if the device couldn't be initialized. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:06 +02:00
Gianluca Anzolin	396dc223dd	Bluetooth: Take proper tty_struct references In net/bluetooth/rfcomm/tty.c the struct tty_struct is used without taking references. This may lead to a use-after-free of the rfcomm tty. Fix this by taking references properly, using the tty_port_* helpers when possible. The raw assignments of dev->port.tty in rfcomm_tty_open/close are addressed in the later commit 'rfcomm: Implement .activate, .shutdown and .carrier_raised methods'. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:05 +02:00
Marcel Holtmann	c7882cbd11	Bluetooth: Set different event mask for LE-only controllers In case of a Low Energy only controller it makes no sense to configure the full BR/EDR event mask. It will just enable events that can not be send anyway and there is no guarantee that such a controller will accept this value. Use event mask 0x90 0xe8 0x04 0x02 0x00 0x80 0x00 0x20 for LE-only controllers which enables the following events: Disconnection Complete Encryption Change Read Remote Version Information Complete Command Complete Command Status Hardware Error Number of Completed Packets Data Buffer Overflow Encryption Key Refresh Complete LE Meta This is according to Core Specification, Part E, Section 3. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:05 +02:00
Johan Hedberg	9d225d2208	Bluetooth: Fix getting SCO socket options in deferred state When a socket is in deferred state there does actually exist an underlying connection even though the connection state is not yet BT_CONNECTED. In the deferred state it should therefore be allowed to get socket options that usually depend on a connection, such as SCO_OPTIONS and SCO_CONNINFO. This patch fixes the behavior of some user space code that behaves as follows without it: $ sudo tools/btiotest -i 00:1B:DC:xx:xx:xx -d -s accept=2 reject=-1 discon=-1 defer=1 sec=0 update_sec=0 prio=0 voice=0x0000 Listening for SCO connections bt_io_get(OPT_DEST): getsockopt(SCO_OPTIONS): Transport endpoint is not connected (107) Accepting connection Successfully connected to 60:D8:19:xx:xx:xx. handle=43, class=000000 The conditions that the patch updates the if-statements to is taken from similar code in l2cap_sock.c which correctly handles the deferred state. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:04 +02:00
Simon Wunderlich	75a423f493	mac80211: ibss: fix ignored channel parameter my earlier patch "mac80211: change IBSS channel state to chandef" created a regression by ignoring the channel parameter in __ieee80211_sta_join_ibss, which breaks IBSS channel selection. This patch fixes this situation by using the right channel and adopting the selected bandwidth mode. Cc: stable@vger.kernel.org Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-21 15:33:08 +02:00
Felix Fietkau	2dfca312a9	mac80211: add a flag to indicate CCK support for HT clients brcm80211 cannot handle sending frames with CCK rates as part of an A-MPDU session. Other drivers may have issues too. Set the flag in all drivers that have been tested with CCK rates. This fixes a reported brcmsmac regression introduced in commit ef47a5e4f1aaf1d0e2e6875e34b2c9595897bef6 "mac80211/minstrel_ht: fix cck rate sampling" Cc: stable@vger.kernel.org # 3.10 Reported-by: Tom Gundersen <teg@jklm.no> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-21 15:03:25 +02:00
Johannes Berg	2a3ba63c23	mac80211: add missing channel context release IBSS needs to release the channel context when leaving but I evidently missed that. Fix it. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-21 12:04:48 +02:00
Daniel Borkmann	9fd0784164	net: ipv6: mcast: minor: use defines for rfc3810/8.1 lengths Instead of hard-coding length values, use a define to make it clear where those lengths come from. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:52:02 -07:00
Daniel Borkmann	c2cef4e888	net: ipv6: minor: *_start_timer: rather use unsigned long For the functions mld_gq_start_timer(), mld_ifc_start_timer(), and mld_dad_start_timer(), rather use unsigned long than int as we operate only on unsigned values anyway. This seems more appropriate as there is no good reason to do type conversions to int, that could lead to future errors. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:52:02 -07:00
Daniel Borkmann	846989635b	net: ipv6: igmp6_event_query: use msecs_to_jiffies Use proper API functions to calculate jiffies from milliseconds and not the crude method of dividing HZ by a value. This ensures more accurate values even in the case of strange HZ values. While at it, also simplify code in the mlh2 case by using max(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:52:02 -07:00
Nicolas Dichtel	e837735ec4	ip6_tunnel: ensure to always have a link local address When an Xin6 tunnel is set up, we check other netdevices to inherit the link- local address. If none is available, the interface will not have any link-local address. RFC4862 expects that each interface has a link local address. Now than this kind of tunnels supports x-netns, it's easy to fall in this case (by creating the tunnel in a netns where ethernet interfaces stand and then moving it to a other netns where no ethernet interface is available). RFC4291, Appendix A suggests two methods: the first is the one currently implemented, the second is to generate a unique identifier, so that we can always generate the link-local address. Let's use eth_random_addr() to generate this interface indentifier. I remove completly the previous method, hence for the whole life of the interface, the link-local address remains the same (previously, it depends on which ethernet interfaces were up when the tunnel interface was set up). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:45:42 -07:00
David S. Miller	7eaa48a45c	Revert "ipv6: fix checkpatch errors in net/ipv6/addrconf.c" This reverts commit `df8372ca74`. These changes are buggy and make unintended semantic changes to ip6_tnl_add_linklocal(). Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:44:39 -07:00
Toshiaki Makita	ef40b7ef18	bridge: Use the correct bit length for bitmap functions in the VLAN code The VLAN code needs to know the length of the per-port VLAN bitmap to perform its most basic operations (retrieving VLAN informations, removing VLANs, forwarding database manipulation, etc). Unfortunately, in the current implementation we are using a macro that indicates the bitmap size in longs in places where the size in bits is expected, which in some cases can cause what appear to be random failures. Use the correct macro. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:35:57 -07:00
David S. Miller	5c751c9344	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Regarding the iwlwifi bits, Johannes says: "We revert an rfkill bugfix that unfortunately caused more bugs, shuffle some code to avoid touching the PCIe device before it's enabled and disconnect if firmware fails to do our bidding. I also have Stanislaw's fix to not crash in some channel switch scenarios." As for the mac80211 bits, Johannes says: "This time, I have one fix from Dan Carpenter for users of nl80211hdr_put(), and one fix from myself fixing a regression with the libertas driver." Along with the above... Dan Carpenter fixes some incorrectly placed "address of" operators in hostap that caused copying of junk data. Jussi Kivilinna corrects zd1201 to use an allocated buffer rather than the stack for a URB operation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 17:25:55 -07:00
Willem de Bruijn	8bcdeaff5e	packet: restore packet statistics tp_packets to include drops getsockopt PACKET_STATISTICS returns tp_packets + tp_drops. Commit `ee80fbf301` ("packet: account statistics only in tpacket_stats_u") cleaned up the getsockopt PACKET_STATISTICS code. This also changed semantics. Historically, tp_packets included tp_drops on return. The commit removed the line that adds tp_drops into tp_packets. This patch reinstates the old semantics. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 17:23:58 -07:00
David S. Miller	cc666c53cc	Included change: - Check if the skb has been correctly prepared before going on -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iQIcBAABCAAGBQJSEkvHAAoJEADl0hg6qKeOyTQP/ifIXk5t26Tu8GTCH+lQnF36 1HY4nkEhLBkrEaKv0RXXEwLCDe1Gk8INewSXhtgDe7v696287zvxSDiftxXOwSn8 EkrP3jakxqNgyEstVUMxXuHQMxn8YsOnU+u4L4MZvcsWNmh1V8FzNLxPDWF3Z0bi ycXhFI+BR+waWzFd8rVZ5sJ00ZhgSuM5vJ/uQ28kT8DyDZXz0I0mvve7ZUh5fczc L7vvnju9VRq84RxV6bQwf9hXDk54fCLz22WSMolrqaHCl0XF4OAu6OVcYBLA0bp7 GUU7fS8IUiqAuC02FS5HEYPy1VErCok8hP/fzvjz8Bxuzz0I5SdYPurFTZQzAx73 U0GCLtNOE7zkwIsRbKhMdUcB6DoFZJVUaLo8YS9E1tl/nn1oRILFGTFb2T9/WzQ8 lbCkmYm+WhLdKeZbLkf8PPs9PDrhRQKr/QRHrCHVKh4rqzP1BUm4FqIBXo71Kiiz no4GJoern8vW0CzoR59P5++/iFOCVTIx4ZJWvnYjWbqsYRazKjjCFtHffpz6mz+1 pHlrdYAZo+DOvme/2putfe6ViR+bA3lPxPkM7k3gADMifJcCAl3D7OM53QaDSKAk Gw3yHafxBaFPXFdqxQkkC6ks7T6qoTsPI2lLqUG6srU3XA399bWVcLq7X9JEcR+9 ODwzHPj9fD5CZCe22lIO =gllO -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included change: - Check if the skb has been correctly prepared before going on	2013-08-20 16:54:29 -07:00
Dan Carpenter	ea857f28ab	ipip: dereferencing an ERR_PTR in ip_tunnel_init_net() We need to move the derefernce after the IS_ERR() check. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:12:15 -07:00
Eric Dumazet	734d2725db	ipv4: raise IP_MAX_MTU to theoretical limit As discussed last year [1], there is no compelling reason to limit IPv4 MTU to 0xFFF0, while real limit is 0xFFFF [1] : http://marc.info/?l=linux-netdev&m=135607247609434&w=2 Willem raised this issue again because some of our internal regression tests broke after lo mtu being set to 65536. IP_MTU reports 0xFFF0, and the test attempts to send a RAW datagram of mtu + 1 bytes, expecting the send() to fail, but it does not. Alexey raised interesting points about TCP MSS, that should be addressed in follow-up patches in TCP stack if needed, as someone could also set an odd mtu anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:05:04 -07:00
Christoph Paasch	397b417463	tcp: trivial: Remove nocache argument from tcp_v4_send_synack The nocache-argument was used in tcp_v4_send_synack as an argument to inet_csk_route_req. However, since `ba3f7f04ef` (ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code.) this is no more used. This patch removes the unsued argument from tcp_v4_send_synack. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:05:04 -07:00
dingtianhong	df8372ca74	ipv6: fix checkpatch errors in net/ipv6/addrconf.c ERROR: code indent should use tabs where possible: fix 2. ERROR: do not use assignment in if condition: fix 5. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:05:03 -07:00
dingtianhong	ba3542e15c	ipv6: convert the uses of ADBG and remove the superfluous parentheses Just follow the Joe Perches's opinion, it is a better way to fix the style errors. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:05:03 -07:00
David S. Miller	89d5e23210	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_conntrack_proto_tcp.c The conflict had to do with overlapping changes dealing with fixing the use of an "s32" to hold the value returned by NAT_OFFSET(). Pablo Neira Ayuso says: ==================== The following batch contains Netfilter/IPVS updates for your net-next tree. More specifically, they are: * Trivial typo fix in xt_addrtype, from Phil Oester. * Remove net_ratelimit in the conntrack logging for consistency with other logging subsystem, from Patrick McHardy. * Remove unneeded includes from the recently added xt_connlabel support, from Florian Westphal. * Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for this, from Florian Westphal. * Remove tproxy core, now that we have socket early demux, from Florian Westphal. * A couple of patches to refactor conntrack event reporting to save a good bunch of lines, from Florian Westphal. * Fix missing locking in NAT sequence adjustment, it did not manifested in any known bug so far, from Patrick McHardy. * Change sequence number adjustment variable to 32 bits, to delay the possible early overflow in long standing connections, also from Patrick. * Comestic cleanups for IPVS, from Dragos Foianu. * Fix possible null dereference in IPVS in the SH scheduler, from Daniel Borkmann. * Allow to attach conntrack expectations via nfqueue. Before this patch, you had to use ctnetlink instead, thus, we save the conntrack lookup. * Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:30:54 -07:00
Alexander Aring	65d892c8ac	6lowpan: handle context based source address Handle context based address when an unspecified address is given. For other context based address we print a warning and drop the packet because we don't support it right now. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:12 -07:00
Alexander Aring	ce2463b283	6lowpan: lowpan_uncompress_addr with address_mode This patch drops the pre and postcount calculation from the lowpan_uncompress_addr function.We use instead a switch/case over address_mode value. The original implementation has several bugs in this function and it was hard to decrypt how it works. To make it maintainable and fix these bugs this patch basically reimplements lowpan_uncompress_addr from scratch. A list of bugs we found in the current implementation: 1) Properly support uncompression of short-address based IPv6 addresses (instead of basically copying garbage) 2) Fix use and uncompression of long-addresses based IPv6 addresses 3) Add missing ff:fe00 in the case of SAM/DAM = 2 and M = 0 Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:12 -07:00
Alexander Aring	84c2e7bcf5	6lowpan: add function to uncompress multicast addr Add function to uncompress multicast address. This function split the uncompress function for a multicast address in a seperate function. To uncompress a multicast address is different than a other non-multicasts addresses according to rfc6282. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:12 -07:00
Alexander Aring	4666669fc3	6lowpan: introduce lowpan_fetch_skb function This patch adds a helper function to parse the ipv6 header to a 6lowpan header in stream. This function checks first if we can pull data with a specific length from a skb. If this seems to be okay, we copy skb data to a destination pointer and run skb_pull. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:11 -07:00
David Hauweele	31afe1f73e	6lowpan: Fix fragmentation with link-local compressed addresses When a new 6lowpan fragment is received, a skbuff is allocated for the reassembled packet. However when a 6lowpan packet compresses link-local addresses based on link-layer addresses, the processing function relies on the skb mac control block to find the related link-layer address. This patch copies the control block from the first fragment into the newly allocated skb to keep a trace of the link-layer addresses in case of a link-local compressed address. Edit: small changes on comment issue Signed-off-by: David Hauweele <david@hauweele.net> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:11 -07:00
Alexander Aring	84ce1ddfef	6lowpan: init ipv6hdr buffer to zero This patch simplify the handling to set fields inside of struct ipv6hdr to zero. Instead of setting some memory regions with memset to zero we initialize the whole ipv6hdr to zero. This is a simplification for parsing the 6lowpan header for the upcomming patches. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:11 -07:00
Andrey Vagin	7ed5c5ae96	tcp: set timestamps for restored skb-s When the repair mode is turned off, the write queue seqs are updated so that the whole queue is considered to be 'already sent. The "when" field must be set for such skb. It's used in tcp_rearm_rto for example. If the "when" field isn't set, the retransmit timeout can be calculated incorrectly and a tcp connected can stop for two minutes (TCP_RTO_MAX). Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:07:15 -07:00
Joe Perches	8be04b9374	treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks Don't emit OOM warnings when k.alloc calls fail when there there is a v.alloc immediately afterwards. Converted a kmalloc/vmalloc with memset to kzalloc/vzalloc. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-08-20 13:06:40 +02:00
Pravin B Shelar	58264848a5	openvswitch: Add vxlan tunneling support. Following patch adds vxlan vport type for openvswitch using vxlan api. So now there is vxlan dependency for openvswitch. CC: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 00:15:44 -07:00
Hannes Frederic Sowa	f46078cfcd	ipv6: drop packets with multiple fragmentation headers It is not allowed for an ipv6 packet to contain multiple fragmentation headers. So discard packets which were already reassembled by fragmentation logic and send back a parameter problem icmp. The updates for RFC 6980 will come in later, I have to do a bit more research here. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 00:11:24 -07:00
Hannes Frederic Sowa	4b08a8f1bd	ipv6: remove max_addresses check from ipv6_create_tempaddr Because of the max_addresses check attackers were able to disable privacy extensions on an interface by creating enough autoconfigured addresses: <http://seclists.org/oss-sec/2012/q4/292> But the check is not actually needed: max_addresses protects the kernel to install too many ipv6 addresses on an interface and guards addrconf_prefix_rcv to install further addresses as soon as this limit is reached. We only generate temporary addresses in direct response of a new address showing up. As soon as we filled up the maximum number of addresses of an interface, we stop installing more addresses and thus also stop generating more temp addresses. Even if the attacker tries to generate a lot of temporary addresses by announcing a prefix and removing it again (lifetime == 0) we won't install more temp addresses, because the temporary addresses do count to the maximum number of addresses, thus we would stop installing new autoconfigured addresses when the limit is reached. This patch fixes CVE-2013-0343 (but other layer-2 attacks are still possible). Thanks to Ding Tianhong to bring this topic up again. Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: George Kargiotakis <kargig@void.gr> Cc: P J P <ppandit@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 00:11:24 -07:00
John W. Linville	22f0d2d1e7	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2013-08-19 14:24:45 -04:00
Rami Rosen	e3fec5a1c5	xfrm: remove irrelevant comment in xfrm_input(). This patch removes a comment in xfrm_input() which became irrelevant due to commit `2774c13`, "xfrm: Handle blackhole route creation via afinfo". That commit removed returning -EREMOTE in the xfrm_lookup() method when the packet should be discarded and also removed the correspoinding -EREMOTE handlers. This was replaced by calling the make_blackhole() method. Therefore the comment about -EREMOTE is not relevant anymore. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-19 12:45:16 +02:00
Hannes Frederic Sowa	844d48746e	xfrm: choose protocol family by skb protocol We need to choose the protocol family by skb->protocol. Otherwise we call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is used in ipv4 mode, in which case we should call down to xfrm4_local_error (ip6 sockets are a superset of ip4 ones). We are called before before ip_output functions, so skb->protocol is not reset. Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-19 09:39:04 +02:00
Hannes Frederic Sowa	5d0ff542d0	ipv6: xfrm: dereference inner ipv6 header if encapsulated In xfrm6_local_error use inner_header if the packet was encapsulated. Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-19 09:38:25 +02:00
Hannes Frederic Sowa	3d483058c8	ipv6: wire up skb->encapsulation When pushing a new header before current one call skb_reset_inner_headers to record the position of the inner headers in the various ipv6 tunnel protocols. We later need this to correctly identify the addresses needed to send back an error in the xfrm layer. This change is safe, because skb->protocol is always checked before dereferencing data from the inner protocol. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-19 09:37:46 +02:00
Linus Lüssing	50fa3b31f4	batman-adv: check return type of unicast packet preparations batadv_unicast(_4addr)_prepare_skb might reallocate the skb's data. And if it tries to do so then this can potentially fail. We shouldn't continue working on this skb in such a case. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-08-17 20:02:32 +02:00
David S. Miller	2ff1cf12c9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2013-08-16 15:37:26 -07:00
John W. Linville	d074666366	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2013-08-16 14:24:51 -04:00
Linus Torvalds	ddea368c78	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix SKB leak in 8139cp, from Dave Jones. 2) Fix use of _PAGES interfaces with mlx5 firmware, from Moshe Lazar. 3) RCU conversion of macvtap introduced two races, fixes by Eric Dumazet 4) Synchronize statistic flows in bnx2x driver to prevent corruption, from Dmitry Kravkov 5) Undo optimization in IP tunneling, we were using the inner IP header in some cases to inherit the IP ID, but that isn't correct in some circumstances. From Pravin B Shelar 6) Use correct struct size when parsing netlink attributes in rtnl_bridge_getlink(). From Asbjoern Sloth Toennesen 7) Length verifications in tun_get_user() are bogus, from Weiping Pan and Dan Carpenter 8) Fix bad merge resolution during 3.11 networking development in openvswitch, albeit a harmless one which added some unreachable code. From Jesse Gross 9) Wrong size used in flexible array allocation in openvswitch, from Pravin B Shelar 10) Clear out firmware capability flags the be2net driver isn't ready to handle yet, from Sarveshwar Bandi 11) Revert DMA mapping error checking addition to cxgb3 driver, it's buggy. From Alexey Kardashevskiy 12) Fix regression in packet scheduler rate limiting when working with a link layer of ATM. From Jesper Dangaard Brouer 13) Fix several errors in TCP Cubic congestion control, in particular overflow errors in timestamp calculations. From Eric Dumazet and Van Jacobson 14) In ipv6 routing lookups, we need to backtrack if subtree traversal don't result in a match. From Hannes Frederic Sowa 15) ipgre_header() returns incorrect packet offset. Fix from Timo Teräs 16) Get "low latency" out of the new MIB counter names. From Eliezer Tamir 17) State check in ndo_dflt_fdb_del() is inverted, from Sridhar Samudrala 18) Handle TCP Fast Open properly in netfilter conntrack, from Yuchung Cheng 19) Wrong memcpy length in pcan_usb driver, from Stephane Grosjean 20) Fix dealock in TIPC, from Wang Weidong and Ding Tianhong 21) call_rcu() call to destroy SCTP transport is done too early and might result in an oops. From Daniel Borkmann 22) Fix races in genetlink family dumps, from Johannes Berg 23) Flags passed into macvlan by the user need to be validated properly, from Michael S Tsirkin 24) Fix skge build on 32-bit, from Stephen Hemminger 25) Handle malformed TCP headers properly in xt_TCPMSS, from Pablo Neira Ayuso 26) Fix handling of stacked vlans in vlan_dev_real_dev(), from Nikolay Aleksandrov 27) Eliminate MTU calculation overflows in esp{4,6}, from Daniel Borkmann 28) neigh_parms need to be setup before calling the ->ndo_neigh_setup() method. From Veaceslav Falico 29) Kill out-of-bounds prefetch in fib_trie, from Eric Dumazet 30) Don't dereference MLD query message if the length isn't value in the bridge multicast code, from Linus Lüssing 31) Fix VXLAN IGMP join regression due to an inverted check, from Cong Wang git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits) net/mlx5_core: Support MANAGE_PAGES and QUERY_PAGES firmware command changes tun: signedness bug in tun_get_user() qlcnic: Fix diagnostic interrupt test for 83xx adapters qlcnic: Fix beacon state return status handling qlcnic: Fix set driver version command net: tg3: fix NULL pointer dereference in tg3_io_error_detected and tg3_io_slot_reset net_sched: restore "linklayer atm" handling drivers/net/ethernet/via/via-velocity.c: update napi implementation Revert "cxgb3: Check and handle the dma mapping errors" be2net: Clear any capability flags that driver is not interested in. openvswitch: Reset tunnel key between input and output. openvswitch: Use correct type while allocating flex array. openvswitch: Fix bad merge resolution. tun: compare with 0 instead of total_len rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header ethernet/arc/arc_emac - fix NAPI "work > weight" warning ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id. bnx2x: prevent crash in shutdown flow with CNIC bnx2x: fix PTE write access error bnx2x: fix memory leak in VF ...	2013-08-16 09:35:29 -07:00
Johannes Berg	27b3eb9c06	mac80211: add APIs to allow keeping connections after WoWLAN In order to be able to (securely) keep connections alive after the system was suspended for WoWLAN, we need some additional APIs. We already have API (ieee80211_gtk_rekey_notify) to tell wpa_supplicant about the new replay counter if GTK rekeying was done by the device while the host was asleep, but that's not sufficient. If GTK rekeying wasn't done, we need to tell the host about sequence counters for the GTK (and PTK regardless of rekeying) that was used while asleep, add ieee80211_set_key_rx_seq() for that. If GTK rekeying was done, then we need to be able to disable the old keys (with ieee80211_remove_key()) and allocate the new GTK key(s) in mac80211 (with ieee80211_gtk_rekey_add()). If protocol offload (e.g. ARP) is implemented, then also the TX sequence counter for the PTK must be updated, using the new ieee80211_set_key_tx_seq() function. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-16 12:58:43 +02:00
Simon Wunderlich	d51b70ff51	mac80211: move ibss presp generation in own function Channel Switch will later require to generate beacons without setting them immediately. Therefore split the presp generation in an own function. Splitting the original very long function might be a good idea anyway. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-16 12:25:34 +02:00
Johan Almbladh	86c228a762	mac80211: perform power save processing before decryption This patch decouples the power save processing from the frame decryption by running the decrypt rx handler after sta_process. In the case where the decryption failed for some reason, the stack used to not process the PM and MOREDATA bits for that frame. The stack now always performs power save processing regardless of the decryption result. That means that encrypted data frames and NULLFUNC frames are now handled in the same way regarding power save processing, making the stack more robust. Signed-off-by: Johan Almbladh <ja@anyfi.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-16 12:19:16 +02:00
Fan Du	99565a6c47	xfrm: Make xfrm_state timer monotonic xfrm_state timer should be independent of system clock change, so switch to CLOCK_BOOTTIME base which is not only monotonic but also counting suspend time. Thus issue reported in commit: `9e0d57fd6d` ("xfrm: SAD entries do not expire correctly after suspend-resume") could ALSO be avoided. v2: Use CLOCK_BOOTTIME to count suspend time, but still monotonic. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-16 06:53:28 +02:00
Pravin B Shelar	16b304f340	netlink: Eliminate kmalloc in netlink dump operation. Following patch stores struct netlink_callback in netlink_sock to avoid allocating and freeing it on every netlink dump msg. Only one dump operation is allowed for a given socket at a time therefore we can safely convert cb pointer to cb struct inside netlink_sock. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 15:51:20 -07:00
Francesco Fusco	d14c5ab6be	net: proc_fs: trivial: print UIDs as unsigned int UIDs are printed in the proc_fs as signed int, whereas they are unsigned int. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 14:37:46 -07:00
John W. Linville	48c3e37135	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2013-08-15 15:36:55 -04:00
Li Wang	ad7a60de88	ceph: punch hole support This patch implements fallocate and punch hole support for Ceph kernel client. Signed-off-by: Li Wang <liwang@ubuntukylin.com> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>	2013-08-15 11:12:17 -07:00
Sage Weil	ee3e542fec	Merge remote-tracking branch 'linus/master' into testing	2013-08-15 11:11:45 -07:00
Luis Henriques	dee08ab83d	net: rfkill: Do not ignore errors from regulator_enable() Function regulator_enable() may return an error that has to be checked. This patch changes function rfkill_regulator_set_block() so that it checks for the return code. Also, rfkill_data->reg_enabled is set to 'true' only if there is no error. This fixes the following compilation warning: net/rfkill/rfkill-regulator.c:43:20: warning: ignoring return value of 'regulator_enable', declared with attribute warn_unused_result [-Wunused-result] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-15 18:17:05 +02:00
Jesper Dangaard Brouer	8a8e3d84b1	net_sched: restore "linklayer atm" handling commit `56b765b79` ("htb: improved accuracy at high rates") broke the "linklayer atm" handling. tc class add ... htb rate X ceil Y linklayer atm The linklayer setting is implemented by modifying the rate table which is send to the kernel. No direct parameter were transferred to the kernel indicating the linklayer setting. The commit `56b765b79` ("htb: improved accuracy at high rates") removed the use of the rate table system. To keep compatible with older iproute2 utils, this patch detects the linklayer by parsing the rate table. It also supports future versions of iproute2 to send this linklayer parameter to the kernel directly. This is done by using the __reserved field in struct tc_ratespec, to convey the choosen linklayer option, but only using the lower 4 bits of this field. Linklayer detection is limited to speeds below 100Mbit/s, because at high rates the rtab is gets too inaccurate, so bad that several fields contain the same values, this resembling the ATM detect. Fields even start to contain "0" time to send, e.g. at 1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have been more broken than we first realized. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:43:08 -07:00
David S. Miller	09a8f03197	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== Three bug fixes that are fairly small either way but resolve obviously incorrect code. For net/3.11. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:41:10 -07:00
Dan Carpenter	fce9b9be89	rtnetlink: remove an unneeded test We know that "dev" is a valid pointer at this point, so we can remove the test and clean up a little. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:29:02 -07:00
Nicolas Dichtel	0bd8762824	ip6tnl: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:00:20 -07:00
Nicolas Dichtel	6c742e714d	ipip: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:00:20 -07:00
Nicolas Dichtel	fc8f999daa	ipv4 tunnels: use net_eq() helper to check netns It's better to use available helpers for these tests. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:00:20 -07:00
Nicolas Dichtel	64261f230a	dev: move skb_scrub_packet() after eth_type_trans() skb_scrub_packet() was called before eth_type_trans() to let eth_type_trans() set pkt_type. In fact, we should force pkt_type to PACKET_HOST, so move the call after eth_type_trans(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:00:20 -07:00
Jesse Gross	36bf5cc66d	openvswitch: Reset tunnel key between input and output. It doesn't make sense to output a tunnel packet using the same parameters that it was received with since that will generally just result in the packet going back to us. As a result, userspace assumes that the tunnel key is cleared when transitioning through the switch. In the majority of cases this doesn't matter since a packet is either going to a tunnel port (in which the key is overwritten with new values) or to a non-tunnel port (in which case the key is ignored). However, it's theoreticaly possible that userspace could rely on the documented behavior, so this corrects it. Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-14 15:50:36 -07:00
Pravin B Shelar	42415c90ce	openvswitch: Use correct type while allocating flex array. Flex array is used to allocate hash buckets which is type struct hlist_head, but we use `struct hlist_head *` to calculate array size. Since hlist_head is of size pointer it works fine. Following patch use correct type. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-14 15:48:17 -07:00
Jesse Gross	30444e981b	openvswitch: Fix bad merge resolution. git silently included an extra hunk in vport_cmd_set() during automatic merging. This code is unreachable so it does not actually introduce a problem but it is clearly incorrect. Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-14 15:48:02 -07:00
Johannes Berg	dee8a9732e	cfg80211: don't request disconnect if not connected Neil Brown reports that with libertas, my recent cfg80211 SME changes in commit `ceca7b7121` ("cfg80211: separate internal SME implementation") broke libertas suspend because it we now asked it to disconnect while already disconnected. The problematic change is in cfg80211_disconnect() as it previously checked the SME state and now calls the driver disconnect operation unconditionally. Fix this by checking if there's a current_bss indicating a connection, and do nothing if not. Reported-and-tested-by: Neil Brown <neilb@suse.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-14 14:00:19 +02:00

... 6 7 8 9 10 ...

29846 Commits