Commit Graph

441190 Commits

Author SHA1 Message Date
Linus Torvalds
dafe344d22 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull bmc2835 crypto fix from Herbert Xu:
 "This fixes a potential boot crash on bcm2835 due to the recent change
  that now causes hardware RNGs to be accessed on registration"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  hwrng: bcm2835 - fix oops when rng h/w is accessed during registration
2014-04-14 16:04:14 -07:00
Mikulas Patocka
e79323bd87 user namespace: fix incorrect memory barriers
smp_read_barrier_depends() can be used if there is data dependency between
the readers - i.e. if the read operation after the barrier uses address
that was obtained from the read operation before the barrier.

In this file, there is only control dependency, no data dependecy, so the
use of smp_read_barrier_depends() is incorrect. The code could fail in the
following way:
* the cpu predicts that idx < entries is true and starts executing the
  body of the for loop
* the cpu fetches map->extent[0].first and map->extent[0].count
* the cpu fetches map->nr_extents
* the cpu verifies that idx < extents is true, so it commits the
  instructions in the body of the for loop

The problem is that in this scenario, the cpu read map->extent[0].first
and map->nr_extents in the wrong order. We need a full read memory barrier
to prevent it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-14 16:03:02 -07:00
David S. Miller
00cbc3dcd1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains three Netfilter fixes for your net tree,
they are:

* Fix missing generation sequence initialization which results in a splat
  if lockdep is enabled, it was introduced in the recent works to improve
  nf_conntrack scalability, from Andrey Vagin.

* Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
  disabled otherwise this crashes due to a double release, from Andrey
  Vagin.

* Fix nf_tables cmp fast in big endian, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 19:00:10 -04:00
Vlad Yasevich
1e785f48d2 net: Start with correct mac_len in skb_network_protocol
Sometimes, when the packet arrives at skb_mac_gso_segment()
its skb->mac_len already accounts for some of the mac lenght
headers in the packet.  This seems to happen when forwarding
through and OpenSSL tunnel.

When we start looking for any vlan headers in skb_network_protocol()
we seem to ignore any of the already known mac headers and start
with an ETH_HLEN.  This results in an incorrect offset, dropped
TSO frames and general slowness of the connection.

We can start counting from the known skb->mac_len
and return at least that much if all mac level headers
are known and accounted for.

Fixes: 53d6471cef (net: Account for all vlan headers in skb_mac_gso_segment)
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Borkman <dborkman@redhat.com>
Tested-by: Martin Filip <nexus+kernel@smoula.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 18:58:58 -04:00
Marcelo Tosatti
b351c39cc9 KVM: x86: remove WARN_ON from get_kernel_ns()
Function and callers can be preempted.

https://bugzilla.kernel.org/show_bug.cgi?id=73721

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-14 17:50:43 -03:00
Feng Wu
66386ade2a KVM: Rename variable smep to cr4_smep
Rename variable smep to cr4_smep, which can better reflect the
meaning of the variable.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:40 -03:00
Feng Wu
de935ae15b KVM: expose SMAP feature to guest
This patch exposes SMAP feature to guest

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:37 -03:00
Feng Wu
e1e746b3c5 KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
SMAP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMAP needs to be
manually disabled when guest switches to non-paging mode.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:35 -03:00
Feng Wu
97ec8c067d KVM: Add SMAP support when setting CR4
This patch adds SMAP handling logic when setting CR4 for guests

Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
way to detect SMAP violation.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:34 -03:00
Feng Wu
56d6efc2de KVM: Remove SMAP bit from CR4_RESERVED_BITS
This patch removes SMAP bit from CR4_RESERVED_BITS.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:33 -03:00
Daniel Borkmann
362d52040c Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
This reverts commit ef2820a735 ("net: sctp: Fix a_rwnd/rwnd management
to reflect real state of the receiver's buffer") as it introduced a
serious performance regression on SCTP over IPv4 and IPv6, though a not
as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.

Current state:

[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 17:56:21 GMT
Connecting to host 192.168.241.3, port 5201
      Cookie: Lab200slot2.1397238981.812898.548918
[  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
[  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
[  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
[  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
[  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
[  4]   6.21-6.21   sec  0.00 Bytes    0.00 bits/sec
[  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
[  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
[  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
[  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
[  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
[  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
[  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
[  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
[  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
[  4]  16.79-16.79  sec  0.00 Bytes    0.00 bits/sec
[  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
(etc)

[root@Lab200slot2 ~]#  iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 19:08:41 GMT
Connecting to host 2001:db8:0:f101::1, port 5201
      Cookie: Lab200slot2.1397243321.714295.2b3f7c
[  4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   169 MBytes  1.42 Gbits/sec
[  4]   1.00-2.00   sec   201 MBytes  1.69 Gbits/sec
[  4]   2.00-3.00   sec   188 MBytes  1.58 Gbits/sec
[  4]   3.00-4.00   sec   174 MBytes  1.46 Gbits/sec
[  4]   4.00-5.00   sec   165 MBytes  1.39 Gbits/sec
[  4]   5.00-6.00   sec   199 MBytes  1.67 Gbits/sec
[  4]   6.00-7.00   sec   163 MBytes  1.36 Gbits/sec
[  4]   7.00-8.00   sec   174 MBytes  1.46 Gbits/sec
[  4]   8.00-9.00   sec   193 MBytes  1.62 Gbits/sec
[  4]   9.00-10.00  sec   196 MBytes  1.65 Gbits/sec
[  4]  10.00-11.00  sec   157 MBytes  1.31 Gbits/sec
[  4]  11.00-12.00  sec   175 MBytes  1.47 Gbits/sec
[  4]  12.00-13.00  sec   192 MBytes  1.61 Gbits/sec
[  4]  13.00-14.00  sec   199 MBytes  1.67 Gbits/sec
(etc)

After patch:

[root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
Time: Mon, 14 Apr 2014 16:40:48 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1397493648.413274.65e131
[  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
[  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
[  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
[  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
[  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
[  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
[  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
[  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec

With the reverted patch applied, the SCTP/IPv4 performance is back
to normal on latest upstream for IPv4 and IPv6 and has same throughput
as 3.4.2 test kernel, steady and interval reports are smooth again.

Fixes: ef2820a735 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
Reported-by: Peter Butler <pbutler@sonusnet.com>
Reported-by: Dongsheng Song <dongsheng.song@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Peter Butler <pbutler@sonusnet.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 16:26:48 -04:00
Steve Wise
bfae232499 cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
Hardware needs the local device mac address to support hw loopback for
rdma loopback connections.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 16:26:48 -04:00
Daniel Borkmann
8c482cdc35 net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has
been wrongly decoded by commit a8fc927780 ("sk-filter: Add ability to
get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS
although it should have been decoded as BPF_LD|BPF_W|BPF_ABS.

In practice, this should not have much side-effect though, as such
conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse
operation PR_GET_SECCOMP will only return the current seccomp mode, but
not the filter itself. Since the transition to the new BPF infrastructure,
it's also not used anymore, so we can simply remove this as it's
unreachable.

Fixes: a8fc927780 ("sk-filter: Add ability to get socket filter program (v2)")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 16:26:47 -04:00
Daniel Borkmann
2eac764832 seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
Linus reports that on 32-bit x86 Chromium throws the following seccomp
resp. audit log messages:

  audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
syscall=172 compat=0 ip=0xb2dd9852 code=0x30000

  audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
compat=0 ip=0xb2dd9852 code=0x50000

These audit messages are being triggered via audit_seccomp() through
__secure_computing() in seccomp mode (BPF) filter with seccomp return
codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
during filter runtime. Moreover, Linus reports that x86_64 Chromium
seems fine.

The underlying issue that explains this is that the implementation of
populate_seccomp_data() is wrong. Our seccomp data structure sd that
is being shared with user ABI is:

  struct seccomp_data {
    int nr;
    __u32 arch;
    __u64 instruction_pointer;
    __u64 args[6];
  };

Therefore, a simple cast to 'unsigned long *' for storing the value of
the syscall argument via syscall_get_arguments() is just wrong as on
32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
at wrong offsets in args[] member, and thus i) could leak stack memory
to user space and ii) tampers with the logic of seccomp BPF programs
that read out and check for syscall arguments:

  syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);

Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
test machine through slow ssh X forwarding, but it fixes the issue on
my side. So fix it up by storing args in type correct variables, gcc
is clever and optimizes the copy away in other cases, e.g. x86_64.

Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Reported-and-bisected-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 16:26:47 -04:00
Eliad Peller
c0da71ff4d wl18xx: align event mailbox with current fw
Some fields are missing from the event mailbox
struct definitions, which cause issues when
trying to handle some events.

Add the missing fields in order to align the
struct size (without adding actual support
for the new fields).

Reported-and-tested-by: Imre Kaloz <kaloz@openwrt.org>
Cc: stable@vger.kernel.org # 3.14+
Fixes: 028e724 ("wl18xx: move to new firmware (wl18xx-fw-3.bin)")
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-04-14 14:31:42 -04:00
Christian Engelmayer
61698b7e22 rsi: Fix a potential memory leak in rsi_send_auto_rate_request()
Fix a potential memory leak in the error path of function
rsi_send_auto_rate_request(). In case memory allocation for array
'selected_rates' fails, the error path exits and leaves the previously
allocated skb in place. Detected by Coverity: CID 1195575.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-04-14 14:31:42 -04:00
Frederic Danis
2004dabaac cw1200: Fix cw1200_debug_link_id
This array is used in debug string to display cw1200_link_status
defined in drivers/net/wireless/cw1200/cw1200.h.

Add missing strings for CW1200_LINK_RESET and CW1200_LINK_RESET_REMAP.

Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-04-14 14:31:42 -04:00
Luciano Coelho
69aa167583 wlcore: ignore dummy packet events in PLT mode
Sometimes the firmware sends a dummy packet event while we are in PLT
mode.  This doesn't make sense, it's a firmware bug.  Fix this by
ignoring dummy packet events when we're PLT mode.

Reported-by: Yegor Yefremov <yegorslists@googlemail.com>
Reported-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Luciano Coelho <luca@coelho.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-04-14 14:31:41 -04:00
Christian Engelmayer
98ddcbe033 rsi: Fix a potential memory leak in rsi_set_channel()
Fix a potential memory leak in function rsi_set_channel() that is used to
program channel changes. The channel check block for the frequency bands
directly exits the function in case of an error, thus leaving an already
allocated skb unreferenced. Move the checks above allocating the skb.
Detected by Coverity: CID 1195576.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-04-14 14:31:41 -04:00
Geert Uytterhoeven
af64dc7474 rsi: Add missing initialization of ii
drivers/net/wireless/rsi/rsi_91x_core.c: In function ‘rsi_core_determine_hal_queue’:
drivers/net/wireless/rsi/rsi_91x_core.c:91: warning: ‘ii’ may be used uninitialized in this function

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-04-14 14:31:41 -04:00
John W. Linville
0215f4cf72 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes 2014-04-14 14:21:07 -04:00
John W. Linville
08e22e193a Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2014-04-14 13:47:01 -04:00
David S. Miller
14ed4a5bcb Merge branch 'qlcnic'
Shahed Shaikh says:

====================
qlcnic: Bug fixes

This patch series contains following bug fixes -

* Send INIT_NIC_FUNC mailbox command as first mailbox
* Fix a panic because of uninitialized delayed_work.
* Fix inconsistent calculation of max rings count.
* Fix PVID configuration issue. Driver needs to clear older
  PVID before adding new one.
* Fix QLogic application/driver interface by packing vNIC information
  array.
* Fix a crash when user tries to disable SR-IOV while VFs are
  still assigned to VMs.

Please apply to net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:58 -04:00
Manish Chopra
696f1943a1 qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
o While disabling SR-IOV when VFs are assigned to VMs causes host crash
  so return -EPERM when user request to disable SR-IOV using pci sysfs in
  case of VFs are assigned to VMs.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:53 -04:00
Jitendra Kalsaria
4f03022777 qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
o Application expect vNIC number as the array index but driver interface
return configuration in array index form.

o Pack the vNIC information array in the buffer such that application can
access it using vNIC number as the array index.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:52 -04:00
Jitendra Kalsaria
a78b6da89f qlcnic: Fix PVID configuration on eSwitch port.
Clear older PVID before adding a newer PVID to the eSwicth port

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:52 -04:00
Shahed Shaikh
7b546842b1 qlcnic: Fix max ring count calculation
Do not read max rings count from qlcnic_get_nic_info(). Use driver defined
values for 82xx adapters. In case of 83xx adapters, use minimum of firmware
provided and driver defined values.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:52 -04:00
Sucheta Chakraborty
4d52e1e8d1 qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
o INIT_NIC_FUNC should be first mailbox sent. Sending DCB capability and
  parameter query commands after that command.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:52 -04:00
Sucheta Chakraborty
463518a0cb qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
o AEN event was being received before initializing delayed_work struct
  and handlers for it. This was resulting in crash. This patch fixes it.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:52 -04:00
David S. Miller
677df2f4ae Merge branch 'be2net'
Sathya Perla says:

====================
be2net: patch set

Patch 1/2 is a v2 of a patch that was submitted earlier (as a part of a
different patch-set). v2 incorporates a suggestion given by David Laight
for how long to poll for pending TX completions while disabling a device.

Patch 2/2 fixes a crash in be_remove()->be_close()
path after be2net has aborted an EEH error recovery
due to a permanant failure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:41:42 -04:00
Kalesh AP
e1ad8e33d2 be2net: Fix invocation of be_close() after be_clear()
In the EEH error recovery path, when a permanent failure occurs,
we clean up adapter structure (i.e. destroy queues etc) by calling
be_clear() and return PCI_ERS_RESULT_DISCONNECT.
After this the stack tries to remove device from bus and calls
be_remove() which invokes netdev_unregister()->be_close().
be_close() operating on destroyed queues results in a
NULL dereference.

This patch fixes this problem by introducing a flag to keep track
of the setup state.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:41:37 -04:00
Vasundhara Volam
1a3d0717f6 be2net: Fix to reap TX compls till HW doesn't respond for some time
be_close() currently waits for a max of 200ms to receive all pending
TX compls. This timeout value was roughly calculated based on 10G
transmission speeds and the TX queue depth. This timeout may not be
enough when the link is operating at lower speeds or in multi-channel/SR-IOV
configs with TX-rate limiting setting.

It is hard to calculate a "proper timeout value" that works in all
configurations.  This patch solves this problem by continuing to reap
TX completions till the HW is completely silent for a period of 10ms or
a HW error is detected.

v2: implements the new scheme (as suggested by David Laight) instead of
just waiting longer than 200ms for reaping all completions.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:41:37 -04:00
Amir Vadai
e1a5ddc506 net/mlx4_core: Defer VF initialization till PF is fully initialized
Fix in commit [1] is not sufficient since a deferred VF initialization
could happen after pci_enable_sriov() is finished, but before the PF is
fully initialized.
Need to prevent VFs from initializing till the PF is fully ready and
comm channel is operational.

[1] - 9798935 "net/mlx4_core: mlx4_init_slave() shouldn't access comm
      channel before PF is ready"

CC: Stuart Hayes <Stuart_Hayes@Dell.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:24:42 -04:00
Daniel J Blueman
77d149c4eb bnx2: Don't build unused suspend/resume functions not enabled
When CONFIG_PM_SLEEP isn't enabled, bnx2_suspend/resume are unused; don't
build them when they aren't used.

Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 12:40:00 -04:00
Eric Dumazet
30f78d8ebf ipv6: Limit mtu to 65575 bytes
Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.

We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.

We must limit the IPv6 MTU to (65535 + 40) bytes in theory.

Tested:

ifconfig lo mtu 70000
netperf -H ::1

Before patch : Throughput :   0.05 Mbits

After patch : Throughput : 35484 Mbits

Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 12:39:59 -04:00
Patrick McHardy
b855d416dc netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4
nft_cmp_fast is used for equality comparisions of size <= 4. For
comparisions of size < 4 byte a mask is calculated that is applied to
both the data from userspace (during initialization) and the register
value (during runtime). Both values are stored using (in effect) memcpy
to a memory area that is then interpreted as u32 by nft_cmp_fast.

This works fine on little endian since smaller types have the same base
address, however on big endian this is not true and the smaller types
are interpreted as a big number with trailing zero bytes.

The mask therefore must not include the lower bytes, but the higher bytes
on big endian. Add a helper function that does a cpu_to_le32 to switch
the bytes on big endian. Since we're dealing with a mask of just consequitive
bits, this works out fine.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-14 10:38:02 +02:00
Andrey Vagin
ee214d54bf netfilter: nf_conntrack: initialize net.ct.generation
[  251.920788] INFO: trying to register non-static key.
[  251.921386] the code is fine but needs lockdep annotation.
[  251.921386] turning off the locking correctness validator.
[  251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294
[  251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  251.921386]  0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd
[  251.921386]  ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0
[  251.921386]  ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172
[  251.921386] Call Trace:
[  251.921386]  [<ffffffff816b7ecd>] dump_stack+0x45/0x56
[  251.921386]  [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c
[  251.921386]  [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40
[  251.921386]  [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0
[  251.921386]  [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat]
[  251.921386]  [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40
[  251.921386]  [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat]
[  251.921386]  [<ffffffff810c7272>] lock_acquire+0xa2/0x120
[  251.921386]  [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
[  251.921386]  [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack]
[  251.921386]  [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
[  251.921386]  [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
[  251.921386]  [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
[  251.921386]  [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0
[  251.921386]  [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
[  251.921386]  [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190
[  251.921386]  [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
[  251.921386]  [<ffffffff815e98f2>] ip_output+0x92/0x100
[  251.921386]  [<ffffffff815e8df9>] ip_local_out+0x29/0x90
[  251.921386]  [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0
[  251.921386]  [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0
[  251.921386]  [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960
[  251.921386]  [<ffffffff81602d82>] tcp_connect+0x812/0x960
[  251.921386]  [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70
[  251.921386]  [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0
[  251.921386]  [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470
[  251.921386]  [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330
[  251.921386]  [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0
[  251.921386]  [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10
[  251.921386]  [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0
[  251.921386]  [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50
[  251.921386]  [<ffffffff8158b157>] SYSC_connect+0xe7/0x120
[  251.921386]  [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0
[  251.921386]  [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0
[  251.921386]  [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10
[  251.921386]  [<ffffffff8158c36e>] SyS_connect+0xe/0x10
[  251.921386]  [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b
[  312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333)
[  312.015097] INFO: Stall ended before state dump start

Fixes: 93bb0ceb75 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-14 10:35:28 +02:00
Mathias Krause
05ab8f2647 filter: prevent nla extensions to peek beyond the end of the message
The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check
for a minimal message length before testing the supplied offset to be
within the bounds of the message. This allows the subtraction of the nla
header to underflow and therefore -- as the data type is unsigned --
allowing far to big offset and length values for the search of the
netlink attribute.

The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is
also wrong. It has the minuend and subtrahend mixed up, therefore
calculates a huge length value, allowing to overrun the end of the
message while looking for the netlink attribute.

The following three BPF snippets will trigger the bugs when attached to
a UNIX datagram socket and parsing a message with length 1, 2 or 3.

 ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]--
 | ld	#0x87654321
 | ldx	#42
 | ld	#nla
 | ret	a
 `---

 ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]--
 | ld	#0x87654321
 | ldx	#42
 | ld	#nlan
 | ret	a
 `---

 ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]--
 | ; (needs a fake netlink header at offset 0)
 | ld	#0
 | ldx	#42
 | ld	#nlan
 | ret	a
 `---

Fix the first issue by ensuring the message length fulfills the minimal
size constrains of a nla header. Fix the second bug by getting the math
for the remainder calculation right.

Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction")
Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..")
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-13 23:31:55 -04:00
Julian Anastasov
91146153da ipv4: return valid RTA_IIF on ip route get
Extend commit 13378cad02
("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid
RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0
which is displayed as 'iif *'.

inet_iif is not appropriate to use because skb_iif is not set.
Use the skb->dev->ifindex instead.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-13 23:23:41 -04:00
Thomas Petazzoni
cc6ca3023f Revert "net: mvneta: fix usage as a module on RGMII configurations"
This reverts commit e3a8786c10. While
this commit allows to use the mvneta driver as a module on some
configurations, it breaks other configurations even if mvneta is used
built-in.

This breakage is due to the fact that on some RGMII platforms, the PCS
bit has to be set, and on some other platforms, it has to be
cleared. At the moment, we lack informations to know exactly the
significance of this bit (the datasheet only says "enables PCS"), and
so we can't produce a patch that will work on all platforms at this
point. And since this change is breaking the network completely for
many users, it's much better to revert it for now. We'll come back
later with a proper fix that takes into account all platforms.

Basically:

 * Armada XP GP is configured as RGMII-ID, and needs the PCS bit to be
   set.
 * Armada 370 Mirabox is configured as RGMII-ID, and needs the PCS bit
   to be cleared.

And at the moment, we don't know how to make the distinction between
those two cases. One hint is that the Armada XP GP appears in fact to
be using a QSGMII connection with the PHY (Quad-SGMII), but
configuring it as SGMII doesn't work, while RGMII-ID works. This needs
more investigation, but in the mean time, let's unbreak the network
for all those users.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reported-by: Arnaud Ebalard <arno@natisbad.org>
Reported-by: Alexander Reuter <Alexander.Reuter@gmx.net>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=73401
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-13 23:13:13 -04:00
Wei Yang
befdf8978a net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()
pci_match_id() just match the static pci_device_id, which may return NULL if
someone binds the driver to a device manually using
/sys/bus/pci/drivers/.../new_id.

This patch wrap up a helper function __mlx4_remove_one() which does the tear
down function but preserve the drv_data. Functions like
mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
releasing drvdata.

Fixes: 97a5221 "net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset".

CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Amir Vadai <amirv@mellanox.com>
CC: Jack Morgenstein <jackm@dev.mellanox.co.il>
CC: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-13 23:12:15 -04:00
Wang, Xiaoming
b04c461902 net: ipv4: current group_info should be put after using.
Plug a group_info refcount leak in ping_init.
group_info is only needed during initialization and
the code failed to release the reference on exit.
While here move grabbing the reference to a place
where it is actually needed.

Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com>
Signed-off-by: xiaoming wang <xiaoming.wang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-13 22:52:58 -04:00
Linus Torvalds
c9eaa447e7 Linux 3.15-rc1 2014-04-13 14:18:35 -07:00
Geert Uytterhoeven
f7c1d07420 mm: Initialize error in shmem_file_aio_read()
Some versions of gcc even warn about it:

  mm/shmem.c: In function ‘shmem_file_aio_read’:
  mm/shmem.c:1414: warning: ‘error’ may be used uninitialized in this function

If the loop is aborted during the first iteration by one of the two
first break statements, error will be uninitialized.

Introduced by commit 6e58e79db8 ("introduce copy_page_to_iter, kill
loop over iovec in generic_file_aio_read()").

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-13 14:10:26 -07:00
Geert Uytterhoeven
e686bd8dc5 cifs: Use min_t() when comparing "size_t" and "unsigned long"
On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":

  fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
  fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast

Introduced by commit 7f25bba819 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-13 14:10:26 -07:00
Linus Torvalds
bf3a340738 Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull slab changes from Pekka Enberg:
 "The biggest change is byte-sized freelist indices which reduces slab
  freelist memory usage:

    https://lkml.org/lkml/2013/12/2/64"

* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
  mm: slab/slub: use page->list consistently instead of page->lru
  mm/slab.c: cleanup outdated comments and unify variables naming
  slab: fix wrongly used macro
  slub: fix high order page allocation problem with __GFP_NOFAIL
  slab: Make allocations with GFP_ZERO slightly more efficient
  slab: make more slab management structure off the slab
  slab: introduce byte sized index for the freelist of a slab
  slab: restrict the number of objects in a slab
  slab: introduce helper functions to get/set free object
  slab: factor out calculate nr objects in cache_estimate
2014-04-13 13:28:13 -07:00
Emmanuel Grumbach
e03bbb62cf iwlwifi: don't disable SCD chain extension on newer devices
7000 device series have a fix for this hardware feature.
Stop disabling it, and get an improvement in Tx throughput.
This feature allows the scheduler to fetch more frames on
the fly while an A-MPDU is being built - which means that
we can get larger A-MPDU. This, of course, give an
improvement in the Tx throughput.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2014-04-13 22:23:20 +03:00
Avri Altman
c6e37a686e iwlwifi: mvm: Re-factor enabling uAPSD logic
The driver can enable uAPSD and specify some of its related parameters.
This patch organizes this logic in a separate function.

Signed-off-by: Avri Altman <avri.altman@intel.com>
Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2014-04-13 22:23:19 +03:00
Emmanuel Grumbach
3a58d98e8d iwlwifi: mvm: replace leading spaces by tabs
Somehow I added spaces instead of tabs to a few lines in
debugfs.c.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2014-04-13 22:23:19 +03:00
Eliad Peller
d15a747fc8 iwlwifi: mvm: don't use d3 fw if d0i3 is used
bail out from the suspend/resume callbacks if
d0i3 is used.

declare support for ANY wowlan trigger (i.e.
normal operation).

On resume, we shouldn't execute the d0i3 exit
flow (which might disconnect stations, etc.)
until mac80211 was resumed.
Add new flags to indicate we are in suspend,
and call the pending exit work on resume.

Since the resume flow can take some time, add
a new EXIT_WORK reference type to prevent going
back to d0i3 at this stage.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2014-04-13 22:23:18 +03:00