linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-23 11:19:59 +07:00

Author	SHA1	Message	Date
WANG Cong	d7480fd3b1	neigh: remove dynamic neigh table registration support Currently there are only three neigh tables in the whole kernel: arp table, ndisc table and decnet neigh table. What's more, we don't support registering multiple tables per family. Therefore we can just make these tables statically built-in. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 15:23:54 -05:00
Daniel Borkmann	4184b2a79a	net: sctp: fix memory leak in auth key management A very minimal and simple user space application allocating an SCTP socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing the socket again will leak the memory containing the authentication key from user space: unreferenced object 0xffff8800837047c0 (size 16): comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s) hex dump (first 16 bytes): 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811c88d8>] __kmalloc+0xe8/0x270 [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp] [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp] [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp] [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20 [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0 [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17 [<ffffffffffffffff>] 0xffffffffffffffff This is bad because of two things, we can bring down a machine from user space when auth_enable=1, but also we would leave security sensitive keying material in memory without clearing it after use. The issue is that sctp_auth_create_key() already sets the refcount to 1, but after allocation sctp_auth_set_key() does an additional refcount on it, and thus leaving it around when we free the socket. Fixes: `65b07e5d0d` ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 15:19:11 -05:00
Daniel Borkmann	e40607cbe2	net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet An SCTP server doing ASCONF will panic on malformed INIT ping-of-death in the form of: ------------ INIT[PARAM: SET_PRIMARY_IP] ------------> While the INIT chunk parameter verification dissects through many things in order to detect malformed input, it misses to actually check parameters inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary IP address' parameter in ASCONF, which has as a subparameter an address parameter. So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0 and thus sctp_get_af_specific() returns NULL, too, which we then happily dereference unconditionally through af->from_addr_param(). The trace for the log: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] PGD 0 Oops: 0000 [#1] SMP [...] Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs RIP: 0010:[<ffffffffa01e9c62>] [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] [...] Call Trace: <IRQ> [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp] [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp] [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [...] A minimal way to address this is to check for NULL as we do on all other such occasions where we know sctp_get_af_specific() could possibly return with NULL. Fixes: `d6de309759` ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 15:19:10 -05:00
Takashi Iwai	5748eb8f8e	net: ppp: Don't call bpf_prog_create() in ppp_lock In ppp_ioctl(), bpf_prog_create() is called inside ppp_lock, which eventually calls vmalloc() and hits BUG_ON() in vmalloc.c. This patch works around the problem by moving the allocation outside the lock. The bug was revealed by the recent change in net/core/filter.c, as it allocates via vmalloc() instead of kmalloc() now. Reported-and-tested-by: Stefan Seyfried <stefan.seyfried@googlemail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 15:15:03 -05:00
Andy Shevchenko	b2e2f0c779	stmmac: split to core library and probe drivers Instead of registering the platform and PCI drivers in one module let's move necessary bits to where it belongs. During this procedure we convert the module registration part to use module_*_driver() macros which makes code simplier. >From now on the driver consists three parts: core library, PCI, and platform drivers. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 14:34:39 -05:00
Miklos Szeredi	799b601451	audit: keep inode pinned Audit rules disappear when an inode they watch is evicted from the cache. This is likely not what we want. The guilty commit is "fsnotify: allow marks to not pin inodes in core", which didn't take into account that audit_tree adds watches with a zero mask. Adding any mask should fix this. Fixes: `90b1e7a578` ("fsnotify: allow marks to not pin inodes in core") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org # 2.6.36+ Signed-off-by: Paul Moore <pmoore@redhat.com>	2014-11-11 14:20:22 -05:00
Joe Perches	ba7a46f16d	net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited Use the more common dynamic_debug capable net_dbg_ratelimited and remove the LIMIT_NETDEBUG macro. All messages are still ratelimited. Some KERN_<LEVEL> uses are changed to KERN_DEBUG. This may have some negative impact on messages that were emitted at KERN_INFO that are not not enabled at all unless DEBUG is defined or dynamic_debug is enabled. Even so, these messages are now _not_ emitted by default. This also eliminates the use of the net_msg_warn sysctl "/proc/sys/net/core/warnings". For backward compatibility, the sysctl is not removed, but it has no function. The extern declaration of net_msg_warn is removed from sock.h and made static in net/core/sysctl_net_core.c Miscellanea: o Update the sysctl documentation o Remove the embedded uses of pr_fmt o Coalesce format fragments o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 14:10:31 -05:00
Denis Kirjanov	5b61c4db49	PPC: bpf_jit_comp: add SKF_AD_HATYPE instruction Add BPF extension SKF_AD_HATYPE to ppc JIT to check the hw type of the interface Before: [ 57.723666] test_bpf: #20 LD_HATYPE [ 57.723675] BPF filter opcode 0020 (@0) unsupported [ 57.724168] 48 48 PASS After: [ 103.053184] test_bpf: #20 LD_HATYPE 7 6 PASS CC: Alexei Starovoitov<alexei.starovoitov@gmail.com> CC: Daniel Borkmann<dborkman@redhat.com> CC: Philippe Bergheaud<felix@linux.vnet.ibm.com> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> v2: address Alexei's comments Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:39:47 -05:00
Aravind Gopalakrishnan	0bd5294158	hwmon: (fam15h_power) Fix NB device ID for F16h M30h F3 device ID is wrongly included in fam15h_power_id_table for F16h M30h. It should be F4 device ID. Fix this. Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>	2014-11-11 10:39:45 -08:00
Kamil Debski	48b9d5b4f4	hwmon: (pwm-fan) Fix suspend/resume behavior The state of a PWM output is not clearly defined after resume. Some PWM drivers do not restore the duty cycle upon resume, thus it is necessary to manually restore the correct value. Signed-off-by: Kamil Debski <k.debski@samsung.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>	2014-11-11 10:39:45 -08:00
Michael Ellerman	aab18da44f	hwmon: (ibmpowernv) Quieten when probing finds no device Because we build kernels with drivers built in for many platforms, it's normal for the ibmpowernv driver to be loaded on systems that don't have the appropriate hardware. Currently the driver spams the log with: ibmpowernv ibmpowernv.0: Opal node 'sensors' not found ibmpowernv: Platfrom driver probe failed But there is no error, this machine is not a powernv and doesn't have the hardware. So change the sensors message to dev_dbg(), and only print an error about the probe failing if it's not ENODEV. Also fix the spelling of "Platfrom" and print the actual error value. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net>	2014-11-11 10:39:45 -08:00
David S. Miller	4083c8056d	Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch Pravin B Shelar says: ==================== Open vSwitch Following batch of patches brings feature parity between upstream ovs and out of tree ovs module. Two features are added, first adds support to export egress tunnel information for a packet. This is used to improve visibility in network traffic. Second feature allows userspace vswitchd process to probe ovs module features. Other patches are optimization and code cleanup. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:32:25 -05:00
Joe Perches	a2ae6007a4	dsa: Use netdev_<level> instead of printk Neaten and standardize the logging output. Other miscellanea: o Use pr_notice_once instead of a guard flag. o Convert existing pr_<level> uses too. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:30:57 -05:00
Or Gerlitz	f4a1edd561	net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set Currenly we only support Large-Send and TX checksum offloads for encapsulated traffic of type VXLAN. We must make sure to advertize these offloads up to the stack only when VXLAN tunnel is set. Failing to do so, would mislead the the networking stack to assume that the driver can offload the internal TX checksum for GRE packets and other buggy schemes. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:24:45 -05:00
David S. Miller	008e81656c	Merge branch 'mlx4-next' Or Gerlitz says: ==================== mlx4: Add CHECKSUM_COMPLETE support These patches from Shani, Matan and myself add support for CHECKSUM_COMPLETE reporting on non TCP/UDP packets such as GRE and ICMP. I'd like to deeply thank Jerry Chu for his innovation and support in that effort. Based on the feedback from Eric and Ido Shamay, in V2 we dropped the patch which removed the calls to napi_gro_frags() and added a patch which makes the RX code to go through that path regardless of the checksum status. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:20:07 -05:00
Shani Michaeli	f8c6455bb0	net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE When processing received traffic, pass CHECKSUM_COMPLETE status to the stack, with calculated checksum for non TCP/UDP packets (such as GRE or ICMP). Although the stack expects checksum which doesn't include the pseudo header, the HW adds it. To address that, we are subtracting the pseudo header checksum from the checksum value provided by the HW. In the IPv6 case, we also compute/add the IP header checksum which is not added by the HW for such packets. Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:20:02 -05:00
Shani Michaeli	dd65beac48	net/mlx4_en: Extend usage of napi_gro_frags We can call napi_gro_frags for all the received traffic regardless of the checksum status. Specifically, received packets whose status is CHECKSUM_NONE (and soon to be added CHECKSUM_COMPLETE) are eligible for napi_gro_frags as well. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:20:02 -05:00
David S. Miller	b00394c007	Merge branch 'so_incoming_cpu' Eric Dumazet says: ==================== net: SO_INCOMING_CPU support SO_INCOMING_CPU socket option (read by getsockopt()) provides an alternative to RPS/RFS for high performance servers using multi queues NIC. TCP should use sk_mark_napi_id() for established sockets only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:00:11 -05:00
Eric Dumazet	2c8c56e15d	net: introduce SO_INCOMING_CPU Alternative to RPS/RFS is to use hardware support for multiple queues. Then split a set of million of sockets into worker threads, each one using epoll() to manage events on its own socket pool. Ideally, we want one thread per RX/TX queue/cpu, but we have no way to know after accept() or connect() on which queue/cpu a socket is managed. We normally use one cpu per RX queue (IRQ smp_affinity being properly set), so remembering on socket structure which cpu delivered last packet is enough to solve the problem. After accept(), connect(), or even file descriptor passing around processes, applications can use : int cpu; socklen_t len = sizeof(cpu); getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len); And use this information to put the socket into the right silo for optimal performance, as all networking stack should run on the appropriate cpu, without need to send IPI (RPS/RFS). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:00:06 -05:00
Eric Dumazet	3d97379a67	tcp: move sk_mark_napi_id() at the right place sk_mark_napi_id() is used to record for a flow napi id of incoming packets for busypoll sake. We should do this only on established flows, not on listeners. This was 'working' by virtue of the socket cloning, but doing this on SYN packets in unecessary cache line dirtying. Even if we move sk_napi_id in the same cache line than sk_lock, we are working to make SYN processing lockless, so it is desirable to set sk_napi_id only for established flows. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-11 13:00:05 -05:00
Takashi Iwai	1a290581de	ALSA: usb-audio: Fix memory leak in FTU quirk M-audio FastTrack Ultra quirk doesn't release the kzalloc'ed memory. This patch adds the private_free callback to release it properly. Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2014-11-11 18:04:41 +01:00
Don Skidmore	d1b849b9e9	ixgbe: add helper function for setting RSS key in preparation of X550 Split off the setting of the RSS key into its own function. This will help when we add support for X550 which can have different RSS keys per pool. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 06:43:23 -08:00
Don Skidmore	9a75a1ac77	ixgbe: Add new support for X550 MAC's This patch will add in the new MAC defines and fit it into the switch cases throughout the driver. New functionality and enablement support will be added in following patches. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 06:18:56 -08:00
Don Skidmore	8d697e7e54	ixgbe: cleanup move setting PFQDE.HIDE_VLAN to support function. Move setting of drop enable to support function. This not only makes the code more readable but is also prep for following patches that add additional MAC support. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 06:18:49 -08:00
Don Skidmore	2b509c0cd2	ixgbe: cleanup ixgbe_ndo_set_vf_vlan Clean up functionality in ixgbe_ndo_set_vf_vlan that will simplify later patches. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 06:18:36 -08:00
Don Skidmore	71bde60191	ixgbe: fix X540 Completion timeout On topologies including few levels of PCIe switching X540 can run into an unexpected completion error. We get around this by waiting after enabling loopback a sufficient amount of time until Tx Data Fetch is sent. We then poll the pending transaction bit to ensure we received the completion. Only then do we go on to clear the buffers. Signed-of-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 06:05:27 -08:00
Mitch Williams	cc0529271f	i40evf: don't use more queues than CPUs It's kind of silly to configure and attempt to use a bunch of queue pairs when you're running on a single (virtual) CPU. Instead of unconditionally configuring all of the queues that the PF gives us, clamp the number of queue pairs to the number of CPUs. Change-ID: I321714c9e15072ee76de8f95ab9a81f86ed347d1 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 06:02:00 -08:00
Mitch Williams	f8d4db35e8	i40evf: make early init processing more robust In early init, if we get an unexpected message from the PF (such as link status), we just kick an error back to the init task, causing it to restart its state machine and delaying initialization. Make the early init AQ message receive code more robust by handling messages in a loop, and ignoring those that we aren't interested in. This also gets rid of some scary log messages that really didn't indicate a problem. Change-ID: I620e8c72e49c49c665ef33eeab2425dd10e721cf Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 06:01:54 -08:00
Jesse Brandeburg	79442d38b3	i40e: clean up throttle rate code The interrupt throttle rate minimum is actually 2us, so fix that define and while we are there, remove some unused defines. Change some strings in the function to be a bit less wrappy, and express the correct limits. Change-ID: I96829bbc77935e0b57c6f0fc1439fb4152b2960a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 06:01:48 -08:00
Shannon Nelson	215367171b	i40e: don't do link_status or stats collection on every ARQ The ARQ events cause a service_task execution, and we do a link_status check and full stats gathering for each service_task. However, when there are a lot of ARQ events, such as when doing an NVM update, we end up doing 10's if not 100's of these per second, thereby heavily abusing the PCI bus and especially the Firmware. This patch adds a check to keep the service_task from running these periodic tasks more than once per second, while still allowing quick action to service the events. Change-ID: Iec7670c37bfae9791c43fec26df48aea7f70b33e Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 05:52:46 -08:00
Kamil Krawczyk	0db4e162e6	i40e: poll firmware slower The code was polling the firmware tail register for completion every 10 microseconds, which is way faster than the firmware can respond. This changes the poll interval to 1ms, which reduces polling CPU utilization, and the number of times we loop. The maximum delay is still 100ms. Change-ID: I4bbfa6b66d802890baf8b4154061e55942b90958 Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-11-11 05:44:16 -08:00
Daniel Thompson	3438cf549d	param: fix crash on bad kernel arguments Currently if the user passes an invalid value on the kernel command line then the kernel will crash during argument parsing. On most systems this is very hard to debug because the console hasn't been initialized yet. This is a regression due to commit `51e158c12a` ("param: hand arguments after -- straight to init") which, in response to the systemd debug controversy, made it possible to explicitly pass arguments to init. To achieve this parse_args() was extended from simply returning an error code to returning a pointer. Regretably the new init args logic does not perform a proper validity check on the pointer resulting in a crash. This patch fixes the validity check. Should the check fail then no arguments will be passed to init. This is reasonable and matches how the kernel treats its own arguments (i.e. no error recovery). Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-11-11 17:03:19 +10:30
Eric Dumazet	2e1af7d74f	mlx4: restore conditional call to napi_complete_done() After commit `1a28817282` ("mlx4: use napi_complete_done()") we ended up calling napi_complete_done() in the case NAPI poll consumed all its budget. This added extra interrupt pressure, this patch restores proper behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `1a28817282` ("mlx4: use napi_complete_done()") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 21:09:03 -05:00
David S. Miller	d21385fada	Merge branch 'sunvnet-next' Sowmini Varadhan says: ==================== sunvnet: edge-case/race-conditions bug fixes This patch series contains fixes for race-conditions in sunvnet, that can encountered when there is a difference in latency between producer and consumer. Patch 1 addresses a case when the STOPPED LDC ack from a peer is processed before vnet_start_xmit can finish updating the dr->prod state. Patch 2 fixes the edge-case when outgoing data and incoming stopped-ack cross each other in flight. Patch 3 adds a missing rcu_read_unlock(), found by code-inspection. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 21:05:43 -05:00
Sowmini Varadhan	df20286ab1	sunvnet: Add missing rcu_read_unlock() in vnet_start_xmit The out_dropped label will only do rcu_read_unlock for non-null port. So add the missing rcu_read_unlock() when bailing due to non-null port. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 21:05:36 -05:00
Sowmini Varadhan	777362d721	sunvnet: vnet_ack() should check if !start_cons to send a missed trigger As per comments in vnet_start_xmit, for the edge case when outgoing vnet_start_xmit() data and an incoming STOPPED ACK cross each other in flight, we may need to send the missed START trigger from maybe_tx_wakeup() after checking for a false value of start_cons Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 21:05:36 -05:00
Sowmini Varadhan	b0cffed543	sunvnet: Fix race between vnet_start_xmit() and vnet_ack() When vnet_start_xmit() is concurrent with vnet_ack(), we may have a race that looks like: thread 1 thread 2 vnet_start_xmit vnet_event_napi -> vnet_rx __vnet_tx_trigger for some desc X at this point dr->prod == X peer sends back a stopped ack for X we process X, but X == dr->prod so we bail out in vnet_ack with !idx_is_pending update dr->prod As a result of the fact that we never processed the stopped ack for X, the Tx path is led to incorrectly believe that the peer is still "started" and reading, but the peer has stopped reading, which will ultimately end in flow-control assertions. The fix is to synchronize the above 2 paths on the netif_tx_lock. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 21:05:36 -05:00
Rabin Vincent	07906da788	tracing: Do not risk busy looping in buffer splice If the read loop in trace_buffers_splice_read() keeps failing due to memory allocation failures without reading even a single page then this function will keep busy looping. Remove the risk for that by exiting the function if memory allocation failures are seen. Link: http://lkml.kernel.org/r/1415309167-2373-2-git-send-email-rabin@rab.in Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-10 16:47:31 -05:00
Rabin Vincent	e30f53aad2	tracing: Do not busy wait in buffer splice On a !PREEMPT kernel, attempting to use trace-cmd results in a soft lockup: # trace-cmd record -e raw_syscalls:* -F false NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61] ... Call Trace: [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90 [<ffffffff81092e25>] wait_on_pipe+0x35/0x40 [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0 [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0 [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290 [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff8112415d>] do_splice_to+0x6d/0x90 [<ffffffff81126971>] SyS_splice+0x7c1/0x800 [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8 The problem is this: tracing_buffers_splice_read() calls ring_buffer_wait() to wait for data in the ring buffers. The buffers are not empty so ring_buffer_wait() returns immediately. But tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1, meaning it only wants to read a full page. When the full page is not available, tracing_buffers_splice_read() tries to wait again with ring_buffer_wait(), which again returns immediately, and so on. Fix this by adding a "full" argument to ring_buffer_wait() which will make ring_buffer_wait() wait until the writer has left the reader's page, i.e. until full-page reads will succeed. Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.16+ Fixes: `b1169cc69b` ("tracing: Remove mock up poll wait function") Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-10 16:45:43 -05:00
Alban Bedel	6f6e741f6d	8139too: Allow using the largest possible MTU This driver allows MTU up to 1518 bytes which is not enought to run batman-adv. Simply raise the maximum packet size up to the maximum allowed by the transmit descriptor, 1792 bytes, giving a maximum MTU of 1774 bytes. Signed-off-by: Alban Bedel <albeu@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 15:30:02 -05:00
Alban Bedel	ef786f106f	8139too: Allow setting MTU larger than 1500 Replace the default ndo_change_mtu callback with one that allow setting MTU that the driver can handle. Signed-off-by: Alban Bedel <albeu@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 15:30:02 -05:00
Joe Thornber	9b460d3699	dm btree: fix a recursion depth bug in btree walking code The walk code was using a 'ro_spine' to hold it's locked btree nodes. But this data structure is designed for the rolling lock scheme, and as such automatically unlocks blocks that are two steps up the call chain. This is not suitable for the simple recursive walk algorithm, which retraces its steps. This code is only used by the persistent array code, which in turn is only used by dm-cache. In order to trigger it you need to have a mapping tree that is more than 2 levels deep; which equates to 8-16 million cache blocks. For instance a 4T ssd with a very small block size of 32k only just triggers this bug. The fix just places the locked blocks on the stack, and stops using the ro_spine altogether. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2014-11-10 15:23:58 -05:00
Anish Bhatt	a815286b94	cxgb4 : Fix bug in DCB app deletion Unlike CEE, IEEE has a bespoke app delete call and does not rely on priority for app deletion Fixes : `2376c879b8` ('cxgb4 : Improve handling of DCB negotiation or loss thereof') Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 15:13:53 -05:00
Jesse Gross	cfdf1e1ba5	udptunnel: Add SKB_GSO_UDP_TUNNEL during gro_complete. When doing GRO processing for UDP tunnels, we never add SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol is added (such as SKB_GSO_TCPV4). The result is that if the packet is later resegmented we will do GSO but not treat it as a tunnel. This results in UDP fragmentation of the outer header instead of (i.e.) TCP segmentation of the inner header as was originally on the wire. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 15:09:45 -05:00
David S. Miller	b92172661e	Merge tag 'master-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-11-07 Please pull this batch of updates intended for the 3.19 stream! For the mac80211 bits, Johannes says: "This relatively large batch of changes is comprised of the following: * large mac80211-hwsim changes from Ben, Jukka and a bit myself * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical University in Prague and Volkswagen Group Research * minstrel VHT work from Karl * more CSA work from Luca * WMM admission control support in mac80211 (myself) * various smaller fixes, spelling corrections, and minor API additions" For the Bluetooth bits, Johan says: "Here's the first bluetooth-next pull request for 3.19. The vast majority of patches are for ieee802154 from Alexander Aring with various fixes and cleanups. There are also several LE/SMP fixes as well as improved support for handling LE devices that have lost their pairing information (the patches from Alfonso). Jukka provides a couple of stability fixes for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers we have one new USB ID for an Acer controller as well as a reset handling fix for H5." For the Atheros bits, Kalle says: "Major changes are: o ethtool support (Ben) o print dev string prefix with debug hex buffers dump (Michal) o debugfs file to read calibration data from the firmware verification purposes (me) o fix fw_stats debugfs file, now results are more reliable (Michal) o firmware crash counters via debugfs (Ben&me) o various tracing points to debug firmware (Rajkumar) o make it possible to provide firmware calibration data via a file (me) And we have quite a lot of smaller fixes and clean up." For the iwlwifi bits, Emmanuel says: "The big new thing here is netdetect which allows the firmware to wake up the platform when a specific network is detected. Along with that I have fixes for d3 operation. The usual amount of rate scaling stuff - we now support STBC. The other commit that stands out is Johannes's work on devcoredump. He basically starts to use the standard infrastructure he built." Along with that are the usual sort of updates and such for ath9k, brcmfmac, wil6210, and a handful of other bits here and there... Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 14:34:59 -05:00
David S. Miller	e344458fc0	Merge branch 'raw_probe_proto_opt' Herbert Xu says: ==================== ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice This series rewrites the function raw_probe_proto_opt in a more readable fasion, and then fixes the long-standing bug where we read the probed bytes twice which means that what we're using to probe may in fact be invalid. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 14:25:49 -05:00
Herbert Xu	c008ba5bdc	ipv4: Avoid reading user iov twice after raw_probe_proto_opt Ever since raw_probe_proto_opt was added it had the problem of causing the user iov to be read twice, once during the probe for the protocol header and once again in ip_append_data. This is a potential security problem since it means that whatever we're probing may be invalid. This patch plugs the hole by firstly advancing the iov so we don't read the same spot again, and secondly saving what we read the first time around for use by ip_append_data. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 14:25:35 -05:00
Herbert Xu	32b5913a93	ipv4: Use standard iovec primitive in raw_probe_proto_opt The function raw_probe_proto_opt tries to extract the first two bytes from the user input in order to seed the IPsec lookup for ICMP packets. In doing so it's processing iovec by hand and overcomplicating things. This patch replaces the manual iovec processing with a call to memcpy_fromiovecend. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 14:25:35 -05:00
David S. Miller	10b450cbbc	Merge branch 'cxgb4-net' Hariprasad Shenai says: ==================== cxgb4/cxgb4vf: Misc. fixes for cxgb4vf For T5 use Packing and Padding Boundaries for SGE DMA transfers, move fl_starve_thres to adpater structure, since they are different for each adapter. The cxgb4vf driver's Free List Starvation Threshold needs to be larger than the SGE's Egress Congestion Threshold or we'll end up in a mutual stall where the driver waits for Ingress Packets to drive replacing Free List Pointers and the SGE waits for Free List Pointers before pushing Ingress Packets to the host. The patches series is created against 'net' tree. And includes patches on cxgb4 and cxgb4vf driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 14:15:09 -05:00
Hariprasad Shenai	50d21a662d	cxgb4vf: FL Starvation Threshold needs to be larger than the SGE's Egress Congestion Threshold Free List Starvation Threshold needs to be larger than the SGE's Egress Congestion Threshold or we'll end up in a mutual stall where the driver waits for Ingress Packets to drive replacing Free List Pointers and the SGE waits for Free List Pointers before pushing Ingress Packets to the host. Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 14:15:03 -05:00

... 3 4 5 6 7 ...

482521 Commits