When considering whether to set RTNH_F_OFFLOAD flag on an IPv6 route,
mlxsw_sp_fib6_entry_offload_set() looks up the mlxsw_sp_nexthop
corresponding to a given route, and decides based on whether the next
hop's offloaded flag was set. When looking for the matching next hop, it
also takes into account the device of the route, which must match next
hop's RIF.
IPIP next hops however hitherto didn't set the RIF. As a result, IPv6
routes forwarding traffic to IP-in-IP netdevices are never marked as
offloaded, even when they actually are.
Thus track RIF of IPIP next hops the same way as that of ETHERNET next
hops.
Fixes: 8f28a30976 ("mlxsw: spectrum_router: Support IPv6 overlay encap")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating a new RIF, bumping RIF count of the containing VR is the
last thing to be done. Symmetrically, when destroying a RIF, RIF count
is first dropped and only then the rest of the cleanup proceeds.
That's a problem for loopback RIFs. Those hold two VR references: one
for overlay and one for underlay. mlxsw_sp_rif_destroy() releases the
overlay one, and the deconfigure() callback the underlay one. But if
both overlay and underlay are the same, and if there are no other
artifacts holding the VR alive, this put actually destroys the VR. Later
on, when mlxsw_sp_rif_destroy() calls mlxsw_sp_vr_put() for the same VR,
the VR will already have been released and the kernel crashes with NULL
pointer dereference.
The underlying problem is that the RIF under destruction ends up
referencing the overlay VR much longer than it claims: all the way until
the call to mlxsw_sp_vr_put(). So line up the reference counting
properly to reflect this. Make corresponding changes in
mlxsw_sp_rif_create() as well for symmetry.
Fixes: 6ddb7426a7 ("mlxsw: spectrum_router: Introduce loopback RIFs")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The usx2y driver allocates the stream read/write buffers in continuous
pages depending on the stream setup, and this may spew the kernel
warning messages with a stack trace like:
WARNING: CPU: 1 PID: 1846 at mm/page_alloc.c:3883
__alloc_pages_slowpath+0x1ef2/0x2d70
Modules linked in:
CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
....
It may confuse user as if it were any serious error, although this is
no fatal error and the driver handles the error case gracefully.
Since the driver has already some sanity check of the given size (128
and 256 pages), it can't pass any crazy value. So it's merely page
fragmentation.
This patch adds __GFP_NOWARN to each caller for suppressing such
kernel warnings. The original issue was spotted by syzkaller.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
previous commit 5d37ca14 "ceph: send LSSNAP request to auth mds
of directory inode" is buggy. It makes __choose_mds() choose mds
base on hash of '.snap' dentry.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
commit 3ae0bebc "ceph: queue cap snap only when snap realm's
context changes" introduced a regression: we may not call
queue_realm_cap_snaps() for newly created snap realm. This
regression allows unflushed snapshot data to be overwritten.
Link: http://tracker.ceph.com/issues/21483
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently data_abort_decode() dumps the ISS field as a decimal value
with a '0x' prefix, which is somewhat misleading.
Fix it to print as hexadecimal, as was intended.
Fixes: 1f9b8936f3 ("arm64: Decode information from ESR upon mem faults")
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This reverts commit 275353bb68 to fix a regression which can abort
'alsactl' program in alsa-utils due to assertion in alsa-lib.
alsactl: control.c:2513: snd_ctl_elem_value_get_integer: Assertion `idx < sizeof(obj->value.integer.value) / sizeof(obj->value.integer.value[0])' failed.
alsactl: control.c:2976: snd_ctl_elem_value_get_integer: Assertion `idx < ARRAY_SIZE(obj->value.integer.value)' failed.
This commit is a band-aid. In a point of usage of ALSA control interface,
the drivers still bring an issue that they prevent userspace applications
to have a consistent way to parse each levels of the dimension information
via ALSA control interface.
Let me investigate this issue. Current implementation of the drivers
have three control element sets with dimension information:
* 'Monitor Mixer Volume' (type: integer)
* 'VMixer Volume' (type: integer)
* 'VU-meters' (type: boolean)
Although the number of elements named as 'Monitor Mixer Volume' differs
depending on drivers in this group, it can be calculated by macros
defined by each driver (= (BX_NUM - BX_ANALOG_IN) * BX_ANALOG_IN). Each
of the elements has one member for value and has dimension information
with 2 levels (= BX_ANALOG_IN * (BX_NUM - BX_ANALOG_IN)). For these
elements, userspace applications are expected to handle the dimension
information so that all of the elements construct a matrix where the
number of rows and columns are represented by the dimension information.
The same way is applied to elements named as 'VMixer Volume'. The number
of these elements can also be calculated by macros defined by each
drivers (= PX_ANALOG_IN * BX_ANALOG_IN). Each of the element has one
member for value and has dimension information with 2 levels
(= BX_ANALOG_IN * PX_ANALOG_IN). All of the elements construct a matrix
with the dimension information.
An element named as 'VU-meters' gets a different way in a point of
dimension information. The element includes 96 members for value. The
element has dimension information with 3 levels (= 3 or 2 * 16 * 2). For
this element, userspace applications are expected to handle the dimension
information so that all of the members for value construct a matrix
where the number of rows and columns are represented by the dimension
information. This is different from the way for the former.
As a summary, the drivers were not designed to produce a consistent way to
parse the dimension information. This makes it hard for general userspace
applications such as amixer to parse the information by a consistent way,
and actually no userspace applications except for 'echomixer' utilize the
dimension information. Additionally, no drivers excluding this group use
the information.
The reverted commit was written based on the latter way. A commit
860c1994a7 ('ALSA: control: add dimension validator for userspace
elements') is written based on the latter way, too. The patch should be
reconsider too in the same time to re-define a consistent way to parse the
dimension information.
Reported-by: Mark Hills <mark@xwax.org>
Reported-by: S. Christian Collins <s.chriscollins@gmail.com>
Fixes: 275353bb68 ('ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members')
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add descriptive info to prompt string so that someone can know what
a Retrode is. Drop an unneeded blank line.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Bastien Nocera <hadess@hadess.net>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
There is a missing comment in struct hid_ll_driver. So, add it.
Signed-off-by: Jaejoong Kim <climbbb.kim@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This reverts commit fcaa4a07d2.
As noted by Masaki [1], 0x120A + trackpoint will not be used in mass
production machines, so remove the ID accordingly.
[1] http://www.spinics.net/lists/linux-input/msg53222.html
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We should not try to bring HID device out of full power state before
calling hid_hw_close(), so that transport driver operates on powered up
device (making this inverse of the opening sequence).
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Benson Leung <bleung@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The wacom_get_hdev_data function is used to find and return a reference to
the "other half" of a Wacom device (i.e., the touch device associated with
a pen, or vice-versa). To ensure these references are properly accounted
for, the function is supposed to automatically increment the refcount before
returning. This was not done, however, for devices which have pen & touch
on different interfaces of the same USB device. This can lead to a WARNING
("refcount_t: underflow; use-after-free") when removing the module or device
as we call kref_put() more times than kref_get(). Triggering an "actual" use-
after-free would be difficult since both devices will disappear nearly-
simultaneously. To silence this warning and prevent the potential error, we
need to increment the refcount for all cases within wacom_get_hdev_data.
Fixes: 41372d5d40 ("HID: wacom: Augment 'oVid' and 'oPid' with heuristics for HID_GENERIC")
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The driver strength selection is missed and required when selecting
hs400es. So, It is added here.
Fixes: 81ac2af657 ("mmc: core: implement enhanced strobe support")
Cc: stable@vger.kernel.org
Signed-off-by: Hankyung Yu <hankyung.yu@lge.com>
Signed-off-by: Chanho Min <chanho.min@lge.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
If this sanity check fails, we must free 'rss_indir'. Otherwise there is a
memory leak.
'goto err' as done in the other error handling paths to fix it.
Fixes: 46a3df9f97 ("net: hns3: Fix for setting rss_size incorrectly")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Armada 7K/8K we need to explicitly enable the bus clock. The bus clock
is optional because not all the SoCs need them but at least for Armada
7K/8K it is actually mandatory.
The binding documentation is updating accordingly.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This linksys dongle by default comes up in cdc_ether mode.
This patch allows r8152 to claim the device:
Bus 002 Device 002: ID 13b1:0041 Linksys
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The l2tp_eth module crashes if its netlink callbacks are run when the
pernet data aren't initialised.
We should normally register_pernet_device() before the genl callbacks.
However, the pernet data only maintain a list of l2tpeth interfaces,
and this list is never used. So let's just drop pernet handling
instead.
Fixes: d9e31d17ce ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
ip_gre: a bunch of fixes for erspan
This patchset is to fix some issues that could cause 0 or low
performance, and even unexpected truncated packets on erspan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch 'ip_gre: ipgre_tap device should keep dst' fixed
the issue ipgre_tap dev mtu couldn't be updated in tx path.
The same fix is needed for erspan as well.
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to __gre_tunnel_init, tunnel->hlen should be set as the
headers' length between inner packet and outer iphdr.
It would be used especially to calculate a proper mtu when updating
mtu in tnl_update_pmtu. Now without setting it, a bigger mtu value
than expected would be updated, which hurts performance a lot.
This patch is to fix it by setting tunnel->hlen with:
tunnel->tun_hlen + tunnel->encap_hlen + sizeof(struct erspanhdr)
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a ARPHRD_ETHER device, skb->len in erspan_xmit is the length
of the whole ether packet. So before checking if a packet size
exceeds the mtu, skb->len should subtract dev->hard_header_len.
Otherwise, all packets with max size according to mtu would be
trimmed to be truncated packet.
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erspan only uses the first 10 bits of session_id as the key to look
up the tunnel. But in erspan_rcv, it missed 'session_id & ID_MASK'
when getting the key from session_id.
If any other flag is also set in session_id in a packet, it would
fail to find the tunnel due to incorrect key in erspan_rcv.
This patch is to add 'session_id & ID_MASK' there and also remove
the unnecessary variable session_id.
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock is being initialized and then being almost immediately updated
hence the initialized value is not being used and is redundant. Remove
the initialization. Cleans up clang warning:
warning: Value stored to 'sock' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Michael Sterrett reports a NULL pointer dereference on NFSv3 mounts when
CONFIG_NFS_V4 is not set because the NFS UOC rpc_wait_queue has not been
initialized. Move the initialization of the queue out of the CONFIG_NFS_V4
conditional setion.
Fixes: 7d6ddf88c4 ("NFS: Add an iocounter wait function for async RPC tasks")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
nfs_idmap_get_desc() can't actually return zero. But if it did then
we would return ERR_PTR(0) which is NULL and the caller,
nfs_idmap_get_key(), doesn't expect that so it leads to a NULL pointer
dereference.
I've cleaned this up by changing the "<=" to "<" so it's more clear that
we don't return ERR_PTR(0).
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The units of RPC_MAX_AUTH_SIZE is bytes, not 4-byte words. This causes
the client to request a larger-than-necessary session replay slot size.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Pull x86 fixes from Thomas Gleixner:
"This contains the following fixes and improvements:
- Avoid dereferencing an unprotected VMA pointer in the fault signal
generation code
- Fix inline asm call constraints for GCC 4.4
- Use existing register variable to retrieve the stack pointer
instead of forcing the compiler to create another indirect access
which results in excessive extra 'mov %rsp, %<dst>' instructions
- Disable branch profiling for the memory encryption code to prevent
an early boot crash
- Fix a sparse warning caused by casting the __user annotation in
__get_user_asm_u64() away
- Fix an off by one error in the loop termination of the error patch
in the x86 sysfs init code
- Add missing CPU IDs to various Intel specific drivers to enable the
functionality on recent hardware
- More (init) constification in the numachip code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Use register variable to get stack pointer value
x86/mm: Disable branch profiling in mem_encrypt.c
x86/asm: Fix inline asm call constraints for GCC 4.4
perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
perf/x86/intel/rapl: Add missing CPU IDs
perf/x86/msr: Add missing CPU IDs
perf/x86/intel/cstate: Add missing CPU IDs
x86: Don't cast away the __user in __get_user_asm_u64()
x86/sysfs: Fix off-by-one error in loop termination
x86/mm: Fix fault error path using unsafe vma pointer
x86/numachip: Add const and __initconst to numachip2_clockevent
Pull timer fixes from Thomas Gleixner:
"This adds a new timer wheel function which is required for the
conversion of the timer callback function from the 'unsigned long
data' argument to 'struct timer_list *timer'. This conversion has two
benefits:
1) It makes struct timer_list smaller
2) Many callers hand in a pointer to the timer or to the structure
containing the timer, which happens via type casting both at setup
and in the callback. This change gets rid of the typecasts.
Once the conversion is complete, which is planned for 4.15, the old
setup function and the intermediate typecast in the new setup function
go away along with the data field in struct timer_list.
Merging this now into mainline allows a smooth queueing of the actual
conversion in the affected maintainer trees without creating
dependencies"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
um/time: Fixup namespace collision
timer: Prepare to change timer callback argument type
Pull smp/hotplug fixes from Thomas Gleixner:
"This addresses the fallout of the new lockdep mechanism which covers
completions in the CPU hotplug code.
The lockdep splats are false positives, but there is no way to
annotate that reliably. The solution is to split the completions for
CPU up and down, which requires some reshuffling of the failure
rollback handling as well"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp/hotplug: Hotplug state fail injection
smp/hotplug: Differentiate the AP completion between up and down
smp/hotplug: Differentiate the AP-work lockdep class between up and down
smp/hotplug: Callback vs state-machine consistency
smp/hotplug: Rewrite AP state machine core
smp/hotplug: Allow external multi-instance rollback
smp/hotplug: Add state diagram
Pull scheduler fixes from Thomas Gleixner:
"The scheduler pull request comes with the following updates:
- Prevent a divide by zero issue by validating the input value of
sysctl_sched_time_avg
- Make task state printing consistent all over the place and have
explicit state characters for IDLE and PARKED so they wont be
displayed as 'D' state which confuses tools"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/sysctl: Check user input value of sysctl_sched_time_avg
sched/debug: Add explicit TASK_PARKED printing
sched/debug: Ignore TASK_IDLE for SysRq-W
sched/debug: Add explicit TASK_IDLE printing
sched/tracing: Use common task-state helpers
sched/tracing: Fix trace_sched_switch task-state printing
sched/debug: Remove unused variable
sched/debug: Convert TASK_state to hex
sched/debug: Implement consistent task-state printing
Pull perf fixes from Thomas Gleixner:
- Prevent a division by zero in the perf aux buffer handling
- Sync kernel headers with perf tool headers
- Fix a build failure in the syscalltbl code
- Make the debug messages of perf report --call-graph work correctly
- Make sure that all required perf files are in the MANIFEST for
container builds
- Fix the atrr.exclude kernel handling so it respects the
perf_event_paranoid and the user permissions
- Make perf test on s390x work correctly
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/aux: Only update ->aux_wakeup in non-overwrite mode
perf test: Fix vmlinux failure on s390x part 2
perf test: Fix vmlinux failure on s390x
perf tools: Fix syscalltbl build failure
perf report: Fix debug messages with --call-graph option
perf evsel: Fix attr.exclude_kernel setting for default cycles:p
tools include: Sync kernel ABI headers with tooling headers
perf tools: Get all of tools/{arch,include}/ in the MANIFEST
Pull locking fixes from Thomas Gleixner:
"Two fixes for locking:
- Plug a hole the pi_stat->owner serialization which was changed
recently and failed to fixup two usage sites.
- Prevent reordering of the rwsem_has_spinner() check vs the
decrement of rwsem count in up_write() which causes a missed
wakeup"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem-xadd: Fix missed wakeup due to reordering of load
futex: Fix pi_state->owner serialization
Pull irq fixes from Thomas Gleixner:
- Add a missing NULL pointer check in free_irq()
- Fix a memory leak/memory corruption in the generic irq chip
- Add missing rcu annotations for radix tree access
- Use ffs instead of fls when extracting data from a chip register in
the MIPS GIC irq driver
- Fix the unmasking of IPI interrupts in the MIPS GIC driver so they
end up at the target CPU and not at CPU0
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/generic-chip: Don't replace domain's name
irqdomain: Add __rcu annotations to radix tree accessors
irqchip/mips-gic: Use effective affinity to unmask
irqchip/mips-gic: Fix shifts to extract register fields
genirq: Check __free_irq() return value for NULL
Pull objtool fixes from Thomas Gleixner:
"Two small fixes for objtool:
- Support frame pointer setup via 'lea (%rsp), %rbp' which was not
yet supported and caused build warnings
- Disable unreacahble warnings for GCC4.4 and older to avoid false
positives caused by the compiler itself"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Support unoptimized frame pointer setup
objtool: Skip unreachable warnings for GCC 4.4 and older
Commit 2ca492e22c has moved the call to 'kfifo_alloc()' from after the
main 'if' statement to before it.
But it has not updated the error handling paths accordingly.
Fix all that:
- if 'kfifo_alloc()' fails we can return directly
- direct returns after 'kfifo_alloc()' must now go to 'out_mbox_free'
- 'goto out_mbox_free' must be replaced by 'goto out', otherwise the
'[pcc_]mbox_free_channel()' call will be missed.
Fixes: 2ca492e22c ("hwmon: (xgene) Fix crash when alarm occurs before driver probe")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
"uuid" must be invisible if both ns->uuid and ns->nguid are unset,
not if either one is.
Fixes: d934f9848a "nvme: provide UUID value to userspace"
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In commit e3a77561e7 ("tipc: split up function tipc_msg_eval()"),
we have updated the function tipc_msg_lookup_dest() to set the error
codes to negative values at destination lookup failures. Thus when
the function sets the error code to -TIPC_ERR_NO_NAME, its inserted
into the 4 bit error field of the message header as 0xf instead of
TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code.
In this commit, we set only positive error code.
Fixes: e3a77561e7 ("tipc: split up function tipc_msg_eval()")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
udp: fix early demux for mcast packets
Currently the early demux callbacks do not perform source address validation.
This is not an issue for TCP or UDP unicast, where the early demux
is only allowed for connected sockets and the source address is validated
for the first packet and never change.
The UDP protocol currently allows early demux also for unconnected multicast
sockets, and we are not currently doing any validation for them, after that
the first packet lands on the socket: beyond ignoring the rp_filter - if
enabled - any kind of martian sources are also allowed.
This series addresses the issue allowing the early demux callback to return an
error code, and performing the proper checks for unconnected UDP multicast
sockets before leveraging the rx dst cache.
Alternatively we could disable the early demux for unconnected mcast sockets,
but that would cause relevant performance regression - around 50% - while with
this series, with full rp_filter in place, we keep the regression to a more
moderate level.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP early demux can leverate the rx dst cache even for
multicast unconnected sockets.
In such scenario the ipv4 source address is validated only on
the first packet in the given flow. After that, when we fetch
the dst entry from the socket rx cache, we stop enforcing
the rp_filter and we even start accepting any kind of martian
addresses.
Disabling the dst cache for unconnected multicast socket will
cause large performace regression, nearly reducing by half the
max ingress tput.
Instead we factor out a route helper to completely validate an
skb source address for multicast packets and we call it from
the UDP early demux for mcast packets landing on unconnected
sockets, after successful fetching the related cached dst entry.
This still gives a measurable, but limited performance
regression:
rp_filter = 0 rp_filter = 1
edmux disabled: 1182 Kpps 1127 Kpps
edmux before: 2238 Kpps 2238 Kpps
edmux after: 2037 Kpps 2019 Kpps
The above figures are on top of current net tree.
Applying the net-next commit 6e617de84e ("net: avoid a full
fib lookup when rp_filter is disabled.") the delta with
rp_filter == 0 will decrease even more.
Fixes: 421b3885bf ("udp: ipv4: Add udp early demux")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently no error is emitted, but this infrastructure will
used by the next patch to allow source address validation
for mcast sockets.
Since early demux can do a route lookup and an ipv4 route
lookup can return an error code this is consistent with the
current ipv4 route infrastructure.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
device, like ip6gre_tap tunnel, for which it should also subtract ether
header to get the correct mtu.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch 'ip_gre: ipgre_tap device should keep dst' fixed
a issue that ipgre_tap mtu couldn't be updated in tx path.
The same fix is needed for ip6gre_tap as well.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without keeping dst, the tunnel will not update any mtu/pmtu info,
since it does not have a dst on the skb.
Reproducer:
client(ipgre_tap1 - eth1) <-----> (eth1 - ipgre_tap1)server
After reducing eth1's mtu on client, then perforamnce became 0.
This patch is to netif_keep_dst in gre_tap_init, as ipgre does.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eight mostly minor fixes for recently discovered issues in drivers.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZz2OGAAoJEAVr7HOZEZN4NTQQAL1avebL1Abau5ODtGTciJrZ
I4QqUuJ0PGo9AKWIA/s34cY7kdPt2qhIoUJY0bLKdv8QtR5MDqfJ6c+2ianKenWm
EkDKYBQ2csBqzH9VN0tEIPQhvLr7oxvCgEDjfvusxWoUX4AgSv2bMaxIpgs4GUxS
hOH4fg/6A0HuhP/HjtYfd89DBdhhKOPU6oRx6YLF4ctlfZxAcALsvVjNAGaJzZX8
Bpv2K/xWOwb/UghjJvxv1wYZ+AL0BHAFVZilBFpyX+ClRhmU9d9SVYs43CbwSu+Y
41qIAYLJXrls6CSWJMTEo/+mhbPMQBS6Q3wSewOaNZXkx4comjJHpx/k2HsiVk7K
ObN9eIXNzyby132pIIHc9ZRSpJRTr0/jjb9FetneZlLHaNabtIf67JA0j4fIoCEh
Qb73uR030mCNvQ+xvK8DxF9UE1zQXnXiJWoIH9NrGtFtknQOmFlfFfmo/gMDI8w6
aYkxuHdqn2oFjKDuZOQ4zYa8ptcZAAml64gj2YiNiwgNosEpt9UMJ5WRknrGJ7po
jHohzKyKkSykjOxgOL1Noh5d7AoWPtJ1Qbg+Leyg1WLnRGHTgAYYmv1LabOdMhbM
SIMhNRBFQunCBKTyes/RB2Nykl1OXipnxIaRgSrRhsEiTOVutmyy7jr6BbZSB2vY
XBpadk38SGSsZva+Sfk0
=v2G5
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Eight mostly minor fixes for recently discovered issues in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ILLEGAL REQUEST + ASC==27 => target failure
scsi: aacraid: Add a small delay after IOP reset
scsi: scsi_transport_fc: Also check for NOTPRESENT in fc_remote_port_add()
scsi: scsi_transport_fc: set scsi_target_id upon rescan
scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
scsi: aacraid: error: testing array offset 'bus' after use
scsi: lpfc: Don't return internal MBXERR_ERROR code from probe function
scsi: aacraid: Fix 2T+ drives on SmartIOC-2000
Drivers that use the start method for netlink dumping rely on dumpit not
being called if start fails. For example, ila_xlat.c allocates memory
and assigns it to cb->args[0] in its start() function. It might fail to
do that and return -ENOMEM instead. However, even when returning an
error, dumpit will be called, which, in the example above, quickly
dereferences the memory in cb->args[0], which will OOPS the kernel. This
is but one example of how this goes wrong.
Since start() has always been a function with an int return type, it
therefore makes sense to use it properly, rather than ignoring it. This
patch thus returns early and does not call dumpit() when start() fails.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newly discovered species of fujitsu laptops break some assumptions about
ACPI device pairings.
fujitsu-laptop:
- Don't oops when FUJ02E3 is not presnt
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZzv7uAAoJEKbMaAwKp364tqUH/iniXL3ZeObv+l0zK/QaPI0y
TR9fEHm73/jtScVokoM79Py+G7c/xP9OEz4yHxD2kQphA+ah3Z4O6jtxDB2pMZlP
W5JQrIwcuWgvh3xEMGSJ4gmvYU7vi5AKm9PGHEydwlQdhDPfIwQhVOqDNnpVIR4E
dU3ydgLk9+nIRhgsfNJi2UyzrwDlKgtOXHX9l4EXP5tQbxxVw2ET3BFtoq2gUkBs
u6/WV0FH9jkxb56Wn+Mr2vAy4M2X2Mxe+88vA8DvE8H1gzMPGxFIxP4/036o+2M0
+df4txjOVy5J4w+SuAcKAv7Kp3m50nnQh4LyKjvfWmnMKCCzkXobNbZwnxIWvKo=
=tsgS
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v4.14-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform drivers fix from Darren Hart:
"Newly discovered species of fujitsu laptops break some assumptions
about ACPI device pairings.
fujitsu-laptop: Don't oops when FUJ02E3 is not present"
* tag 'platform-drivers-x86-v4.14-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: fujitsu-laptop: Don't oops when FUJ02E3 is not presnt