This fixes a bug in madvise() where if you'd try to soft offline a
hugepage via madvise(), while walking the address range you'd end up,
using the wrong page offset due to attempting to get the compound order
of a former but presently not compound page, due to dissolving the huge
page (since commit c3114a84f7: "mm: hugetlb: soft-offline: dissolve
source hugepage after successful migration").
As a result I ended up with all my free pages except one being offlined.
Link: http://lkml.kernel.org/r/20170912204306.GA12053@gmail.com
Fixes: c3114a84f7 ("mm: hugetlb: soft-offline: dissolve source hugepage after successful migration")
Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In this place mm is unlocked, so vmas or list may change. Down read
mmap_sem to protect them from modifications.
Link: http://lkml.kernel.org/r/150512788393.10691.8868381099691121308.stgit@localhost.localdomain
Fixes: e86c59b1b1 ("mm/ksm: improve deduplication of zero pages with colouring")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's a typo in recent change of VM_MPX definition. We want it to be
VM_HIGH_ARCH_4, not VM_HIGH_ARCH_BIT_4.
This bug does cause visible regressions. In arch_vma_name the vmflags
are tested against VM_MPX. With the incorrect value of VM_MPX, a number
of vmas (such as the stack) test positive and end up being marked as
"[mpx]" in /proc/N/maps instead of their correct names.
This confuses tools like rr which expect to be able to find familiar
vmas.
Fixes: df3735c5b4 ("x86,mpx: make mpx depend on x86-64 to free up VMA flag")
Link: http://lkml.kernel.org/r/20170918140253.36856-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kyle Huey <me@kylehuey.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here are some of the more spelling mistakes and typos that I've found
while fixing up spelling mistakes in kernel error message text over the
past eight weeks.
[akpm@linux-foundation.org: s/|/||/, per Joe]
Link: http://lkml.kernel.org/r/20170919090818.5989-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Joe Perches <joe@perches.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This parameter is named kp, so the documentation should use that.
Fixes: 9b473de872 ("param: Fix duplicate module prefixes")
Link: http://lkml.kernel.org/r/20170919142656.64aea59e@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The build of alpha allmodconfig is giving error:
arch/alpha/include/asm/mmu_context.h: In function 'ev5_switch_mm':
arch/alpha/include/asm/mmu_context.h:160:2: error:
implicit declaration of function 'task_thread_info';
did you mean 'init_thread_info'? [-Werror=implicit-function-declaration]
The file 'mmu_context.h' needed an extra header file.
Link: http://lkml.kernel.org/r/1505668810-7497-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcelo Ricardo Leitner says:
====================
Introduce SCTP Stream Schedulers
This patchset introduces the SCTP Stream Schedulers are defined by
https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
It provides 3 schedulers at the moment: FCFS, Priority and Round Robin.
The other 3, Round Robin per packet, Fair Capacity and Weighted Fair
Capacity will be added later. More specifically, WFQ is required by
WebRTC Datachannels.
The draft also defines the idata chunk, allowing a usermsg to be
interrupted by another piece of idata from another stream. This patchset
*doesn't* include it. It will be posted later by Xin Long. Its
integration with this patchset is very simple and it basically only
requires a tweak in sctp_sched_dequeue_done(), to ignore datamsg
boundaries.
The first 5 patches are a preparation for the next ones. The most
relevant patches are the 4th and 6th ones. More details are available on
each patch.
v2: changelog update on patch 3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces RFC Draft ndata section 3.2 Priority Based
Scheduler (SCTP_SS_RR).
Works by maintaining a list of enqueued streams and tracking the last
one used to send data. When the datamsg is done, it switches to the next
stream.
See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces RFC Draft ndata section 3.4 Priority Based
Scheduler (SCTP_SS_PRIO).
It works by having a struct sctp_stream_priority for each priority
configured. This struct is then enlisted on a queue ordered per priority
if, and only if, there is a stream with data queued, so that dequeueing
is very straightforward: either finish current datamsg or simply dequeue
from the highest priority queued, which is the next stream pointed, and
that's it.
If there are multiple streams assigned with the same priority and with
data queued, it will do round robin amongst them while respecting
datamsgs boundaries (when not using idata chunks), to be reasonably
fair.
We intentionally don't maintain a list of priorities nor a list of all
streams with the same priority to save memory. The first would mean at
least 2 other pointers per priority (which, for 1000 priorities, that
can mean 16kB) and the second would also mean 2 other pointers but per
stream. As SCTP supports up to 65535 streams on a given asoc, that's
1MB. This impacts when giving a priority to some stream, as we have to
find out if the new priority is already being used and if we can free
the old one, and also when tearing down.
The new fields in struct sctp_stream_out_ext and sctp_stream are added
under a union because that memory is to be shared with other schedulers.
See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As defined per RFC Draft ndata Section 4.3.3, named as
SCTP_STREAM_SCHEDULER_VALUE.
See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As defined per RFC Draft ndata Section 4.3.2, named as
SCTP_STREAM_SCHEDULER.
See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the hooks necessary to do stream scheduling, as
per RFC Draft ndata. It also introduces the first scheduler, which is
what we do today but now factored out: first come first served (FCFS).
With stream scheduling now we have to track which chunk was enqueued on
which stream and be able to select another other than the in front of
the main outqueue. So we introduce a list on sctp_stream_out_ext
structure for this purpose.
We reuse sctp_chunk->transmitted_list space for the list above, as the
chunk cannot belong to the two lists at the same time. By using the
union in there, we can have distinct names for these moments.
sctp_sched_ops are the operations expected to be implemented by each
scheduler. The dequeueing is a bit particular to this implementation but
it is to match how we dequeue packets today. We first dequeue and then
check if it fits the packet and if not, we requeue it at head. Thus why
we don't have a peek operation but have dequeue_done instead, which is
called once the chunk can be safely considered as transmitted.
The check removed from sctp_outq_flush is now performed by
sctp_stream_outq_migrate, which is only called during assoc setup.
(sctp_sendmsg() also checks for it)
The only operation that is foreseen but not yet added here is a way to
signalize that a new packet is starting or that the packet is done, for
round robin scheduler per packet, but is intentionally left to the
patch that actually implements it.
Support for I-DATA chunks, also described in this RFC, with user message
interleaving is straightforward as it just requires the schedulers to
probe for the feature and ignore datamsg boundaries when dequeueing.
See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to fetch the stream number from a given chunk.
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the stream schedulers, sctp_stream_out will become too big to be
allocated by kmalloc and as we need to allocate with BH disabled, we
cannot use __vmalloc in sctp_stream_init().
This patch moves out the stats from sctp_stream_out to
sctp_stream_out_ext, which will be allocated only when the application
tries to sendmsg something on it.
Just the introduction of sctp_stream_out_ext would already fix the issue
described above by splitting the allocation in two. Moving the stats
to it also reduces the pressure on the allocator as we will ask for less
memory atomically when creating the socket and we will use GFP_KERNEL
later.
Then, for stream schedulers, we will just use sctp_stream_out_ext.
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is 1 place allocating it and another reallocating. Move such
procedures to a common function.
v2: updated changelog
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is 1 place allocating it and 2 other reallocating. Move such
procedures to a common function.
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As SCTP supports up to 65535 streams, that can lead to very large
allocations in sctp_stream_init(). As Xin Long noticed, systems with
small amounts of memory are more prone to not have enough memory and
dump warnings on dmesg initiated by user actions. Thus, silence them.
Also, if the reallocation of stream->out is not necessary, skip it and
keep the memory we already have.
Reported-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2017-10-03
This series contains updates to fm10k only.
Jake provides majority of the changes in this series, starting with using
fm10k_prepare_for_reset() if we lose PCIe link. Before we would detach
the device and close the netdev, which left a lot of items still active,
such as the Tx/Rx resources. This could cause problems where register
reads would return potentially invalid values and would result in unknown
driver behavior, so call fm10k_prepare_for_reset() much like we do for
suspend/resume cycles. This will attempt to shutdown as much as possible
to prevent possible issues. Then replaced the PCI specific legacy power
management hooks with the new generic power management hooks for both
suspend and hibernate. Introduced a workqueue item which monitors a
queue of MAC and VLAN requests since a large number of MAC address or
VLAN updates at once can overload the mailbox with too many messages at
once. Fixed a cppcheck warning by properly declaring the min_rate and
max_rate variables in the declaration and definition for .ndo_set_vf_bw,
rather than using "unused" for the minimum rates.
Joe Perches fixes the backward logic when using net_ratelimit().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- bpf prog_array just like all other types of bpf array accepts 32-bit index.
Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent
The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40 in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 96eabe7a40 ("bpf: Allow selecting numa node during map creation")
Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device alias can be set by either rtnetlink (rtnl is held) or sysfs.
rtnetlink hold the rtnl mutex, sysfs acquires it for this purpose.
Add an extra mutex for it and use rcu to protect concurrent accesses.
This allows the sysfs path to not take rtnl and would later allow
to not hold it when dumping ifalias.
Based on suggestion from Eric Dumazet.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add constants and callback functions for the dwmac on rk3128 soc.
As can be seen, the base structure is the same, only registers
and the bits in them moved slightly.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some NIC drivers don't have correct speed/duplex settings at the
time they send NETDEV_UP notification and that messes up the
bonding state. Especially 802.3ad mode which is very sensitive
to these settings. In the current implementation we invoke
bond_update_speed_duplex() when we receive NETDEV_UP, however,
ignore the return value. If the values we get are invalid
(UNKNOWN), then slave gets removed from the aggregator with
speed and duplex set to UNKNOWN while link is still marked as UP.
This patch fixes this scenario. Also 802.3ad mode is sensitive to
these conditions while other modes are not, so making sure that it
doesn't change the behavior for other modes.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Treat the ef/04/01 interface class/subclass/protocol combination used
by the Novatel Verizon USB730L (1410:9030) as a possible RNDIS
interface.
T: Bus=01 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#= 17 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 3
P: Vendor=1410 ProdID=9030 Rev=03.10
S: Manufacturer=Novatel Wireless
S: Product=MiFi USB730L
S: SerialNumber=0123456789ABCDEF
C: #Ifs= 3 Cfg#= 1 Atr=80 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host
I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host
I: If#= 2 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid
Once the network interface is brought up, the user just needs to run a
DHCP client to get IP address and routing setup.
As a side note, other Novatel Verizon USB730L models with the same
vid:pid end up exposing a standard ECM interface which doesn't require
any other kernel update to make it work.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cgroup fix from Tejun Heo:
"The recent migration code updates assumed that migrations always
execute from the top to the bottom once and didn't clean up internal
states after each migration round; however, cgroup_transfer_tasks()
repeats the inner steps multiple times and the garbage internal states
from the previous iteration led to OOPS.
Waiman fixed the bug by reinitializing the relevant states at the end
of each migration round"
* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Reinit cgroup_taskset structure before cgroup_migrate_execute() returns
We accidentally return success if the kmalloc_array() call fails.
Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw_afa_block_create() doesn't return error pointers, it returns NULL
on error.
Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function mt7530_phy_write is local to the source and does not need to
be in global scope, so make it static.
Cleans up sparse warnings:
symbol 'mt7530_phy_write' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions lan9303_mdio_phy_write and lan9303_mdio_phy_read are local
to the source and do not need to be in global scope, so make them static.
Cleans up sparse warnings:
symbol 'lan9303_mdio_phy_write' was not declared. Should it be static?
symbol 'lan9303_mdio_phy_read' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When RTM_GETSTATS was added the fields of its header struct were not all
initialized when returning the result thus leaking 4 bytes of information
to user-space per rtnl_fill_statsinfo call, so initialize them now. Thanks
to Alexander Potapenko for the detailed report and bisection.
Reported-by: Alexander Potapenko <glider@google.com>
Fixes: 10c9ead9f3 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: Add support for partial multicast route offload
Yotam says:
Previous patchset introduced support for offloading multicast MFC routes to
the Spectrum hardware. As described in that patchset, no partial offloading
is supported, i.e if a route has one output interface which is not a valid
offloadable device (e.g. pimreg device, dummy device, management NIC), the
route is trapped to the CPU and the forwarding is done in slow-path.
Add support for partial offloading of multicast routes, by letting the
hardware to forward the packet to all the in-hardware devices, while the
kernel ipmr module will continue forwarding to all other interfaces.
Similarly to the bridge, the kernel ipmr module will forward a marked
packet to an interface only if the interface has a different parent ID than
the packet's ingress interfaces.
The first patch introduces the offload_mr_fwd_mark skb field, which can be
used by offloading drivers to indicate that a packet had already gone
through multicast forwarding in hardware, similarly to the offload_fwd_mark
field that indicates that a packet had already gone through L2 forwarding
in hardware.
Patches 2 and 3 change the ipmr module to not forward packets that had
already been forwarded by the hardware, i.e. packets that are marked with
offload_mr_fwd_mark and the ingress VIF shares the same parent ID with the
egress VIF.
Patches 4, 5, 6 and 7 add the support in the mlxsw Spectrum driver for trap
and forward routes, while marking the trapped packets with the
offload_mr_fwd_mark.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the support of trap-and-forward route action in the multicast routing
offloading logic. A route will be set to trap-and-forward action if one (or
more) of its output interfaces is not offload-able, i.e. does not have a
valid Spectrum RIF.
This way, a route with mixed output VIFs list, which contains both
offload-able and un-offload-able devices can go through partial offloading
in hardware, and the rest will be done in the kernel ipmr module.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In addition to the current multicast route actions, which include trap
route action and a forward route action, add the trap-and-forward multicast
route action, and implement it in the multicast routing hardware logic.
To implement that, add a trap-and-forward ACL action as the last action in
the route flexible action set. The used trap is the ACL2 trap, which marks
the packets with offload_mr_forward_mark, to prevent the packet from being
forwarded again by the kernel.
Note: At that stage the offloading logic does not support trap-and-forward
multicast routes. This patch adds the support only in the hardware logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a multicast route is configured with trap-and-forward action, the
packets should be marked with skb->offload_mr_fwd_mark, in order to prevent
the packets from being forwarded again by the kernel ipmr module.
Due to this, it is not possible to use the already existing multicast trap
(MLXSW_TRAP_ID_ACL1) as the packet should be marked differently. Add the
MLXSW_TRAP_ID_ACL2 which is for trap-and-forward multicast routes, and set
the offload_mr_fwd_mark skb field in its handler.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use trap/discard flex action to implement trap and forward. The action will
later be used for multicast routing, as the multicast routing mechanism is
done using ACL flexible actions in Spectrum hardware. Using that action, it
will be possible to implement a trap-and-forward route.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the ipmr module to not forward packets if:
- The packet is marked with the offload_mr_fwd_mark, and
- Both input interface and output interface share the same parent ID.
This way, a packet can go through partial multicast forwarding in the
hardware, where it will be forwarded only to the devices that share the
same parent ID (AKA, reside inside the same hardware). The kernel will
forward the packet to all other interfaces.
To do this, add the ipmr_offload_forward helper, which per skb, ingress VIF
and egress VIF, returns whether the forwarding was offloaded to hardware.
The ipmr_queue_xmit frees the skb and does not forward it if the result is
a true value.
All the forwarding path code compiles out when the CONFIG_NET_SWITCHDEV is
not set.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to allow the ipmr module to do partial multicast forwarding
according to the device parent ID, add the device parent ID field to the
VIF struct. This way, the forwarding path can use the parent ID field
without invoking switchdev calls, which requires the RTNL lock.
When a new VIF is added, set the device parent ID field in it by invoking
the switchdev_port_attr_get call.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is
used to allow partial offloading of MFC multicast routes.
Switchdev drivers can offload MFC multicast routes to the hardware by
registering to the FIB notification chain. When one of the route output
interfaces is not offload-able, i.e. has different parent ID, the route
cannot be fully offloaded by the hardware. Examples to non-offload-able
devices are a management NIC, dummy device, pimreg device, etc.
Similar problem exists in the bridge module, as one bridge can hold
interfaces with different parent IDs. At the bridge, the problem is solved
by the offload_fwd_mark skb field.
Currently, when a route cannot go through full offload, the only solution
for a switchdev driver is not to offload it at all and let the packet go
through slow path.
Using the offload_mr_fwd_mark field, a driver can indicate that a packet
was already forwarded by hardware to all the devices with the same parent
ID as the input device. Further patches in this patch-set are going to
enhance ipmr to skip multicast forwarding to devices with the same parent
ID if a packets is marked with that field.
The reason why the already existing "offload_fwd_mark" bit cannot be used
is that a switchdev driver would want to make the distinction between a
packet that has already gone through L2 forwarding but did not go through
multicast forwarding, and a packet that has already gone through both L2
and multicast forwarding.
For example: when a packet is ingressing from a switchport enslaved to a
bridge, which is configured with multicast forwarding, the following
scenarios are possible:
- The packet can be trapped to the CPU due to exception while multicast
forwarding (for example, MTU error). In that case, it had already gone
through L2 forwarding in the hardware, thus A switchdev driver would
want to set the skb->offload_fwd_mark and not the
skb->offload_mr_fwd_mark.
- The packet can also be trapped due to a pimreg/dummy device used as one
of the output interfaces. In that case, it can go through both L2 and
(partial) multicast forwarding inside the hardware, thus a switchdev
driver would want to set both the skb->offload_fwd_mark and
skb->offload_mr_fwd_mark.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellaox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull percpu fixes from Tejun Heo:
"Rather important fixes this time.
- The new percpu area allocator had a subtle bug in how it iterates
the memory regions and could skip viable areas, which led to
allocation failures for module static percpu variables. Dennis
fixed the bug and another non-critical one in stat calculation.
- Mark noticed that the generic implementations of percpu local
atomic reads aren't properly protected against irqs and there's a
(slim) chance for split reads on some 32bit systems. Generic
implementations are updated to disable irq when read size is larger
than ulong size. This may have made some 32bit archs which can do
atomic local 64bit accesses generate sub-optimal code. We need to
find them out and implement arch-specific overrides"
* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: fix iteration to prevent skipping over block
percpu: fix starting offset for chunk statistics traversal
percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
We have lost a comment for minimum mtu value set for netdevice with
'commit d894be57ca ("ethernet: use net core MTU range checking in
more drivers"). Updating it accordingly.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull libata fixes from Tejun Heo:
"Nothing too interesting.
Arnd's gcc-7 warning fixes that slipped through the cracks for two
release cycles (my bad), and two minor low level driver updates"
* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: don't ignore result code of ahci_reset_controller()
ata_piix: Add Fujitsu-Siemens Lifebook S6120 to short cable IDs
ata: avoid gcc-7 warning in ata_timing_quantize
Here are a number of USB fixes for 4.14-rc4 to resolved reported issue.
There's a bunch of stuff in here based on the great work Andrey
Konovalov is doing in fuzzing the USB stack. Lots of bug fixes when
dealing with corrupted USB descriptors that we've never seen in "normal"
operation, but is now ensuring the stack is much more hardened overall.
There's also the usual XHCI and gadget driver fixes as well, and a build
error fix, and a few other minor things, full details in the shortlog.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWdN/yw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yl6pQCdGY+nPJhzj9EIeFj5QUpSuS4b1pYAoKrbNn+V
CMpg4iG1oXUtVL8jBbKa
=fVpl
-----END PGP SIGNATURE-----
Merge tag 'usb-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of USB fixes for 4.14-rc4 to resolved reported
issues.
There's a bunch of stuff in here based on the great work Andrey
Konovalov is doing in fuzzing the USB stack. Lots of bug fixes when
dealing with corrupted USB descriptors that we've never seen in
"normal" operation, but is now ensuring the stack is much more
hardened overall.
There's also the usual XHCI and gadget driver fixes as well, and a
build error fix, and a few other minor things, full details in the
shortlog.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (38 commits)
usb: dwc3: of-simple: Add compatible for Spreadtrum SC9860 platform
usb: gadget: udc: atmel: set vbus irqflags explicitly
usb: gadget: ffs: handle I/O completion in-order
usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction
usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe
usb: gadget: udc: renesas_usb3: Fix return value of usb3_write_pipe()
usb: gadget: udc: renesas_usb3: fix Pn_RAMMAP.Pn_MPKT value
usb: gadget: udc: renesas_usb3: fix for no-data control transfer
USB: dummy-hcd: Fix erroneous synchronization change
USB: dummy-hcd: fix infinite-loop resubmission bug
USB: dummy-hcd: fix connection failures (wrong speed)
USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse
USB: devio: Don't corrupt user memory
USB: devio: Prevent integer overflow in proc_do_submiturb()
USB: g_mass_storage: Fix deadlock when driver is unbound
USB: gadgetfs: Fix crash caused by inadequate synchronization
USB: gadgetfs: fix copy_to_user while holding spinlock
USB: uas: fix bug in handling of alternate settings
usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives
usb-storage: fix bogus hardware error messages for ATA pass-thru devices
...
Here are a small number (5) of patches for some reported TTY and serial
issues. Nothing major, a documentation update, timing fix, error
handling fix, name reporting fix, and a timeout issue resolved.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWdN+uw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykZmgCbBSJmwcbVhuhZ64Fx4OE0eprjOgoAoMLmHaT2
jTjQTxM/Gaz108t3o9rt
=5ve+
-----END PGP SIGNATURE-----
Merge tag 'tty-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are a small number (5) of patches for some reported TTY and
serial issues. Nothing major, a documentation update, timing fix,
error handling fix, name reporting fix, and a timeout issue resolved.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: sccnxp: Fix error handling in sccnxp_probe()
tty: serial: lpuart: avoid report NULL interrupt
serial: bcm63xx: fix timing issue.
mxser: fix timeout calculation for low rates
serial: sh-sci: document R8A77970 bindings
Here are some small staging/IIO driver fixes for 4.14-rc4
Most of these have been in my tree for a while due to travels, sorry for
the delay. They resolve a number of small issues reported by people,
mostly for the iio drivers. Nothing major in here, full details are in
the shortlog.
All have been linux-next for a few weeks with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWdN+KQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yluygCgneh7i/okOfsmt/p75eCA4ClWVLwAoIE7BZzt
1WdBcY/Zxv1ANIoY7ZTQ
=K+FX
-----END PGP SIGNATURE-----
Merge tag 'staging-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"Here are some small staging/IIO driver fixes for 4.14-rc4
Most of these have been in my tree for a while due to travels, sorry
for the delay. They resolve a number of small issues reported by
people, mostly for the iio drivers. Nothing major in here, full
details are in the shortlog.
All have been linux-next for a few weeks with no reported issues"
* tag 'staging-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack.
iio: core: Return error for failed read_reg
iio: ad7793: Fix the serial interface reset
iio: ad_sigma_delta: Implement a dedicated reset function
IIO: BME280: Updates to Humidity readings need ctrl_reg write!
iio: adc: mcp320x: Fix readout of negative voltages
iio: adc: mcp320x: Fix oops on module unload
iio: adc: stm32: fix bad error check on max_channels
iio: trigger: stm32-timer: fix a corner case to write preset
iio: trigger: stm32-timer: preset shouldn't be buffered
iio: adc: twl4030: Return an error if we can not enable the vusb3v1 regulator in 'twl4030_madc_probe()'
iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()'
iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()'
staging: rtl8723bs: avoid null pointer dereference on pmlmepriv
staging: rtl8723bs: add missing range check on id
staging: vchiq_2835_arm: Fix NULL ptr dereference in free_pagelist
staging: speakup: fix speakup-r empty line lockup
staging: pi433: Move limit check to switch default to kill warning
staging: r8822be: fix null pointer dereferences with a null driver_adapter
staging: mt29f_spinand: Enable the read ECC before program the page
...
We've had support for setting both a minimum and maximum bandwidth via
.ndo_set_vf_bw since commit 883a9ccbae ("fm10k: Add support for SR-IOV
to driver", 2014-09-20).
Likely because we do not support minimum rates, the declaration
mis-ordered the "unused" parameter, which causes warnings when analyzed
with cppcheck.
Fix this warning by properly declaring the min_rate and max_rate
variables in the declaration and definition (rather than using
"unused"). Also rename "rate" to max_rate so as to clarify that we only
support setting the maximum rate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Here are a few small fixes for 4.14-rc4.
The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
straggler made it in through some other tree (odds are, one of mine...)
So there's a simple removal of the last user, and then finally the macro
is removed from the tree.
There's a fix for old crazy udev instances that insist on reloading a
module when it is removed from the kernel due to the new uevents for
bind/unbind. This fixes the reported regression, hopefully some year in
the future we can drop the workaround, once users update to the latest
version, but I'm not holding my breath.
And then there's a build fix for a linker warning, and a buffer overflow
fix to match the PCI fixes you took through the PCI tree in the same
area.
All of these have been in linux-next for a few weeks while I've been
traveling, sorry for the delay.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWdN8qA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymLEgCfUSSBhxW04teEcPua4QygLv2omK0An2SRkpnY
28nn+D+AfeOByQImY8v+
=RQY+
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are a few small fixes for 4.14-rc4.
The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
straggler made it in through some other tree (odds are, one of
mine...) So there's a simple removal of the last user, and then
finally the macro is removed from the tree.
There's a fix for old crazy udev instances that insist on reloading a
module when it is removed from the kernel due to the new uevents for
bind/unbind. This fixes the reported regression, hopefully some year
in the future we can drop the workaround, once users update to the
latest version, but I'm not holding my breath.
And then there's a build fix for a linker warning, and a buffer
overflow fix to match the PCI fixes you took through the PCI tree in
the same area.
All of these have been in linux-next for a few weeks while I've been
traveling, sorry for the delay"
* tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: remove DRIVER_ATTR
fpga: altera-cvp: remove DRIVER_ATTR() usage
driver core: platform: Don't read past the end of "driver_override" buffer
base: arch_topology: fix section mismatch build warnings
driver core: suppress sending MODALIAS in UNBIND uevents
Don't hard code the function names in the diagnostic output when these
reset related routines fail. Instead, use %s and __func__ so that future
refactors don't need to change the print outs.
Additionally, while we are here, add missing function header comments
for the new reset_prepare and reset_done function handlers.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Correct the backward logic using !net_ratelimit()
Miscellanea:
o Add a blank line before the error return label
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now that we have a working MAC/VLAN queue for handling MAC/VLAN messages
from the netdev, replace the default handler for the VF<->PF messages.
This new handler is very similar to the default code, but uses the
MAC/VLAN queue instead of sending the message directly. Unfortunately we
can't easily re-use the default code, so we'll just replace the entire
function.
This ensures that a VF requesting a large number of VLANs or MAC
addresses does not start a reset cycle, as explained in the commit which
introduced the message queue.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ngai-mint Kwan <ngai-mint.kwan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>