Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct
xdp_buff. This brings xdp_return_frame and ndp_xdp_xmit in sync.
This builds towards changing the API further to become a bulk API,
because xdp_buff is not a queue-able object while xdp_frame is.
V4: Adjust for commit 59655a5b6c ("tuntap: XDP_TX can use native XDP")
V7: Adjust for commit d9314c474d ("i40e: add support for XDP_REDIRECT")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing API xdp_return_frame() to take struct xdp_frame as argument,
seems like a natural choice. But there are some subtle performance
details here that needs extra care, which is a deliberate choice.
When de-referencing xdp_frame on a remote CPU during DMA-TX
completion, result in the cache-line is change to "Shared"
state. Later when the page is reused for RX, then this xdp_frame
cache-line is written, which change the state to "Modified".
This situation already happens (naturally) for, virtio_net, tun and
cpumap as the xdp_frame pointer is the queued object. In tun and
cpumap, the ptr_ring is used for efficiently transferring cache-lines
(with pointers) between CPUs. Thus, the only option is to
de-referencing xdp_frame.
It is only the ixgbe driver that had an optimization, in which it can
avoid doing the de-reference of xdp_frame. The driver already have
TX-ring queue, which (in case of remote DMA-TX completion) have to be
transferred between CPUs anyhow. In this data area, we stored a
struct xdp_mem_info and a data pointer, which allowed us to avoid
de-referencing xdp_frame.
To compensate for this, a prefetchw is used for telling the cache
coherency protocol about our access pattern. My benchmarks show that
this prefetchw is enough to compensate the ixgbe driver.
V7: Adjust for commit d9314c474d ("i40e: add support for XDP_REDIRECT")
V8: Adjust for commit bd658dda42 ("net/mlx5e: Separate dma base address
and offset in dma_sync call")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch shows how it is possible to have both the driver local page
cache, which uses elevated refcnt for "catching"/avoiding SKB
put_page returns the page through the page allocator. And at the
same time, have pages getting returned to the page_pool from
ndp_xdp_xmit DMA completion.
The performance improvement for XDP_REDIRECT in this patch is really
good. Especially considering that (currently) the xdp_return_frame
API and page_pool_put_page() does per frame operations of both
rhashtable ID-lookup and locked return into (page_pool) ptr_ring.
(It is the plan to remove these per frame operation in a followup
patchset).
The benchmark performed was RX on mlx5 and XDP_REDIRECT out ixgbe,
with xdp_redirect_map (using devmap) . And the target/maximum
capability of ixgbe is 13Mpps (on this HW setup).
Before this patch for mlx5, XDP redirected frames were returned via
the page allocator. The single flow performance was 6Mpps, and if I
started two flows the collective performance drop to 4Mpps, because we
hit the page allocator lock (further negative scaling occurs).
Two test scenarios need to be covered, for xdp_return_frame API, which
is DMA-TX completion running on same-CPU or cross-CPU free/return.
Results were same-CPU=10Mpps, and cross-CPU=12Mpps. This is very
close to our 13Mpps max target.
The reason max target isn't reached in cross-CPU test, is likely due
to RX-ring DMA unmap/map overhead (which doesn't occur in ixgbe to
ixgbe testing). It is also planned to remove this unnecessary DMA
unmap in a later patchset
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes not return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
- Save a branch in mlx5e_page_release
- Correct page_pool size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ
V5: Updated patch desc
V8: Adjust for b0cedc844c ("net/mlx5e: Remove rq_headroom field from params")
V9:
- Adjust for 121e892754 ("net/mlx5e: Refactor RQ XDP_TX indication")
- Adjust for 73281b78a3 ("net/mlx5e: Derive Striding RQ size from MTU")
- Correct handling if page_pool_create fail for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ
V10: Req from Tariq
- Change pool_size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the IDA infrastructure for getting a cyclic increasing ID number,
that is used for keeping track of each registered allocator per
RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which
uses a radix tree, use a dynamic rhashtable, for creating ID to
pointer lookup table, because this is faster.
The problem that is being solved here is that, the xdp_rxq_info
pointer (stored in xdp_buff) cannot be used directly, as the
guaranteed lifetime is too short. The info is needed on a
(potentially) remote CPU during DMA-TX completion time . In an
xdp_frame the xdp_mem_info is stored, when it got converted from an
xdp_buff, which is sufficient for the simple page refcnt based recycle
schemes.
For more advanced allocators there is a need to store a pointer to the
registered allocator. Thus, there is a need to guard the lifetime or
validity of the allocator pointer, which is done through this
rhashtable ID map to pointer. The removal and validity of of the
allocator and helper struct xdp_mem_allocator is guarded by RCU. The
allocator will be created by the driver, and registered with
xdp_rxq_info_reg_mem_model().
It is up-to debate who is responsible for freeing the allocator
pointer or invoking the allocator destructor function. In any case,
this must happen via RCU freeing.
Use the IDA infrastructure for getting a cyclic increasing ID number,
that is used for keeping track of each registered allocator per
RX-queue xdp_rxq_info.
V4: Per req of Jason Wang
- Use xdp_rxq_info_reg_mem_model() in all drivers implementing
XDP_REDIRECT, even-though it's not strictly necessary when
allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero).
V6: Per req of Alex Duyck
- Introduce rhashtable_lookup() call in later patch
V8: Address sparse should be static warnings (from kbuild test robot)
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now all the users of ndo_xdp_xmit have been converted to use xdp_return_frame.
This enable a different memory model, thus activating another code path
in the xdp_return_frame API.
V2: Fixed issues pointed out by Tariq.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also convert driver i40e, which very recently got XDP_REDIRECT support
in commit d9314c474d ("i40e: add support for XDP_REDIRECT").
V7: This patch got added in V7 of this patchset.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend struct ixgbe_tx_buffer to store the xdp_mem_info.
Notice that this could be optimized further by putting this into
a union in the struct ixgbe_tx_buffer, but this patchset
works towards removing this again. Thus, this is not done.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements basic XDP redirect support in mlx5 driver.
Notice that the ndo_xdp_xmit() is NOT implemented, because that API
need some changes that this patchset is working towards.
The main purpose of this patch is have different drivers doing
XDP_REDIRECT to show how different memory models behave in a cross
driver world.
Update(pre-RFCv2 Tariq): Need to DMA unmap page before xdp_do_redirect,
as the return API does not exist yet to to keep this mapped.
Update(pre-RFCv3 Saeed): Don't mix XDP_TX and XDP_REDIRECT flushing,
introduce xdpsq.db.redirect_flush boolian.
V9: Adjust for commit 121e892754 ("net/mlx5e: Refactor RQ XDP_TX indication")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Added red_drops stats. Inbound packets dropped by RED, buffer exhaustion
2. Included fcs_err, jabber_err, l2_err and frame_err errors under
rx_errors
3. Included fifo_err, dmac_drop, red_drops, fw_err_pko, fw_err_link and
fw_err_drop under rx_dropped
4. Included max_collision_fail, max_deferral_fail, total_collisions,
fw_err_pko, fw_err_link, fw_err_drop and fw_err_pki under tx_dropped
5. Counting dma mapping errors
6. Added some firmware stats description and removed for some
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Acked-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace magic number "0x5 << MAX_READ_REQUEST_SHIFT" with the
appropriate constant as defined in PCI core.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch stmmac_mode_ops to generic Hardware Interface Helpers instead of
using hard-coded callbacks. This makes the code more readable and more
flexible.
No functional change.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch stmmac_hwtimestamp to generic Hardware Interface Helpers instead
of using hard-coded callbacks. This makes the code more readable and
more flexible.
No functional change.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch stmmac_ops to generic Hardware Interface Helpers instead of using
hard-coded callbacks. This makes the code more readable and more
flexible.
No functional change.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch stmmac_dma_ops to generic Hardware Interface Helpers instead of
using hard-coded callbacks. This makes the code more readable and more
flexible.
No functional change.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch stmmac_desc_ops to generic Hardware Interface Helpers instead of
using hard-coded callbacks. This makes the code more readable and more
flexible.
No functional change.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the interface is down, head/tail of the descriptor
ring address is set to 0 in netsec_netdev_stop().
But netsec hardware still keeps the previous descriptor
ring address, so there is inconsistency between driver
and hardware after interface is up at a later time.
To address this inconsistency, add netsec_reset_hardware()
when the interface is down.
In addition, to minimize the reset process,
add flag to decide whether driver loads the netsec microcode.
Even if driver resets the netsec hardware, netsec microcode
keeps resident on RAM, so it is ok we only load the microcode
at initialization.
This patch is critical for installation over network.
Signed-off-by: Masahisa KOJIMA <masahisa.kojima@linaro.org>
Fixes: 533dd11a12 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable TX-irq as well during ndo_open() as we can not count upon
RX to arrive early enough to trigger the napi. This patch is critical
for installation over network.
Fixes: 533dd11a12 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The usage of of_device_get_match_data() reduce the code size a bit.
Also, the only way to call mtk_probe() is to match an entry in
of_mtk_match[], so match cannot be NULL.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) In ip_gre tunnel, handle the conflict between TUNNEL_{SEQ,CSUM} and
GSO/LLTX properly. From Sabrina Dubroca.
2) Stop properly on error in lan78xx_read_otp(), from Phil Elwell.
3) Don't uncompress in slip before rstate is initialized, from Tejaswi
Tanikella.
4) When using 1.x firmware on aquantia, issue a deinit before we
hardware reset the chip, otherwise we break dirty wake WOL. From
Igor Russkikh.
5) Correct log check in vhost_vq_access_ok(), from Stefan Hajnoczi.
6) Fix ethtool -x crashes in bnxt_en, from Michael Chan.
7) Fix races in l2tp tunnel creation and duplicate tunnel detection,
from Guillaume Nault.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
l2tp: fix race in duplicate tunnel detection
l2tp: fix races in tunnel creation
tun: send netlink notification when the device is modified
tun: set the flags before registering the netdevice
lan78xx: Don't reset the interface on open
bnxt_en: Fix NULL pointer dereference at bnxt_free_irq().
bnxt_en: Need to include RDMA rings in bnxt_check_rings().
bnxt_en: Support max-mtu with VF-reps
bnxt_en: Ignore src port field in decap filter nodes
bnxt_en: do not allow wildcard matches for L2 flows
bnxt_en: Fix ethtool -x crash when device is down.
vhost: return bool from *_access_ok() functions
vhost: fix vhost_vq_access_ok() log check
vhost: Fix vhost_copy_to_user()
net: aquantia: oops when shutdown on already stopped device
net: aquantia: Regression on reset with 1.x firmware
cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
slip: Check if rstate is initialized before uncompressing
lan78xx: Avoid spurious kevent 4 "error"
lan78xx: Correctly indicate invalid OTP
...
- Rework the idle loop in order to prevent CPUs from spending too
much time in shallow idle states by making it stop the scheduler
tick before putting the CPU into an idle state only if the idle
duration predicted by the idle governor is long enough. That
required the code to be reordered to invoke the idle governor
before stopping the tick, among other things (Rafael Wysocki,
Frederic Weisbecker, Arnd Bergmann).
- Add the missing description of the residency sysfs attribute to
the cpuidle documentation (Prashanth Prakash).
- Finalize the cpufreq cleanup moving frequency table validation
from drivers to the core (Viresh Kumar).
- Fix a clock leak regression in the armada-37xx cpufreq driver
(Gregory Clement).
- Fix the initialization of the CPU performance data structures
for shared policies in the CPPC cpufreq driver (Shunyong Yang).
- Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers
a bit (Viresh Kumar, Rafael Wysocki).
- Mark the expected switch fall-throughs in the PM QoS core (Gustavo
Silva).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJazfv7AAoJEILEb/54YlRx/kYP+gPOX5O5cFF22Y2xvDHPMWjm
D/3Nc2aRo+5DuHHECSIJ3ZVQzVoamN5zQ1KbsBRV0bJgwim4fw4M199Jr/0I2nES
1pkByuxLrAtwb83uX3uBIQnwgKOAwRftOTeVaFaMoXgIbyUqK7ZFkGq0xQTnKqor
6+J+78O7wMaIZ0YXQP98BC6g96vs/f+ICrh7qqY85r4NtO/thTA1IKevBmlFeIWR
yVhEYgwSFBaWehKK8KgbshmBBEk3qzDOYfwZF/JprPhiN/6madgHgYjHC8Seok5c
QUUTRlyO1ULTQe4JulyJUKobx7HE9u/FXC0RjbBiKPnYR4tb9Hd8OpajPRZo96AT
8IQCdzL2Iw/ZyQsmQZsWeO1HwPTwVlF/TO2gf6VdQtH221izuHG025p8/RcZe6zb
fTTFhh6/tmBvmOlbKMwxaLbGbwcj/5W5GvQXlXAtaElLobwwNEcEyVfF4jo4Zx/U
DQc7agaAps67lcgFAqNDy0PoU6bxV7yoiAIlTJHO9uyPkDNyIfb0ZPlmdIi3xYZd
tUD7C+VBezrNCkw7JWL1xXLFfJ5X7K6x5bi9I7TBj1l928Hak0dwzs7KlcNBtF1Y
SwnJsNa3kxunGsPajya8dy5gdO0aFeB9Bse0G429+ugk2IJO/Q9M9nQUArJiC9Xl
Gw1bw5Ynv6lx+r5EqxHa
=Pnk4
-----END PGP SIGNATURE-----
Merge tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These include one big-ticket item which is the rework of the idle loop
in order to prevent CPUs from spending too much time in shallow idle
states. It reduces idle power on some systems by 10% or more and may
improve performance of workloads in which the idle loop overhead
matters. This has been in the works for several weeks and it has been
tested and reviewed quite thoroughly.
Also included are changes that finalize the cpufreq cleanup moving
frequency table validation from drivers to the core, a few fixes and
cleanups of cpufreq drivers, a cpuidle documentation update and a PM
QoS core update to mark the expected switch fall-throughs in it.
Specifics:
- Rework the idle loop in order to prevent CPUs from spending too
much time in shallow idle states by making it stop the scheduler
tick before putting the CPU into an idle state only if the idle
duration predicted by the idle governor is long enough.
That required the code to be reordered to invoke the idle governor
before stopping the tick, among other things (Rafael Wysocki,
Frederic Weisbecker, Arnd Bergmann).
- Add the missing description of the residency sysfs attribute to the
cpuidle documentation (Prashanth Prakash).
- Finalize the cpufreq cleanup moving frequency table validation from
drivers to the core (Viresh Kumar).
- Fix a clock leak regression in the armada-37xx cpufreq driver
(Gregory Clement).
- Fix the initialization of the CPU performance data structures for
shared policies in the CPPC cpufreq driver (Shunyong Yang).
- Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a
bit (Viresh Kumar, Rafael Wysocki).
- Mark the expected switch fall-throughs in the PM QoS core (Gustavo
Silva)"
* tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
tick-sched: avoid a maybe-uninitialized warning
cpufreq: Drop cpufreq_table_validate_and_show()
cpufreq: SCMI: Don't validate the frequency table twice
cpufreq: CPPC: Initialize shared perf capabilities of CPUs
cpufreq: armada-37xx: Fix clock leak
cpufreq: CPPC: Don't set transition_latency
cpufreq: ti-cpufreq: Use builtin_platform_driver()
cpufreq: intel_pstate: Do not include debugfs.h
PM / QoS: mark expected switch fall-throughs
cpuidle: Add definition of residency to sysfs documentation
time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
nohz: Avoid duplication of code related to got_idle_tick
nohz: Gather tick_sched booleans under a common flag field
cpuidle: menu: Avoid selecting shallow states with stopped tick
cpuidle: menu: Refine idle state selection for running tick
sched: idle: Select idle state before stopping the tick
time: hrtimer: Introduce hrtimer_next_event_without()
time: tick-sched: Split tick_nohz_stop_sched_tick()
cpuidle: Return nohz hint from cpuidle_select()
jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
...
When open fails during ethtool -L ring change, for example, the driver
may crash at bnxt_free_irq() because bp->bnapi is NULL.
If we fail to allocate all the new rings, bnxt_open_nic() will free
all the memory including bp->bnapi. Subsequent call to bnxt_close_nic()
will try to dereference bp->bnapi in bnxt_free_irq().
Fix it by checking for !bp->bnapi in bnxt_free_irq().
Fixes: e5811b8c09 ("bnxt_en: Add IRQ remapping logic.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With recent changes to reserve both L2 and RDMA rings, we need to include
the RDMA rings in bnxt_check_rings(). Otherwise we will under-estimate
the rings we need during ethtool -L and may lead to failure.
Fixes: fbcfc8e467 ("bnxt_en: Reserve completion rings and MSIX for bnxt_re RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While a VF is configured with a bigger mtu (> 1500), any packets that
are punted to the VF-rep (slow-path) get dropped by OVS kernel-datapath
with the following message: "dropped over-mtu packet". Fix this by
returning the max-mtu value for a VF-rep derived from its corresponding VF.
VF-rep's mtu can be changed using 'ip' command as shown in this example:
$ ip link set bnxt0_pf0vf0 mtu 9000
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently uses src port field (along with other fields) in the
decap tunnel key, while looking up and adding tunnel nodes. This leads to
redundant cfa_decap_filter_alloc() requests to the FW and flow-miss in the
flow engine. Fix this by ignoring the src port field in decap tunnel nodes.
Fixes: f484f6782e ("bnxt_en: add hwrm FW cmds for cfa_encap_record and decap_filter")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this patch the following commands would succeed as far as the
user was concerned:
$ tc qdisc add dev p1p1 ingress
$ tc filter add dev p1p1 parent ffff: protocol all \
flower skip_sw action drop
$ tc filter add dev p1p1 parent ffff: protocol ipv4 \
flower skip_sw src_mac 00:02:00:00:00:01/44 action drop
The current flow offload infrastructure used does not support wildcard
matching for ethernet headers, so do not allow the second or third
commands to succeed. If a user wants to drop traffic on that interface
the protocol and MAC addresses need to be specified explicitly:
$ tc qdisc add dev p1p1 ingress
$ tc filter add dev p1p1 parent ffff: protocol arp \
flower skip_sw action drop
$ tc filter add dev p1p1 parent ffff: protocol ipv4 \
flower skip_sw action drop
...
$ tc filter add dev p1p1 parent ffff: protocol ipv4 \
flower skip_sw src_mac 00:02:00:00:00:01 action drop
$ tc filter add dev p1p1 parent ffff: protocol ipv4 \
flower skip_sw src_mac 00:02:00:00:00:02 action drop
...
There are also checks for VLAN parameters in this patch as other callers
may wildcard those parameters even if tc does not. Using different
flow infrastructure could allow this to work in the future for L2 flows,
but for now it does not.
Fixes: 2ae7408fed ("bnxt_en: bnxt: add TC flower filter offload support")
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ethtool .get_rxfh() crash by checking for valid indirection table
address before copying the data.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case netdev is closed at the moment of pci shutdown, aq_nic_stop
gets called second time. napi_disable in that case hangs indefinitely.
In other case, if device was never opened at all, we get oops because
of null pointer access.
We should invoke aq_nic_stop conditionally, only if device is running
at the moment of shutdown.
Reported-by: David Arcari <darcari@redhat.com>
Fixes: 90869ddfef ("net: aquantia: Implement pci shutdown callback")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On ASUS XG-C100C with 1.5.44 firmware a special mode called "dirty wake"
is active. With this mode when motherboard gets powered (but no poweron
happens yet), NIC automatically enables powersave link and watches
for WOL packet.
This normally allows to powerup the PC after AC power failures.
Not all motherboards or bios settings gives power to PCI slots,
so this mode is not enabled on all the hardware.
4.16 linux driver introduced full hardware reset sequence
This is required since before that we had no NIC hardware
reset implemented and there were side effects of "not clean start".
But this full reset is incompatible with "dirty wake" WOL feature
it keeps the PHY link in a special mode forever. As a consequence,
driver sees no link and no traffic.
To fix this we forcibly change FW state to idle state before doing
the full reset. This makes FW to restore link state.
Fixes: c8c82eb net: aquantia: Introduce global AQC hardware reset sequence
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* pm-cpuidle:
tick-sched: avoid a maybe-uninitialized warning
cpuidle: Add definition of residency to sysfs documentation
time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
nohz: Avoid duplication of code related to got_idle_tick
nohz: Gather tick_sched booleans under a common flag field
cpuidle: menu: Avoid selecting shallow states with stopped tick
cpuidle: menu: Refine idle state selection for running tick
sched: idle: Select idle state before stopping the tick
time: hrtimer: Introduce hrtimer_next_event_without()
time: tick-sched: Split tick_nohz_stop_sched_tick()
cpuidle: Return nohz hint from cpuidle_select()
jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
sched: idle: Do not stop the tick before cpuidle_idle_call()
sched: idle: Do not stop the tick upfront in the idle loop
time: tick-sched: Reorganize idle tick management code
* pm-qos:
PM / QoS: mark expected switch fall-throughs
Pull networking fixes from David Miller:
1) The sockmap code has to free socket memory on close if there is
corked data, from John Fastabend.
2) Tunnel names coming from userspace need to be length validated. From
Eric Dumazet.
3) arp_filter() has to take VRFs properly into account, from Miguel
Fadon Perlines.
4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.
5) Missing idr_remove() in u32_delete_key(), from Cong Wang.
6) More syzbot stuff. Several use of uninitialized value fixes all
over, from Eric Dumazet.
7) Do not leak kernel memory to userspace in sctp, also from Eric
Dumazet.
8) Discard frames from unused ports in DSA, from Andrew Lunn.
9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
Falcon.
10) Do not access dp83640 PHY registers prematurely after reset, from
Esben Haabendal.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
vhost-net: set packet weight of tx polling to 2 * vq size
net: thunderx: rework mac addresses list to u64 array
inetpeer: fix uninit-value in inet_getpeer
dp83640: Ensure against premature access to PHY registers after reset
devlink: convert occ_get op to separate registration
ARM: dts: ls1021a: Specify TBIPA register address
net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
ibmvnic: Do not reset CRQ for Mobility driver resets
ibmvnic: Fix failover case for non-redundant configuration
ibmvnic: Fix reset scheduler error handling
ibmvnic: Zero used TX descriptor counter on reset
ibmvnic: Fix DMA mapping mistakes
tipc: use the right skb in tipc_sk_fill_sock_diag()
sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
net: dsa: Discard frames from unused ports
sctp: do not leak kernel memory to user space
soreuseport: initialise timewait reuseport field
ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
dccp: initialize ireq->ir_mark
net: fix uninit-value in __hw_addr_add_ex()
...
It is too expensive to pass u64 values via linked list, instead
allocate array for them by overall number of mac addresses from netdev.
This eventually removes multiple kmalloc() calls, aviod memory
fragmentation and allow to put single null check on kmalloc
return value in order to prevent a potential null pointer dereference.
Addresses-Coverity-ID: 1467429 ("Dereference null return value")
Fixes: 37c3347eb2 ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This resolves race during initialization where the resources with
ops are registered before driver and the structures used by occ_get
op is initialized. So keep occ_get callbacks registered only when
all structs are initialized.
The example flows, as it is in mlxsw:
1) driver load/asic probe:
mlxsw_core
-> mlxsw_sp_resources_register
-> mlxsw_sp_kvdl_resources_register
-> devlink_resource_register IDX
mlxsw_spectrum
-> mlxsw_sp_kvdl_init
-> mlxsw_sp_kvdl_parts_init
-> mlxsw_sp_kvdl_part_init
-> devlink_resource_size_get IDX (to get the current setup
size from devlink)
-> devlink_resource_occ_get_register IDX (register current
occupancy getter)
2) reload triggered by devlink command:
-> mlxsw_devlink_core_bus_device_reload
-> mlxsw_sp_fini
-> mlxsw_sp_kvdl_fini
-> devlink_resource_occ_get_unregister IDX
(struct mlxsw_sp *mlxsw_sp is freed at this point, call to occ get
which is using mlxsw_sp would cause use-after free)
-> mlxsw_sp_init
-> mlxsw_sp_kvdl_init
-> mlxsw_sp_kvdl_parts_init
-> mlxsw_sp_kvdl_part_init
-> devlink_resource_size_get IDX (to get the current setup
size from devlink)
-> devlink_resource_occ_get_register IDX (register current
occupancy getter)
Fixes: d9f9b9a4d0 ("devlink: Add support for resource abstraction")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduces a simpler and generic method for for finding (and mapping)
the TBIPA register.
Instead of relying of complicated logic for finding the TBIPA register
address based on the MDIO or MII register block base
address, which even in some cases relies on undocumented shadow registers,
a second "reg" entry for the mdio bus devicetree node specifies the TBIPA
register.
Backwards compatibility is kept, as the existing logic is applied when
only a single "reg" mapping is specified.
Signed-off-by: Esben Haabendal <eha@deif.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When resetting the ibmvnic driver after a partition migration occurs
there is no requirement to do a reset of the main CRQ. The current
driver code does the required re-enable of the main CRQ, then does
a reset of the main CRQ later.
What we should be doing for a driver reset after a migration is to
re-enable the main CRQ, release all the sub-CRQs, and then allocate
new sub-CRQs after capability negotiation.
This patch updates the handling of mobility resets to do the proper
work and not reset the main CRQ. To do this the initialization/reset
of the main CRQ had to be moved out of the ibmvnic_init routine
and in to the ibmvnic_probe and do_reset routines.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a failover case for a non-redundant pseries VNIC
configuration that was not being handled properly. The current
implementation assumes that the driver will always have a redandant
device to communicate with following a failover notification. There
are cases, however, when a non-redundant configuration can receive
a failover request. If that happens, the driver should wait until
it receives a signal that the device is ready for operation.
The driver is agnostic of its backing hardware configuration,
so this fix necessarily affects all device failover management.
The driver needs to wait until it receives a signal that the device
is ready for resetting. A flag is introduced to track this intermediary
state where the driver is waiting for an active device.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, if the driver is waiting for a reset following
a device parameter change, failure to schedule a reset can result
in a hang since a completion signal is never sent.
If the device configuration is being altered by a tool such
as ethtool or ifconfig, it could cause the console to hang
if the reset request does not get scheduled. Add some additional
error handling code to exit the wait_for_completion if there is
one in progress.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The counter that tracks used TX descriptors pending completion
needs to be zeroed as part of a device reset. This change fixes
a bug causing transmit queues to be stopped unnecessarily and in
some cases a transmit queue stall and timeout reset. If the counter
is not reset, the remaining descriptors will not be "removed",
effectively reducing queue capacity. If the queue is over half full,
it will cause the queue to stall if stopped.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix some mistakes caught by the DMA debugger. The first change
fixes a unnecessary unmap that should have been removed in an
earlier update. The next hunk fixes another bad unmap by zeroing
the bit checked to determine that an unmap is needed. The final
change fixes some buffers that are unmapped with the wrong
direction specified.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches noted that we have a few source files that for some
inexplicable reason (read: I'm too lazy to even go look at the history)
are marked executable:
drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
drivers/net/ethernet/cadence/macb_ptp.c
A simple git command line to show executable C/asm/header files is this:
git ls-files -s '*.[chsS]' | grep '^100755'
and then you can fix them up with scripting by just feeding that output
into:
| cut -f2 | xargs chmod -x
and commit it.
Which is exactly what this commit does.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlrHeY8UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxhLRAAndV/0NDyWZU0eZNM6twri2SEFnF7
E4ar+YthxDxxJG4TLJbIA12jc5NgHZy4WuttDa6Jb99KreBXIHJFlNi/V/tme6zf
+yXUuxWae7wJzBiaay57VqLGSc80gt/LTgjLa1siwQqjTbO3wSXR6JJXNaE9FtQ4
/jL61t8bD1Peb5cWTpt9p0hrnKI0/pHwASdReyFS4F/HDKdvpof7BxE/OU3HSxxA
XKC2v6RjY4S93vkzvApDXQ+vhKquVRK7/ojyTXQUO/GIzcARprO7H4k62N4ar0x/
qbXLkR8IMkwA8ecsNmcL92ftb/cXoHfd+wdK8WpijqzF4kW4SdteVWbIhUzI0gbr
0gjDYIzjplvH3pZGv/qvx+8sFtAP95OdPjuAAW2qJ9TCVfmiS8naNFCvcxg87RhD
gjyQD3If1X7F8wy309lhq7VNyRexTHgIMgTXHyFvuZMzn/Qe1huL2XCwDcEAg/OX
AvU2iuSE5tWAh7gIUMF/aWi3uoeJUyyoru5ZR//gqdFfx9YxpSimO1UDXnpPi8SR
Iz/jzHJc0aWGYdQ9l6HiSbJF3P/QQcWYs9igt0A7BRGB05SPdWCh7sSO70FJa8ME
f4WID5/qEiaH26kiSRX4cUqpc8Amk8bT0DXw2OT57qy3JM0ZdV5ENQX11pSpr9hv
uLEf0DU7AEmdvzQ=
=T++R
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- move pci_uevent_ers() out of pci.h (Michael Ellerman)
- skip ASPM common clock warning if BIOS already configured it (Sinan
Kaya)
- fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)
- remove last user of pci_get_bus_and_slot() and the function itself
(Sinan Kaya)
- add decoding for 16 GT/s link speed (Jay Fang)
- add interfaces to get max link speed and width (Tal Gilboa)
- add pcie_bandwidth_capable() to compute max supported link bandwidth
(Tal Gilboa)
- add pcie_bandwidth_available() to compute bandwidth available to
device (Tal Gilboa)
- add pcie_print_link_status() to log link speed and whether it's
limited (Tal Gilboa)
- use PCI core interfaces to report when device performance may be
limited by its slot instead of doing it in each driver (Tal Gilboa)
- fix possible cpqphp NULL pointer dereference (Shawn Lin)
- rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
hotplug (Mika Westerberg)
- add support for PCI I/O port space that's neither directly accessible
via CPU in/out instructions nor directly mapped into CPU physical
memory space. This is fairly intrusive and includes minor changes to
interfaces used for I/O space on most platforms (Zhichang Yuan, John
Garry)
- add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
John Garry)
- use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)
- remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
(Shawn Lin)
- report quirk timings with dev_info (Bjorn Helgaas)
- report quirks that take longer than 10ms (Bjorn Helgaas)
- add and use Altera Vendor ID (Johannes Thumshirn)
- tidy Makefiles and comments (Bjorn Helgaas)
- don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
ia64, and mn10300 with x86 (Bjorn Helgaas)
- move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
Lawler)
- merge pcieport_if.h into portdrv.h (Bjorn Helgaas)
- move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
Helgaas)
- completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)
- remove portdrv link order dependency (Bjorn Helgaas)
- remove support for unused VC portdrv service (Bjorn Helgaas)
- simplify portdrv feature permission checking (Bjorn Helgaas)
- remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
Helgaas)
- remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)
- use cached AER capability offset (Frederick Lawler)
- don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)
- rename pcie-dpc.c to dpc.c (Bjorn Helgaas)
- use generic pci_mmap_resource_range() instead of powerpc and xtensa
arch-specific versions (David Woodhouse)
- support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)
- remove System and Video ROM reservations on sparc (Bjorn Helgaas)
- probe for device reset support during enumeration instead of runtime
(Bjorn Helgaas)
- add ACS quirk for Ampere (née APM) root ports (Feng Kan)
- add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
Vincent-Cross)
- protect device restore with device lock (Sinan Kaya)
- handle failure of FLR gracefully (Sinan Kaya)
- handle CRS (config retry status) after device resets (Sinan Kaya)
- skip various config reads for SR-IOV VFs as an optimization
(KarimAllah Ahmed)
- consolidate VPD code in vpd.c (Bjorn Helgaas)
- add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)
- add DT support for R-Car r8a7743 (Biju Das)
- fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
bridge driver that causes a general protection fault (Dexuan Cui)
- fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
(Dexuan Cui)
- fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
(Dexuan Cui)
- make several structures static (Fengguang Wu)
- increase number of MSI IRQs supported by Synopsys DesignWare bridges
from 32 to 256 (Gustavo Pimentel)
- implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
API from DesignWare drivers (Gustavo Pimentel)
- add Tegra power management support (Manikanta Maddireddy)
- add Tegra loadable module support (Manikanta Maddireddy)
- handle 64-bit BARs correctly in endpoint support (Niklas Cassel)
- support optional regulator for HiSilicon STB (Shawn Guo)
- use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)
- support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)
* tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
HISI LPC: Add ACPI support
ACPI / scan: Do not enumerate Indirect IO host children
ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
of: Add missing I/O range exception for indirect-IO devices
PCI: Apply the new generic I/O management on PCI IO hosts
PCI: Add fwnode handler as input param of pci_register_io_range()
PCI: Remove __weak tag from pci_register_io_range()
MAINTAINERS: Add missing /drivers/pci/cadence directory entry
fm10k: Report PCIe link properties with pcie_print_link_status()
net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
net/mlx5: Report PCIe link properties with pcie_print_link_status()
net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
PCI: Add pcie_print_link_status() to log link speed and whether it's limited
PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
misc: pci_endpoint_test: Handle 64-bit BARs properly
PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
...
- Fix RDMA uapi headers to actually compile in userspace and be more
complete
- Three shared with netdev pull requests from Mellanox:
* 7 patches, mostly to net with 1 IB related one at the back). This
series addresses an IRQ performance issue (patch 1), cleanups related to
the fix for the IRQ performance problem (patches 2-6), and then extends
the fragmented completion queue support that already exists in the net
side of the driver to the ib side of the driver (patch 7).
* Mostly IB, with 5 patches to net that are needed to support the remaining
10 patches to the IB subsystem. This series extends the current
'representor' framework when the mlx5 driver is in switchdev mode from
being a netdev only construct to being a netdev/IB dev construct. The IB
dev is limited to raw Eth queue pairs only, but by having an IB dev of
this type attached to the representor for a switchdev port, it enables
DPDK to work on the switchdev device.
* All net related, but needed as infrastructure for the rdma driver
- Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
- SRP performance updates
- IB uverbs write path cleanup patch series from Leon
- Add RDMA_CM support to ib_srpt. This is disabled by default. Users need to
set the port for ib_srpt to listen on in configfs in order for it to be
enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
- TSO and Scatter FCS support in mlx4
- Refactor of modify_qp routine to resolve problems seen while working on new
code that is forthcoming
- More refactoring and updates of RDMA CM for containers support from Parav
- mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory'
user API features
- Infrastructure updates for the new IOCTL interface, based on increased usage
- ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit
kernel as was originally intended. See the commit messages for
extensive details
- Syzkaller bugs and code cleanups motivated by them
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJax5Z0AAoJEDht9xV+IJsacCwQAJBIgmLCvVp5fBu2kJcXMMVI
y3l2YNzAUJvDDKv1r5yTC9ugBXEkDtgzi/W/C2/5es2yUG/QeT/zzQ3YPrtsnN68
5FkiXQ35Tt7+PBHMr0cacGRmF4M3Td3MeW0X5aJaBKhqlNKwA+aF18pjGWBmpVYx
URYCwLb5BZBKVh4+1Leebsk4i0/7jSauAqE5M+9notuAUfBCoY1/Eve3DipEIBBp
EyrEnMDIdujYRsg4KHlxFKKJ1EFGItknLQbNL1+SEa0Oe0SnEl5Bd53Yxfz7ekNP
oOWQe5csTcs3Yr4Ob0TC+69CzI71zKbz6qPDILTwXmsPFZJ9ipJs4S8D6F7ra8tb
D5aT1EdRzh/vAORPC9T3DQ3VsHdvhwpUMG7knnKrVT9X/g7E+gSji1BqaQaTr/xs
i40GepHT7lM/TWEuee/6LRpqdhuOhud7vfaRFwn2JGRX9suqTcvwhkBkPUDGV5XX
5RkHcWOb/7KvmpG7S1gaRGK5kO208LgmAZi7REaJFoZB74FqSneMR6NHIH07ha41
Zou7rnxV68CT2bgu27m+72EsprgmBkVDeEzXgKxVI/+PZ1oadUFpgcZ3pRLOPWVx
rEqjHu65rlA/YPog4iXQaMfSwt/oRD3cVJS/n8EdJKXi4Qt2RDDGdyOmt74w4prM
QuLEdvJIFmwrND1KDoqn
=Ku8g
-----END PGP SIGNATURE-----
Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Doug and I are at a conference next week so if another PR is sent I
expect it to only be bug fixes. Parav noted yesterday that there are
some fringe case behavior changes in his work that he would like to
fix, and I see that Intel has a number of rc looking patches for HFI1
they posted yesterday.
Parav is again the biggest contributor by patch count with his ongoing
work to enable container support in the RDMA stack, followed by Leon
doing syzkaller inspired cleanups, though most of the actual fixing
went to RC.
There is one uncomfortable series here fixing the user ABI to actually
work as intended in 32 bit mode. There are lots of notes in the commit
messages, but the basic summary is we don't think there is an actual
32 bit kernel user of drivers/infiniband for several good reasons.
However we are seeing people want to use a 32 bit user space with 64
bit kernel, which didn't completely work today. So in fixing it we
required a 32 bit rxe user to upgrade their userspace. rxe users are
still already quite rare and we think a 32 bit one is non-existing.
- Fix RDMA uapi headers to actually compile in userspace and be more
complete
- Three shared with netdev pull requests from Mellanox:
* 7 patches, mostly to net with 1 IB related one at the back).
This series addresses an IRQ performance issue (patch 1),
cleanups related to the fix for the IRQ performance problem
(patches 2-6), and then extends the fragmented completion queue
support that already exists in the net side of the driver to the
ib side of the driver (patch 7).
* Mostly IB, with 5 patches to net that are needed to support the
remaining 10 patches to the IB subsystem. This series extends
the current 'representor' framework when the mlx5 driver is in
switchdev mode from being a netdev only construct to being a
netdev/IB dev construct. The IB dev is limited to raw Eth queue
pairs only, but by having an IB dev of this type attached to the
representor for a switchdev port, it enables DPDK to work on the
switchdev device.
* All net related, but needed as infrastructure for the rdma
driver
- Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
- SRP performance updates
- IB uverbs write path cleanup patch series from Leon
- Add RDMA_CM support to ib_srpt. This is disabled by default. Users
need to set the port for ib_srpt to listen on in configfs in order
for it to be enabled
(/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
- TSO and Scatter FCS support in mlx4
- Refactor of modify_qp routine to resolve problems seen while
working on new code that is forthcoming
- More refactoring and updates of RDMA CM for containers support from
Parav
- mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
memory' user API features
- Infrastructure updates for the new IOCTL interface, based on
increased usage
- ABI compatibility bug fixes to fully support 32 bit userspace on 64
bit kernel as was originally intended. See the commit messages for
extensive details
- Syzkaller bugs and code cleanups motivated by them"
* tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
IB/mlx5: Device memory mr registration support
net/mlx5: Mkey creation command adjustments
IB/mlx5: Device memory support in mlx5_ib
net/mlx5: Query device memory capabilities
IB/uverbs: Add device memory registration ioctl support
IB/uverbs: Add alloc/free dm uverbs ioctl support
IB/uverbs: Add device memory capabilities reporting
IB/uverbs: Expose device memory capabilities to user
RDMA/qedr: Fix wmb usage in qedr
IB/rxe: Removed GID add/del dummy routines
RDMA/qedr: Zero stack memory before copying to user space
IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
IB/mlx5: Add information for querying IPsec capabilities
IB/mlx5: Add IPsec support for egress and ingress
{net,IB}/mlx5: Add ipsec helper
IB/mlx5: Add modify_flow_action_esp verb
IB/mlx5: Add implementation for create and destroy action_xfrm
IB/uverbs: Introduce ESP steering match filter
IB/uverbs: Add modify ESP flow_action
...
Merge updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- the v9fs maintainers have been missing for a long time. I've taken
over v9fs patch slinging.
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (116 commits)
mm,oom_reaper: check for MMF_OOM_SKIP before complaining
mm/ksm: fix interaction with THP
mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t
headers: untangle kmemleak.h from mm.h
include/linux/mmdebug.h: make VM_WARN* non-rvals
mm/page_isolation.c: make start_isolate_page_range() fail if already isolated
mm: change return type to vm_fault_t
mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes
mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory
kernel/fork.c: detect early free of a live mm
mm: make counting of list_lru_one::nr_items lockless
mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static
block_invalidatepage(): only release page if the full page was invalidated
mm: kernel-doc: add missing parameter descriptions
mm/swap.c: remove @cold parameter description for release_pages()
mm/nommu: remove description of alloc_vm_area
zram: drop max_zpage_size and use zs_huge_class_size()
zsmalloc: introduce zs_huge_class_size()
mm: fix races between swapoff and flush dcache
fs/direct-io.c: minor cleanups in do_blockdev_direct_IO
...
Core:
* Remove support for asynchronous erase (not implemented by any of
the existing drivers anyway)
* Remove Cyrille from the list of SPI NOR and MTD maintainers
* Fix kernel doc headers
* Allow users to define the partitions parsers they want to test
through a DT property (compatible of the partitions subnode)
* Remove the bfin-async-flash driver (the only architecture using
it has been removed)
* Fix pagetest test
* Add extra checks in mtd_erase()
* Simplify the MTD partition creation logic and get rid of
mtd_add_device_partitions()
Drivers:
* Add endianness information to the physmap DT binding
* Add Eon EN29LV400A IDs to JEDEC probe logic
* Use %*ph where appropriate
SPI NOR changes:
Drivers:
* Make fsl-quaspi assign different names to MTD devices connected
to the same QSPI controller
* Remove an unneeded driver.bus assigned in the fsl-qspi driver
NAND changes:
Core:
* Prepare arrival of the SPI NAND subsystem by implementing a
generic (interface-agnostic) layer to ease manipulation of NAND
devices
* Move onenand code base to the drivers/mtd/nand/ dir
* Rework timing mode selection
* Provide a generic way for NAND chip drivers to flag a specific
GET/SET FEATURE operation as supported/unsupported
* Stop embedding ONFI/JEDEC param page in nand_chip
Drivers:
* Rework/cleanup of the mxc driver
* Various cleanups in the vf610 driver
* Migrate the fsmc and vf610 to ->exec_op()
* Get rid of the pxa driver (replaced by marvell_nand)
* Support ->setup_data_interface() in the GPMI driver
* Fix probe error path in several drivers
* Remove support for unused hw_syndrome mode in sunxi_nand
* Various minor improvements
-----BEGIN PGP SIGNATURE-----
iQI5BAABCAAjBQJaxTXfHBxib3Jpcy5icmV6aWxsb25AYm9vdGxpbi5jb20ACgkQ
Ze02AX4ItwDeFhAAs+OmGip0LHk+D7gFCBONtypOGRi0nmGsi6PkhKtUZBob6/Y2
JoJhTrHx/LsRpqcuh3Y8KSCp0CG5WlZX2m1RU1yrcqiWvTfiXHlv6aQGfiE/esRb
Ei4QNiND8hg+hyzc6I5wRrAuJ7jPP5BanX+n9TyvygaA1Ic5pcur0gssoYKeJTia
18pUV+RLe67wfP02uT0GJvmXd5ecIpu0OVhmJPye2UQqnfl+eiJ2zI4TJAZN3zRW
tD9XiX3/GP8oCnvC+1kZw48a2c/qiHs/DKCdDcH6SDTKzKha4zCgaX4220X3AqPK
rScrFTKxiCa1utZPn9EplnW6ZW3Ud46GqReWGRQZuyphu/ntdkNim7nuJMUZfEFL
RFtMLhXDs1aXUaYJ6F3YQeajHxVU6Ugl34UQCraCXbfmzqCkaKfgXm3EVb9d2duY
rCkrS5N4gSQXp5SMvc0aiu2TnsRX2OCthGg0NBdHG9AaVhwuj0neeSFQ/XFPxOeh
5jQKo4umR7tr2Md7osSx0PVxVo8uO5smqcl90SsdwrjxQxL7z1u/SJBjRf6pJDzQ
cR1o+H9jBXecqD6m+dN3H/h0sMwHzefsASkxYNxl2OeRGYsJvBpuoQF8yUD9jnRS
AzHhWhUNo8g3FxDe7kKgsMXoqnr/pKopMCRxVbeEkhDFNsTvBde10/Xo09U=
=EG7L
-----END PGP SIGNATURE-----
Merge tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd
Pull MTD updates from Boris Brezillon:
"MTD Core:
- Remove support for asynchronous erase (not implemented by any of
the existing drivers anyway)
- Remove Cyrille from the list of SPI NOR and MTD maintainers
- Fix kernel doc headers
- Allow users to define the partitions parsers they want to test
through a DT property (compatible of the partitions subnode)
- Remove the bfin-async-flash driver (the only architecture using it
has been removed)
- Fix pagetest test
- Add extra checks in mtd_erase()
- Simplify the MTD partition creation logic and get rid of
mtd_add_device_partitions()
MTD Drivers:
- Add endianness information to the physmap DT binding
- Add Eon EN29LV400A IDs to JEDEC probe logic
- Use %*ph where appropriate
SPI NOR Drivers:
- Make fsl-quaspi assign different names to MTD devices connected to
the same QSPI controller
- Remove an unneeded driver.bus assigned in the fsl-qspi driver
NAND Core:
- Prepare arrival of the SPI NAND subsystem by implementing a generic
(interface-agnostic) layer to ease manipulation of NAND devices
- Move onenand code base to the drivers/mtd/nand/ dir
- Rework timing mode selection
- Provide a generic way for NAND chip drivers to flag a specific
GET/SET FEATURE operation as supported/unsupported
- Stop embedding ONFI/JEDEC param page in nand_chip
NAND Drivers:
- Rework/cleanup of the mxc driver
- Various cleanups in the vf610 driver
- Migrate the fsmc and vf610 to ->exec_op()
- Get rid of the pxa driver (replaced by marvell_nand)
- Support ->setup_data_interface() in the GPMI driver
- Fix probe error path in several drivers
- Remove support for unused hw_syndrome mode in sunxi_nand
- Various minor improvements"
* tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd: (89 commits)
dt-bindings: fsl-quadspi: Add the example of two SPI NOR
mtd: fsl-quadspi: Distinguish the mtd device names
mtd: nand: Fix some function description mismatches in core.c
mtd: fsl-quadspi: Remove unneeded driver.bus assignment
mtd: rawnand: marvell: Rename ->ecc_clk into ->core_clk
mtd: rawnand: s3c2410: enhance the probe function error path
mtd: rawnand: tango: fix probe function error path
mtd: rawnand: sh_flctl: fix the probe function error path
mtd: rawnand: omap2: fix the probe function error path
mtd: rawnand: mxc: fix probe function error path
mtd: rawnand: denali: fix probe function error path
mtd: rawnand: davinci: fix probe function error path
mtd: rawnand: cafe: fix probe function error path
mtd: rawnand: brcmnand: fix probe function error path
mtd: rawnand: sunxi: Stop supporting ECC_HW_SYNDROME mode
mtd: rawnand: marvell: Fix clock resource by adding a register clock
mtd: ftl: Use DIV_ROUND_UP()
mtd: Fix some function description mismatches in mtdcore.c
mtd: physmap_of: update struct map_info's swap as per map requirement
dt-bindings: mtd-physmap: Add endianness supports
...
1) Return correct size from ice_get_regs_len.
2) Fix incorrect use of ARRAY_SIZE in ice_get_regs.
Fixes: fcea6f3da5 (ice: Add stats and ethtool support)
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix to return error code ICE_ERR_NO_MEMORY from the alloc error
handling case instead of 0, as done elsewhere in this function.
Fixes: dc49c77236 ("ice: Get MAC/PHY/link info and scheduler topology")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since the subsequent changes will need a TICK_USEC definition
analogous to TICK_NSEC, rename the existing TICK_USEC as
USER_TICK_USEC, update its users and redefine TICK_USEC
accordingly.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
reason. It looks like it's only a convenience, so remove kmemleak.h
from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
don't already #include it. Also remove <linux/kmemleak.h> from source
files that do not use it.
This is tested on i386 allmodconfig and x86_64 allmodconfig. It would
be good to run it through the 0day bot for other $ARCHes. I have
neither the horsepower nor the storage space for the other $ARCHes.
Update: This patch has been extensively build-tested by both the 0day
bot & kisskb/ozlabs build farms. Both of them reported 2 build failures
for which patches are included here (in v2).
[ slab.h is the second most used header file after module.h; kernel.h is
right there with slab.h. There could be some minor error in the
counting due to some #includes having comments after them and I didn't
combine all of those. ]
[akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au> [2 build failures]
Reported-by: Fengguang Wu <fengguang.wu@intel.com> [2 build failures]
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Boundary check in mvpp2_prs_init_from_hw must be done according to the
passed "tid" parameter, not the mvpp2_prs_entry index, which is not yet
initialized at the time of the check.
Fixes: 47e0e14eb1 ("net: mvpp2: Make mvpp2_prs_hw_read a parser entry init function")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change updates the mlx5 interface to create mkey
on the device.
The updates in the command mailbox include increasing the
access mode type field to 5 bits in order to support additional
types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
memory access type and will be used when registering MR on allocated
device memory.
All the places that use the old access mode format are adjusted as
well.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>