VT-d posted interrupts, DAX/ZONE_DEVICE,
module unload/reload.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJdyrEsAAoJEL/70l94x66DIOkH/Asqrh4o4pwfRHWE+9rnM6PI
j8oFi7Q4eRXJnP4zEMnMbb6xD/BfSH1tWEcPcYgIxD/t0DFx8F92/xsETAJ/Qc5n
CWpmnhMkJqERlV+GSRuBqnheMo0CEH1Ab1QZKhh5U3//pK3OtGY9WyydJHWcquTh
bGh2pnxwVZOtIIEmclUUfKjyR2Fu8hJLnQwzWgYZ27UK7J2pLmiiTX0vwQG359Iq
sDn9ND33pCBW5e/D2mzccRjOJEvzwrumewM1sRDsoAYLJzUjg9+xD83vZDa1d7R6
gajCDFWVJbPoLvUY+DgsZBwMMlogElimJMT/Zft3ERbCsYJbFvcmwp4JzyxDxQ4=
=J6KN
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Fix unwinding of KVM_CREATE_VM failure, VT-d posted interrupts,
DAX/ZONE_DEVICE, and module unload/reload"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved
KVM: VMX: Introduce pi_is_pir_empty() helper
KVM: VMX: Do not change PID.NDST when loading a blocked vCPU
KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts
KVM: VMX: Fix comment to specify PID.ON instead of PIR.ON
KVM: X86: Fix initialization of MSR lists
KVM: fix placement of refcount initialization
KVM: Fix NULL-ptr deref after kvm_create_vm fails
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl3J1ekTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRBaxiGjkeSdILUKB/9mYalSK5f4n/AalnsEoGy3IWVy/xwG
KvAdZAueH2uFIhu6MfqmWbs9xp94kGsI9RP1o3UDzwLxBaol8cG97oi0T1Fn5670
LSzBkR5WXEVaerw7noGwVEvi/IKkSCO7pOI3PszwITHU+P/w/EChVLyeWnzDoTaf
bzSXbtXQTOCINepIic+f1COffyJA4o84uiEFwpkuTJPbiihxr43bGrwj0xb1cWC9
Zh464vGuA7s2/Xqhsmg/Z/BU5ihWIz3/wEAqKUW99dj1rUpgdaChTFEO1femXInX
WkOi6r4EZGsAB0Cbbp6hTkovK9UF94asVMOW74Yz9UK9dxEakcwmR7XK
=S7hM
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-5.5-20191111' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2019-10-07
this is a pull request for net-next/master consisting of 32 patches.
The first patch is by Gustavo A. R. Silva and removes unused code in the
generic CAN infrastructure.
The next three patches target the mcp251x driver. The one by Andy
Shevchenko removes the legacy platform data support from the driver. The
other two are by Timo Schlüßler and reset the device only when needed,
to prevent glitches on the output when GPIO support is added.
I'm contributing two patches fixing checkpatch warnings in the
c_can_platform and peak_canfd driver.
Stephane Grosjean's patch for the peak_canfd driver adds hw timestamps
support in rx skbs.
The next three patches target the xilinx_can driver. One patch by me to
fix checkpatch warnings, one patch by Anssi Hannula to avoid non
requested bus error frames, and a patch by YueHaibing that switches the
driver to devm_platform_ioremap_resource().
Pankaj Sharma contributes two patches for the m_can driver, the first
one adds support for one shot mode, the other support for handling
arbitration errors.
Followed by four patches by YueHaibing, switching the grcan, ifi, rcar,
and sun4i drivers to devm_platform_ioremap_resource()
I'm contributing cleanup patches for the rx-offload helper, while Joakim
Zhang's patch prepares the rx-offload helper for CAN-FD support. The rx
offload users flexcan and ti_hecc are converted accordingly.
The remaining twelve patches target the flexcan driver. First Joakim
Zhang switches the driver to devm_platform_ioremap_resource(). The
remaining eleven patch are by me and clean up the abstract the access of
the iflag1 and iflag2 register both for RX and TX mailboxes. This is a
preparation for the upcoming CAN-FD support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The sfc driver can drop packets processed with XDP, notably when running
out of buffer space on XDP_TX, or returning an unknown XDP action.
This increments the rx_xdp_bad_drops ethtool counter.
Call trace_xdp_exception everywhere rx_xdp_bad_drops is incremented,
except for fragmented RX packets as the XDP program hasn't run yet.
This allows it to easily be monitored from userspace.
This mirrors the behavior of other drivers.
Signed-off-by: Arthur Fabre <afabre@cloudflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an SMC socket is immediately terminated after a non-blocking connect()
has been called, a memory leak is possible.
Due to the sock_hold move in
commit 301428ea37 ("net/smc: fix refcounting for non-blocking connect()")
an extra sock_put() is needed in smc_connect_work(), if the internal
TCP socket is aborted and cancels the sk_stream_wait_connect() of the
connect worker.
Reported-by: syzbot+4b73ad6fc767e576e275@syzkaller.appspotmail.com
Fixes: 301428ea37 ("net/smc: fix refcounting for non-blocking connect()")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When do randbuilding, we got this warning:
WARNING: unmet direct dependencies detected for PTP_1588_CLOCK
Depends on [n]: NET [=y] && POSIX_TIMERS [=n]
Selected by [y]:
- PTP_1588_CLOCK_IDTCM [=y]
Make PTP_1588_CLOCK_IDTCM depends on PTP_1588_CLOCK to fix this.
Fixes: 3a6ba7dc77 ("ptp: Add a ptp clock driver for IDT ClockMatrix.")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
after commit 4097e9d250 ("net: sched: don't use tc_action->order during
action dump"), 'act->order' is initialized but then it's no more read, so
we can just remove this member of struct tc_action.
CC: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since CNP it's possible for rawclk to have two different values, 19.2
and 24 MHz. If the value indicated by SFUSE_STRAP register is different
from the power on default for PCH_RAWCLK_FREQ, we'll end up having a
mismatch between the rawclk hardware and software states after
suspend/resume. On previous platforms this used to work by accident,
because the power on defaults worked just fine.
Update the rawclk also on resume. The natural place to do this would be
intel_modeset_init_hw(), however VLV/CHV need it done before
intel_power_domains_init_hw(). Thus put it there even if it feels
slightly out of place.
v2: Call intel_update_rawclck() in intel_power_domains_init_hw() for all
platforms (Ville).
Reported-by: Shawn Lee <shawn.c.lee@intel.com>
Cc: Shawn Lee <shawn.c.lee@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Shawn Lee <shawn.c.lee@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101142024.13877-1-jani.nikula@intel.com
(cherry picked from commit 59ed05ccdd)
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There is a stray semicolon in an if statement that will cause a dev_err
message to be printed unconditionally. Fix this by removing the stray
semicolon.
Addresses-Coverity: ("Stay semicolon")
Fixes: f0942e00a1 ("net: dsa: mv88e6xxx: Add support for port mirroring")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aya Levin says:
====================
Update devlink binary output
This series changes the devlink binary interface:
-The first patch forces binary values to be enclosed in an array. In
addition, devlink_fmsg_binary_pair_put breaks the binary value into
chunks to comply with devlink's restriction for value length.
-The second patch removes redundant code and uses the fixed devlink
interface (devlink_fmsg_binary_pair_put).
-The third patch make self test to use the updated devlink
interface.
-The fourth, adds a verification of dumping a very large binary
content. This test verifies breaking the data into chunks in a valid
JSON output.
Series was generated against net-next commit:
ca22d6977b Merge branch 'stmmac-next'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test of 2 PAGEs size (exceeds devlink previous length limitation)
of binary data on a 'devlink health dump show' command. Set binary length
to 8192, issue a dump show command and clear it.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update dummy reporter's output to use updated devlink interface of
binary fmsg pair.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove redundant code from fw_fatal reporter's dump callback. Use
updated devlink interface of binary fmsg pair which breaks the output
into chunks internally.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink supports pair output of name and value. When the value is
binary, it must be presented in an array. If the length of the binary
value exceeds fmsg limitation, break the value into chunks internally.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When building with SFP disabled, the stub for sfp_bus_add_upstream()
missed "inline". Add it.
Fixes: 727b3668b7 ("net: sfp: rework upstream interface")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warnings:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c:242:6: warning: symbol 'cxgb4_mqprio_free_hw_resources' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 2d0cb84dd9 ("cxgb4: add ETHOFLD hardware queue support")
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
zhengbin says:
====================
net: atlantic: make some symbol & function static
v1->v2: add Fixes tag
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warnings:
drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c:706:5: warning: symbol 'aq_ethtool_get_priv_flags' was not declared. Should it be static?
drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c:713:5: warning: symbol 'aq_ethtool_set_priv_flags' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: ea4b4d7fc1 ("net: atlantic: loopback tests via private flags")
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warnings:
drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c:426:25: warning: symbol 'aq_pm_ops' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 8aaa112a57 ("net: atlantic: refactoring pm logic")
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Add extended ACK for EMADs
Shalom says:
Ethernet Management Datagrams (EMADs) are Ethernet packets sent between
the driver and device's firmware. They are used to pass various
configurations to the device, but also to get events (e.g., port up)
from it. After the Ethernet header, these packets are built in a TLV
format.
Up until now, whenever the driver issued an erroneous register access it
only got an error code indicating a bad parameter was used. This patch
set adds a new TLV (string TLV) that can be used by the firmware to
encode a 128 character string describing the error. The new TLV is
allocated by the driver and set to zeros. In case of error, the driver
will check the length of the string in the response and report it using
devlink hwerr tracepoint.
Example:
$ perf record -a -q -e devlink:devlink_hwerr &
$ pkill -2 perf
$ perf script -F trace:event,trace | grep hwerr
devlink:devlink_hwerr: bus_name=pci dev_name=0000:03:00.0 driver_name=mlxsw_spectrum err=7 (tid=9913892d00001593,reg_id=8018(rauhtd)) bad parameter (inside er_rauhtd_write_query(), num_rec=32 is over the maximum number of records supported)
Patch #1 parses the offsets of the different TLVs in incoming EMADs and
stores them in the skb's control block. This makes it easier to later
add new TLVs.
Patches #2-#3 remove deprecated TLVs and add string TLV definition.
Patches #4-#7 gradually add support for the new string TLV.
v2:
* Use existing devlink hwerr tracepoint to report the error string,
instead of printing it to kernel log
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure to enable EMAD string TLV only after using the required firmware
version.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the firmware had an error while processing EMADs, it can send back
an ASCII string with the reason using EMAD string TLV.
This patch adds the support for using EMAD string TLV. In case of an error,
reports the reason using devlink hwerr tracepoint.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend EMAD information reported to devlink hwerr tracepoint with
transaction id and reg id (both, hex and string).
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During parsing of incoming EMADs, fill the string TLV's offset when it is
used.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add EMAD string TLV, an ASCII string the driver can receive from the
firmware in case of an error.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now the code assumes a fixed structure which makes it difficult to
support EMADs with and without new TLVs.
Make it more generic by parsing the TLVs when the EMADs are received and
store the offset to the different TLVs in the control block. Using these
offsets to extract information from the EMADs without relying on a specific
structure.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 TSX Async Abort and iTLB Multihit mitigations from Thomas Gleixner:
"The performance deterioration departement is not proud at all of
presenting the seventh installment of speculation mitigations and
hardware misfeature workarounds:
1) TSX Async Abort (TAA) - 'The Annoying Affair'
TAA is a hardware vulnerability that allows unprivileged
speculative access to data which is available in various CPU
internal buffers by using asynchronous aborts within an Intel TSX
transactional region.
The mitigation depends on a microcode update providing a new MSR
which allows to disable TSX in the CPU. CPUs which have no
microcode update can be mitigated by disabling TSX in the BIOS if
the BIOS provides a tunable.
Newer CPUs will have a bit set which indicates that the CPU is not
vulnerable, but the MSR to disable TSX will be available
nevertheless as it is an architected MSR. That means the kernel
provides the ability to disable TSX on the kernel command line,
which is useful as TSX is a truly useful mechanism to accelerate
side channel attacks of all sorts.
2) iITLB Multihit (NX) - 'No eXcuses'
iTLB Multihit is an erratum where some Intel processors may incur
a machine check error, possibly resulting in an unrecoverable CPU
lockup, when an instruction fetch hits multiple entries in the
instruction TLB. This can occur when the page size is changed
along with either the physical address or cache type. A malicious
guest running on a virtualized system can exploit this erratum to
perform a denial of service attack.
The workaround is that KVM marks huge pages in the extended page
tables as not executable (NX). If the guest attempts to execute in
such a page, the page is broken down into 4k pages which are
marked executable. The workaround comes with a mechanism to
recover these shattered huge pages over time.
Both issues come with full documentation in the hardware
vulnerabilities section of the Linux kernel user's and administrator's
guide.
Thanks to all patch authors and reviewers who had the extraordinary
priviledge to be exposed to this nuisance.
Special thanks to Borislav Petkov for polishing the final TAA patch
set and to Paolo Bonzini for shepherding the KVM iTLB workarounds and
providing also the backports to stable kernels for those!"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
Documentation: Add ITLB_MULTIHIT documentation
kvm: x86: mmu: Recovery of shattered NX large pages
kvm: Add helper function for creating VM worker threads
kvm: mmu: ITLB_MULTIHIT mitigation
cpu/speculation: Uninline and export CPU mitigations helpers
x86/cpu: Add Tremont to the cpu vulnerability whitelist
x86/bugs: Add ITLB_MULTIHIT bug infrastructure
x86/tsx: Add config options to set tsx=on|off|auto
x86/speculation/taa: Add documentation for TSX Async Abort
x86/tsx: Add "auto" option to the tsx= cmdline parameter
kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
x86/speculation/taa: Add sysfs reporting for TSX Async Abort
x86/speculation/taa: Add mitigation for TSX Async Abort
x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
x86/cpu: Add a helper function x86_read_arch_cap_msr()
x86/msr: Add the IA32_TSX_CTRL MSR
If TI_DAVINCI_EMAC=y and GENERIC_ALLOCATOR is not set,
below erros can be seen:
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_desc_pool_destroy.isra.14':
davinci_cpdma.c:(.text+0x359): undefined reference to `gen_pool_size'
davinci_cpdma.c:(.text+0x365): undefined reference to `gen_pool_avail'
davinci_cpdma.c:(.text+0x373): undefined reference to `gen_pool_avail'
davinci_cpdma.c:(.text+0x37f): undefined reference to `gen_pool_size'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `__cpdma_chan_free':
davinci_cpdma.c:(.text+0x4a2): undefined reference to `gen_pool_free_owner'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_chan_submit_si':
davinci_cpdma.c:(.text+0x66c): undefined reference to `gen_pool_alloc_algo_owner'
davinci_cpdma.c:(.text+0x805): undefined reference to `gen_pool_free_owner'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_ctlr_create':
davinci_cpdma.c:(.text+0xabd): undefined reference to `devm_gen_pool_create'
davinci_cpdma.c:(.text+0xb79): undefined reference to `gen_pool_add_owner'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_check_free_tx_desc':
davinci_cpdma.c:(.text+0x16c6): undefined reference to `gen_pool_avail'
This patch mades TI_DAVINCI_EMAC select GENERIC_ALLOCATOR.
Fixes: 99f6297182 ("net: ethernet: ti: cpsw: drop TI_DAVINCI_CPDMA config option")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some Coffee Lake platforms have a skewed HPET timer once the SoCs entered
PC10, which in consequence marks TSC as unstable because HPET is used as
watchdog clocksource for TSC.
Harry Pan tried to work around it in the clocksource watchdog code [1]
thereby creating a circular dependency between HPET and TSC. This also
ignores the fact, that HPET is not only unsuitable as watchdog clocksource
on these systems, it becomes unusable in general.
Disable HPET on affected platforms.
Suggested-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183
Link: https://lore.kernel.org/lkml/20190516090651.1396-1-harry.pan@intel.com/ [1]
Link: https://lkml.kernel.org/r/20191016103816.30650-1-kai.heng.feng@canonical.com
Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
instead manually handle ZONE_DEVICE on a case-by-case basis. For things
like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
pages, e.g. put pages grabbed via gup(). But for flows such as setting
A/D bits or shifting refcounts for transparent huge pages, KVM needs to
to avoid processing ZONE_DEVICE pages as the flows in question lack the
underlying machinery for proper handling of ZONE_DEVICE pages.
This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
when running a KVM guest backed with /dev/dax memory, as KVM straight up
doesn't put any references to ZONE_DEVICE pages acquired by gup().
Note, Dan Williams proposed an alternative solution of doing put_page()
on ZONE_DEVICE pages immediately after gup() in order to simplify the
auditing needed to ensure is_zone_device_page() is called if and only if
the backing device is pinned (via gup()). But that approach would break
kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
unmap() when accessing guest memory, unlike KVM's secondary MMU, which
coordinates with mmu_notifier invalidations to avoid creating stale
page references, i.e. doesn't rely on pages being pinned.
[*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.pl
Reported-by: Adam Borowski <kilobyte@angband.pl>
Analyzed-by: David Hildenbrand <david@redhat.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Fixes: 3565fce3a6 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Streamline the PID.PIR check and change its call sites to use
the newly added helper.
Suggested-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU
linked list of all vCPUs that are blocked on this pCPU. Afterwards, it
changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler
(wakeup_handler()) is responsible to kick (unblock) any vCPU on that
linked list that now has pending posted interrupts.
While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
will cause vmx_vcpu_pi_put() to set PID.SN. If later the vCPU will be
scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear
PID.SN but will also *overwrite PID.NDST to this different pCPU*.
Instead of keeping it with original pCPU which vCPU had entered block
phase on.
This results in an issue because when a posted interrupt is delivered, as
the wakeup_handler() will be executed and fail to find blocked vCPU on
its per pCPU linked list of all vCPUs that are blocked on this pCPU.
Which is due to the vCPU being placed on a *different* per pCPU
linked list i.e. the original pCPU in which it entered block phase.
The regression is introduced by commit c112b5f502 ("KVM: x86:
Recompute PID.ON when clearing PID.SN"). Therefore, partially revert
it and reintroduce the condition in vmx_vcpu_pi_load() responsible for
avoiding changing PID.NDST when loading a blocked vCPU.
Fixes: c112b5f502 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
Tested-by: Nathan Ni <nathan.ni@oracle.com>
Co-developed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 17e433b543 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
introduced vmx_dy_apicv_has_pending_interrupt() in order to determine
if a vCPU have a pending posted interrupt. This routine is used by
kvm_vcpu_on_spin() when searching for a a new runnable vCPU to schedule
on pCPU instead of a vCPU doing busy loop.
vmx_dy_apicv_has_pending_interrupt() determines if a
vCPU has a pending posted interrupt solely based on PID.ON. However,
when a vCPU is preempted, vmx_vcpu_pi_put() sets PID.SN which cause
raised posted interrupts to only set bit in PID.PIR without setting
PID.ON (and without sending notification vector), as depicted in VT-d
manual section 5.2.3 "Interrupt-Posting Hardware Operation".
Therefore, checking PID.ON is insufficient to determine if a vCPU has
pending posted interrupts and instead we should also check if there is
some bit set on PID.PIR if PID.SN=1.
Fixes: 17e433b543 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
Reviewed-by: Jagannathan Raman <jag.raman@oracle.com>
Co-developed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Outstanding Notification (ON) bit is part of the Posted Interrupt
Descriptor (PID) as opposed to the Posted Interrupts Register (PIR).
The latter is a bitmap for pending vectors.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The three MSR lists(msrs_to_save[], emulated_msrs[] and
msr_based_features[]) are global arrays of kvm.ko, which are
adjusted (copy supported MSRs forward to override the unsupported MSRs)
when insmod kvm-{intel,amd}.ko, but it doesn't reset these three arrays
to their initial value when rmmod kvm-{intel,amd}.ko. Thus, at the next
installation, kvm-{intel,amd}.ko will do operations on the modified
arrays with some MSRs lost and some MSRs duplicated.
So define three constant arrays to hold the initial MSR lists and
initialize msrs_to_save[], emulated_msrs[] and msr_based_features[]
based on the constant arrays.
Cc: stable@vger.kernel.org
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
[Remove now useless conditionals. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
An ESP packet could be decrypted in async mode if the input handler for
this packet returns -EINPROGRESS in xfrm_input(). At this moment the device
reference in skb is held. Later xfrm_input() will be invoked again to
resume the processing.
If the transform state is still valid it would continue to release the
device reference and there won't be a problem; however if the transform
state is not valid when async resumption happens, the packet will be
dropped while the device reference is still being held.
When the device is deleted for some reason and the reference to this
device is not properly released, the kernel will keep logging like:
unregister_netdevice: waiting for ppp2 to become free. Usage count = 1
The issue is observed when running IPsec traffic over a PPPoE device based
on a bridge interface. By terminating the PPPoE connection on the server
end for multiple times, the PPPoE device on the client side will eventually
get stuck on the above warning message.
This patch will check the async mode first and continue to release device
reference in async resumption, before it is dropped due to invalid state.
v2: Do not assign address family from outer_mode in the transform if the
state is invalid
v3: Release device reference in the error path instead of jumping to resume
Fixes: 4ce3dbe397 ("xfrm: Fix xfrm_input() to verify state is valid when (encap_type < 0)")
Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>
Reported-by: Bo Chen <chenborfc@163.com>
Tested-by: Bo Chen <chenborfc@163.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Currently we make sequence == 0 be the same as sequence == 1, but that's
not super useful if the intent is really to have a timeout that's just
a pure timeout.
If the user passes in sqe->off == 0, then don't apply any sequence logic
to the request, let it purely be driven by the timeout specified.
Reported-by: 李通洲 <carter.li@eoitek.com>
Reviewed-by: 李通洲 <carter.li@eoitek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A cast to 'time_t' was accidentally left in place during the
conversion of __do_adjtimex() to 64-bit timestamps, so the
resulting value is incorrectly truncated.
Remove the cast so the 64-bit time gets propagated correctly.
Fixes: ead25417f8 ("timex: use __kernel_timex internally")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191108203435.112759-2-arnd@arndb.de
Jose Abreu says:
====================
net: stmmac: Improvements for -next
Misc improvements for stmmac.
Patch 1/6, fixes a sparse warning that was introduced in recent commit in
-next.
Patch 2/6, adds the Split Header support which is also available in XGMAC
cores and now in GMAC4+ with this patch.
Patch 3/6, adds the C45 support for MDIO transactions when using XGMAC cores.
Patch 4/6, removes the speed dependency on CBS callbacks so that it can be used
in XGMAC cores.
Patch 5/6, reworks the over-engineered stmmac_rx() function so that its easier
to read.
Patch 6/6, implements the UDP Segmentation Offload feature in GMAC4+ cores.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the UDP Segmentation Offload feature in stmmac. This is only
available in GMAC4+ cores.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This looks over-engineered. Let's use some helpers to get the buffer
length and hereby simplify the stmmac_rx() function. No performance drop
was seen with the new implementation.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XGMAC3 supports full CBS features with speeds that can go up to 10G so
we can now remove the maximum speed check of CBS.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the support for C45 PHYs in the MDIO callbacks for XGMAC. This was
tested using Synopsys DesignWare XPCS.
v2:
- Pull out the readl_poll_timeout() calls into common code (Andrew)
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GMAC4+ cores also support the Split Header feature.
Add the support for Split Header feature in the RX path following the
same implementation logic that XGMAC followed.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VID is converted to le16 so the variable must be __le16 type.
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: c7ab0b8088 ("net: stmmac: Fallback to VLAN Perfect filtering if HASH is not available")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable hdr_len is being assigned a value that is never read.
The assignment is redundant and hence can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable err is not uninitialized and hence can potentially contain
any garbage value. This may cause an error when logical or'ing the
return values from the calls to functions crypto_aead_setauthsize or
crypto_aead_setkey. Fix this by setting err to the return of
crypto_aead_setauthsize rather than or'ing in the return into the
uninitialized variable
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: fc1b6d6de2 ("tipc: introduce TIPC encryption & authentication")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sysfs paths changed, updating to the current ones.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>