Commit Graph

916355 Commits

Author SHA1 Message Date
Xiyu Yang
59e1947ca0 ALSA: usb-audio: Fix usb audio refcnt leak when getting spdif
snd_microii_spdif_default_get() invokes snd_usb_lock_shutdown(), which
increases the refcount of the snd_usb_audio object "chip".

When snd_microii_spdif_default_get() returns, local variable "chip"
becomes invalid, so the refcount should be decreased to keep refcount
balanced.

The reference counting issue happens in several exception handling paths
of snd_microii_spdif_default_get(). When those error scenarios occur
such as usb_ifnum_to_if() returns NULL, the function forgets to decrease
the refcnt increased by snd_usb_lock_shutdown(), causing a refcnt leak.

Fix this issue by jumping to "end" label when those error scenarios
occur.

Fixes: 447d6275f0 ("ALSA: usb-audio: Add sanity checks for endpoint accesses")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1587617711-13200-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-04-23 09:10:05 +02:00
David Ahern
257d7d4f0e libbpf: Only check mode flags in get_xdp_id
The commit in the Fixes tag changed get_xdp_id to only return prog_id
if flags is 0, but there are other XDP flags than the modes - e.g.,
XDP_FLAGS_UPDATE_IF_NOEXIST. Since the intention was only to look at
MODE flags, clear other ones before checking if flags is 0.

Fixes: f07cbad297 ("libbpf: Fix bpf_get_link_xdp_id flags handling")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrey Ignatov <rdna@fb.com>
2020-04-22 22:07:22 -07:00
David S. Miller
30685b2a43 Merge branch 'dpaa2-eth-add-support-for-xdp-bulk-enqueue'
Ioana Ciornei says:

====================
dpaa2-eth: add support for xdp bulk enqueue

The first patch moves the DEV_MAP_BULK_SIZE macro into the xdp.h header
file so that drivers can take advantage of it and use it.

The following 3 patches are there to setup the scene for using the bulk
enqueue feature.  First of all, the prototype of the enqueue function is
changed so that it returns the number of enqueued frames. Second, the
bulk enqueue interface is used but without any functional changes, still
one frame at a time is enqueued.  Third, the .ndo_xdp_xmit callback is
split into two stages, create all FDs for the xdp_frames received and
then enqueue them.

The last patch of the series builds on top of the others and instead of
issuing an enqueue operation for each FD it issues a bulk enqueue call
for as many frames as possible. This is repeated until all frames are
enqueued or the maximum number of retries is hit. We do not use the
XDP_XMIT_FLUSH flag since the architecture is not capable to store all
frames dequeued in a NAPI cycle, instead we send out right away all
frames received in a .ndo_xdp_xmit call.

Changes in v2:
 - statically allocate an array of dpaa2_fd by frame queue
 - use the DEV_MAP_BULK_SIZE as the maximum number of xdp_frames
   received in .ndo_xdp_xmit()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 20:11:29 -07:00
Ioana Ciornei
8665d9780e dpaa2-eth: use bulk enqueue in .ndo_xdp_xmit
Take advantage of the bulk enqueue feature in .ndo_xdp_xmit.
We cannot use the XDP_XMIT_FLUSH since the architecture is not capable
to store all the frames dequeued in a NAPI cycle so we instead are
enqueueing all the frames received in a ndo_xdp_xmit call right away.

After setting up all FDs for the xdp_frames received, enqueue multiple
frames at a time until all are sent or the maximum number of retries is
hit.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 20:11:29 -07:00
Ioana Ciornei
6aa40b9e5b dpaa2-eth: split the .ndo_xdp_xmit callback into two stages
Instead of having a function that both creates a frame descriptor from
an xdp_frame and enqueues it, split this into two stages.
Add the dpaa2_eth_xdp_create_fd that just transforms an xdp_frame into a
FD while the actual enqueue callback is called directly from the ndo for
each frame.
This is particulary useful in conjunction with bulk enqueue.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 20:11:29 -07:00
Ioana Ciornei
6ff8044751 dpaa2-eth: use the bulk ring mode enqueue interface
Update the dpaa2-eth driver to use the bulk enqueue function introduced
with the change to QBMAN ring mode. At the moment, no functional changes
are made but rather the driver just transitions to the new interface
while still enqueuing just one frame at a time.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 20:11:29 -07:00
Ioana Ciornei
48c0481e5a dpaa2-eth: return num_enqueued frames from enqueue callback
The enqueue dpaa2-eth callback now returns the number of successfully
enqueued frames. This is a preliminary patch necessary for adding
support for bulk ring mode enqueue.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 20:11:29 -07:00
Ioana Ciornei
788f87ac60 xdp: export the DEV_MAP_BULK_SIZE macro
Export the DEV_MAP_BULK_SIZE macro to the header file so that drivers
can directly use it as the maximum number of xdp_frames received in the
.ndo_xdp_xmit() callback.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 20:11:29 -07:00
David Ahern
493f3cc7ee selftests: A few improvements to fib_nexthops.sh
Add nodad when adding IPv6 addresses and remove the sleep.

A recent change to iproute2 moved the 'pref medium' to the prefix
(where it belongs). Change the expected route check to strip
'pref medium' to be compatible with old and new iproute2.

Add IPv4 runtime test with an IPv6 address as the gateway in
the default route.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:59:57 -07:00
David Ahern
7c74b0bec9 ipv4: Update fib_select_default to handle nexthop objects
A user reported [0] hitting the WARN_ON in fib_info_nh:

    [ 8633.839816] ------------[ cut here ]------------
    [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
    ...
    [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
    ...
    [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
    [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
    [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
    [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
    [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
    [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
    [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
    [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
    [ 8633.839867] Call Trace:
    [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
    [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
    [ 8633.839876]  ip_route_output_flow+0x1a/0x50
    [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
    [ 8633.839880]  ip4_datagram_connect+0x28/0x40
    [ 8633.839882]  __sys_connect+0xd6/0x100
    ...

The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.

Add test case that covers the affected code path.

[0] https://github.com/FRRouting/frr/issues/6089

Fixes: 493ced1ac4 ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:57:39 -07:00
Salvatore Bonaccorso
c0259664c6 netlabel: Kconfig: Update reference for NetLabel Tools project
The NetLabel Tools project has moved from http://netlabel.sf.net to a
GitHub project. Update to directly refer to the new home for the tools.

Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:55:01 -07:00
Ioana Ciornei
31fa51ad7c MAINTAINERS: update dpaa2-eth maintainer list
Add myself as another maintainer of dpaa2-eth.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:54:00 -07:00
David S. Miller
250562aff7 Merge branch 'Add-selftests-for-pedit-ex-munge-ip6-dsfield'
Petr Machata says:

====================
Add selftests for pedit ex munge ip6 dsfield

Patch #1 extends the existing generic forwarding selftests to cover pedit
ex munge ip6 traffic_class as well. Patch #2 adds TDC test coverage.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:48:57 -07:00
Petr Machata
f132ccc56e selftests: tc-testing: Add a TDC test for pedit munge ip6 dsfield
Add a self-test for the IPv6 dsfield munge that iproute2 will support.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:48:57 -07:00
Petr Machata
93e106da6a selftests: forwarding: pedit_dsfield: Add pedit munge ip6 dsfield
Extend the pedit_dsfield forwarding selftest with coverage of "pedit ex
munge ip6 dsfield set".

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:48:57 -07:00
Paolo Abeni
9a19371bf0 mptcp: fix data_fin handing in RX path
The data fin flag is set only via a DSS option, but
mptcp_incoming_options() copies it unconditionally from the
provided RX options.

Since we do not clear all the mptcp sock RX options in a
socket free/alloc cycle, we can end-up with a stray data_fin
value while parsing e.g. MPC packets.

That would lead to mapping data corruption and will trigger
a few WARN_ON() in the RX path.

Instead of adding a costly memset(), fetch the data_fin flag
only for DSS packets - when we always explicitly initialize
such bit at option parsing time.

Fixes: 648ef4b886 ("mptcp: Implement MPTCP receive path")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:46:01 -07:00
David S. Miller
ec403d880b Merge branch 'add-TJA1102-support'
Oleksij Rempel says:

====================
add TJA1102 support

changes v5:
- rename __of_mdiobus_register_phy() to of_mdiobus_phy_device_register()

changes v4:
- remove unused phy_id variable

changes v3:
- export part of of_mdiobus_register_phy() and reuse it in tja11xx
  driver
- coding style fixes

changes v2:
- use .match_phy_device
- add irq support
- add add delayed registration for PHY1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:41:26 -07:00
Oleksij Rempel
6a64d3cdc5 net: phy: tja11xx: add delayed registration of TJA1102 PHY1
TJA1102 is a dual PHY package with PHY0 having proper PHYID and PHY1
having no ID. On one hand it is possible to for PHY detection by
compatible, on other hand we should be able to reset complete chip
before PHY1 configured it, and we need to define dependencies for proper
power management.

We can solve it by defining PHY1 as child of PHY0:
	tja1102_phy0: ethernet-phy@4 {
		reg = <0x4>;

		interrupts-extended = <&gpio5 8 IRQ_TYPE_LEVEL_LOW>;

		reset-gpios = <&gpio5 9 GPIO_ACTIVE_LOW>;
		reset-assert-us = <20>;
		reset-deassert-us = <2000>;

		tja1102_phy1: ethernet-phy@5 {
			reg = <0x5>;

			interrupts-extended = <&gpio5 8 IRQ_TYPE_LEVEL_LOW>;
		};
	};

The PHY1 should be a subnode of PHY0 and registered only after PHY0 was
completely reset and initialized.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:41:26 -07:00
Oleksij Rempel
5972157c2d net: mdio: of: export part of of_mdiobus_register_phy()
This function will be needed in tja11xx driver for secondary PHY
support.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:41:26 -07:00
Oleksij Rempel
8f469506de net: phy: tja11xx: add initial TJA1102 support
TJA1102 is an dual T1 PHY chip. Both PHYs are separately addressable.
Both PHYs are similar but have different amount of functionality. For
example PHY 1 has no PHY ID and no health monitor.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:41:26 -07:00
Oleksij Rempel
d9cc193cf0 dt-bindings: net: phy: Add support for NXP TJA11xx
Document the NXP TJA11xx PHY bindings.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:41:26 -07:00
Tang Bin
d0f0c55e7c net: phy: Use IS_ERR() to check and simplify code
Use IS_ERR() and PTR_ERR() instead of PTR_ZRR_OR_ZERO()
to simplify code, avoid redundant paramenter definitions
and judgements.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:40:07 -07:00
Oleksij Rempel
bcf3440c6d net: phy: micrel: add phy-mode support for the KSZ9031 PHY
Add support for following phy-modes: rgmii, rgmii-id, rgmii-txid, rgmii-rxid.

This PHY has an internal RX delay of 1.2ns and no delay for TX.

The pad skew registers allow to set the total TX delay to max 1.38ns and
the total RX delay to max of 2.58ns (configurable 1.38ns + build in
1.2ns) and a minimal delay of 0ns.

According to the RGMII v1.3 specification the delay provided by PCB traces
should be between 1.5ns and 2.0ns. The RGMII v2.0 allows to provide this
delay by MAC or PHY. So, we configure this PHY to the best values we can
get by this HW: TX delay to 1.38ns (max supported value) and RX delay to
1.80ns (best calculated delay)

The phy-modes can still be fine tuned/overwritten by *-skew-ps
device tree properties described in:
Documentation/devicetree/bindings/net/micrel-ksz90x1.txt

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Philippe Schenker <philippe.schenker@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:38:46 -07:00
Jason Yan
8518307dc2 net: caif: use true,false for bool variables
Fix the following coccicheck warning:

net/caif/caif_dev.c:410:2-13: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:445:2-13: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:145:1-12: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:223:1-12: WARNING: Assignment of 0/1 to bool
variable

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:37:48 -07:00
Chuah, Kim Tatt
c89f44ff10 net: stmmac: Add support for VLAN promiscuous mode
For dwmac4, enable VLAN promiscuity when MAC controller is requested to
enter promiscuous mode.

Signed-off-by: Chuah, Kim Tatt <kim.tatt.chuah@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:35:48 -07:00
Wei Yongjun
58e64a312c macvlan: silence RCU list debugging warning
macvlan_hash_lookup() uses list_for_each_entry_rcu() for traversing
should either under RCU in fast path or the protection of rtnl_mutex.

In the case of holding RTNL, we should add the corresponding lockdep
expression to silence the following false-positive warning:

=============================
WARNING: suspicious RCU usage
5.7.0-rc1-next-20200416-00003-ga3b8d28bc #1 Not tainted
-----------------------------
drivers/net/macvlan.c:126 RCU-list traversed in non-reader section!!

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:33:19 -07:00
David Ahern
3f251d7411 selftests: Add tests for vrf and xfrms
Add tests for vrf and xfrms with a second round after adding a
qdisc. There are a few known problems documented with the test
cases that fail. The fix is non-trivial; will come back to it
when time allows.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:31:40 -07:00
David Ahern
a53c102872 vrf: Fix IPv6 with qdisc and xfrm
When a qdisc is attached to the VRF device, the packet goes down the ndo
xmit function which is setup to send the packet back to the VRF driver
which does a lookup to send the packet out. The lookup in the VRF driver
is not considering xfrm policies. Change it to use ip6_dst_lookup_flow
rather than ip6_route_output.

Fixes: 35402e3136 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:30:38 -07:00
Stephen Hemminger
5cc4adbcfc Documentation: add documentation of ping_group_range
Support for non-root users to send ICMP ECHO requests was added
back in Linux 3.0 kernel, but the documentation for the sysctl
to enable it has been missing.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:29:26 -07:00
David S. Miller
609120c52d Merge branch 'sctp-fixes'
Jere Leppänen says:

====================
sctp: Fix problems with peer restart when in SHUTDOWN-PENDING state and socket is closed

These patches are related to the scenario described in commit
bdf6fa52f0 ("sctp: handle association restarts when the socket is
closed."). To recap, when our association is in SHUTDOWN-PENDING state
and we've closed our one-to-one socket, while the peer crashes without
being detected, restarts and reconnects using the same addresses and
ports, we start association shutdown.

In this case, Cumulative TSN Ack in the SHUTDOWN that we send has
always been incorrect. Additionally, bundling of the SHUTDOWN with the
COOKIE-ACK was broken by a later commit. This series fixes both of
these issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:27:40 -07:00
Jere Leppänen
12dfd78e3a sctp: Fix SHUTDOWN CTSN Ack in the peer restart case
When starting shutdown in sctp_sf_do_dupcook_a(), get the value for
SHUTDOWN Cumulative TSN Ack from the new association, which is
reconstructed from the cookie, instead of the old association, which
the peer doesn't have anymore.

Otherwise the SHUTDOWN is either ignored or replied to with an ABORT
by the peer because CTSN Ack doesn't match the peer's Initial TSN.

Fixes: bdf6fa52f0 ("sctp: handle association restarts when the socket is closed.")
Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:27:40 -07:00
Jere Leppänen
145cb2f717 sctp: Fix bundling of SHUTDOWN with COOKIE-ACK
When we start shutdown in sctp_sf_do_dupcook_a(), we want to bundle
the SHUTDOWN with the COOKIE-ACK to ensure that the peer receives them
at the same time and in the correct order. This bundling was broken by
commit 4ff40b8626 ("sctp: set chunk transport correctly when it's a
new asoc"), which assigns a transport for the COOKIE-ACK, but not for
the SHUTDOWN.

Fix this by passing a reference to the COOKIE-ACK chunk as an argument
to sctp_sf_do_9_2_start_shutdown() and onward to
sctp_make_shutdown(). This way the SHUTDOWN chunk is assigned the same
transport as the COOKIE-ACK chunk, which allows them to be bundled.

In sctp_sf_do_9_2_start_shutdown(), the void *arg parameter was
previously unused. Now that we're taking it into use, it must be a
valid pointer to a chunk, or NULL. There is only one call site where
it's not, in sctp_sf_autoclose_timer_expire(). Fix that too.

Fixes: 4ff40b8626 ("sctp: set chunk transport correctly when it's a new asoc")
Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:27:40 -07:00
Xiaoliang Yang
3c7b51bd39 net: dsa: felix: allow flooding for all traffic classes
Right now it can be seen that the VSC9959 (Felix) switch will not flood
frames if they have a VLAN tag with a PCP of 1-7 (nonzero).

It turns out that Felix is quite different from its cousin, Ocelot, in
that frame flooding can be allowed/denied per traffic class. Where
Ocelot has 1 instance of the ANA_FLOODING register, Felix has 8.

The approach that this driver is going to take is "thanks, but no
thanks". We have no use case of limiting the flooding domain based on
traffic class, so we just want to allow packets to be flooded, no matter
what traffic class they have.

So we copy the line of code from ocelot.c which does the one-shot
initialization of the flooding PGIDs, and we add it to felix.c as well -
except replicated 8 times.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:26:33 -07:00
Vladimir Oltean
72579e14a1 net: dsa: don't fail to probe if we couldn't set the MTU
There is no reason to fail the probing of the switch if the MTU couldn't
be configured correctly (either the switch port itself, or the host
port) for whatever reason. MTU-sized traffic probably won't work, sure,
but we can still probably limp on and support some form of communication
anyway, which the users would probably appreciate more.

Fixes: bfcb813203 ("net: dsa: configure the MTU for switch ports")
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:22:59 -07:00
Eric Dumazet
a1211bf9a7 sched: etf: do not assume all sockets are full blown
skb->sk does not always point to a full blown socket,
we need to use sk_fullsock() before accessing fields which
only make sense on full socket.

BUG: KASAN: use-after-free in report_sock_error+0x286/0x300 net/sched/sch_etf.c:141
Read of size 1 at addr ffff88805eb9b245 by task syz-executor.5/9630

CPU: 1 PID: 9630 Comm: syz-executor.5 Not tainted 5.7.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382
 __kasan_report.cold+0x35/0x4d mm/kasan/report.c:511
 kasan_report+0x33/0x50 mm/kasan/common.c:625
 report_sock_error+0x286/0x300 net/sched/sch_etf.c:141
 etf_enqueue_timesortedlist+0x389/0x740 net/sched/sch_etf.c:170
 __dev_xmit_skb net/core/dev.c:3710 [inline]
 __dev_queue_xmit+0x154a/0x30a0 net/core/dev.c:4021
 neigh_hh_output include/net/neighbour.h:499 [inline]
 neigh_output include/net/neighbour.h:508 [inline]
 ip6_finish_output2+0xfb5/0x25b0 net/ipv6/ip6_output.c:117
 __ip6_finish_output+0x442/0xab0 net/ipv6/ip6_output.c:143
 ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip6_output+0x239/0x810 net/ipv6/ip6_output.c:176
 dst_output include/net/dst.h:435 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip6_xmit+0xe1a/0x2090 net/ipv6/ip6_output.c:280
 tcp_v6_send_synack+0x4e7/0x960 net/ipv6/tcp_ipv6.c:521
 tcp_rtx_synack+0x10d/0x1a0 net/ipv4/tcp_output.c:3916
 inet_rtx_syn_ack net/ipv4/inet_connection_sock.c:669 [inline]
 reqsk_timer_handler+0x4c2/0xb40 net/ipv4/inet_connection_sock.c:763
 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1405
 expire_timers kernel/time/timer.c:1450 [inline]
 __run_timers kernel/time/timer.c:1774 [inline]
 __run_timers kernel/time/timer.c:1741 [inline]
 run_timer_softirq+0x623/0x1600 kernel/time/timer.c:1787
 __do_softirq+0x26c/0x9f7 kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:373 [inline]
 irq_exit+0x192/0x1d0 kernel/softirq.c:413
 exiting_irq arch/x86/include/asm/apic.h:546 [inline]
 smp_apic_timer_interrupt+0x19e/0x600 arch/x86/kernel/apic/apic.c:1140
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
 </IRQ>
RIP: 0010:des_encrypt+0x157/0x9c0 lib/crypto/des.c:792
Code: 85 22 06 00 00 41 31 dc 41 8b 4d 04 44 89 e2 41 83 e4 3f 4a 8d 3c a5 60 72 72 88 81 e2 3f 3f 3f 3f 48 89 f8 48 c1 e8 03 31 d9 <0f> b6 34 28 48 89 f8 c1 c9 04 83 e0 07 83 c0 03 40 38 f0 7c 09 40
RSP: 0018:ffffc90003b5f6c0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff10e4e55 RBX: 00000000d2f846d0 RCX: 00000000d2f846d0
RDX: 0000000012380612 RSI: ffffffff839863ca RDI: ffffffff887272a8
RBP: dffffc0000000000 R08: ffff888091d0a380 R09: 0000000000800081
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000012
R13: ffff8880a8ae8078 R14: 00000000c545c93e R15: 0000000000000006
 cipher_crypt_one crypto/cipher.c:75 [inline]
 crypto_cipher_encrypt_one+0x124/0x210 crypto/cipher.c:82
 crypto_cbcmac_digest_update+0x1b5/0x250 crypto/ccm.c:830
 crypto_shash_update+0xc4/0x120 crypto/shash.c:119
 shash_ahash_update+0xa3/0x110 crypto/shash.c:246
 crypto_ahash_update include/crypto/hash.h:547 [inline]
 hash_sendmsg+0x518/0xad0 crypto/algif_hash.c:102
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:672
 ____sys_sendmsg+0x308/0x7e0 net/socket.c:2362
 ___sys_sendmsg+0x100/0x170 net/socket.c:2416
 __sys_sendmmsg+0x195/0x480 net/socket.c:2506
 __do_sys_sendmmsg net/socket.c:2535 [inline]
 __se_sys_sendmmsg net/socket.c:2532 [inline]
 __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2532
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x45c829
Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f6d9528ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000004fc080 RCX: 000000000045c829
RDX: 0000000000000001 RSI: 0000000020002640 RDI: 0000000000000004
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000008d7 R14: 00000000004cb7aa R15: 00007f6d9528f6d4

Fixes: 4b15c70753 ("net/sched: Make etf report drops on error_queue")
Fixes: 25db26a913 ("net/sched: Introduce the ETF Qdisc")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:20:28 -07:00
Jason Yan
d013496f99 tracing: Convert local functions in tracing_map.c to static
Fix the following sparse warning:

kernel/trace/tracing_map.c:286:6: warning: symbol
'tracing_map_array_clear' was not declared. Should it be static?
kernel/trace/tracing_map.c:297:6: warning: symbol
'tracing_map_array_free' was not declared. Should it be static?
kernel/trace/tracing_map.c:319:26: warning: symbol
'tracing_map_array_alloc' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20200410073312.38855-1-yanaijie@huawei.com

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-22 22:07:26 -04:00
Nikolay Borisov
a2806ef77f tracing: Remove DECLARE_TRACE_NOARGS
This macro was intentionally broken so that the kernel code is not
poluted with such noargs macro used simply as markers. This use case
can be satisfied by using dummy no inline functions. Just remove it.

Link: http://lkml.kernel.org/r/20200413153246.8511-1-nborisov@suse.com

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-22 22:06:35 -04:00
Steven Rostedt (VMware)
353da87921 ftrace: Fix memory leak caused by not freeing entry in unregister_ftrace_direct()
kmemleak reported the following:

unreferenced object 0xffff90d47127a920 (size 32):
  comm "modprobe", pid 1766, jiffies 4294792031 (age 162.568s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 22 01 00 00 00 00 ad de  ........".......
    00 78 12 a7 ff ff ff ff 00 00 b6 c0 ff ff ff ff  .x..............
  backtrace:
    [<00000000bb79e72e>] register_ftrace_direct+0xcb/0x3a0
    [<00000000295e4f79>] do_one_initcall+0x72/0x340
    [<00000000873ead18>] do_init_module+0x5a/0x220
    [<00000000974d9de5>] load_module+0x2235/0x2550
    [<0000000059c3d6ce>] __do_sys_finit_module+0xc0/0x120
    [<000000005a8611b4>] do_syscall_64+0x60/0x230
    [<00000000a0cdc49e>] entry_SYSCALL_64_after_hwframe+0x49/0xb3

The entry used to save the direct descriptor needs to be freed
when unregistering.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-22 22:04:27 -04:00
Masahiro Yamada
ff4634eeed Documentation: kbuild: fix the section title format
Make it consistent with the other sections.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-04-23 10:53:19 +09:00
Vitor Massaru Iha
63ec90f182 um: ensure make ARCH=um mrproper removes arch/$(SUBARCH)/include/generated/
In this workflow:

$ make ARCH=um defconfig && make ARCH=um -j8
  [snip]
$ make ARCH=um mrproper
  [snip]
$ make ARCH=um defconfig O=./build_um && make ARCH=um -j8 O=./build_um
  [snip]
  CC      scripts/mod/empty.o
In file included from ../include/linux/types.h:6,
                 from ../include/linux/mod_devicetable.h:12,
                 from ../scripts/mod/devicetable-offsets.c:3:
../include/uapi/linux/types.h:5:10: fatal error: asm/types.h: No such file or directory
    5 | #include <asm/types.h>
      |          ^~~~~~~~~~~~~
compilation terminated.
make[2]: *** [../scripts/Makefile.build💯 scripts/mod/devicetable-offsets.s] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [/home/iha/sdb/opensource/lkmp/linux-kselftest.git/Makefile:1140: prepare0] Error 2
make[1]: Leaving directory '/home/iha/sdb/opensource/lkmp/linux-kselftest.git/build_um'
make: *** [Makefile:180: sub-make] Error 2

The cause of the error was because arch/$(SUBARCH)/include/generated files
weren't properly cleaned by `make ARCH=um mrproper`.

Fixes: a788b2ed81 ("kbuild: check arch/$(SRCARCH)/include/generated before out-of-tree build")
Reported-by: Theodore Ts'o <tytso@mit.edu>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vitor Massaru Iha <vitor@massaru.org>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Link: https://groups.google.com/forum/#!msg/kunit-dev/QmA27YEgEgI/hvS1kiz2CwAJ
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-04-23 10:50:26 +09:00
Masahiro Yamada
62d0fd591d arch: split MODULE_ARCH_VERMAGIC definitions out to <asm/vermagic.h>
As the bug report [1] pointed out, <linux/vermagic.h> must be included
after <linux/module.h>.

I believe we should not impose any include order restriction. We often
sort include directives alphabetically, but it is just coding style
convention. Technically, we can include header files in any order by
making every header self-contained.

Currently, arch-specific MODULE_ARCH_VERMAGIC is defined in
<asm/module.h>, which is not included from <linux/vermagic.h>.

Hence, the straight-forward fix-up would be as follows:

|--- a/include/linux/vermagic.h
|+++ b/include/linux/vermagic.h
|@@ -1,5 +1,6 @@
| /* SPDX-License-Identifier: GPL-2.0 */
| #include <generated/utsrelease.h>
|+#include <linux/module.h>
|
| /* Simply sanity version stamp for modules. */
| #ifdef CONFIG_SMP

This works enough, but for further cleanups, I split MODULE_ARCH_VERMAGIC
definitions into <asm/vermagic.h>.

With this, <linux/module.h> and <linux/vermagic.h> will be orthogonal,
and the location of MODULE_ARCH_VERMAGIC definitions will be consistent.

For arc and ia64, MODULE_PROC_FAMILY is only used for defining
MODULE_ARCH_VERMAGIC. I squashed it.

For hexagon, nds32, and xtensa, I removed <asm/modules.h> entirely
because they contained nothing but MODULE_ARCH_VERMAGIC definition.
Kbuild will automatically generate <asm/modules.h> at build-time,
wrapping <asm-generic/module.h>.

[1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnic

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
2020-04-23 10:50:26 +09:00
Masahiro Yamada
3d4b223868 kbuild: fix DT binding schema rule again to avoid needless rebuilds
Since commit 7a04960560 ("kbuild: fix DT binding schema rule to detect
command line changes"), this rule is every time re-run even if you change
nothing.

cmd_dtc takes one additional parameter to pass to the -O option of dtc.

We need to pass 'yaml' to if_changed_rule. Otherwise, cmd-check invoked
from if_changed_rule is false positive.

Fixes: 7a04960560 ("kbuild: fix DT binding schema rule to detect command line changes")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-04-23 10:50:26 +09:00
Vamshi K Sthambamkadi
9da73974eb tracing: Fix memory leaks in trace_events_hist.c
kmemleak report 1:
    [<9092c50b>] kmem_cache_alloc_trace+0x138/0x270
    [<05a2c9ed>] create_field_var+0xcf/0x180
    [<528a2d68>] action_create+0xe2/0xc80
    [<63f50b61>] event_hist_trigger_func+0x15b5/0x1920
    [<28ea5d3d>] trigger_process_regex+0x7b/0xc0
    [<3138e86f>] event_trigger_write+0x4d/0xb0
    [<ffd66c19>] __vfs_write+0x30/0x200
    [<4f424a0d>] vfs_write+0x96/0x1b0
    [<da59a290>] ksys_write+0x53/0xc0
    [<3717101a>] __ia32_sys_write+0x15/0x20
    [<c5f23497>] do_fast_syscall_32+0x70/0x250
    [<46e2629c>] entry_SYSENTER_32+0xaf/0x102

This is because save_vars[] of struct hist_trigger_data are
not destroyed

kmemleak report 2:
    [<9092c50b>] kmem_cache_alloc_trace+0x138/0x270
    [<6e5e97c5>] create_var+0x3c/0x110
    [<de82f1b9>] create_field_var+0xaf/0x180
    [<528a2d68>] action_create+0xe2/0xc80
    [<63f50b61>] event_hist_trigger_func+0x15b5/0x1920
    [<28ea5d3d>] trigger_process_regex+0x7b/0xc0
    [<3138e86f>] event_trigger_write+0x4d/0xb0
    [<ffd66c19>] __vfs_write+0x30/0x200
    [<4f424a0d>] vfs_write+0x96/0x1b0
    [<da59a290>] ksys_write+0x53/0xc0
    [<3717101a>] __ia32_sys_write+0x15/0x20
    [<c5f23497>] do_fast_syscall_32+0x70/0x250
    [<46e2629c>] entry_SYSENTER_32+0xaf/0x102

struct hist_field allocated through create_var() do not initialize
"ref" field to 1. The code in __destroy_hist_field() does not destroy
object if "ref" is initialized to zero, the condition
if (--hist_field->ref > 1) always passes since unsigned int wraps.

kmemleak report 3:
    [<f8666fcc>] __kmalloc_track_caller+0x139/0x2b0
    [<bb7f80a5>] kstrdup+0x27/0x50
    [<39d70006>] init_var_ref+0x58/0xd0
    [<8ca76370>] create_var_ref+0x89/0xe0
    [<f045fc39>] action_create+0x38f/0xc80
    [<7c146821>] event_hist_trigger_func+0x15b5/0x1920
    [<07de3f61>] trigger_process_regex+0x7b/0xc0
    [<e87daf8f>] event_trigger_write+0x4d/0xb0
    [<19bf1512>] __vfs_write+0x30/0x200
    [<64ce4d27>] vfs_write+0x96/0x1b0
    [<a6f34170>] ksys_write+0x53/0xc0
    [<7d4230cd>] __ia32_sys_write+0x15/0x20
    [<8eadca00>] do_fast_syscall_32+0x70/0x250
    [<235cf985>] entry_SYSENTER_32+0xaf/0x102

hist_fields (system & event_name) are not freed

Link: http://lkml.kernel.org/r/20200422061503.GA5151@cosmos

Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-22 21:24:00 -04:00
Julien Thierry
7f9b34f36c objtool: Fix off-by-one in symbol_by_offset()
Sometimes, WARN_FUNC() and other users of symbol_by_offset() will
associate the first instruction of a symbol with the symbol preceding
it.  This is because symbol->offset + symbol->len is already outside of
the symbol's range.

Fixes: 2a362ecc3e ("objtool: Optimize find_symbol_*() and read_symbols()")
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-04-22 23:14:46 +02:00
Ian Rogers
f3bed55e85 perf/core: fix parent pid/tid in task exit events
Current logic yields the child task as the parent.

Before:
$ perf record bash -c "perf list > /dev/null"
$ perf script -D |grep 'FORK\|EXIT'
4387036190981094 0x5a70 [0x30]: PERF_RECORD_FORK(10472:10472):(10470:10470)
4387036606207580 0xf050 [0x30]: PERF_RECORD_EXIT(10472:10472):(10472:10472)
4387036607103839 0x17150 [0x30]: PERF_RECORD_EXIT(10470:10470):(10470:10470)
                                                   ^
  Note the repeated values here -------------------/

After:
383281514043 0x9d8 [0x30]: PERF_RECORD_FORK(2268:2268):(2266:2266)
383442003996 0x2180 [0x30]: PERF_RECORD_EXIT(2268:2268):(2266:2266)
383451297778 0xb70 [0x30]: PERF_RECORD_EXIT(2266:2266):(2265:2265)

Fixes: 94d5d1b2d8 ("perf_counter: Report the cloning task as parent on perf_counter_fork()")
Reported-by: KP Singh <kpsingh@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200417182842.12522-1-irogers@google.com
2020-04-22 23:10:14 +02:00
Quentin Perret
eaf5a92ebd sched/core: Fix reset-on-fork from RT with uclamp
uclamp_fork() resets the uclamp values to their default when the
reset-on-fork flag is set. It also checks whether the task has a RT
policy, and sets its uclamp.min to 1024 accordingly. However, during
reset-on-fork, the task's policy is lowered to SCHED_NORMAL right after,
hence leading to an erroneous uclamp.min setting for the new task if it
was forked from RT.

Fix this by removing the unnecessary check on rt_task() in
uclamp_fork() as this doesn't make sense if the reset-on-fork flag is
set.

Fixes: 1a00d99997 ("sched/uclamp: Set default clamps for RT tasks")
Reported-by: Chitti Babu Theegala <ctheegal@codeaurora.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Patrick Bellasi <patrick.bellasi@matbug.net>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/20200416085956.217587-1-qperret@google.com
2020-04-22 23:10:13 +02:00
Giovanni Gherdovich
db441bd9f6 x86, sched: Move check for CPU type to caller function
Improve readability of the function intel_set_max_freq_ratio() by moving
the check for KNL CPUs there, together with checks for GLM and SKX.

Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-5-ggherdovich@suse.cz
2020-04-22 23:10:13 +02:00
Peter Zijlstra (Intel)
b56e7d45e8 x86, sched: Don't enable static key when starting secondary CPUs
The static key arch_scale_freq_key only needs to be enabled once (at
boot). This change fixes a bug by which the key was enabled every time cpu0
is started, even as a secondary CPU during cpu hotplug. Secondary CPUs are
started from the idle thread: setting a static key from there means
acquiring a lock and may result in sleeping in the idle task, causing CPU
lockup.

Another consequence of this change is that init_counter_refs() is now
called on each CPU correctly; previously the function on_each_cpu() was
used, but it was called at boot when the only online cpu is cpu0.

[ggherdovich@suse.cz: Tested and wrote changelog]
Fixes: 1567c3e346 ("x86, sched: Add support for frequency invariance")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-4-ggherdovich@suse.cz
2020-04-22 23:10:13 +02:00
Giovanni Gherdovich
23ccee22e8 x86, sched: Account for CPUs with less than 4 cores in freq. invariance
If a CPU has less than 4 physical cores, MSR_TURBO_RATIO_LIMIT will
rightfully report that the 4C turbo ratio is zero. In such cases, use the
1C turbo ratio instead for frequency invariance calculations.

Fixes: 1567c3e346 ("x86, sched: Add support for frequency invariance")
Reported-by: Like Xu <like.xu@linux.intel.com>
Reported-by: Neil Rickert <nwr10cst-oslnx@yahoo.com>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Link: https://lkml.kernel.org/r/20200416054745.740-3-ggherdovich@suse.cz
2020-04-22 23:10:13 +02:00
Giovanni Gherdovich
9a6c2c3c7a x86, sched: Bail out of frequency invariance if base frequency is unknown
Some hypervisors such as VMWare ESXi 5.5 advertise support for
X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
clock frequency of the CPU (highest non-turbo frequency), producing a
division by zero when computing the ratio turbo_freq/base_freq necessary
for frequency invariant accounting.

It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
couldn't be done in principle (not that it would make a lot of sense in a
VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
that feature.

Fixes: 1567c3e346 ("x86, sched: Add support for frequency invariance")
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-2-ggherdovich@suse.cz
2020-04-22 23:10:13 +02:00