Commit Graph

813892 Commits

Author SHA1 Message Date
Peter Oskolkov
9d6b3584a7 selftests: bpf: test_lwt_ip_encap: add negative tests.
As requested by David Ahern:

- add negative tests (no routes, explicitly unreachable destinations)
  to exercize error handling code paths;
- do not exit on test failures, but instead print a summary of
  passed/failed tests at the end.

Future patches will add TSO and VRF tests.

Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-16 18:41:44 -08:00
Alexandre Torgue
f186a82b10 net: stmmac: use correct define to get rx timestamp on GMAC4
In dwmac4_wrback_get_rx_timestamp_status we looking for a RX timestamp.
For that receive descriptors are handled and so we should use defines
related to receive descriptors. It'll no change the functional behavior
as RDES3_RDES1_VALID=TDES3_RS1V=BIT(26) but it makes code easier to read.

Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-16 18:13:58 -08:00
Dan Carpenter
d0edde8d29 atm: clean up vcc_seq_next()
It's confusing to call PTR_ERR(v).  The PTR_ERR() function is basically
a fancy cast to long so it makes you wonder, was IS_ERR() intended?  But
that doesn't make sense because vcc_walk() doesn't return error
pointers.

This patch doesn't affect runtime, it's just a cleanup.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-16 18:12:22 -08:00
Guillaume Nault
4057765f2d sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values
SO_SNDBUF and SO_RCVBUF (and their *BUFFORCE version) may overflow or
underflow their input value. This patch aims at providing explicit
handling of these extreme cases, to get a clear behaviour even with
values bigger than INT_MAX / 2 or lower than INT_MIN / 2.

For simplicity, only SO_SNDBUF and SO_SNDBUFFORCE are described here,
but the same explanation and fix apply to SO_RCVBUF and SO_RCVBUFFORCE
(with 'SNDBUF' replaced by 'RCVBUF' and 'wmem_max' by 'rmem_max').

Overflow of positive values

===========================

When handling SO_SNDBUF or SO_SNDBUFFORCE, if 'val' exceeds
INT_MAX / 2, the buffer size is set to its minimum value because
'val * 2' overflows, and max_t() considers that it's smaller than
SOCK_MIN_SNDBUF. For SO_SNDBUF, this can only happen with
net.core.wmem_max > INT_MAX / 2.

SO_SNDBUF and SO_SNDBUFFORCE are actually designed to let users probe
for the maximum buffer size by setting an arbitrary large number that
gets capped to the maximum allowed/possible size. Having the upper
half of the positive integer space to potentially reduce the buffer
size to its minimum value defeats this purpose.

This patch caps the base value to INT_MAX / 2, so that bigger values
don't overflow and keep setting the buffer size to its maximum.

Underflow of negative values
============================

For negative numbers, SO_SNDBUF always considers them bigger than
net.core.wmem_max, which is bounded by [SOCK_MIN_SNDBUF, INT_MAX].
Therefore such values are set to net.core.wmem_max and we're back to
the behaviour of positive integers described above (return maximum
buffer size if wmem_max <= INT_MAX / 2, return SOCK_MIN_SNDBUF
otherwise).

However, SO_SNDBUFFORCE behaves differently. The user value is
directly multiplied by two and compared with SOCK_MIN_SNDBUF. If
'val * 2' doesn't underflow or if it underflows to a value smaller
than SOCK_MIN_SNDBUF then buffer size is set to its minimum value.
Otherwise the buffer size is set to the underflowed value.

This patch treats negative values passed to SO_SNDBUFFORCE as null, to
prevent underflows. Therefore negative values now always set the buffer
size to its minimum value.

Even though SO_SNDBUF behaves inconsistently by setting buffer size to
the maximum value when passed a negative number, no attempt is made to
modify this behaviour. There may exist some programs that rely on using
negative numbers to set the maximum buffer size. Avoiding overflows
because of extreme net.core.wmem_max values is the most we can do here.

Summary of altered behaviours
=============================

val      : user-space value passed to setsockopt()
val_uf   : the underflowed value resulting from doubling val when
           val < INT_MIN / 2
wmem_max : short for net.core.wmem_max
val_cap  : min(val, wmem_max)
min_len  : minimal buffer length (that is, SOCK_MIN_SNDBUF)
max_len  : maximal possible buffer length, regardless of wmem_max (that
           is, INT_MAX - 1)
^^^^     : altered behaviour

SO_SNDBUF:
+-------------------------+-------------+------------+----------------+
|       CONDITION         | OLD RESULT  | NEW RESULT |    COMMENT     |
+-------------------------+-------------+------------+----------------+
| val < 0 &&              |             |            | No overflow,   |
| wmem_max <= INT_MAX/2   | wmem_max*2  | wmem_max*2 | keep original  |
|                         |             |            | behaviour      |
+-------------------------+-------------+------------+----------------+
| val < 0 &&              |             |            | Cap wmem_max   |
| INT_MAX/2 < wmem_max    | min_len     | max_len    | to prevent     |
|                         |             | ^^^^^^^    | overflow       |
+-------------------------+-------------+------------+----------------+
| 0 <= val <= min_len/2   | min_len     | min_len    | Ordinary case  |
+-------------------------+-------------+------------+----------------+
| min_len/2 < val &&      | val_cap*2   | val_cap*2  | Ordinary case  |
| val_cap <= INT_MAX/2    |             |            |                |
+-------------------------+-------------+------------+----------------+
| min_len < val &&        |             |            | Cap val_cap    |
| INT_MAX/2 < val_cap     | min_len     | max_len    | again to       |
| (implies that           |             | ^^^^^^^    | prevent        |
| INT_MAX/2 < wmem_max)   |             |            | overflow       |
+-------------------------+-------------+------------+----------------+

SO_SNDBUFFORCE:
+------------------------------+---------+---------+------------------+
|          CONDITION           | BEFORE  | AFTER   |     COMMENT      |
|                              | PATCH   | PATCH   |                  |
+------------------------------+---------+---------+------------------+
| val < INT_MIN/2 &&           | min_len | min_len | Underflow with   |
| val_uf <= min_len            |         |         | no consequence   |
+------------------------------+---------+---------+------------------+
| val < INT_MIN/2 &&           | val_uf  | min_len | Set val to 0 to  |
| val_uf > min_len             |         | ^^^^^^^ | avoid underflow  |
+------------------------------+---------+---------+------------------+
| INT_MIN/2 <= val < 0         | min_len | min_len | No underflow     |
+------------------------------+---------+---------+------------------+
| 0 <= val <= min_len/2        | min_len | min_len | Ordinary case    |
+------------------------------+---------+---------+------------------+
| min_len/2 < val <= INT_MAX/2 | val*2   | val*2   | Ordinary case    |
+------------------------------+---------+---------+------------------+
| INT_MAX/2 < val              | min_len | max_len | Cap val to       |
|                              |         | ^^^^^^^ | prevent overflow |
+------------------------------+---------+---------+------------------+

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-16 18:09:54 -08:00
Linus Torvalds
64c0133eb8 ARM: SoC fixes for 5.0
This week is a much smaller update, containing fixes only for TI OMAP,
 NXP i.MX and Rockchips platforms:
 
  - omap4 had problems with lost timer interrupts
  - another IRQ handling issue with OMAP5
  - A workaround for a regression in the pwm-omap-dmtimer driver
 
  - eMMC was broken on the new imx8mq-evk board
 
  - a fix for new dtc graph warnings and a regulator fix for rock64
  - USB support broke on rk3328-rock64
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJcaGuEAAoJEGCrR//JCVIn3ngQAMALxG1ApQ9cVYZjYvah5p6Y
 MJ5moKFjxI4wU0KjCIY0wH6/C8pfzfJpplTegkGGY/88j+XG0qMHF6Sn3QBD0WV+
 kKlunfmRZJs2a1A36M4fuJ3zJCKI/J5UPwPAmjNZMXN0ZpTN9Xtgub/M8wHE5BGm
 dwq6P3sDjkINK9pPBH+shZOS4i/AOsZ30LFJQCLiY299YWEqFZQ6+13ZgB2AYfXD
 apU6Pn2u+wN1vSkQiX9H58A45xVcMaisTiqYJZWF9F4+7Lmmg76VSSY+egAR/LHF
 /XCYlsPMmSoIHHwb4osRuzf+uiGXj82f+E7irLYUcFxyqp43cftDyGm70pVYCfmu
 NZRDaqUqJU/vkhSYpXpOKUT6Feja0wk52QRPaYzphv3/7fpZFGQl5aFK+Hm7CKoH
 mZh86YUodYtLvVyLwxtGRCy+XiWNf6k9IuT79oQS/qLSNvKonpJ0hpLzT/4mQNSE
 bPc4wMZw3z5ImMDzNS1q/tHWKsKh8sDrPvHUC8SkztXnCN/c0ocRjuXtHmqvGbMo
 yE19ewquyZRlaxvcN6f1wpKf48D5PSqf4lNOXgO8WY9C/Y5KyW9wn0GvRxdKKE13
 yayxHCN2AHhF9RDgF+9+3jFblXsA44QbE/aJRfnu5egrROHsdNWtqVk4PX6rjMI8
 kdsTGih505XSE0Tm2/WH
 =mo4H
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "This week is a much smaller update, containing fixes only for TI OMAP,
  NXP i.MX and Rockchips platforms:

  omap:
   - omap4 had problems with lost timer interrupts
   - another IRQ handling issue with OMAP5
   - A workaround for a regression in the pwm-omap-dmtimer driver

  NXP i.MX:
   - eMMC was broken on the new imx8mq-evk board

  Rockchip:
   - a fix for new dtc graph warnings and a regulator fix for rock64
   - USB support broke on rk3328-rock64"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: OMAP2+: fix lack of timer interrupts on CPU1 after hotplug
  arm64: dts: imx8mq: Fix boot from eMMC
  ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
  ARM: dts: Configure clock parent for pwm vibra
  bus: ti-sysc: Fix timer handling with drop pm_runtime_irq_safe()
  arm64: dts: rockchip: enable usb-host regulators at boot on rk3328-rock64
  arm64: dts: rockchip: fix graph_port warning on rk3399 bob kevin and excavator
  ARM: OMAP5+: Fix inverted nirq pin interrupts with irq_set_type
  clocksource: timer-ti-dm: Fix pwm dmtimer usage of fck reparenting
  ARM: dts: rockchip: remove qos_cif1 from rk3188 power-domain
2019-02-16 17:44:12 -08:00
Linus Torvalds
88fe73cb80 Two small fixes, one for crashes using nfs/krb5 with older enctypes, one
that could prevent clients from reclaiming state after a kernel upgrade.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcZzZHAAoJECebzXlCjuG+EOsQALVuwSJqQh4GUVMSBYzL6Ov4
 SfinB8LJ8/1HwngSvRB3xQ4HiOtpFSNkjzfFYE7epy6augY8tRRnHGbnlHbsG5vI
 wQqTR6PbSq2mupgpi2WGRlRh521SDOi8V49fplUC+FuV7dJT/wm0hgdKsHCPHPX4
 TEYPglsvG6PLu5IcAofNac9PVZH21s3yVIKvqd6yifED5lhopdNw210s5DtzvugI
 g2JgHOhTfana+xQS/cJ1U8JHbbpM7jwOXAJ7IWD8k4GXdAW03X6jNOcseudcBTQY
 qSL33//6Xdu0r0uI21z4ZWxSWCOtt8YvnbMoG4EBqh3DpKbUpExh8j4eIyNPSuSF
 Y/8iAVJ9KWYhWO+IVPqvHVXz4mCIDK+f7iJ/m+lLjOQmWkpp6koeUDjKs4k9zBUC
 mbGTOrh0TJzXvKWKEU5Qy7meZVJGUpV+9ca+cDs5XN7Xa3blTp+5VrRVeDgKO5Kx
 OF3Y3IBOWhqN7+kEH98RvdZAmtbO0zg02IEIHOMPxH69JU8o0EsEni1LXsqDJrRi
 sLVYXvLwdPLfkqSjpI8xNeaoFXeelopx8Re+2oNEFIEvsfeT5XikbQHoqgFJNsyk
 hz7PHwuyGjc6NJRRSBUKYouWKPP4rrM7ZiOSyIEDYIIwyhBirpjrECaHzdi3D5j+
 xUyFGMF5F3wk1fdQHPfD
 =NopI
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.0-2' of git://linux-nfs.org/~bfields/linux

Pull more nfsd fixes from Bruce Fields:
 "Two small fixes, one for crashes using nfs/krb5 with older enctypes,
  one that could prevent clients from reclaiming state after a kernel
  upgrade"

* tag 'nfsd-5.0-2' of git://linux-nfs.org/~bfields/linux:
  sunrpc: fix 4 more call sites that were using stack memory with a scatterlist
  Revert "nfsd4: return default lease period"
2019-02-16 17:38:01 -08:00
Linus Torvalds
55638c520b More NFS client fixes for Linux 5.0
- Make sure Send CQ is allocated on an existing compvec
 - Properly check debugfs dentry before using it
 - Don't use page_file_mapping() after removing a page
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlxnMQ0ACgkQ18tUv7Cl
 QOsbhQ//VhgoXX25xHrApLz8wMuYPNOboDFSUf0O1GWoHi3opHnP+9LPf/iZkRQy
 YS0ufcO95i1LGjZLb8ac9hBWkko8TBl/dIONsG4ppf2bAbiVuag848wehi8hsGba
 zaSsXV6qdibq4qZsyK35hh0cHVHDgB1EMTu7AVORdvXsTHVX3xL86vts2y2VSLKv
 w9yKQBg4E4pWwENi7v77icSuGg/WpwfKnYxBzG6JPXuHQLGidyc/HrnVmLwhd6DQ
 0Sa6nzOAvgjjgVibB+tJfsitScmMTsaxulvHsm5iLjPJZ8SUjxYvAPl3AZdCYPvU
 XaADy8nrvXJUe9APhMINbkoxnF4W/OPnUMG3bWkWp2LeNZvk5l7VOzTW5Sh49Xyk
 pBAOd7qr3kfjFdvzypVz9NeXuS6BsTUA6LAudo8rF7nxi8jHPp6L+zZNWVrPIjY0
 +bNIj3K1Bji3jU9vTHyTzxDRB/4ZnzJaPF2Gv/5Y2cvkI7mfzHUz5p6cAU1OPIVB
 kuhZXkQFEPSS2OV6MUOe/HgmtY0oLM3XU9cEaFkLz59D1kb1fjO/yUu9YBQMq6Ke
 o6b7Dwh4WvLVN/AbgegKOnp5G0/ljmz6y7ML0AElYXg1iT4k0zE+qJpMWhOTRJnd
 +jf4hSS+l7p7D1ed+uqdMS/jc1s5vcuxwYDQUIutELjA/TCbLNI=
 =28v+
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull more NFS client fixes from Anna Schumaker:
 "Three fixes this time.

  Nicolas's is for xprtrdma completion vector allocation on single-core
  systems. Greg's adds an error check when allocating a debugfs dentry.
  And Ben's is an additional fix for nfs_page_async_flush() to prevent
  pages from accidentally getting truncated.

  Summary:

   - Make sure Send CQ is allocated on an existing compvec

   - Properly check debugfs dentry before using it

   - Don't use page_file_mapping() after removing a page"

* tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Don't use page_file_mapping after removing the page
  rpc: properly check debugfs dentry before using it
  xprtrdma: Make sure Send CQ is allocated on an existing compvec
2019-02-16 17:33:39 -08:00
Linus Torvalds
9a7dcde4a6 auxdisplay:
- ht16k33: fix potential user-after-free on module unload
     Reported by Sven Van Asbroeck
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAlxnD4EACgkQGXyLc2ht
 IW2H9hAAosoyAvsXleY8bM/zwxv8CEObUjNDF/rq6ytrMCuYKM5YlTGrIQ3C3Fho
 zqKFjLN+f29pQjLoIaD6QoEdb3xTZ8vVdbnB/1W1BhS+7ZXdX8D5dQ8Mpuj7AiHj
 cqYSH7wIgghri6AXPgIUn1hfjXUb4UnNJCGXrtVim9SUyfkLu3fmk7JUWvUwF7Wi
 I2xJD+QSSE9KPjf3mZvViX56jTUDAfkf5ZYQTENMoaFLqBdjxdcLoBC7RbK5sQIw
 y5PJMtVCUmpNSuox/4/N3A3KBwuZRDfbVKV7NR1OS3WG+RXruH0vFqYoNpyrUU7U
 +G8j7pUda6XYylCU0axnrdRZXww4Ewu0y+5U6tkicFEd1eBKZeZ5tOhtxLr0w4Os
 bYXyfeQpF8S2We+ZiJeQ//1SaXa6C9BoL3+C7l88r5RO7cRRhR2ab8WyiW8M1+Uk
 59oo4jfebbxVNkGM9RIFMKXJcUCk3jzhOXgu28bEO0rIdyX6Vm8jMb7Y2TRydt2D
 IIgt3helOX7o1KkpFgCFtdOMQagTmgirE7uSokd5o6mAiARHmYkerkdp+ZHsztZr
 k7xvURtS7eI0+W9Yf6Er7MdusmCdSSz6Q5nRf5pE4XorYlLfGxVPPGHWEu8obycC
 lQF3mNgB8ziA/GbhHpCxJppbOrrfsycUDmAyNCZM+jEPFmLB+9Q=
 =K/Pk
 -----END PGP SIGNATURE-----

Merge tag 'auxdisplay-for-linus-v5.0-rc7' of git://github.com/ojeda/linux

Pull auxdisplay fix from Miguel Ojeda:
 "Fix potential user-after-free on ht16k33 module unload. Reported by
  Sven Van Asbroeck"

* tag 'auxdisplay-for-linus-v5.0-rc7' of git://github.com/ojeda/linux:
  auxdisplay: ht16k33: fix potential user-after-free on module unload
2019-02-16 17:31:36 -08:00
David S. Miller
8681ef1f3d net: Add header for usage of fls64()
Fixes: 3b89ea9c59 ("net: Fix for_each_netdev_feature on Big endian")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-16 13:45:01 -08:00
David S. Miller
f2281c245d Support Mellanox BlueField SmartNIC (mlx5-updates-2019-02-15)
Bodong Wang says,
 
 BlueField device is a multi-core ARM processor in a highly integrated
 system on chip coupled with the ConnectX interconnect controller.
 BlueField device can be presented in one out of two modes:
 
 - SEPARATED_HOST: ARM processors as a separated and orthogonal host
   like any other external host in the multi-host virtualization model.
 - EMBEDDED_CPU: ARM processors as Embedded CPU (EC) and part of the
   external hosts virtualization model.
 
 While existing driver already supports the device on separated_host
 mode, this patch series focus on the functionalities of embedded_cpu
 mode.
 
 On embedded_cpu mode, BlueField device exposes regular network
 controller PCI function in the BlueField host(e.g, x86). However, a
 separate PCI function called Embedded CPU Physical Function(ECPF) is
 also added to the ARM host side, where standard Linux distributions is
 able to run on the ARM cores. Depends on the NV configuration from
 firmware, ECPF can be the e-switch manager and firmware pages supplier.
 If ECPF is configured as e-switch manager and page supplier, it will
 take over the responsibilities from the PF on BlueField host includes:
 - Owns, controls and manages all e-switch parts, and takes e-switch
   traffic by default. It also should perform ENABLE_HCA for the host
   PF just like a PF does for its VFs.
 - Provides and manages the ICM host memory required for the HCA to
   store various contexts for itself, the PF and VFs belong the
   e-switch it manages.
 
 The PF on BlueField host side is still responsible for:
 - Control its own permanent MAC.
 - PCI and SRIOV configurations and perform ENABLE_HCA for its VFs.
 
 The ECPF can also retrieve information about the external host it
 controls, like host identifier, PCI BDF and number of virtual functions.
 As these parameters may be changed dynamically, an event will be triggered
 to the driver on ECPF side.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcZ2amAAoJEEg/ir3gV/o+lzMIALvLNUoD6pXi41MWsOwvmAHg
 07mzg1N80Z66MCcFau40I8T3h9NLiRMzNtFrBNtxx+ruwKFUpAjJHjaU0sms0yQH
 WVjr35vk4XsZyDPSCJ4g/hCQVlgCT/1tIUvPO0YM9hjhDuVa9mT4wEpucQRDu8bO
 KfeXNXLDFnlWxjokhpSVj369ozh+LTv4Kzy0MBBbji97bG6MktGAT8uCimUy7wG0
 7dlYimnZ1+iUD1/DZQadiLCUHUu/rTcnvF2+DcdG/nbSU8ydVLgj6vgtIfCYt4e5
 kcQO5hmatnl0iJUA8GNpdHQwGjYytKneoGmIfbMAK4KjvFNF+3/8N4Bytoa5kzk=
 =DgTl
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Support Mellanox BlueField SmartNIC (mlx5-updates-2019-02-15)

Bodong Wang says,

BlueField device is a multi-core ARM processor in a highly integrated
system on chip coupled with the ConnectX interconnect controller.
BlueField device can be presented in one out of two modes:

- SEPARATED_HOST: ARM processors as a separated and orthogonal host
  like any other external host in the multi-host virtualization model.
- EMBEDDED_CPU: ARM processors as Embedded CPU (EC) and part of the
  external hosts virtualization model.

While existing driver already supports the device on separated_host
mode, this patch series focus on the functionalities of embedded_cpu
mode.

On embedded_cpu mode, BlueField device exposes regular network
controller PCI function in the BlueField host(e.g, x86). However, a
separate PCI function called Embedded CPU Physical Function(ECPF) is
also added to the ARM host side, where standard Linux distributions is
able to run on the ARM cores. Depends on the NV configuration from
firmware, ECPF can be the e-switch manager and firmware pages supplier.
If ECPF is configured as e-switch manager and page supplier, it will
take over the responsibilities from the PF on BlueField host includes:
- Owns, controls and manages all e-switch parts, and takes e-switch
  traffic by default. It also should perform ENABLE_HCA for the host
  PF just like a PF does for its VFs.
- Provides and manages the ICM host memory required for the HCA to
  store various contexts for itself, the PF and VFs belong the
  e-switch it manages.

The PF on BlueField host side is still responsible for:
- Control its own permanent MAC.
- PCI and SRIOV configurations and perform ENABLE_HCA for its VFs.

The ECPF can also retrieve information about the external host it
controls, like host identifier, PCI BDF and number of virtual functions.
As these parameters may be changed dynamically, an event will be triggered
to the driver on ECPF side.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-16 12:11:17 -08:00
Linus Torvalds
0b999ae361 Compiler Attributes: Clean the new GCC 9 -Wmissing-attributes warnings
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
 (enabled by -Wall) to C and aliases: it warns when particular function
 attributes are missing in the aliases but not in their target, e.g.:
 
     void __cold f(void) {}
     void __alias("f") g(void);
 
 diagnoses:
 
     warning: 'g' specifies less restrictive attribute than
     its target 'f': 'cold' [-Wmissing-attributes]
 
 These patch series clean these new warnings. Most of them are caused
 by the module_init/exit macros.
 
 Link: https://lore.kernel.org/lkml/20190125104353.2791-1-labbott@redhat.com/
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAlxnDBQACgkQGXyLc2ht
 IW18eg//ePgieGRY9v4lGRs0pCQkdMmyjNJ5ChAbWdjoLwL45eiUpSt2sJNcSv8f
 4JzgnpN9bB9G18b10kPCtIUT4/A8M6eV5IxkUwJsutyhVJ7xLfER6BihMa0PRxbR
 qnqZX9MDSxL2nShL5y40zbT1uAwfCB7x7cNkgXn/Gh1mZxWPONBrsxaRlBLu3oUE
 C1uqocgJKMIxczRUhaID11Zl4u0qrro8i6uqCzqX7g9dyH7CkNheQb7gxUvnOTi2
 9rBxdCQfUVGma5zxEqc5ow9lprVoiumLbwdW+Hx0HamNp/V5DjH4cV2CKgu1q/hN
 5UzaHQKgQ78VDQe5R+T8k5tt1xq0dEf6jGrHDWBnRGCf+P0tDa0ygIX83rnTUkwL
 agoVg3Ikfa7hsoYI3FovxqryrsgmR6wpJpZGrdpGLe/rnFuJ2mRvuCuKoRHiW5pk
 2is5DUgdbU7bH6QZhXuPpK7ZfCLBwVb873cfLNBAywofFTh3B4FnsbV/UCw4iw3t
 IpGbLC3JLCPrmf2Wivd/gYpwA+UuqLnk/mMTpB2pt/Zjgq5TeNThzhozOpW6qEHK
 42V/30mLztdXr6lQYSY4o6/H4vWgs8tWtD/vfbOHVrM3AV4ErVYFk42PlqrNP6KE
 LOLBngiYsuTyWNe9vwS+IDaTBwng9Z4KWqFwyZDMFrlRDE8VsJs=
 =FCFQ
 -----END PGP SIGNATURE-----

Merge tag 'compiler-attributes-for-linus-v5.0-rc7' of git://github.com/ojeda/linux

Pull compiler attributes fixes from Miguel Ojeda:
 "Clean the new GCC 9 -Wmissing-attributes warnings

  The upcoming GCC 9 release extends the -Wmissing-attributes warnings
  (enabled by -Wall) to C and aliases: it warns when particular function
  attributes are missing in the aliases but not in their target, e.g.:

    void __cold f(void) {}
    void __alias("f") g(void);

  diagnoses:

    warning: 'g' specifies less restrictive attribute than
    its target 'f': 'cold' [-Wmissing-attributes]

  These patch series clean these new warnings. Most of them are caused
  by the module_init/exit macros"

Link: https://lore.kernel.org/lkml/20190125104353.2791-1-labbott@redhat.com/

* tag 'compiler-attributes-for-linus-v5.0-rc7' of git://github.com/ojeda/linux:
  include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
  Compiler Attributes: add support for __copy (gcc >= 9)
  lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure
2019-02-16 10:28:05 -08:00
Ard Biesheuvel
582a32e708 efi/arm: Revert "Defer persistent reservations until after paging_init()"
This reverts commit eff8962888, which
deferred the processing of persistent memory reservations to a point
where the memory may have already been allocated and overwritten,
defeating the purpose.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190215123333.21209-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-16 15:02:03 +01:00
Ard Biesheuvel
8a5b403d71 arm64, mm, efi: Account for GICv3 LPI tables in static memblock reserve table
In the irqchip and EFI code, we have what basically amounts to a quirk
to work around a peculiarity in the GICv3 architecture, which permits
the system memory address of LPI tables to be programmable only once
after a CPU reset. This means kexec kernels must use the same memory
as the first kernel, and thus ensure that this memory has not been
given out for other purposes by the time the ITS init code runs, which
is not very early for secondary CPUs.

On systems with many CPUs, these reservations could overflow the
memblock reservation table, and this was addressed in commit:

  eff8962888 ("efi/arm: Defer persistent reservations until after paging_init()")

However, this turns out to have made things worse, since the allocation
of page tables and heap space for the resized memblock reservation table
itself may overwrite the regions we are attempting to reserve, which may
cause all kinds of corruption, also considering that the ITS will still
be poking bits into that memory in response to incoming MSIs.

So instead, let's grow the static memblock reservation table on such
systems so it can accommodate these reservations at an earlier time.
This will permit us to revert the above commit in a subsequent patch.

[ mingo: Minor cleanups. ]

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190215123333.21209-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-16 15:02:03 +01:00
Alin Nastac
a3419ce335 netfilter: nf_conntrack_sip: add sip_external_media logic
When enabled, the sip_external_media logic will leave SDP
payload untouched when it detects that interface towards INVITEd
party is the same with the one towards media endpoint.

The typical scenario for this logic is when a LAN SIP agent has more
than one IP address (uses a different address for media streams than
the one used on signalling stream) and it also forwards calls to a
voice mailbox located on the WAN side. In such case sip_direct_media
must be disabled (so normal calls could be handled by the SIP
helper), but media streams that are not traversing this router must
also be excluded from address translation (e.g. call forwards).

Signed-off-by: Alin Nastac <alin.nastac@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-16 10:49:12 +01:00
Wei Yongjun
dddaf89e2f netfilter: ipt_CLUSTERIP: make symbol 'cip_netdev_notifier' static
Fixes the following sparse warnings:

net/ipv4/netfilter/ipt_CLUSTERIP.c:867:23: warning:
 symbol 'cip_netdev_notifier' was not declared. Should it be static?

Fixes: 5a86d68bcf ("netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-16 10:43:56 +01:00
Andrea Claudi
c93a49b976 ipvs: fix warning on unused variable
When CONFIG_IP_VS_IPV6 is not defined, build produced this warning:

net/netfilter/ipvs/ip_vs_ctl.c:899:6: warning: unused variable ‘ret’ [-Wunused-variable]
  int ret = 0;
      ^~~

Fix this by moving the declaration of 'ret' in the CONFIG_IP_VS_IPV6
section in the same function.

While at it, drop its unneeded initialisation.

Fixes: 098e13f5b2 ("ipvs: fix dependency on nf_defrag_ipv6")
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-16 10:41:42 +01:00
David S. Miller
46f3766638 Merge branch 'net-dsa-b53-VLAN-and-L2-fixes'
Florian Fainelli says:

====================
net: dsa: b53: VLAN and L2 fixes

This patch series contains a collection of fixes to the b53 driver in
order to:

- consistently program the same default VLAN ID when a port is bridged
  or not
- properly account for VLAN filtering being turned on/off and turning
  on ingress VID checking accordingly
- have SYSTEMPORT properly forward BPDU frames to the network stack
  (which it did not)
- do not assume that WoL is supported by the DSA master network device
  we are connected to
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:37:54 -08:00
Florian Fainelli
10163aaee9 net: dsa: b53: Do not program CPU port's PVID
The CPU port is special and does not need to obey VLAN restrictions as
far as untagged traffic goes, also, having the CPU port be part of a
particular PVID is against the idea of keeping it tagged in all VLANs.

Fixes: ca89319483 ("net: dsa: b53: Keep CPU port as tagged in all VLANs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:37:54 -08:00
Florian Fainelli
c3152ec4c0 net: dsa: bcm_sf2: Do not assume DSA master supports WoL
We assume in the bcm_sf2 driver that the DSA master network device
supports ethtool_ops::{get,set}_wol operations, which is not a given.
Avoid de-referencing potentially non-existent function pointers and
check them as we should.

Fixes: 96e65d7f3f ("net: dsa: bcm_sf2: add support for Wake-on-LAN")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:37:54 -08:00
Florian Fainelli
a40061ea2e net: systemport: Fix reception of BPDUs
SYSTEMPORT has its RXCHK parser block that attempts to validate the
packet structures, unfortunately setting the L2 header check bit will
cause Bridge PDUs (BPDUs) to be incorrectly rejected because they look
like LLC/SNAP packets with a non-IPv4 or non-IPv6 Ethernet Type.

Fixes: 4e8aedfe78c7 ("net: systemport: Turn on offloads by default")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:37:54 -08:00
Florian Fainelli
dad8d7c645 net: dsa: b53: Properly account for VLAN filtering
VLAN filtering can be built into the kernel, and also dynamically turned
on/off through the bridge master device. Allow re-configuring the switch
appropriately to account for that by deciding whether VLAN table
(v_table) misses should lead to a drop or forward.

Fixes: a2482d2ce3 ("net: dsa: b53: Plug in VLAN support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:37:54 -08:00
Florian Fainelli
fea8335317 net: dsa: b53: Fix default VLAN ID
We were not consistent in how the default VID of a given port was
defined, b53_br_leave() would make sure the VLAN ID would be either 0/1
depending on the switch generation, but b53_configure_vlan(), which is
the default configuration would unconditionally set it to 1. The correct
value is 1 for 5325/5365 series and 0 otherwise. To avoid repeating that
mistake ever again, introduce a helper function: b53_default_pvid() to
factor that out.

Fixes: 967dd82ffc ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:37:54 -08:00
David S. Miller
bb015f2216 Merge branch 's390-next'
Julian Wiedmann says:

====================
s390/qeth: updates 2019-02-15

please apply a few more qeth patches to net-next. Along with some smaller
improvements, this revamps our code for the SW statistics that are exposed
through ETHTOOL_GSTATS.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:35:30 -08:00
Julian Wiedmann
8024cc9e85 s390/qeth: split out OSN netdev ops
Rather than special-casing OSN in a number of places, just give this
device type its own netdev_ops structure.

When setting up the OSN net_device, also skip the handling of the
various HW offloads (eg TSO). The device shouldn't be advertising any of
them, and the OSN code paths in qeth don't have support for them.
In particular RX VLAN filtering is not supported, so don't hook up those
callbacks in the netdev_ops.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:35:30 -08:00
Julian Wiedmann
1b4d5e1c61 s390/qeth: add support for ETHTOOL_GRINGPARAM
Implement a trivial callback that exposes the queue sizes.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:35:30 -08:00
Julian Wiedmann
b0abc4f5df s390/qeth: overhaul ethtool statistics
Accumulate per-TX queue statistics, and increase their size to 64 bit.
Don't bother with enabling/disabling the statistics, the overhead is
negligible.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:35:29 -08:00
Julian Wiedmann
d896ac62d0 s390/qeth: move ethtool code into its own file
Most of this is self-contained code.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:35:29 -08:00
Julian Wiedmann
4326b5b461 s390/qeth: reduce ethtool statistics
Counting the number of function calls and the time spent in functions
is best left to proper tracing facilities.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:35:29 -08:00
Julian Wiedmann
bb92d3f866 s390/qeth: use a static Output Queue array
qeth dynamically allocates an array for storing pointers to its
Output Queue structures. Switch this to a static array - we are
currently limited to 4 Output Queues, so shrinking the qeth_qdio_info
struct by just a few bytes doesn't justify the additional complexity.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:35:29 -08:00
Julian Wiedmann
0aa35a3689 s390/qeth: allow manual recovery when device is SOFTSETUP
Once a qeth ccwgroup device is set online, it's also armed for internal
recovery. So allow for testing that code path via sysfs, regardless of
whether the interface is up or down.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:35:29 -08:00
Florian Fainelli
ff326d3cdf selftests: forwarding: Add some missing configuration symbols
For the forwarding selftests to work, we need network namespaces when
using veth/vrf otherwise ping/ping6 commands like these:

ip vrf exec vveth0 /bin/ping 192.0.2.2 -c 10 -i 0.1 -w 5

will fail because network namespaces may not be enabled.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:32:22 -08:00
Willem de Bruijn
d5be7f632b net: validate untrusted gso packets without csum offload
Syzkaller again found a path to a kernel crash through bad gso input.
By building an excessively large packet to cause an skb field to wrap.

If VIRTIO_NET_HDR_F_NEEDS_CSUM was set this would have been dropped in
skb_partial_csum_set.

GSO packets that do not set checksum offload are suspicious and rare.
Most callers of virtio_net_hdr_to_skb already pass them to
skb_probe_transport_header.

Move that test forward, change it to detect parse failure and drop
packets on failure as those cleary are not one of the legitimate
VIRTIO_NET_HDR_GSO types.

Fixes: bfd5f4a3d6 ("packet: Add GSO/csum offload support.")
Fixes: f43798c276 ("tun: Allow GSO using virtio_net_hdr")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:30:37 -08:00
Paolo Abeni
1490ed2abc net/ipv6: prefer rcu_access_pointer() over rcu_dereference()
rt6_cache_allowed_for_pmtu() checks for rt->from presence, but
it does not access the RCU protected pointer. We can use
rcu_access_pointer() and clean-up the code a bit. No functional
changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:25:26 -08:00
Hauke Mehrtens
3b89ea9c59 net: Fix for_each_netdev_feature on Big endian
The features attribute is of type u64 and stored in the native endianes on
the system. The for_each_set_bit() macro takes a pointer to a 32 bit array
and goes over the bits in this area. On little Endian systems this also
works with an u64 as the most significant bit is on the highest address,
but on big endian the words are swapped. When we expect bit 15 here we get
bit 47 (15 + 32).

This patch converts it more or less to its own for_each_set_bit()
implementation which works on 64 bit integers directly. This is then
completely in host endianness and should work like expected.

Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Hauke Mehrtens <hauke.mehrtens@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:23:50 -08:00
Paul Kocialkowski
197f9ab7f0 net: phy: xgmiitorgmii: Support generic PHY status read
Some PHY drivers like the generic one do not provide a read_status
callback on their own but rely on genphy_read_status being called
directly.

With the current code, this results in a NULL function pointer call.
Call genphy_read_status instead when there is no specific callback.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:20:10 -08:00
Colin Ian King
59e6158aca mlxsw: core: fix spelling mistake "temprature" -> "temperature"
There is a spelling mistake in several dev_err messages, fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:16:52 -08:00
Lorenzo Bianconi
4974d5f678 net: ip6_gre: initialize erspan_ver just for erspan tunnels
After commit c706863bc8 ("net: ip6_gre: always reports o_key to
userspace"), ip6gre and ip6gretap tunnels started reporting TUNNEL_KEY
output flag even if it is not configured.
ip6gre_fill_info checks erspan_ver value to add TUNNEL_KEY for
erspan tunnels, however in commit 84581bdae9 ("erspan: set
erspan_ver to 1 by default when adding an erspan dev")
erspan_ver is initialized to 1 even for ip6gre or ip6gretap
Fix the issue moving erspan_ver initialization in a dedicated routine

Fixes: c706863bc8 ("net: ip6_gre: always reports o_key to userspace")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:14:25 -08:00
David S. Miller
a31687e85a Just a few fixes this time:
* mesh rhashtable fixes from Herbert
  * a small error path fix when starting AP interfaces
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlxmtYAACgkQB8qZga/f
 l8RqahAAh6NGphorxr07mZmHQXSxaSwKhGdbNZF30gWY5U+Pi/e6SDJXQFCiE36n
 gMlYq3cNx3O129+rQo5T45avRgU7bCxE2gGzPU+Fk8BlvCUhgGf1n91I9duJG5LW
 U4PFK5ywzWEwqA3vZLLqwegRVJJuG3Fyq7pDDz7iNA8SwNfvCjl9q1ahoKq+yTBd
 1z5cc2NNd61TKsxBWhcDMnxLGgLd5ae8eQgkcje0I1XKj2KX5CwbNfaQgIQfaEEd
 wNLvrMGj57PAIZ4bHw2BiTOikKD5CEUmr9xZ4qmfcmsrIHX5ncHnbO9lBj4VX11J
 YAmD+P9Yz8WxhORt9DefKBVYZXi338fcOPBkrD7lo+F8PU70BBnIaWjDxuE1Ig3x
 T4L1woy4ByxKsQoX4NcJo18J0/IESSYrNU/00IfFg525sd8LBegJ6sN5d87eVbhC
 6DGMfmtysvkFZQ+IgxToCxKttIiOy/L2iTnDL8zkINLr+Dg01Nq6ZgdzFcec9km4
 BJa8Hby6pik6NGnl42YsfDhHosakko13hqkm2nWYVRFWTlNOGtZvVGUp5pbVFsrG
 VM4ZdBzGcHNYIC8dlrN1zsLMNEtBDF9e2vJxclUEpjgFQVZCQ2iWCieXR9iUsT/P
 RB1Z9lh0EkIqgOccbpkomqevB8jQczlewdQyBFP0XIMEDjocoBM=
 =4o1/
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2019-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just a few fixes this time:
 * mesh rhashtable fixes from Herbert
 * a small error path fix when starting AP interfaces
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 19:42:24 -08:00
Bodong Wang
c96692fb8f net/mlx5: E-Switch, Allow transition to offloads mode for ECPF
Currently, the e-switch driver requires going to legacy mode before
changing to the offloads mode. This makes sense for regular case as
the legacy mode is done by creating VFs.

However, it's problematic when ECPF is the eswitch manager. In such
case, ECPF will control the vports on peer host including the peer
PF and VFs. But ECPF doesn't need and shall not create VFs as the
VFs are created in the peer PF host.

Grant ECPF the ability to change from none to the offloads mode. Note
that currently the only way to go back to none mode is by unloading
the ECPF driver.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang
a3888f33db net/mlx5: E-Switch, Load/unload VF reps according to event from host PF
When host PF changes the number of VFs, the ECPF esw driver will get
a FW event. It should query the number of VFs enabled by host PF and
update the VF reps accordingly. Note that host PF can't change the
number of VFs dynamically, it has to reset the number of VFs to 0
before changing to a new positive number.

The host event is registered when driver is moving to switchdev mode,
and it's the last step to do in esw_offloads_init. It's unregistered
and the work queue is flushed when driver quits from switchdev mode.
In this way, the host event and devlink command are serialized.

When driver is enabling switchdev mode, pay attention to the following
two facts:
1. Host PF must not have VF initialized as the flow table in ECPF has
   ENCAP enabled as default. Such flow table can't be created with
   existing initialized VFs.
2. ECPF doesn't know how many VFs the host PF will enable, ECPF
   offloads flow steering shall create the flow table/groups based on
   the max number of VFs possibly supported by host PF.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang
81cd229c29 net/mlx5: E-Switch, Consider ECPF vport depends on eswitch ownership
ECPF connects to the eswitch through vport 0xfffe. ECPF may or may
not be the eswitch manager depending on firmware configuration.

1. If ECPF is eswitch manager: ECPF will take over the eswitch manager
   responsibility. A rep of the host PF shall be created at the ECPF
   side for the eswitch manager to control.

2. If ECPF is not eswitch manager: host PF will be the eswitch manager,
   ECPF acts similar as a VF to the host PF. Host PF will be aware
   of the ECPF vport presence and control it's rep.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang
5ae5162066 net/mlx5: E-Switch, Assign a different position for uplink rep and vport
In offloads mode, the current implementation puts the uplink
representor at index zero of the vport reps array. It is not "natural"
to place it at index 0 since we want to put the representor for vport
0 at index 0 with the introduction of SmartNIC. A separate patch will
handle the case whether a rep is needed for vport 0 (PF vport).

So, we want to have a different placeholder for uplink vport and
representor. It was placed at the end of vport and rep array. Since
vport number can no longer act as an index into the vport or
representors arrays, use functions to map vport numbers to indices
when accessing the vports or representors arrays, and vice versa.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang
f8e8fa0262 net/mlx5: E-Switch, Centralize repersentor reg/unreg to eswitch driver
Eswitch has two users: IB and ETH. They both register repersentors
when mlx5 interface is added, and unregister the repersentors when
mlx5 interface is removed. Ideally, each driver should only deal with
the entities which are unique to itself. However, current IB and ETH
drivers have to perform the following eswitch operations:

1. When registering, specify how many vports to register. This number
   is the same for both drivers which is the total available vport
   numbers.
2. When unregistering, specify the number of registered vports to do
   unregister. Also, unload the repersentors which are already loaded.

It's unnecessary for eswitch driver to hands out the control of above
operations to individual driver users, as they're not unique to each
driver. Instead, such operations should be centralized to eswitch
driver. This consolidates eswitch control flow, and simplified IB and
ETH driver.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang
29d9fd7d5a net/mlx5: E-Switch, Support load/unload reps of specific vport types
Currently the driver loads and unloads all reps in an unbreakable
group. However, with ECPF, the reps of special vports such as uplink
and host PF should always be loaded in switchdev mode where the reps
for VFs will be loaded on-demand and unloaded on no-demand. This is
a pre-step for that change.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang
f121e0ea95 net/mlx5: E-Switch, Add state to eswitch vport representors
Currently the eswitch vport reps have a valid indicator, which is
set on register and unset on unregister. However, a rep can be loaded
or not loaded when doing unregister, current driver checks if the
vport of that rep is enabled as a flag to imply the rep is loaded.
However, for ECPF, this is not valid as the host PF will enable the
vports for its VFs instead.

Add three states: {unregistered, registered, loaded}, with the
following state changes across different operations:

	create: (none)       -> unregistered
	reg:    unregistered -> registered
	load:   registered   -> loaded
	unload: loaded       -> registered
	unreg:  registered   -> unregistered

Note that the state shall only be updated inside eswitch driver rather
than individual drivers such as ETH or IB.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Suggested-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang
879c8f84e3 net/mlx5: E-Switch, Use getter and iterator to access vport/rep
With only PF and VF, it is sufficient to have the vport/rep array
index as the vport number. This is because PF and VF vports numbers
are consecutive serial numbers. In downstream patches with
introducing of ECPF and UPLINK vports, it's not consecutive any more.

Use getter to get specific vport/rep, and use iterator to traversal
a list of vport/rep. This hides the translation between array index
and vport number, and provides flexibility of using different
translation mechanism in the future.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang
c9b99abcf2 net/mlx5: E-Switch, Split VF and special vports for offloads mode
When driver is entering offloads mode, there are two major tasks to
do: initialize flow steering and create representors. Flow steering
should make sure enough flow table/group spaces are reserved for all
reps. Representors will be created in a group, all or none.

With the introduction of ECPF, flow steering should still reserve the
same spaces. But, the representors are not always loaded/unloaded in a
single piece. Once ECPF is in offloads mode, it will get the number
of VF changing event from host PF. In such scenario, only the VF reps
should be loaded/unloaded, not the reps for special vports (such as
the uplink vport).

Thus, when entering offloads mode, driver should specify the total
number of reps, and the number of VF reps separately. When leaving
offloads mode, the cleanup should use the information self-contained
in eswitch such as number of VFs.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang
eca8cc3895 net/mlx5: E-Switch, Refactor offloads flow steering init/cleanup
E-switch offloads mode initialize/cleanup multiple steering related
entities (flow table/group). Refactor these operations to internal
helper functions for better block design.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang
cbc44e76bf net/mlx5: E-Switch, Properly refer to host PF vport as other vport
Commands referring to vports use the following scheme:

1. When referring to my own vport, put 0 in vport and 0 in other_vport.
2. When referring to another vport, put the vport number of the
   referred vport and put 1 in other_vport. It was assumed that driver
   is accessing other vport when vport number is greater than 0.

With the above scheme, the case that ECPF eswitch manager is trying
to access host PF vport will fall over with scheme 1 as the vport
number is 0. This is apparently wrong as driver is trying to refer
other vport.

As such usage can only happen in the eswitch context, change relevant
functions to provide other vport input properly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:56 -08:00
Bodong Wang
a1b3839ac4 net/mlx5: E-Switch, Properly refer to the esw manager vport
In SmartNIC mode, the eswitch manager is not necessarily the PF
(vport 0). Use a helper function to get the correct eswitch manager
vport number and cache on the eswitch instance for fast reference.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:56 -08:00