Commit Graph

767635 Commits

Author SHA1 Message Date
John Fastabend
54fedb42c6 bpf: sockmap, fix smap_list_map_remove when psock is in many maps
If a hashmap is free'd with open socks it removes the reference to
the hash entry from the psock. If that is the last reference to the
psock then it will also be free'd by the reference counting logic.
However the current logic that removes the hash reference from the
list of references is broken. In smap_list_remove() we first check
if the sockmap entry matches and then check if the hashmap entry
matches. But, the sockmap entry sill always match because its NULL in
this case which causes the first entry to be removed from the list.
If this is always the "right" entry (because the user adds/removes
entries in order) then everything is OK but otherwise a subsequent
bpf_tcp_close() may reference a free'd object.

To fix this create two list handlers one for sockmap and one for
sockhash.

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:31 +02:00
John Fastabend
9901c5d77e bpf: sockmap, fix crash when ipv6 sock is added
This fixes a crash where we assign tcp_prot to IPv6 sockets instead
of tcpv6_prot.

Previously we overwrote the sk->prot field with tcp_prot even in the
AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
are used.

Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
crashing case here.

Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:31 +02:00
Daniel Borkmann
0b9e3d543f Merge branch 'bpf-bpftool-libbpf-improvements'
Jakub Kicinski says:

====================
Set of random updates to bpftool and libbpf.  I'm preparing for
extending bpftool prog load, but there is a good number of
improvements that can be made before bpf -> bpf-next merge
helping to keep the later patch set to a manageable size as well.

First patch is a bpftool build speed improvement.  Next missing
program types are added to libbpf program type detection by section
name.  The ability to load programs from '.text' section is restored
when ELF file doesn't contain any pseudo calls.

In bpftool I remove my Author comments as unnecessary sign of vanity.
Last but not least missing option is added to bash completions and
processing of options in bash completions is improved.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:52 +02:00
Jakub Kicinski
121c58bed0 tools: bpftool: deal with options upfront
Remove options (in getopt() sense, i.e. starting with a dash like
-n or --NAME) while parsing arguments for bash completions.  This
allows us to refer to position-dependent parameters better, and
complete options at any point.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
ef347a340b tools: bpftool: add missing --bpffs to completions
--bpffs is not suggested by bash completions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
71e07ddcdc tools: bpftool: drop unnecessary Author comments
Drop my author comments, those are from the early days of
bpftool and make little sense in tree, where we have quite
a few people contributing and git to attribute the work.

While at it bump some copyrights.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
eac7d84519 tools: libbpf: don't return '.text' as a program for multi-function programs
Make bpf_program__next() skip over '.text' section if object file
has pseudo calls.  The '.text' section is hardly a program in that
case, it's more of a storage for code of functions other than main.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
9a94f277c4 tools: libbpf: restore the ability to load programs from .text section
libbpf used to be able to load programs from the default section
called '.text'.  It's not very common to leave sections unnamed,
but if it happens libbpf will fail to load the programs reporting
-EINVAL from the kernel.  The -EINVAL comes from bpf_obj_name_cpy()
because since 48cca7e44f ("libbpf: add support for bpf_call")
libbpf does not resolve program names for programs in '.text',
defaulting to '.text'.  '.text', however, does not pass the
(isalnum(*src) || *src == '_') check in bpf_obj_name_cpy().

With few extra lines of code we can limit the pseudo call
assumptions only to objects which actually contain code relocations.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
9aba36139a tools: libbpf: allow setting ifindex for programs and maps
Users of bpf_object__open()/bpf_object__load() APIs may want to
load the programs and maps onto a device for offload.  Allow
setting ifindex on those sub-objects.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
d9b683d746 tools: libbpf: add section names for missing program types
Specify default section names for BPF_PROG_TYPE_LIRC_MODE2
and BPF_PROG_TYPE_LWT_SEG6LOCAL, these are the only two
missing right now.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
c256429fbd tools: bpftool: use correct make variable type to improve compilation time
Commit 4bfe3bd3cc ("tools/bpftool: use version from the kernel
source tree") added version to bpftool.  The version used is
equal to the kernel version and obtained by running make kernelversion
against kernel source tree.  Version is then communicated
to the sources with a command line define set in CFLAGS.

Use a simply expanded variable for the version, otherwise the
recursive make will run every time CFLAGS are used.

This brings the single-job compilation time for me from almost
16 sec down to less than 4 sec.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Linus Torvalds
883c9ab9eb Merge branch 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes and cleanups from Helge Deller:
 "Nothing exiting in this patchset, just

   - small cleanups of header files

   - default to 4 CPUs when building a SMP kernel

   - mark 16kB and 64kB page sizes broken

   - addition of the new io_pgetevents syscall"

* 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Build kernel without -ffunction-sections
  parisc: Reduce debug output in unwind code
  parisc: Wire up io_pgetevents syscall
  parisc: Default to 4 SMP CPUs
  parisc: Convert printk(KERN_LEVEL) to pr_lvl()
  parisc: Mark 16kB and 64kB page sizes BROKEN
  parisc: Drop struct sigaction from not exported header file
2018-06-30 14:16:30 -07:00
Linus Torvalds
08af78d7a5 ARM: SoC fixes for 4.18-rc
A smaller batch for the end of the week (let's see if I can keep the
 weekly cadence going for once).
 
 All medium-grade fixes here, nothing worrisome:
  - Fixes for some fairly old bugs around SD card write-protect detection
    and GPIO interrupt assignments on Davinci.
  - Wifi module suspend fix for Hikey.
  - Minor DT tweaks to fix inaccuracies for Amlogic platforms, on of
    which solves booting with third-party u-boot.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAls32nkPHG9sb2ZAbGl4
 b20ubmV0AAoJEIwa5zzehBx381gP/ihYGEiM1iSp1+WJaR3YaVHIt4VbnZV76A/T
 oCeX/9X11o1tundMbyX5iBY30SlHA+GrGEEETQyGDJ+an2hBxfJVJzG+u0AFVtkr
 orf0v5UbUJZqxsU3cnzB508wuIgpdouZ60cqXT0HJOfC6NV9oL5yV1ZWKguWWuWL
 KgRqavz/9QyTaUiphwdhG+n1Ey+EVH1uPUqRxh3Md8jMKscMWcd36D2OsMmu3AbZ
 O73KRoIr4SgXwnk6V2q/xoAHyshURhnVDHmEuyO1fJh9b7OZMEJiMcFmr8RC6SLr
 /ooc0nAtJyCdyJl2h9+XGONLB+pxDVL9O9dWU21YrCdGMPAjBY1e9Ppeus+u+Zzt
 H1bk2bDTZe5Oybx1M5xCgMtc7Snar+F1kUySFS7JXwEWHUwbEVpiSz9s0IRnpRgD
 yQJn3ybxMHHFpJba3VFZeg7+cmNMq5n+XilZDmTp+mCcdRlnX+3HMt2tgf9WZJgq
 MwkVNdHykHzs7Uw0IaLFDfdvUbMnjn/4iHoBdfWpQPjoDBpXcSmo6rhpi1WUbKnW
 LF4zTywaaCifwfuvb4p2K6ByRg2zUwrqrlYtx6og5D0ARhI6Izqv6YEjoY/d5+nl
 NeC/whEFFG0O5lFH32Oy8XuhPwLLOTW5wXd0vYlFWTy9YuO5GZ3nlqb73v4cPvsC
 +34hp/7x
 =55JJ
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A smaller batch for the end of the week (let's see if I can keep the
  weekly cadence going for once).

  All medium-grade fixes here, nothing worrisome:

   - Fixes for some fairly old bugs around SD card write-protect
     detection and GPIO interrupt assignments on Davinci.

   - Wifi module suspend fix for Hikey.

   - Minor DT tweaks to fix inaccuracies for Amlogic platforms, one
     of which solves booting with third-party u-boot"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: dts: hikey960: Define wl1837 power capabilities
  arm64: dts: hikey: Define wl1835 power capabilities
  ARM64: dts: meson-gxl: fix Mali GPU compatible string
  ARM64: dts: meson-axg: fix ethernet stability issue
  ARM64: dts: meson-gx: fix ATF reserved memory region
  ARM64: dts: meson-gxl-s905x-p212: Add phy-supply for usb0
  ARM64: dts: meson: fix register ranges for SD/eMMC
  ARM64: dts: meson: disable sd-uhs modes on the libretech-cc
  ARM: dts: da850: Fix interrups property for gpio
  ARM: davinci: board-da850-evm: fix WP pin polarity for MMC/SD
2018-06-30 14:08:06 -07:00
Linus Torvalds
22d3e0c36e Kbuild fixes for v4.18
- introduce __diag_* macros and suppress -Wattribute-alias warnings from GCC 8
 
 - fix stack protector test script for x86_64
 
 - fix line number handling in Kconfig
 
 - document that '#' starts a comment in Kconfig
 
 - handle P_SYMBOL property in dump debugging of Kconfig
 
 - correct help message of LD_DEAD_CODE_DATA_ELIMINATION
 
 - fix occasional segmentation faults in Kconfig
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbN450AAoJED2LAQed4NsGFdAP/0fc2NhkzQMvz1EBEc2n93LC
 FUXew75tsX2ZewssoLzb4Iepkb/mHU+fjhBaE65S+Xu2/6mNfId9a7HAtywvFyO2
 ZUQPXHjMHnLEPRKuzQy34uCy9/wWCiqi8rpWUsOEohmNIcLaF0vMZf5Ifod7wIr7
 pnix3b9Q+dY+l49TSsSv4MX7F9qs5fXRhEarcQ3jYEb3yRUEXgmli3hV1wRita/n
 tJhFDiIdJDeISDkgmHUuOhjFnv5Yf3WJTXi/ILZ2zvpGjjqNDAwxtyzGnPMShQEc
 fxk3/1nkg9h/ScVAaGavrYYmiiH8XsqWY2q6p52jTK3kD+yTXaVakPSmxw8UHImh
 aNWQutzMF8GYEsb+ld1ncsNrwfgd40mA25mEyb/ZPSw2IdNBrXtIVbw7XiBLi8eH
 recAlRN0MouzD7+sXafgtoKopqanQbB/rMqDO4ULfnVvZLWDmZVbfreCc+qrJtiJ
 mqydBMUVxrvB+qf5SHQ7WlDmXWHY1xQuxXzS0gRVGT14EsyD6yhC2D62pEHnB7uG
 zE1pGemOCzOlGY6nDAbtQVR1n5AAWEZYveZXUuFn+vuqR7ZtYxCFUFOS0u621zFI
 HMI9B81ifdNV2efT2VTVi6Tnnvn44sAXOYjaULX6566EyX0/mOL5CWZlTqn5SKOn
 PwNxc7ZeCylTbkZww2c4
 =oABr
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - introduce __diag_* macros and suppress -Wattribute-alias warnings
   from GCC 8

 - fix stack protector test script for x86_64

 - fix line number handling in Kconfig

 - document that '#' starts a comment in Kconfig

 - handle P_SYMBOL property in dump debugging of Kconfig

 - correct help message of LD_DEAD_CODE_DATA_ELIMINATION

 - fix occasional segmentation faults in Kconfig

* tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: loop boundary condition fix
  kbuild: reword help of LD_DEAD_CODE_DATA_ELIMINATION
  kconfig: handle P_SYMBOL in print_symbol()
  kconfig: document Kconfig source file comments
  kconfig: fix line numbers for if-entries in menu tree
  stack-protector: Fix test with 32-bit userland and CONFIG_64BIT=y
  powerpc: Remove -Wattribute-alias pragmas
  disable -Wattribute-alias warning for SYSCALL_DEFINEx()
  kbuild: add macro for controlling warnings to linux/compiler.h
2018-06-30 13:05:30 -07:00
Linus Torvalds
0fbc4aeabc Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "The biggest diffstat comes from self-test updates, plus there's entry
  code fixes, 5-level paging related fixes, console debug output fixes,
  and misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Clean up the printk()s in show_fault_oops()
  x86/mm: Drop unneeded __always_inline for p4d page table helpers
  x86/efi: Fix efi_call_phys_epilog() with CONFIG_X86_5LEVEL=y
  selftests/x86/sigreturn: Do minor cleanups
  selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
  x86/entry/64/compat: Fix "x86/entry/64/compat: Preserve r8-r11 in int $0x80"
  x86/mm: Don't free P4D table when it is folded at runtime
  x86/entry/32: Add explicit 'l' instruction suffix
  x86/mm: Get rid of KERN_CONT in show_fault_oops()
2018-06-30 11:42:14 -07:00
Linus Torvalds
d7d5388679 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Tooling fixes mostly, plus a build warning fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  perf/core: Move inline keyword at the beginning of declaration
  tools/headers: Pick up latest kernel ABIs
  perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE]
  perf script: Fix crash because of missing evsel->priv
  perf script: Add missing output fields in a hint
  perf bench: Fix numa report output code
  perf stat: Remove duplicate event counting
  perf alias: Rebuild alias expression string to make it comparable
  perf alias: Remove trailing newline when reading sysfs files
  perf tools: Fix a clang 7.0 compilation error
  tools include uapi: Synchronize bpf.h with the kernel
  tools include uapi: Update if_link.h to pick IFLA_{BRPORT_ISOLATED,VXLAN_TTL_INHERIT}
  tools include powerpc: Update arch/powerpc/include/uapi/asm/unistd.h copy to get 'rseq' syscall
  perf tools: Update x86's syscall_64.tbl, adding 'io_pgetevents' and 'rseq'
  tools headers uapi: Synchronize drm/drm.h
  perf intel-pt: Fix packet decoding of CYC packets
  perf tests: Add valid callback for parse-events test
  perf tests: Add event parsing error handling to parse events test
  perf report powerpc: Fix crash if callchain is empty
  perf test session topology: Fix test on s390
  ...
2018-06-30 11:26:25 -07:00
Linus Torvalds
34a484d58c selinux/stable-4.18 PR 20180629
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAls2gk0UHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQVeRaWujKfIqEvRAAgjtXjU7cN9Vj7GZpwgSjJKyXSruR
 RDM19CRDKIo27UoxaqD92hFnmDreoysdzLi9cunDxshsUbGdHeyXvaOkY2apqpkn
 msPIKO3pmoiq0Umze1/smgl6g0ruPxc+ZSslfuHLUjephogfuDGKgfTeLN60z6up
 KaiVPoaSni+DZSsiFOkzUHwQFfhTRqONBx/tfPo2H1K0bP2dy9YuOCdvpZEqpc7P
 8tx/uF3pYrUBuq9ufEgWt2VC0fU1uZJEI21BQqKcrXLYlmVr73pWYLXnV1NWck0A
 rY5DABxbDf1sXCAIhYRJJDiM51uPltmFfntGF3sS1OKYOgyxIxf51xwgl9dXsGOA
 jOFFwUuXeHJpWICTm1PQCdpA/mzLVgrPzt8ULPE6zYnP+LbBId6RSrPz6Irhswal
 /wiq1mlhdAdeBf2r/tY3VXUy8dMNLjfeHLD3scx07hfFuswXCPTC6xwlyiKPsgKD
 1hGCQazZZTNAcMVI4A00SpPZVGx1yU/Yu/+8vMKyV5BmT55TPcQElcQy+ZZxuX1a
 711B9U7P6w7k8UwL+hVSncgwCLI1vb4MKnmrGZRH9wXiUdaKcgivaDDoeoLTtqBF
 dcIP84OtigAHbJHxWXoTmuRX9KGmtgF5sUBgFv2kg8R8EgnmVKaWJxyyttRg4awo
 RZXmTh2eC3p3IGw=
 =fzCd
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
 "One fairly straightforward patch to fix a longstanding issue where a
  process could stall while accessing files in selinuxfs and block
  everyone else due to a held mutex.

  The patch passes all our tests and looks to apply cleanly to your
  current tree"

* tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: move user accesses in selinuxfs out of locked regions
2018-06-30 11:15:12 -07:00
Linus Torvalds
e6e5bec43c for-linus-20180629
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAls2oVcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpr89D/9PaJCtLpCEU1HdaohIDXDyJKBdtCllIsqJ
 YAJTlUkFRkMvsbdEA3U43rVRVlu7zxP3aRg79VRI+kdZ8y8XSiisAF/QUbG/Bwd3
 NuFqKj4Hm5xFTtLMKIUlumR2/exD4ZpPHKnWmD4idp1PZRoTUwxeIjqVv77L6Nfy
 c4/GVlz2t9buWxH/5PSa//rsT49xM6CLpyIwO2lFsNEk2HYE/uAWya5KZVJ1YKVw
 mSjUdyFtJ//lFK9lVU4YkdNF0d5w2PcEoCfNyPe0n9F43iITACo2om7rkSkTgciS
 izPAs1DkqNdWPta2twULt656WlUoGlwnWyFchU34N/I/pDiww4kdCQFfNIOi6qBW
 7rPQ4xpywiw/U93C2GLEeE09++T8+yO4KuELhIcN5+D3D2VGRIuaGcf4xeY0CX3A
 a335EZIxBGOfxyejknBDg3BsoUcLvejbANKD8ltis3zciGf/QyLt+FFqx25WCzkN
 N028ppml8WPSiqhSlFDMyfscO1dx/WSYh1q7ANrBiPPaEjFYmakzEDT0cZW4JsUh
 av4jqYt3cqrFIXyhRHJM2wjFymQv6aVnmLa4zOKCFbwm9KzMWUUFjc4R962T/sQK
 0g+eCSFGidii4JZ4ghOAQk2dqCTIZqV72pmLlpJBGu+SAIQxkh+29h0LOi5U3v1Y
 FnTtJ1JhCg==
 =0uqV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20180629' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Small set of fixes for this series. Mostly just minor fixes, the only
  oddball in here is the sg change.

  The sg change came out of the stall fix for NVMe, where we added a
  mempool and limited us to a single page allocation. CONFIG_SG_DEBUG
  sort-of ruins that, since we'd need to account for that. That's
  actually a generic problem, since lots of drivers need to allocate SG
  lists. So this just removes support for CONFIG_SG_DEBUG, which I added
  back in 2007 and to my knowledge it was never useful.

  Anyway, outside of that, this pull contains:

   - clone of request with special payload fix (Bart)

   - drbd discard handling fix (Bart)

   - SATA blk-mq stall fix (me)

   - chunk size fix (Keith)

   - double free nvme rdma fix (Sagi)"

* tag 'for-linus-20180629' of git://git.kernel.dk/linux-block:
  sg: remove ->sg_magic member
  drbd: Fix drbd_request_prepare() discard handling
  blk-mq: don't queue more if we get a busy return
  block: Fix cloning of requests with a special payload
  nvme-rdma: fix possible double free of controller async event buffer
  block: Fix transfer when chunk sectors exceeds max
2018-06-30 10:47:46 -07:00
Roopa Prabhu
35e8c7ba08 net: fib_rules: bring back rule_exists to match rule during add
After commit f9d4b0c1e9 ("fib_rules: move common handling of newrule
delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find
for existing rule lookup in both the add and del paths. While this
is good for the delete path, it solves a few problems but opens up
a few invalid key matches in the add path.

$ip -4 rule add table main tos 10 fwmark 1
$ip -4 rule add table main tos 10
RTNETLINK answers: File exists

The problem here is rule_find does not check if the key masks in
the new and old rule are the same and hence ends up matching a more
secific rule. Rule key masks cannot be easily compared today without
an elaborate if-else block. Its best to introduce key masks for easier
and accurate rule comparison in the future. Until then, due to fear of
regressions this patch re-introduces older loose rule_exists during add.
Also fixes both rule_exists and rule_find to cover missing attributes.

Fixes: f9d4b0c1e9 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:11:13 +09:00
David S. Miller
1a84d7fdb5 Merge branch 'mlxsw-Add-resource-scale-tests'
Petr Machata says:

====================
mlxsw: Add resource scale tests

There are a number of tests that check features of the Linux networking
stack. By running them on suitable interfaces, one can exercise the
mlxsw offloading code. However none of these tests attempts to push
mlxsw to the limits supported by the ASIC.

As an additional wrinkle, the "limits supported by the ASIC" themselves
may not be a set of fixed numbers, but rather depend on a profile that
determines how the ASIC resources are allocated for different purposes.

This patchset introduces several tests that verify capability of mlxsw
to offload amounts of routes, flower rules, and mirroring sessions that
match predicted ASIC capacity, at different configuration profiles.
Additionally they verify that amounts exceeding the predicted capacity
can *not* be offloaded.

These are not generic tests, but ones that are tailored for mlxsw
specifically. For that reason they are not added to net/forwarding
selftests subdirectory, but rather to a newly-added drivers/net/mlxsw.

Patches #1, #2 and #3 tweak the generic forwarding/lib.sh to support the
new additions.

In patches #4 and #5, new libraries for interfacing with devlink are
introduced, first a generic one, then a Spectrum-specific one.

In patch #6, a devlink resource test is introduced.

Patches #7 and #8, #9 and #10, and #11 and #12 introduce three scale
tests: router, flower and mirror-to-gretap. The first of each pair of
patches introduces a generic portion of the test (mlxsw-specific), the
second introduces a Spectrum-specific wrapper.

Patch #13 then introduces a scale test driver that runs (possibly a
subset of) the tests introduced by patches from previous paragraph.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:16 +09:00
Yuval Mintz
1b6130df62 selftests: mlxsw: Add scale test for resources
Add a scale test capable of validating that offloaded network
functionality is indeed functional at scale when configured to
the different KVD profiles available.

Start by testing offloaded routes are functional at scale by
passing traffic on each one of them in turn.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:16 +09:00
Petr Machata
9136074d56 selftests: mlxsw: Add target for mirror-to-gretap test on spectrum
Add a wrapper around mlxsw/mirror_gre_scale.sh that parameterized number
of offloadable mirrors on Spectrum machines.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:16 +09:00
Petr Machata
b973b78aae selftests: mlxsw: Add scale test for mirror-to-gretap
Test that it's possible to offload a given number of mirrors.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:16 +09:00
Petr Machata
741a7661f0 selftests: mlxsw: Add target for tc flower test on spectrum
Add a wrapper around mlxsw/tc_flower_scale.sh that parameterizes the
generic tc flower scale test template with Spectrum-specific target
values.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
d67a94e81f selftests: mlxsw: Add tc flower scale test
Add test of capacity to offload flower.

This is a generic portion of the test that is meant to be called from a
driver that supplies a particular number of rules to be tested with.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Yuval Mintz
c51a744a28 selftests: mlxsw: Add target for router test on spectrum
IPv4 routes in Spectrum are based on the kvd single-hash, but as it's
a hash we need to assume we cannot reach 100% of its capacity.

Add a wrapper that provides us with good/bad target numbers for the
Spectrum ASIC.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
[petrm@mellanox.com: Drop shebang.]
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Arkadi Sharshevsky
d98307c52b selftests: mlxsw: Add router test
This test aims for both stand alone and internal usage by the resource
infra. The test receives the number routes to offload and checks:
- The routes were offloaded correctly
- Traffic for each route.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Yuval Mintz
b030c33811 selftests: mlxsw: Add devlink KVD resource test
Add a selftest that can be used to perform basic sanity of the devlink
resource API as well as test the behavior of KVD manipulation in the
driver.

This is the first case of a HW-only test - in order to test the devlink
resource a driver capable of exposing resources has to be provided
first.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
[petrm@mellanox.com: Extracted two patches out of this patch. Tweaked
commit message.]
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
5aeba3e89b selftests: mlxsw: Add devlink_lib_spectrum.sh
This library builds on top of devlink_lib.sh and contains functionality
specific to Spectrum ASICs, e.g., re-partitioning the various KVD
sub-parts.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
[petrm@mellanox.com: Split this out from another patch. Fix line length
in devlink_sp_read_kvd_defaults().]
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
bc7cbb1e9f selftests: forwarding: Add devlink_lib.sh
This helper library contains wrappers to devlink functionality agnostic
to the underlying device.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
[petrm@mellanox.com: Split this out from another patch.]
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
68d9cea594 selftests: forwarding: lib: Parameterize NUM_NETIFS in two functions
setup_wait() and tc_offload_check() both assume that all NUM_NETIFS
interfaces are relevant for a given test. However, the scale test script
acts as an umbrella for a number of sub-tests, some of which may not
require all the interfaces.

Thus it's suboptimal for tc_offload_check() to query all the interfaces.
In case of setup_wait() it's incorrect, because the sub-test in question
of course doesn't configure any interfaces beyond what it needs, and
setup_wait() then ends up waiting indefinitely for the extraneous
interfaces to come up.

For that reason, give setup_wait() and tc_offload_check() an optional
parameter with a number of interfaces to probe. Fall back to global
NUM_NETIFS if the parameter is not given.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
96fa91d281 selftests: forwarding: lib: Add check_err_fail()
In the scale testing scenarios, one usually has a condition that is
expected to either fail, or pass, depending on which side of the scale
is being tested.

To capture this logic, add a function check_err_fail(), which dispatches
either to check_err() or check_fail(), depending on the value of the
first argument, should_fail.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Yuval Mintz
87d8fb18cb selftests: forwarding: Allow lib.sh sourcing from other directories
The devlink related scripts are mlxsw-specific. As a result, they'll
reside in a different directory - but would still need the common logic
implemented in lib.sh.
So as a preliminary step, allow lib.sh to be sourced from other
directories as well.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
David S. Miller
a847726159 Merge branch 'nfp-flower-updates-and-netconsole'
Jakub Kicinski says:

====================
nfp: flower updates and netconsole

This set contains assorted updates to driver base and flower.
First patch is a follow up to a fix to calculating counters which
went into net.  For ethtool counters we should also make sure
they are visible even after ring reconfiguration.  Next patch
is a safety measure in case we are running on a machine with a
broken BIOS we should fail the probe when risk of incorrect
operation is too high.  The next two patches add netpoll support
and make use of napi_consume_skb().  Last we populate bus info
on all representors.

Pieter adds support for offload of the checksum action in flower.

John follows up to another fix he's done in net, we set TTL
values on tunnels to stack's default, now Johns does a full
route lookup to get a more precise information, he populates
ToS field as well.  Last but not least he follows up on Jiri's
request to enable LAG offload in case the team driver is used
and then hash function is unknown.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:57 +09:00
John Hurley
635cf43dbd nfp: flower: enabled offloading of Team LAG
Currently the NFP fw only supports L3/L4 hashing so rejects the offload of
filters that output to LAG ports implementing other hash algorithms. Team,
however, uses a BPF function for the hash that is not defined. To support
Team offload, accept hashes that are defined as 'unknown' (only Team
defines such hash types). In this case, use the NFP default of L3/L4
hashing for egress port selection.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:56 +09:00
John Hurley
51a8cefc6e nfp: flower: offload tos and tunnel flags for ipv4 udp tunnels
Extract the tos and the tunnel flags from the tunnel key and offload these
action fields. Only the checksum and tunnel key flags are implemented in
fw so reject offloads of other flags. The tunnel key flag is always
considered set in the fw so enforce that it is set in the rule. Note that
the compulsory setting of the tunnel key flag and optional setting of
checksum is inline with how tc currently generates ipv4 udp tunnel
actions.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:56 +09:00
John Hurley
ed21b637e9 nfp: flower: extract ipv4 udp tunnel ttl from route
Previously the ttl for ipv4 udp tunnels was set to the namespace default.
Modify this to attempt to extract the ttl from a full route lookup on the
tunnel destination. If this is not possible then resort to the default.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:56 +09:00
Pieter Jansen van Vuuren
ed8f2b52b6 nfp: flower: ignore checksum actions when performing pedit actions
Hardware will automatically update csum in headers when a set action has
been performed. This means we could in the driver ignore the explicit
checksum action when performing a set action.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:56 +09:00
Jakub Kicinski
5d4b0b4068 nfp: populate bus-info on representors
We used to leave bus-info in ethtool driver info empty for
representors in case multi-PCIe-to-single-host cards make
the association between PCIe device and NFP many to one.
It seems these attempts are futile, we need to link the
representors to one PCIe device in sysfs to get consistent
naming, plus devlink uses one PCIe as a handle, anyway.
The multi-PCIe-to-single-system support won't be clean,
if it ever comes.

Turns out some user space (RHEL tests) likes to read bus-info
so just populate it.

While at it remove unnecessary app NULL-check, representors
are spawned by an app, so it must exist.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:56 +09:00
Jakub Kicinski
d387b8a19a nfp: make use of napi_consume_skb()
Use napi_consume_skb() in nfp_net_tx_complete() to get bulk free.
Pass 0 as budget for ctrl queue completion since it runs out of
a tasklet.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:56 +09:00
Jakub Kicinski
670b5274ff nfp: implement netpoll ndo (thus enabling netconsole)
NFP NAPI handling will only complete the TXed packets when called
with budget of 0, implement ndo_poll_controller by scheduling NAPI
on all TX queues.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:56 +09:00
Jakub Kicinski
18aa5b180f nfp: fail probe if serial or interface id is missing
On some platforms with broken ACPI tables we may not have access
to the Serial Number PCIe capability.  This capability is crucial
for us for switchdev operation as we use serial number as switch ID,
and for communication with management FW where interface ID is used.

If we can't determine the Serial Number we have to fail device probe.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:56 +09:00
Jakub Kicinski
f055a9dfee nfp: expose ring stats of inactive rings via ethtool
After user changes the ring count statistics for deactivated
rings disappear from ethtool -S output.  This causes loss of
information to the user and means that ethtool stats may not
add up to interface stats.  Always expose counters from all
the rings.  Note that we allocate at most num_possible_cpus()
rings so number of rings should be reasonable.

The alternative of only listing stats for rings which were
ever in use could be confusing.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:31:56 +09:00
Stephen Hemminger
3ffe64f1a6 hv_netvsc: split sub-channel setup into async and sync
When doing device hotplug the sub channel must be async to avoid
deadlock issues because device is discovered in softirq context.

When doing changes to MTU and number of channels, the setup
must be synchronous to avoid races such as when MTU and device
settings are done in a single ip command.

Reported-by: Thomas Walker <Thomas.Walker@twosigma.com>
Fixes: 8195b1396e ("hv_netvsc: fix deadlock on hotplug")
Fixes: 732e49850c ("netvsc: fix race on sub channel creation")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:28:36 +09:00
Cong Wang
3f76df1982 net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN
As noticed by Eric, we need to switch to the helper
dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too,
otheriwse still miss dev_qdisc_change_tx_queue_len().

Fixes: 6a643ddb56 ("net: introduce helper dev_change_tx_queue_len()")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:26:52 +09:00
Vakul Garg
4e485d06bb strparser: Call skb_unclone conditionally
Calling skb_unclone() is expensive as it triggers a memcpy operation.
Instead of calling skb_unclone() unconditionally, call it only when skb
has a shared frag_list. This improves tls rx throughout significantly.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Suggested-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:25:39 +09:00
Gustavo A. R. Silva
ced9e19150 atm: zatm: Fix potential Spectre v1
pool can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue
'zatm_dev->pool_info' (local cap)

Fix this by sanitizing pool before using it to index
zatm_dev->pool_info

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:24:18 +09:00
David S. Miller
c7f653e0a8 Merge branch 's390-qeth-fixes'
Julian Wiedmann says:

====================
s390/qeth: fixes 2018-06-29

please apply a few qeth fixes for -net and your 4.17 stable queue.

Patches 1-3 fix several issues wrt to MAC address management that were
introduced during the 4.17 cycle.
Patch 4 tackles a long-standing issue with busy multi-connection workloads
on devices in af_iucv mode.
Patch 5 makes sure to re-enable all active HW offloads, after a card was
previously set offline and thus lost its HW context.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:19:55 +09:00
Julian Wiedmann
d025da9eb1 s390/qeth: consistently re-enable device features
commit e830baa9c3 ("qeth: restore device features after recovery") and
commit ce34435641 ("s390/qeth: rely on kernel for feature recovery")
made sure that the HW functions for device features get re-programmed
after recovery.

But we missed that the same handling is also required when a card is
first set offline (destroying all HW context), and then online again.
Fix this by moving the re-enable action out of the recovery-only path.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:19:48 +09:00
Julian Wiedmann
ce28867fd2 s390/qeth: don't clobber buffer on async TX completion
If qeth_qdio_output_handler() detects that a transmit requires async
completion, it replaces the pending buffer's metadata object
(qeth_qdio_out_buffer) so that this queue buffer can be re-used while
the data is pending completion.

Later when the CQ indicates async completion of such a metadata object,
qeth_qdio_cq_handler() tries to free any data associated with this
object (since HW has now completed the transfer). By calling
qeth_clear_output_buffer(), it erronously operates on the queue buffer
that _previously_ belonged to this transfer ... but which has been
potentially re-used several times by now.
This results in double-free's of the buffer's data, and failing
transmits as the buffer descriptor is scrubbed in mid-air.

The correct way of handling this situation is to
1. scrub the queue buffer when it is prepared for re-use, and
2. later obtain the data addresses from the async-completion notifier
   (ie. the AOB), instead of the queue buffer.

All this only affects qeth devices used for af_iucv HiperTransport.

Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 21:19:48 +09:00