Commit Graph

5104 Commits

Author SHA1 Message Date
Linus Torvalds
2324d50d05 It's been a busy cycle for documentation - hopefully the busiest for a
while to come.  Changes include:
 
  - Some new Chinese translations
 
  - Progress on the battle against double words words and non-HTTPS URLs
 
  - Some block-mq documentation
 
  - More RST conversions from Mauro.  At this point, that task is
    essentially complete, so we shouldn't see this kind of churn again for a
    while.  Unless we decide to switch to asciidoc or something...:)
 
  - Lots of typo fixes, warning fixes, and more.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
 cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
 BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
 7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
 9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
 0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
 =JQLZ
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It's been a busy cycle for documentation - hopefully the busiest for a
  while to come. Changes include:

   - Some new Chinese translations

   - Progress on the battle against double words words and non-HTTPS
     URLs

   - Some block-mq documentation

   - More RST conversions from Mauro. At this point, that task is
     essentially complete, so we shouldn't see this kind of churn again
     for a while. Unless we decide to switch to asciidoc or
     something...:)

   - Lots of typo fixes, warning fixes, and more"

* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
  scripts/kernel-doc: optionally treat warnings as errors
  docs: ia64: correct typo
  mailmap: add entry for <alobakin@marvell.com>
  doc/zh_CN: add cpu-load Chinese version
  Documentation/admin-guide: tainted-kernels: fix spelling mistake
  MAINTAINERS: adjust kprobes.rst entry to new location
  devices.txt: document rfkill allocation
  PCI: correct flag name
  docs: filesystems: vfs: correct flag name
  docs: filesystems: vfs: correct sync_mode flag names
  docs: path-lookup: markup fixes for emphasis
  docs: path-lookup: more markup fixes
  docs: path-lookup: fix HTML entity mojibake
  CREDITS: Replace HTTP links with HTTPS ones
  docs: process: Add an example for creating a fixes tag
  doc/zh_CN: add Chinese translation prefer section
  doc/zh_CN: add clearing-warn-once Chinese version
  doc/zh_CN: add admin-guide index
  doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
  futex: MAINTAINERS: Re-add selftests directory
  ...
2020-08-04 22:47:54 -07:00
Linus Torvalds
4da9f33026 Support for FSGSBASE. Almost 5 years after the first RFC to support it,
this has been brought into a shape which is maintainable and actually
 works.
 
 This final version was done by Sasha Levin who took it up after Intel
 dropped the ball. Sasha discovered that the SGX (sic!) offerings out there
 ship rogue kernel modules enabling FSGSBASE behind the kernels back which
 opens an instantanious unpriviledged root hole.
 
 The FSGSBASE instructions provide a considerable speedup of the context
 switch path and enable user space to write GSBASE without kernel
 interaction. This enablement requires careful handling of the exception
 entries which go through the paranoid entry path as they cannot longer rely
 on the assumption that user GSBASE is positive (as enforced via prctl() on
 non FSGSBASE enabled systemn). All other entries (syscalls, interrupts and
 exceptions) can still just utilize SWAPGS unconditionally when the entry
 comes from user space. Converting these entries to use FSGSBASE has no
 benefit as SWAPGS is only marginally slower than WRGSBASE and locating and
 retrieving the kernel GSBASE value is not a free operation either. The real
 benefit of RD/WRGSBASE is the avoidance of the MSR reads and writes.
 
 The changes come with appropriate selftests and have held up in field
 testing against the (sanitized) Graphene-SGX driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8pGnoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTYJD/9873GkwvGcc/Vq/dJH1szGTgFftPyZ
 c/Y9gzx7EGBPLo25BS820L+ZlynzXHDxExKfCEaD10TZfe5XIc1vYNR0J74M2NmK
 IBgEDstJeW93ai+rHCFRXIevhpzU4GgGYJ1MeeOgbVMN3aGU1g6HfzMvtF0fPn8Y
 n6fsLZa43wgnoTdjwjjikpDTrzoZbaL1mbODBzBVPAaTbim7IKKTge6r/iCKrOjz
 Uixvm3g9lVzx52zidJ9kWa8esmbOM1j0EPe7/hy3qH9DFo87KxEzjHNH3T6gY5t6
 NJhRAIfY+YyTHpPCUCshj6IkRudE6w/qjEAmKP9kWZxoJrvPCTWOhCzelwsFS9b9
 gxEYfsnaKhsfNhB6fi0PtWlMzPINmEA7SuPza33u5WtQUK7s1iNlgHfvMbjstbwg
 MSETn4SG2/ZyzUrSC06lVwV8kh0RgM3cENc/jpFfIHD0vKGI3qfka/1RY94kcOCG
 AeJd0YRSU2RqL7lmxhHyG8tdb8eexns41IzbPCLXX2sF00eKNkVvMRYT2mKfKLFF
 q8v1x7yuwmODdXfFR6NdCkGm9IU7wtL6wuQ8Nhu9UraFmcXo6X6FLJC18FqcvSb9
 jvcRP4XY/8pNjjf44JB8yWfah0xGQsaMIKQGP4yLv4j6Xk1xAQKH1MqcC7l1D2HN
 5Z24GibFqSK/vA==
 =QaAN
 -----END PGP SIGNATURE-----

Merge tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fsgsbase from Thomas Gleixner:
 "Support for FSGSBASE. Almost 5 years after the first RFC to support
  it, this has been brought into a shape which is maintainable and
  actually works.

  This final version was done by Sasha Levin who took it up after Intel
  dropped the ball. Sasha discovered that the SGX (sic!) offerings out
  there ship rogue kernel modules enabling FSGSBASE behind the kernels
  back which opens an instantanious unpriviledged root hole.

  The FSGSBASE instructions provide a considerable speedup of the
  context switch path and enable user space to write GSBASE without
  kernel interaction. This enablement requires careful handling of the
  exception entries which go through the paranoid entry path as they
  can no longer rely on the assumption that user GSBASE is positive (as
  enforced via prctl() on non FSGSBASE enabled systemn).

  All other entries (syscalls, interrupts and exceptions) can still just
  utilize SWAPGS unconditionally when the entry comes from user space.
  Converting these entries to use FSGSBASE has no benefit as SWAPGS is
  only marginally slower than WRGSBASE and locating and retrieving the
  kernel GSBASE value is not a free operation either. The real benefit
  of RD/WRGSBASE is the avoidance of the MSR reads and writes.

  The changes come with appropriate selftests and have held up in field
  testing against the (sanitized) Graphene-SGX driver"

* tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/fsgsbase: Fix Xen PV support
  x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase
  selftests/x86/fsgsbase: Add a missing memory constraint
  selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test
  selftests/x86: Add a syscall_arg_fault_64 test for negative GSBASE
  selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE
  selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write
  Documentation/x86/64: Add documentation for GS/FS addressing mode
  x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit
  x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
  x86/entry/64: Introduce the FIND_PERCPU_BASE macro
  x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
  x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation
  x86/process/64: Use FSGSBASE instructions on thread copy and ptrace
  x86/process/64: Use FSBSBASE in switch_to() if available
  x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE
  x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
  x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
  x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  ...
2020-08-04 21:16:22 -07:00
Linus Torvalds
4f30a60aa7 close-range-v5.9
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygcpgAKCRCRxhvAZXjc
 ogPeAQDv1ncqtNroFAC4pJ4tQhH7JSjW0OltiMk/AocY/J2SdQD9GJ15luYJ0/om
 697q/Z68sndRynhdoZlMuf3oYuBlHQw=
 =3ZhE
 -----END PGP SIGNATURE-----

Merge tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull close_range() implementation from Christian Brauner:
 "This adds the close_range() syscall. It allows to efficiently close a
  range of file descriptors up to all file descriptors of a calling
  task.

  This is coordinated with the FreeBSD folks which have copied our
  version of this syscall and in the meantime have already merged it in
  April 2019:

    https://reviews.freebsd.org/D21627
    https://svnweb.freebsd.org/base?view=revision&revision=359836

  The syscall originally came up in a discussion around the new mount
  API and making new file descriptor types cloexec by default. During
  this discussion, Al suggested the close_range() syscall.

  First, it helps to close all file descriptors of an exec()ing task.
  This can be done safely via (quoting Al's example from [1] verbatim):

        /* that exec is sensitive */
        unshare(CLONE_FILES);
        /* we don't want anything past stderr here */
        close_range(3, ~0U);
        execve(....);

  The code snippet above is one way of working around the problem that
  file descriptors are not cloexec by default. This is aggravated by the
  fact that we can't just switch them over without massively regressing
  userspace. For a whole class of programs having an in-kernel method of
  closing all file descriptors is very helpful (e.g. demons, service
  managers, programming language standard libraries, container managers
  etc.).

  Second, it allows userspace to avoid implementing closing all file
  descriptors by parsing through /proc/<pid>/fd/* and calling close() on
  each file descriptor and other hacks. From looking at various
  large(ish) userspace code bases this or similar patterns are very
  common in service managers, container runtimes, and programming
  language runtimes/standard libraries such as Python or Rust.

  In addition, the syscall will also work for tasks that do not have
  procfs mounted and on kernels that do not have procfs support compiled
  in. In such situations the only way to make sure that all file
  descriptors are closed is to call close() on each file descriptor up
  to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery.

  Based on Linus' suggestion close_range() also comes with a new flag
  CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping
  right before exec. This would usually be expressed in the sequence:

        unshare(CLONE_FILES);
        close_range(3, ~0U);

  as pointed out by Linus it might be desirable to have this be a part
  of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which
  gets especially handy when we're closing all file descriptors above a
  certain threshold.

  Test-suite as always included"

* tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add CLOSE_RANGE_UNSHARE tests
  close_range: add CLOSE_RANGE_UNSHARE
  tests: add close_range() tests
  arch: wire-up close_range()
  open: add close_range()
2020-08-04 15:12:02 -07:00
Linus Torvalds
74858abbb1 cap-checkpoint-restore-v5.9
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygegQAKCRCRxhvAZXjc
 olWZAQCMPbhI/20LA3OYJ6s+BgBEnm89PymvlHcym6Z4AvTungD+KqZonIYuxWgi
 6Ttlv/fzgFFbXgJgbuass5mwFVoN5wM=
 =oK7d
 -----END PGP SIGNATURE-----

Merge tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull checkpoint-restore updates from Christian Brauner:
 "This enables unprivileged checkpoint/restore of processes.

  Given that this work has been going on for quite some time the first
  sentence in this summary is hopefully more exciting than the actual
  final code changes required. Unprivileged checkpoint/restore has seen
  a frequent increase in interest over the last two years and has thus
  been one of the main topics for the combined containers &
  checkpoint/restore microconference since at least 2018 (cf. [1]).

  Here are just the three most frequent use-cases that were brought forward:

   - The JVM developers are integrating checkpoint/restore into a Java
     VM to significantly decrease the startup time.

   - In high-performance computing environment a resource manager will
     typically be distributing jobs where users are always running as
     non-root. Long-running and "large" processes with significant
     startup times are supposed to be checkpointed and restored with
     CRIU.

   - Container migration as a non-root user.

  In all of these scenarios it is either desirable or required to run
  without CAP_SYS_ADMIN. The userspace implementation of
  checkpoint/restore CRIU already has the pull request for supporting
  unprivileged checkpoint/restore up (cf. [2]).

  To enable unprivileged checkpoint/restore a new dedicated capability
  CAP_CHECKPOINT_RESTORE is introduced. This solution has last been
  discussed in 2019 in a talk by Google at Linux Plumbers (cf. [1]
  "Update on Task Migration at Google Using CRIU") with Adrian and
  Nicolas providing the implementation now over the last months. In
  essence, this allows the CRIU binary to be installed with the
  CAP_CHECKPOINT_RESTORE vfs capability set thereby enabling
  unprivileged users to restore processes.

  To make this possible the following permissions are altered:

   - Selecting a specific PID via clone3() set_tid relaxed from userns
     CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.

   - Selecting a specific PID via /proc/sys/kernel/ns_last_pid relaxed
     from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.

   - Accessing /proc/pid/map_files relaxed from init userns
     CAP_SYS_ADMIN to init userns CAP_CHECKPOINT_RESTORE.

   - Changing /proc/self/exe from userns CAP_SYS_ADMIN to userns
     CAP_CHECKPOINT_RESTORE.

  Of these four changes the /proc/self/exe change deserves a few words
  because the reasoning behind even restricting /proc/self/exe changes
  in the first place is just full of historical quirks and tracking this
  down was a questionable version of fun that I'd like to spare others.

  In short, it is trivial to change /proc/self/exe as an unprivileged
  user, i.e. without userns CAP_SYS_ADMIN right now. Either via ptrace()
  or by simply intercepting the elf loader in userspace during exec.
  Nicolas was nice enough to even provide a POC for the latter (cf. [3])
  to illustrate this fact.

  The original patchset which introduced PR_SET_MM_MAP had no
  permissions around changing the exe link. They too argued that it is
  trivial to spoof the exe link already which is true. The argument
  brought up against this was that the Tomoyo LSM uses the exe link in
  tomoyo_manager() to detect whether the calling process is a policy
  manager. This caused changing the exe links to be guarded by userns
  CAP_SYS_ADMIN.

  All in all this rather seems like a "better guard it with something
  rather than nothing" argument which imho doesn't qualify as a great
  security policy. Again, because spoofing the exe link is possible for
  the calling process so even if this were security relevant it was
  broken back then and would be broken today. So technically, dropping
  all permissions around changing the exe link would probably be
  possible and would send a clearer message to any userspace that relies
  on /proc/self/exe for security reasons that they should stop doing
  this but for now we're only relaxing the exe link permissions from
  userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE.

  There's a final uapi change in here. Changing the exe link used to
  accidently return EINVAL when the caller lacked the necessary
  permissions instead of the more correct EPERM. This pr contains a
  commit fixing this. I assume that userspace won't notice or care and
  if they do I will revert this commit. But since we are changing the
  permissions anyway it seems like a good opportunity to try this fix.

  With these changes merged unprivileged checkpoint/restore will be
  possible and has already been tested by various users"

[1] LPC 2018
     1. "Task Migration at Google Using CRIU"
        https://www.youtube.com/watch?v=yI_1cuhoDgA&t=12095
     2. "Securely Migrating Untrusted Workloads with CRIU"
        https://www.youtube.com/watch?v=yI_1cuhoDgA&t=14400
     LPC 2019
     1. "CRIU and the PID dance"
         https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=2m48s
     2. "Update on Task Migration at Google Using CRIU"
        https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=1h2m8s

[2] https://github.com/checkpoint-restore/criu/pull/1155

[3] https://github.com/nviennot/run_as_exe

* tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  selftests: add clone3() CAP_CHECKPOINT_RESTORE test
  prctl: exe link permission error changed from -EINVAL to -EPERM
  prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe
  proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE
  pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid
  pid: use checkpoint_restore_ns_capable() for set_tid
  capabilities: Introduce CAP_CHECKPOINT_RESTORE
2020-08-04 15:02:07 -07:00
Linus Torvalds
0a72761b27 threads-v5.9
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygcLwAKCRCRxhvAZXjc
 ohajAP4n5E3BmN0jpIviXT4eNhP62jzxJtxlVXtgGT3D8b1mpQEA5n8NSOlQLoAh
 yUGsjtwR9xDcHMcrhXD3yN6eYJSK0A8=
 =tn4R
 -----END PGP SIGNATURE-----

Merge tag 'threads-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread updates from Christian Brauner:
 "This contains the changes to add the missing support for attaching to
  time namespaces via pidfds.

  Last cycle setns() was changed to support attaching to multiple
  namespaces atomically. This requires all namespaces to have a point of
  no return where they can't fail anymore.

  Specifically, <namespace-type>_install() is allowed to perform
  permission checks and install the namespace into the new struct nsset
  that it has been given but it is not allowed to make visible changes
  to the affected task. Once <namespace-type>_install() returns,
  anything that the given namespace type additionally requires to be
  setup needs to ideally be done in a function that can't fail or if it
  fails the failure must be non-fatal.

  For time namespaces the relevant functions that fell into this
  category were timens_set_vvar_page() and vdso_join_timens(). The
  latter could still fail although it didn't need to. This function is
  only implemented for vdso_join_timens() in current mainline. As
  discussed on-list (cf. [1]), in order to make setns() support time
  namespaces when attaching to multiple namespaces at once properly we
  changed vdso_join_timens() to always succeed. So vdso_join_timens()
  replaces the mmap_write_lock_killable() with mmap_read_lock().

  Please note that arm is about to grow vdso support for time namespaces
  (possibly this merge window). We've synced on this change and arm64
  also uses mmap_read_lock(), i.e. makes vdso_join_timens() a function
  that can't fail. Once the changes here and the arm64 changes have
  landed, vdso_join_timens() should be turned into a void function so
  it's obvious to callers and implementers on other architectures that
  the expectation is that it can't fail.

  We didn't do this right away because it would've introduced
  unnecessary merge conflicts between the two trees for no major gain.

  As always, tests included"

[1]: https://lore.kernel.org/lkml/20200611110221.pgd3r5qkjrjmfqa2@wittgenstein

* tag 'threads-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add CLONE_NEWTIME setns tests
  nsproxy: support CLONE_NEWTIME with setns()
  timens: add timens_commit() helper
  timens: make vdso_join_timens() always succeed
2020-08-04 14:40:07 -07:00
Linus Torvalds
9ecc6ea491 seccomp updates for v5.9-rc1
- Improved selftest coverage, timeouts, and reporting
 - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)
 - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers
 - Introduce "addfd" command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oZcQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJomDD/4x3j7eXREcXDsHOmlgEaHWGx4l
 JldHFQhV5GjmD7gOkPcoZSG7NfG7F6VpwAJg7ZoR3qUkem7K8DFucxqgo1RldCot
 nigleeLX6JeMS0Z+iwjAVZd+5t4xG4J/7GGDHIIMiG5qvwJ0Yf64o1bkjaB2Q/Bv
 tluBg0WF32kFMG/ZwyY/V2QDbbue97CFPflybOh1o2nWbVzmUlFEEum3UUvZsxc8
 smMsattJyuAV7kcEKzKrs8b010NdFZqwdbub5Np9W3XEXGBYMdIPoNsOQGmB9wby
 j2ui0lzboXRG997jM7TCd1l/XZAv8aAwvPplw3FJRybzkOGs9NDyLMoz87yJpR1T
 xp511vnMyMbyKIGdungkt7cIyzaictHwaYzznsmuNdCPEjTaIQJr1ctsa4GEgtqf
 pnkktZ9YbMCcHU0CtZ8GlOVqA9wE+FUm0/u0zgikzJQsB+HcNItiARTTTHRyco7p
 VJCqK8o4Zx4ELV7QNkSH4nhFkVgRopvrvBiPAGro/qwGOofBg8W8wM8O1+V/MDmp
 zSU22v4SncT1Xb7dtmdJqDEeHfDikhaCAb4Je2hsGQWzbdAqwHGlpa7vpk9x3Q5r
 L+XyP+Z+rPHlXYyypJwUvvOQhXOmP0zYxcEHxByqIBfXiwy+3dN4tDDfatWbccwl
 uTlTDM8kmQn6QzSztA==
 =yb55
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
 "There are a bunch of clean ups and selftest improvements along with
  two major updates to the SECCOMP_RET_USER_NOTIF filter return:
  EPOLLHUP support to more easily detect the death of a monitored
  process, and being able to inject fds when intercepting syscalls that
  expect an fd-opening side-effect (needed by both container folks and
  Chrome). The latter continued the refactoring of __scm_install_fd()
  started by Christoph, and in the process found and fixed a handful of
  bugs in various callers.

   - Improved selftest coverage, timeouts, and reporting

   - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)

   - Refactor __scm_install_fd() into __receive_fd() and fix buggy
     callers

   - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun
     Dhillon)"

* tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
  selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
  seccomp: Introduce addfd ioctl to seccomp user notifier
  fs: Expand __receive_fd() to accept existing fd
  pidfd: Replace open-coded receive_fd()
  fs: Add receive_fd() wrapper for __receive_fd()
  fs: Move __scm_install_fd() to __receive_fd()
  net/scm: Regularize compat handling of scm_detach_fds()
  pidfd: Add missing sock updates for pidfd_getfd()
  net/compat: Add missing sock updates for SCM_RIGHTS
  selftests/seccomp: Check ENOSYS under tracing
  selftests/seccomp: Refactor to use fixture variants
  selftests/harness: Clean up kern-doc for fixtures
  seccomp: Use -1 marker for end of mode 1 syscall list
  seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
  selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
  selftests/seccomp: Make kcmp() less required
  seccomp: Use pr_fmt
  selftests/seccomp: Improve calibration loop
  selftests/seccomp: use 90s as timeout
  selftests/seccomp: Expand benchmark to per-filter measurements
  ...
2020-08-04 14:11:08 -07:00
Linus Torvalds
0a897743ac A single commit that adds the /sys/kernel/debug/selftest_helpers/test_fpu FPU self-test.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oUawRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIxw//dFlYF9W5W5dBOK0keLlUiHr2WG77Emwz
 I1+sGfTIZAkWCrbCYhVBSOr3tfQ+aJ/HHNlVLHYX9USah297z3gLUZJ+pVdvPPO7
 Sb56KwZ/0d6usiuullirSe2btCV+qEtxGJVVqeR9YpcEW6If9Nhp2r1eLzqjo9up
 M6MJKtGeLBuifjPU5zyay7cAE1fW4LxA92fEWtG5GXbMSCndrU0defwge4iQFYD+
 RnbAuDf/L9pe8dbOfvnH6K12mBeoD6Z3MnMXiUTu6zvivp4hQshfKw24BCKBKlRZ
 kkZ16pVKX48sXulhI89ppVUJGUhmhSF/1mrPZSi1PbZltZcS+oCH5GEGTM9KCHfR
 HKsUl1lxNjTKU3cTZLyYMQqniiPj51h53h7DhDyTdh3RW+Dh6wp2DhoaRpZw0Nd+
 8VUpbMSNKlEbPzuHT5z8XjcwPIynoxxLCo2AGRbEuoeuY9Sv337ST/pvXdPbdRX+
 1Y8PPOpB3xgBnFZur3VXHdIFz0CwS7XoX56ZLY7ahWzBHNP+BHhICPY//QhyWfMf
 mVeJSRdSHlF30Sle/xDoy6up5EqlbhclUUwhpQwFaSqPMBo6ygb6Xtya/tLXmDUz
 bl4qJNVs2RFaH+68XTlCh7lUnDaSDjBXlA6Ymo3qF9AE0FJoqvfzzKRPKs68YVu8
 a38VxITW1sI=
 =mdhI
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FPU selftest from Ingo Molnar:
 "Add the /sys/kernel/debug/selftest_helpers/test_fpu FPU self-test"

* tag 'x86-fpu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/fpu: Add an FPU selftest
2020-08-03 17:21:10 -07:00
Linus Torvalds
8f0cb6660a These are the latest RCU bits for v5.9:
- kfree_rcu updates
   - RCU tasks updates
   - Read-side scalability tests
   - SRCU updates
   - Torture-test updates
   - Documentation updates
   - Miscellaneous fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8n80ERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gauA/+NtuExW9V9cPDZ8AAp6x6QfoEIgqN4VEk
 pYuyP0+ZbmwH+h8z7qPqMrwxUHQnhef7gqtlWa7wj9MawbEbmqnA/3uivjX/3Aao
 bGMMXkqXppc6hgwktgLNk8vfq3LRVEH2P0i0I+Tymgxu3DCHSGRep4LWfdAS/q3z
 4pe5JXqdMx+Qnfy/bsVxJTaJAncMq1LQNAtWY1TIwK8L8RmpXrj5dvuLKUr7q+zl
 P+BfXyrdX+x05TpmHHnI/bR3w9yASL32E0S3IaQYRRqH8TsUIGHWe13Ib6hKXXG5
 j7W5KrsOgr0fQBxi+JW2fgGQkrua4o7yk4H2Ygj+Fi5RvP2uqNZdvXFAlP2cUMu/
 7Pg8+7kC6jKIrwpD03s9ZZzm0QN3jsCxFs2PEkkHMzjXbe1CI4tIkTH6ex1uvjR2
 v3OhCIp6ypxpEIJbFQucia0iQ4NF+evKjqCvRkbepqQ096jg+CNFh0VG0Tp8XR+y
 Gk9B9oXvLLPMd6ah5CI9nLJKiMWVRV8mvvqspoblGo//+39ksh4mzxm865tFXYg4
 C+DPJvKlY15Ib5eJ/xr8EZ/oS0K2sUF9sMYnK4P8QMhyTBMbpAZiljHYK+Wujt8I
 g/JCWxrEMv3LHPY9/guB5Nod/Qb4Jqqm9iE9qEX3MQxtt2O2nmmWd91pzFcUXlFU
 RDBWYJ63Okg=
 =rNhf
 -----END PGP SIGNATURE-----

Merge tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:

 - kfree_rcu updates

 - RCU tasks updates

 - Read-side scalability tests

 - SRCU updates

 - Torture-test updates

 - Documentation updates

 - Miscellaneous fixes

* tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (109 commits)
  torture: Remove obsolete "cd $KVM"
  torture: Avoid duplicate specification of qemu command
  torture: Dump ftrace at shutdown only if requested
  torture: Add kvm-tranform.sh script for qemu-cmd files
  torture: Add more tracing crib notes to kvm.sh
  torture: Improve diagnostic for KCSAN-incapable compilers
  torture: Correctly summarize build-only runs
  torture: Pass --kmake-arg to all make invocations
  rcutorture: Check for unwatched readers
  torture: Abstract out console-log error detection
  torture: Add a stop-run capability
  torture: Create qemu-cmd in --buildonly runs
  rcu/rcutorture: Replace 0 with false
  torture: Add --allcpus argument to the kvm.sh script
  torture: Remove whitespace from identify_qemu_vcpus output
  rcutorture: NULL rcu_torture_current earlier in cleanup code
  rcutorture: Handle non-statistic bang-string error messages
  torture: Set configfile variable to current scenario
  rcutorture: Add races with task-exit processing
  locktorture: Use true and false to assign to bool variables
  ...
2020-08-03 14:31:33 -07:00
Linus Torvalds
628e04dfeb Bugfixes and strengthening the validity checks on inputs from new userspace APIs.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8mdWQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM48Qf/eUYXVdGQWErQ/BQR/pLPfTUkjkIj
 1C/IGogWijbWPQrtIsnXVG53eULgHocbfzExZ8PzUSfjuZ/g9V8aHGbrGKbg/y6g
 SVArpCTu+BDidmLMbjsAKh3f5SEbzZZuFKnoxoEEKiA2TeYqDQ05nxcv9+T4udrs
 DWqXnk27y0vR9RtJkf5sqDnyH36cDT9TsbRFPaCt/hFE0UA65UTtWgl1ZaagzmuL
 I70z5Ap+/6fsEMoKoys9AKSCpLRGUBu/42fQhZkEq9no0w5PBkzn/YMfapHwT9I+
 1i9WUv3gn+FQOXpqAg0+aaPtWwIBIEONbm0qmw4yNNUI24Btq2QjGWxRqg==
 =VQPu
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Bugfixes and strengthening the validity checks on inputs from new
  userspace APIs.

  Now I know why I shouldn't prepare pull requests on the weekend, it's
  hard to concentrate if your son is shouting about his latest Minecraft
  builds in your ear. Fortunately all the patches were ready and I just
  had to check the test results..."

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Fix disable pause loop exit/pause filtering capability on SVM
  KVM: LAPIC: Prevent setting the tscdeadline timer if the lapic is hw disabled
  KVM: arm64: Don't inherit exec permission across page-table levels
  KVM: arm64: Prevent vcpu_has_ptrauth from generating OOL functions
  KVM: nVMX: check for invalid hdr.vmx.flags
  KVM: nVMX: check for required but missing VMCS12 in KVM_SET_NESTED_STATE
  selftests: kvm: do not set guest mode flag
2020-08-02 10:41:00 -07:00
David S. Miller
69138b34a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-07-31

The following pull-request contains BPF updates for your *net* tree.

We've added 5 non-merge commits during the last 21 day(s) which contain
a total of 5 files changed, 126 insertions(+), 18 deletions(-).

The main changes are:

1) Fix a map element leak in HASH_OF_MAPS map type, from Andrii Nakryiko.

2) Fix a NULL pointer dereference in __btf_resolve_helper_id() when no
   btf_vmlinux is available, from Peilin Ye.

3) Init pos variable in __bpfilter_process_sockopt(), from Christoph Hellwig.

4) Fix a cgroup sockopt verifier test by specifying expected attach type,
   from Jean-Philippe Brucker.

Note that when net gets merged into net-next later on, there is a small
merge conflict in kernel/bpf/btf.c between commit 5b801dfb7f ("bpf: Fix
NULL pointer dereference in __btf_resolve_helper_id()") from the bpf tree
and commit 138b9a0511 ("bpf: Remove btf_id helpers resolving") from the
net-next tree.

Resolve as follows: remove the old hunk with the __btf_resolve_helper_id()
function. Change the btf_resolve_helper_id() so it actually tests for a
NULL btf_vmlinux and bails out:

int btf_resolve_helper_id(struct bpf_verifier_log *log,
                          const struct bpf_func_proto *fn, int arg)
{
        int id;

        if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID || !btf_vmlinux)
                return -EINVAL;
        id = fn->btf_id[arg];
        if (!id || id > btf_vmlinux->nr_types)
                return -EINVAL;
        return id;
}

Let me know if you run into any others issues (CC'ing Jiri Olsa so he's in
the loop with regards to merge conflict resolution).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31 17:19:47 -07:00
Hangbin Liu
4bbca662df selftests/bpf: fix netdevsim trap_flow_action_cookie read
When read netdevsim trap_flow_action_cookie, we need to init it first,
or we will get "Invalid argument" error.

Fixes: d3cbb907ae ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30 16:33:07 -07:00
Ingo Molnar
c1cc4784ce Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull the v5.9 RCU bits from Paul E. McKenney:

 - Documentation updates
 - Miscellaneous fixes
 - kfree_rcu updates
 - RCU tasks updates
 - Read-side scalability tests
 - SRCU updates
 - Torture-test updates

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-31 00:15:53 +02:00
Andrii Nakryiko
0ba5834841 selftests/bpf: Extend map-in-map selftest to detect memory leaks
Add test validating that all inner maps are released properly after skeleton
is destroyed. To ensure determinism, trigger kernel-side synchronize_rcu()
before checking map existence by their IDs.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729040913.2815687-2-andriin@fb.com
2020-07-30 01:30:26 +02:00
Amit Cohen
10fef9ca6a selftests: ethtool: Fix test when only two speeds are supported
The test case check_highest_speed_is_chosen() configures $h1 to
advertise a subset of its supported speeds and checks that $h2 chooses
the highest speed from the subset.

To find the common advertised speeds between $h1 and $h2,
common_speeds_get() is called.

Currently, the first speed returned from common_speeds_get() is removed
claiming "h1 does not advertise this speed". The claim is wrong because
the function is called after $h1 already advertised a subset of speeds.

In case $h1 supports only two speeds, it will advertise a single speed
which will be later removed because of previously mentioned bug. This
results in the test needlessly failing. When more than two speeds are
supported this is not an issue because the first advertised speed
is the lowest one.

Fix this by not removing any speed from the list of commonly advertised
speeds.

Fixes: 64916b57c0 ("selftests: forwarding: Add speed and auto-negotiation test")
Reported-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-29 12:16:21 -07:00
Tanner Love
94b6c13be5 selftests/net: tcp_mmap: fix clang warning for target arch PowerPC
When size_t maps to unsigned int (e.g. on 32-bit powerpc), then the
comparison with 1<<35 is always true. Clang 9 threw:
warning: result of comparison of constant 34359738368 with \
expression of type 'size_t' (aka 'unsigned int') is always true \
[-Wtautological-constant-out-of-range-compare]
        while (total < FILE_SZ) {

Tested: make -C tools/testing/selftests TARGETS="net" run_tests

Fixes: 192dc405f3 ("selftests: net: add tcp_mmap program")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27 12:56:59 -07:00
Tanner Love
b4da96ffd3 selftests/net: so_txtime: fix clang issues for target arch PowerPC
On powerpcle, int64_t maps to long long. Clang 9 threw:
warning: absolute value function 'labs' given an argument of type \
'long long' but has parameter of type 'long' which may cause \
truncation of value [-Wabsolute-value]
        if (labs(tstop - texpect) > cfg_variance_us)

Tested: make -C tools/testing/selftests TARGETS="net" run_tests

Fixes: af5136f950 ("selftests/net: SO_TXTIME with ETF and FQ")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27 12:56:58 -07:00
Tanner Love
64f9ede227 selftests/net: psock_fanout: fix clang issues for target arch PowerPC
Clang 9 threw:
warning: format specifies type 'unsigned short' but the argument has \
type 'int' [-Wformat]
                typeflags, PORT_BASE, PORT_BASE + port_off);

Tested: make -C tools/testing/selftests TARGETS="net" run_tests

Fixes: 77f65ebdca ("packet: packet fanout rollover during socket overload")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27 12:56:58 -07:00
Tanner Love
955cbe91bc selftests/net: rxtimestamp: fix clang issues for target arch PowerPC
The signedness of char is implementation-dependent. Some systems
(including PowerPC and ARM) use unsigned char. Clang 9 threw:
warning: result of comparison of constant -1 with expression of type \
'char' is always true [-Wtautological-constant-out-of-range-compare]
                                  &arg_index)) != -1) {

Tested: make -C tools/testing/selftests TARGETS="net" run_tests

Fixes: 16e7812241 ("selftests/net: Add a test to validate behavior of rx timestamps")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27 12:56:58 -07:00
Paolo Bonzini
5e105c88ab KVM: nVMX: check for invalid hdr.vmx.flags
hdr.vmx.flags is meant for future extensions to the ABI, rejecting
invalid flags is necessary to avoid broken half-loads of the
nVMX state.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-27 09:04:50 -04:00
Paolo Bonzini
0f02bd0ade KVM: nVMX: check for required but missing VMCS12 in KVM_SET_NESTED_STATE
A missing VMCS12 was not causing -EINVAL (it was just read with
copy_from_user, so it is not a security issue, but it is still
wrong).  Test for VMCS12 validity and reject the nested state
if a VMCS12 is required but not present.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-27 09:04:49 -04:00
Paolo Bonzini
9319676595 selftests: kvm: do not set guest mode flag
Setting KVM_STATE_NESTED_GUEST_MODE enables various consistency checks
on VMCS12 and therefore causes KVM_SET_NESTED_STATE to fail spuriously
with -EINVAL.  Do not set the flag so that we're sure to cover the
conditions included by the test, and cover the case where VMCS12 is
set and KVM_SET_NESTED_STATE is called with invalid VMCS12 contents.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-27 09:04:48 -04:00
Linus Torvalds
1b64b2e244 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net into master
Pull networking fixes from David Miller:

 1) Fix RCU locaking in iwlwifi, from Johannes Berg.

 2) mt76 can access uninitialized NAPI struct, from Felix Fietkau.

 3) Fix race in updating pause settings in bnxt_en, from Vasundhara
    Volam.

 4) Propagate error return properly during unbind failures in ax88172a,
    from George Kennedy.

 5) Fix memleak in adf7242_probe, from Liu Jian.

 6) smc_drv_probe() can leak, from Wang Hai.

 7) Don't muck with the carrier state if register_netdevice() fails in
    the bonding driver, from Taehee Yoo.

 8) Fix memleak in dpaa_eth_probe, from Liu Jian.

 9) Need to check skb_put_padto() return value in hsr_fill_tag(), from
    Murali Karicheri.

10) Don't lose ionic RSS hash settings across FW update, from Shannon
    Nelson.

11) Fix clobbered SKB control block in act_ct, from Wen Xu.

12) Missing newlink in "tx_timeout" sysfs output, from Xiongfeng Wang.

13) IS_UDPLITE cleanup a long time ago, incorrectly handled
    transformations involving UDPLITE_RECV_CC. From Miaohe Lin.

14) Unbalanced locking in netdevsim, from Taehee Yoo.

15) Suppress false-positive error messages in qed driver, from Alexander
    Lobakin.

16) Out of bounds read in ax25_connect and ax25_sendmsg, from Peilin Ye.

17) Missing SKB release in cxgb4's uld_send(), from Navid Emamdoost.

18) Uninitialized value in geneve_changelink(), from Cong Wang.

19) Fix deadlock in xen-netfront, from Andera Righi.

19) flush_backlog() frees skbs with IRQs disabled, so should use
    dev_kfree_skb_irq() instead of kfree_skb(). From Subash Abhinov
    Kasiviswanathan.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
  drivers/net/wan: lapb: Corrected the usage of skb_cow
  dev: Defer free of skbs in flush_backlog
  qrtr: orphan socket in qrtr_release()
  xen-netfront: fix potential deadlock in xennet_remove()
  flow_offload: Move rhashtable inclusion to the source file
  geneve: fix an uninitialized value in geneve_changelink()
  bonding: check return value of register_netdevice() in bond_newlink()
  tcp: allow at most one TLP probe per flight
  AX.25: Prevent integer overflows in connect and sendmsg
  cxgb4: add missing release on skb in uld_send()
  net: atlantic: fix PTP on AQC10X
  AX.25: Prevent out-of-bounds read in ax25_sendmsg()
  sctp: shrink stream outq when fails to do addstream reconf
  sctp: shrink stream outq only when new outcnt < old outcnt
  AX.25: Fix out-of-bounds read in ax25_connect()
  enetc: Remove the mdio bus on PF probe bailout
  net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload
  net: ethernet: ave: Fix error returns in ave_init
  drivers/net/wan/x25_asy: Fix to make it work
  ipvs: fix the connection sync failed in some cases
  ...
2020-07-25 11:50:59 -07:00
Paolo Pisati
b346c0c858 selftest: txtimestamp: fix net ns entry logic
According to 'man 8 ip-netns', if `ip netns identify` returns an empty string,
there's no net namespace associated with current PID: fix the net ns entrance
logic.

Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 16:11:07 -07:00
Adrian Reber
1d27a0be16
selftests: add clone3() CAP_CHECKPOINT_RESTORE test
This adds a test that changes its UID, uses capabilities to
get CAP_CHECKPOINT_RESTORE and uses clone3() with set_tid to
create a process with a given PID as non-root.

Signed-off-by: Adrian Reber <areber@redhat.com>
Link: https://lore.kernel.org/r/20200719100418.2112740-8-areber@redhat.com
[christian.brauner@ubuntu.com: use TH_LOG() instead of ksft_print_msg()]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-07-20 15:47:42 +02:00
Linus Torvalds
721db9dfb1 powerpc fixes for 5.8 #7
A fix to the VAS code we merged this cycle, to report the proper error code to
 userspace for address translation failures. And a selftest update to match.
 
 Another fix for our pkey handling of PROT_EXEC mappings.
 
 A fix for a crash when booting a "secure VM" under an ultravisor with certain
 numbers of CPUs.
 
 Thanks to:
   Aneesh Kumar K.V, Haren Myneni, Laurent Dufour, Sandipan Das, Satheesh
   Rajendran, Thiago Jung Bauermann.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl8S7Q8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgA9WEAC580vTBCte54XEpRPfKLs0g/piM93z
 +rDDKFFkHrxI+EpySeg20jlDMc/3EoevLN6UNT5I2hrZ5QcNF17RPVmjRoHP0w2l
 ixy9B6Y+auFwos1yEec14ZLJ7iWsnko0SlgtWIAsn9r35hJYFtC3+ho3/XWO0lnF
 0jb31uZim4nFQvGSNwe3oZ3/rJsKwWtPZw0WznFr9GB4pMOnrspV/zx796RI9OKY
 khwm4y6pas5Duk9GUJPJjOIk4Mag4yLTXuhzJ5G5UeuUchZRxCTVcdnXEdGXeyte
 9ZJnRjbvbDjTM9qyk2lPSHv5zFHfEbglDkp2zoKX2Ie083LIcKlkwgeFvlBjhdxQ
 qwko27DXIZdmKTsSiFDODI0VlyK3ZHumCX/m2Ctg9/VmeSiYacebQjcS7DmAwQeE
 6h2bRL2TiTLRkgWiD4HOvHZZTy22pVgRcYe/pwGzMMXJW6SLQ9GUOhhar4XEYMgj
 pzn86uZRVJLf90qdUkI9sl8p/PthGlfehqHivfwgKYk/0H1AyGkChO3NB1mLiCfS
 WC+7J/lDIvtAMnC+536LqZT5l46ntt5RQ5tUcHfvn4bFoh5ndeav0Y9hXEXblyYI
 32lYj/paAmzR2kuHOzOQAa4hwy9rnKEQiGYsF1RcpMO5zdNupXl/EPY5WaKYyEx7
 p+eGalBNTf8zuw==
 =eEXz
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into master

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 5.8:

   - A fix to the VAS code we merged this cycle, to report the proper
     error code to userspace for address translation failures. And a
     selftest update to match.

   - Another fix for our pkey handling of PROT_EXEC mappings.

   - A fix for a crash when booting a "secure VM" under an ultravisor
     with certain numbers of CPUs.

  Thanks to: Aneesh Kumar K.V, Haren Myneni, Laurent Dufour, Sandipan
  Das, Satheesh Rajendran, Thiago Jung Bauermann"

* tag 'powerpc-5.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Use proper error code to check fault address
  powerpc/vas: Report proper error code for address translation failure
  powerpc/pseries/svm: Fix incorrect check for shared_lppaca_size
  powerpc/book3s64/pkeys: Fix pkey_access_permitted() for execute disable pkey
2020-07-18 10:45:17 -07:00
Paolo Pisati
aba69d49fb selftests: net: ip_defrag: modprobe missing nf_defrag_ipv6 support
Fix ip_defrag.sh when CONFIG_NF_DEFRAG_IPV6=m:

$ sudo ./ip_defrag.sh
+ set -e
+ mktemp -u XXXXXX
+ readonly NETNS=ns-rGlXcw
+ trap cleanup EXIT
+ setup
+ ip netns add ns-rGlXcw
+ ip -netns ns-rGlXcw link set lo up
+ ip netns exec ns-rGlXcw sysctl -w net.ipv4.ipfrag_high_thresh=9000000
+ ip netns exec ns-rGlXcw sysctl -w net.ipv4.ipfrag_low_thresh=7000000
+ ip netns exec ns-rGlXcw sysctl -w net.ipv4.ipfrag_time=1
+ ip netns exec ns-rGlXcw sysctl -w net.ipv6.ip6frag_high_thresh=9000000
+ ip netns exec ns-rGlXcw sysctl -w net.ipv6.ip6frag_low_thresh=7000000
+ ip netns exec ns-rGlXcw sysctl -w net.ipv6.ip6frag_time=1
+ ip netns exec ns-rGlXcw sysctl -w net.netfilter.nf_conntrack_frag6_high_thresh=9000000
+ cleanup
+ ip netns del ns-rGlXcw

$ ls -la /proc/sys/net/netfilter/nf_conntrack_frag6_high_thresh
ls: cannot access '/proc/sys/net/netfilter/nf_conntrack_frag6_high_thresh': No such file or directory

$ sudo modprobe nf_defrag_ipv6
$ ls -la /proc/sys/net/netfilter/nf_conntrack_frag6_high_thresh
-rw-r--r-- 1 root root 0 Jul 14 12:34 /proc/sys/net/netfilter/nf_conntrack_frag6_high_thresh

Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 12:49:18 -07:00
Haren Myneni
f0479c4bcb selftests/powerpc: Use proper error code to check fault address
ERR_NX_TRANSLATION(CSB.CC=5) is for internal to VAS for fault handling
and should not used by OS. ERR_NX_AT_FAULT(CSB.CC=250) is the proper
error code should be reported by OS when NX encounters address
translation failure.

This patch uses CC=250 to determine the fault address when the request
is not successful.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0315251705baff94f678c33178491b5008723511.camel@linux.ibm.com
2020-07-15 23:10:17 +10:00
Sargun Dhillon
c97aedc52d selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
Test whether we can add file descriptors in response to notifications.
This injects the file descriptors via notifications, and then uses kcmp
to determine whether or not it has been successful.

It also includes some basic sanity checking for arguments.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Palmer <palmer@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Robert Sesek <rsesek@google.com>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Matt Denton <mpdenton@google.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/20200603011044.7972-5-sargun@sargun.me
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-14 16:30:22 -07:00
Paolo Pisati
651149f603 selftests: fib_nexthop_multiprefix: fix cleanup() netns deletion
During setup():
...
        for ns in h0 r1 h1 h2 h3
        do
                create_ns ${ns}
        done
...

while in cleanup():
...
        for n in h1 r1 h2 h3 h4
        do
                ip netns del ${n} 2>/dev/null
        done
...

and after removing the stderr redirection in cleanup():

$ sudo ./fib_nexthop_multiprefix.sh
...
TEST: IPv4: host 0 to host 3, mtu 1400                              [ OK ]
TEST: IPv6: host 0 to host 3, mtu 1400                              [ OK ]
Cannot remove namespace file "/run/netns/h4": No such file or directory
$ echo $?
1

and a non-zero return code, make kselftests fail (even if the test
itself is fine):

...
not ok 34 selftests: net: fib_nexthop_multiprefix.sh # exit=1
...

Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14 15:06:12 -07:00
Linus Torvalds
5a764898af Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking
    BPF programs, from Maciej Żenczykowski.

 2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu
    Mariappan.

 3) Slay memory leak in nl80211 bss color attribute parsing code, from
    Luca Coelho.

 4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin.

 5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric
    Dumazet.

 6) xsk code dips too deeply into DMA mapping implementation internals.
    Add dma_need_sync and use it. From Christoph Hellwig

 7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko.

 8) Check for disallowed attributes when loading flow dissector BPF
    programs. From Lorenz Bauer.

 9) Correct packet injection to L3 tunnel devices via AF_PACKET, from
    Jason A. Donenfeld.

10) Don't advertise checksum offload on ipa devices that don't support
    it. From Alex Elder.

11) Resolve several issues in TCP MD5 signature support. Missing memory
    barriers, bogus options emitted when using syncookies, and failure
    to allow md5 key changes in established states. All from Eric
    Dumazet.

12) Fix interface leak in hsr code, from Taehee Yoo.

13) VF reset fixes in hns3 driver, from Huazhong Tan.

14) Make loopback work again with ipv6 anycast, from David Ahern.

15) Fix TX starvation under high load in fec driver, from Tobias
    Waldekranz.

16) MLD2 payload lengths not checked properly in bridge multicast code,
    from Linus Lüssing.

17) Packet scheduler code that wants to find the inner protocol
    currently only works for one level of VLAN encapsulation. Allow
    Q-in-Q situations to work properly here, from Toke
    Høiland-Jørgensen.

18) Fix route leak in l2tp, from Xin Long.

19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport
    support and various protocols. From Martin KaFai Lau.

20) Fix socket cgroup v2 reference counting in some situations, from
    Cong Wang.

21) Cure memory leak in mlx5 connection tracking offload support, from
    Eli Britstein.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  mlxsw: pci: Fix use-after-free in case of failed devlink reload
  mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()
  net: macb: fix call to pm_runtime in the suspend/resume functions
  net: macb: fix macb_suspend() by removing call to netif_carrier_off()
  net: macb: fix macb_get/set_wol() when moving to phylink
  net: macb: mark device wake capable when "magic-packet" property present
  net: macb: fix wakeup test in runtime suspend/resume routines
  bnxt_en: fix NULL dereference in case SR-IOV configuration fails
  libbpf: Fix libbpf hashmap on (I)LP32 architectures
  net/mlx5e: CT: Fix memory leak in cleanup
  net/mlx5e: Fix port buffers cell size value
  net/mlx5e: Fix 50G per lane indication
  net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
  net/mlx5e: Fix VXLAN configuration restore after function reload
  net/mlx5e: Fix usage of rcu-protected pointer
  net/mxl5e: Verify that rpriv is not NULL
  net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode
  net/mlx5: Fix eeprom support for SFP module
  cgroup: Fix sock_cgroup_data on big-endian.
  selftests: bpf: Fix detach from sockmap tests
  ...
2020-07-10 18:16:22 -07:00
Jean-Philippe Brucker
55b244221c selftests/bpf: Fix cgroup sockopt verifier test
Since the BPF_PROG_TYPE_CGROUP_SOCKOPT verifier test does not set an
attach type, bpf_prog_load_check_attach() disallows loading the program
and the test is always skipped:

 #434/p perfevent for cgroup sockopt SKIP (unsupported program type 25)

Fix the issue by setting a valid attach type.

Fixes: 0456ea170c ("bpf: Enable more helpers for BPF_PROG_TYPE_CGROUP_{DEVICE,SYSCTL,SOCKOPT}")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200710150439.126627-1-jean-philippe@linaro.org
2020-07-11 01:32:15 +02:00
Kees Cook
11eb004ef7 selftests/seccomp: Check ENOSYS under tracing
There should be no difference between -1 and other negative syscalls
while tracing.

Cc: Keno Fischer <keno@juliacomputing.com>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Kees Cook
adeeec8472 selftests/seccomp: Refactor to use fixture variants
Now that the selftest harness has variants, use them to eliminate a
bunch of copy/paste duplication.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Kees Cook
9d1587adcc selftests/harness: Clean up kern-doc for fixtures
The FIXTURE*() macro kern-doc examples had the wrong names for the C code
examples associated with them. Fix those and clarify that FIXTURE_DATA()
usage should be avoided.

Cc: Shuah Khan <shuah@kernel.org>
Fixes: 74bc7c97fa ("kselftest: add fixture variants")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Kees Cook
47e33c05f9 seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
When SECCOMP_IOCTL_NOTIF_ID_VALID was first introduced it had the wrong
direction flag set. While this isn't a big deal as nothing currently
enforces these bits in the kernel, it should be defined correctly. Fix
the define and provide support for the old command until it is no longer
needed for backward compatibility.

Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Kees Cook
279ed89000 selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
The user_trap_syscall() helper creates a filter with
SECCOMP_RET_USER_NOTIF. To avoid confusion with SECCOMP_RET_TRAP, rename
the helper to user_notif_syscall().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Cc: linux-kselftest@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Kees Cook
cf8918dba2 selftests/seccomp: Make kcmp() less required
The seccomp tests are a bit noisy without CONFIG_CHECKPOINT_RESTORE (due
to missing the kcmp() syscall). The seccomp tests are more accurate with
kcmp(), but it's not strictly required. Refactor the tests to use
alternatives (comparing fd numbers), and provide a central test for
kcmp() so there is a single SKIP instead of many. Continue to produce
warnings for the other tests, though.

Additionally adds some more bad flag EINVAL tests to the addfd selftest.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Cc: linux-kselftest@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Kees Cook
81a0c8bc82 selftests/seccomp: Improve calibration loop
The seccomp benchmark calibration loop did not need to take so long.
Instead, use a simple 1 second timeout and multiply up to target. It
does not need to be accurate.

Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Thadeu Lima de Souza Cascardo
bc32c9c865 selftests/seccomp: use 90s as timeout
As seccomp_benchmark tries to calibrate how many samples will take more
than 5 seconds to execute, it may end up picking up a number of samples
that take 10 (but up to 12) seconds. As the calibration will take double
that time, it takes around 20 seconds. Then, it executes the whole thing
again, and then once more, with some added overhead. So, the thing might
take more than 40 seconds, which is too close to the 45s timeout.

That is very dependent on the system where it's executed, so may not be
observed always, but it has been observed on x86 VMs. Using a 90s timeout
seems safe enough.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Link: https://lore.kernel.org/r/20200601123202.1183526-1-cascardo@canonical.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Kees Cook
d3a37ea9f6 selftests/seccomp: Expand benchmark to per-filter measurements
It's useful to see how much (at a minimum) each filter adds to the
syscall overhead. Add additional calculations.

Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Christian Brauner
ad5682184a selftests/seccomp: Check for EPOLLHUP for user_notif
This verifies we're correctly notified when a seccomp filter becomes
unused when a notifier is in use.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200531115031.391515-4-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:51 -07:00
Kees Cook
e4d05028a0 selftests/seccomp: Set NNP for TSYNC ESRCH flag test
The TSYNC ESRCH flag test will fail for regular users because NNP was
not set yet. Add NNP setting.

Fixes: 51891498f2 ("seccomp: allow TSYNC and USER_NOTIF together")
Cc: stable@vger.kernel.org
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:46 -07:00
Kees Cook
d7d2e5bb9f selftests/seccomp: Add SKIPs for failed unshare()
Running the seccomp tests as a regular user shouldn't just fail tests
that require CAP_SYS_ADMIN (for getting a PID namespace). Instead,
detect those cases and SKIP them. Additionally, gracefully SKIP missing
CONFIG_USER_NS (and add to "config" since we'd prefer to actually test
this case).

Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:45 -07:00
Kees Cook
8b1bc88c3c selftests/seccomp: Rename XFAIL to SKIP
The kselftests will be renaming XFAIL to SKIP in the test harness, and
to avoid painful conflicts, rename XFAIL to SKIP now in a future-proofed
way.

Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:42 -07:00
Linus Torvalds
0f318cba1e linux-kselftest-fixes-5.8-rc5
This Kselftest fixes update for Linux 5.8-rc5 consists of tmp2 test
 changes to run on python3 and kselftest framework fix to incorrect
 return type.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl8ImAcACgkQCwJExA0N
 QxwEPg//fYS/ilgwYRn+Zc9fV/JPGYBfJKN1DdAua0fRgXjcz2z3dMON7Tt/HBNY
 KZDCDJMY4SSPBAdHRt/2duQx21C0i8zVcCBsPgZPhJdPF5tUIbD4oM0qCxrto1hB
 Wss1UHWmobB+BPI7mJSPFirYXyVqVXcpdYJmKkRYbpJTFRS2t1g5jFczZVgEbeEv
 7lEY4yRi2hlGsCR4J+VqIUzyY+enc0vSYNaSaH8d4P1cC8h5Dl7MoqSIJshrNcDe
 kHz5l5R7cQSOQhsSIwvRuEolkmjyB0ejzuUsAQ6fwRKGMTVWlY+2fWcu5dNelYMm
 C3BGyVcAPXsrMfX6Vw4AFRd1a/7XiuNOFwH4PkQeomfAh2Ml9m1pO3fdqWQlbnHu
 VpkIbPaiTyarEaIYvF6UnxUTB+hNzHxw8Ac6TvwP3xz26k7U4X4ue75s03DVA7qx
 63EsxY+gtAhD0ATUSpHq65ZgUSnQh5OAk0UB24h7kzxaE5EZ8gU8ifYqBPGiEot2
 3KQ1QLZdCTmuTxsjI+OdXdOAnFuhqNtzk69CCS43YwDJitBLG6kz+aFgVWDZxkx1
 MY5msDi/ROyymiWEUCSZhE70qIhIU/tDKQ9+QeL4VqEhzfcGrgePtV+dsp6seKCl
 Swi2PY2zoCdq6gZshiiIQln/M5XaW8dPBvz+wKaRo/sMzwts18M=
 =pKoZ
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-fixes-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "TPM2 test changes to run on python3 and kselftest framework fix to
  incorrect return type"

* tag 'linux-kselftest-fixes-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kselftest: ksft_test_num return type should be unsigned
  selftests: tpm: upgrade TPM2 tests from Python 2 to Python 3
2020-07-10 10:15:37 -07:00
Lorenz Bauer
f43cb0d672 selftests: bpf: Fix detach from sockmap tests
Fix sockmap tests which rely on old bpf_prog_dispatch behaviour.
In the first case, the tests check that detaching without giving
a program succeeds. Since these are not the desired semantics,
invert the condition. In the second case, the clean up code doesn't
supply the necessary program fds.

Fixes: bb0de3131f ("bpf: sockmap: Require attach_bpf_fd when detaching a program")
Reported-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200709115151.75829-1-lmb@cloudflare.com
2020-07-09 23:41:37 +02:00
Linus Torvalds
ce69fb3b39 Refactor kallsyms_show_value() users for correct cred
Several users of kallsyms_show_value() were performing checks not
 during "open". Refactor everything needed to gain proper checks against
 file->f_cred for modules, kprobes, and bpf.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8GUbMWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJohnD/9VsPsAMV+8lhsPvkcuW/DkTcAY
 qUEzsXU3v06gJ0Z/1lBKtisJ6XmD93wWcZCTFvJ0S8vR3yLZvOVfToVjCMO32Trc
 4ZkWTPwpvfeLug6T6CcI2ukQdZ/opI1cSabqGl79arSBgE/tsghwrHuJ8Exkz4uq
 0b7i8nZa+RiTezwx4EVeGcg6Dv1tG5UTG2VQvD/+QGGKneBlrlaKlI885N/6jsHa
 KxvB7+8ES1pnfGYZenx+RxMdljNrtyptbQEU8gyvoV5YR7635gjZsVsPwWANJo+4
 EGcFFpwWOAcVQaC3dareLTM8nVngU6Wl3Rd7JjZtjvtZba8DdCn669R34zDGXbiP
 +1n1dYYMSMBeqVUbAQfQyLD0pqMIHdwQj2TN8thSGccr2o3gNk6AXgYq0aYm8IBf
 xDCvAansJw9WqmxErIIsD4BFkMqF7MjH3eYZxwCPWSrKGDvKxQSPV5FarnpDC9U7
 dYCWVxNPmtn+unC/53yXjEcBepKaYgNR7j5G7uOfkHvU43Bd5demzLiVJ10D8abJ
 ezyErxxEqX2Gr7JR2fWv7iBbULJViqcAnYjdl0y0NgK/hftt98iuge6cZmt1z6ai
 24vI3X4VhvvVN5/f64cFDAdYtMRUtOo2dmxdXMid1NI07Mj2qFU1MUwb8RHHlxbK
 8UegV2zcrBghnVuMkw==
 =ib5Q
 -----END PGP SIGNATURE-----

Merge tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kallsyms fix from Kees Cook:
 "Refactor kallsyms_show_value() users for correct cred.

  I'm not delighted by the timing of getting these changes to you, but
  it does fix a handful of kernel address exposures, and no one has
  screamed yet at the patches.

  Several users of kallsyms_show_value() were performing checks not
  during "open". Refactor everything needed to gain proper checks
  against file->f_cred for modules, kprobes, and bpf"

* tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests: kmod: Add module address visibility test
  bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()
  kprobes: Do not expose probe addresses to non-CAP_SYSLOG
  module: Do not expose section addresses to non-CAP_SYSLOG
  module: Refactor section attr into bin attribute
  kallsyms: Refactor kallsyms_show_value() to take cred
2020-07-09 13:09:30 -07:00
Kees Cook
2c79583927 selftests: kmod: Add module address visibility test
Make sure we don't regress the CAP_SYSLOG behavior of the module address
visibility via /proc/modules nor /sys/module/*/sections/*.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08 16:01:36 -07:00
Christian Brauner
55d9ad97e4
tests: add CLONE_NEWTIME setns tests
Now that pidfds support CLONE_NEWTIME as well enable testing them in the
setns() testuite.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20200706154912.3248030-5-christian.brauner@ubuntu.com
2020-07-08 11:14:22 +02:00
Paolo Bonzini
3c01655ac8 kselftest: ksft_test_num return type should be unsigned
Fixes a compiler warning:

In file included from sync_test.c:37:
../kselftest.h: In function ‘ksft_print_cnts’:
../kselftest.h:78:16: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
  if (ksft_plan != ksft_test_num())
                ^~

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-07-06 15:07:47 -06:00