Include <linux/limits.h> like some of uapi/linux/netfilter/xt_*.h
headers do to fix the following linux/netfilter/xt_hashlimit.h
userspace compilation error:
/usr/include/linux/netfilter/xt_hashlimit.h:90:12: error: 'NAME_MAX' undeclared here (not in a function)
char name[NAME_MAX];
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 63159f5dcc ("uapi: Use __kernel_long_t in struct mq_attr")
changed the types from long to __kernel_long_t, but didn't add a
linux/types.h include. Code that tries to include this header directly
breaks:
/usr/include/linux/mqueue.h:26:2: error: unknown type name '__kernel_long_t'
__kernel_long_t mq_flags; /* message queue flags */
This also upsets configure tests for this header:
checking linux/mqueue.h usability... no
checking linux/mqueue.h presence... yes
configure: WARNING: linux/mqueue.h: present but cannot be compiled
configure: WARNING: linux/mqueue.h: check for missing prerequisite headers?
configure: WARNING: linux/mqueue.h: see the Autoconf documentation
configure: WARNING: linux/mqueue.h: section "Present But Cannot Be Compiled"
configure: WARNING: linux/mqueue.h: proceeding with the compiler's result
checking for linux/mqueue.h... no
Link: http://lkml.kernel.org/r/20170119194644.4403-1-vapier@gentoo.org
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow userfaultfd monitor track termination of the processes that have
memory backed by the uffd.
[rppt@linux.vnet.ibm.com: add comment]
Link: http://lkml.kernel.org/r/20170202135448.GB19804@rapoport-lnxLink: http://lkml.kernel.org/r/1485542673-24387-4-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a non-cooperative userfaultfd monitor copies pages in the
background, it may encounter regions that were already unmapped.
Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
changes in the virtual memory layout.
Since there might be different uffd contexts for the affected VMAs, we
first should create a temporary representation for the unmap event for
each uffd context and then notify them one by one to the appropriate
userfault file descriptors.
The event notification occurs after the mmap_sem has been released.
[arnd@arndb.de: fix nommu build]
Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
[mhocko@suse.com: fix nommu build]
Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "userfaultfd: non-cooperative: add madvise() event for
MADV_REMOVE request".
These patches add notification of madvise(MADV_REMOVE) event to
non-cooperative userfaultfd monitor.
The first pacth renames EVENT_MADVDONTNEED to EVENT_REMOVE along with
relevant functions and structures. Using _REMOVE instead of
_MADVDONTNEED describes the event semantics more clearly and I hope it's
not too late for such change in the ABI.
This patch (of 3):
The UFFD_EVENT_MADVDONTNEED purpose is to notify uffd monitor about
removal of certain range from address space tracked by userfaultfd.
Hence, UFFD_EVENT_REMOVE seems to better reflect the operation
semantics. Respectively, 'madv_dn' field of uffd_msg is renamed to
'remove' and the madvise_userfault_dontneed callback is renamed to
userfaultfd_remove.
Link: http://lkml.kernel.org/r/1484814154-1557-2-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull namespace updates from Eric Biederman:
"There is a lot here. A lot of these changes result in subtle user
visible differences in kernel behavior. I don't expect anything will
care but I will revert/fix things immediately if any regressions show
up.
From Seth Forshee there is a continuation of the work to make the vfs
ready for unpriviled mounts. We had thought the previous changes
prevented the creation of files outside of s_user_ns of a filesystem,
but it turns we missed the O_CREAT path. Ooops.
Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
children that are forked after the prctl are considered and not
children forked before the prctl. The only known user of this prctl
systemd forks all children after the prctl. So no userspace
regressions will occur. Holding earlier forked children to the same
rules as later forked children creates a semantic that is sane enough
to allow checkpoing of processes that use this feature.
There is a long delayed change by Nikolay Borisov to limit inotify
instances inside a user namespace.
Michael Kerrisk extends the API for files used to maniuplate
namespaces with two new trivial ioctls to allow discovery of the
hierachy and properties of namespaces.
Konstantin Khlebnikov with the help of Al Viro adds code that when a
network namespace exits purges it's sysctl entries from the dcache. As
in some circumstances this could use a lot of memory.
Vivek Goyal fixed a bug with stacked filesystems where the permissions
on the wrong inode were being checked.
I continue previous work on ptracing across exec. Allowing a file to
be setuid across exec while being ptraced if the tracer has enough
credentials in the user namespace, and if the process has CAP_SETUID
in it's own namespace. Proc files for setuid or otherwise undumpable
executables are now owned by the root in the user namespace of their
mm. Allowing debugging of setuid applications in containers to work
better.
A bug I introduced with permission checking and automount is now
fixed. The big change is to mark the mounts that the kernel initiates
as a result of an automount. This allows the permission checks in sget
to be safely suppressed for this kind of mount. As the permission
check happened when the original filesystem was mounted.
Finally a special case in the mount namespace is removed preventing
unbounded chains in the mount hash table, and making the semantics
simpler which benefits CRIU.
The vfs fix along with related work in ima and evm I believe makes us
ready to finish developing and merge fully unprivileged mounts of the
fuse filesystem. The cleanups of the mount namespace makes discussing
how to fix the worst case complexity of umount. The stacked filesystem
fixes pave the way for adding multiple mappings for the filesystem
uids so that efficient and safer containers can be implemented"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc/sysctl: Don't grab i_lock under sysctl_lock.
vfs: Use upper filesystem inode in bprm_fill_uid()
proc/sysctl: prune stale dentries during unregistering
mnt: Tuck mounts under others instead of creating shadow/side mounts.
prctl: propagate has_child_subreaper flag to every descendant
introduce the walk_process_tree() helper
nsfs: Add an ioctl() to return owner UID of a userns
fs: Better permission checking for submounts
exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
vfs: open() with O_CREAT should not create inodes with unknown ids
nsfs: Add an ioctl() to return the namespace type
proc: Better ownership of files for non-dumpable tasks in user namespaces
exec: Remove LSM_UNSAFE_PTRACE_CAP
exec: Test the ptracer's saved cred to see if the tracee can gain caps
exec: Don't reset euid and egid when the tracee has CAP_SETUID
inotify: Convert to using per-namespace limits
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYr5aeAAoJEAx081l5xIa+ZK4P/RD3XUsduYqziVFCRQ2n0X8r
+D92F4peTnSeSq7ZcZvprv+fezUGAHbfsWFs8feYCI5quUO6pEQSPwN+wyGazUi0
4hUVB/K9Iq7U/Bj7Z/SmsU3NuWJnkNqbmvSFvUdqYK9D/kl+Tnllzap2N4cTzjwu
GZOObz4n85cx94NqC3qw+7/ptL1X2MhXa+z0MzbkKyas84Bko1LwCSHRHsDKUnJc
IcSpOcYZ6pSRMIsKH4Kd79Go4vWm7djXT9XL3PwDk2NcXXUOuR+cfdHqYchYaM/O
iD2hvaSywBcflxSAml5x6vlXraoRd91ZZulgOObXtFfnUXdZB81TVq4uv6LU4Bx3
jLFixUZuk/TJT+W/8N10l7M6yMIFaTpNoNMc5n4IF5RNNyWba4BKnrI+f+lQiOpY
mmjIaidb0t5BICnJzCD264RhCEXmP0HaDV+iQQV6y6jJRXfd1bgnOXLKP73JekzB
TsbDshCoE7UO0dJ7n0LFpXSTQDTYzlazoEp14f2kFBxir5/l7r67nUlnDTvUQfuN
tSRvpN/s0wqvH3o7zhmpHxyJ/ZasPMQjNCFAuUEbx8L5SKXsua0FubIzN4aVpilb
XvfdFRWM/lkOT/q+8cGI/TcE3YTqEmALmGxdV/akbdNCiCg6aClyCLRE/DZhgmSQ
UMFjr9wlHl5Qo/OqLKj0
=Yjfg
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main drm pull request for v4.11.
Nothing too major, the tinydrm and mmu-less support should make
writing smaller drivers easier for some of the simpler platforms, and
there are a bunch of documentation updates.
Intel grew displayport MST audio support which is hopefully useful to
people, and FBC is on by default for GEN9+ (so people know where to
look for regressions). AMDGPU has a lot of fixes that would like new
firmware files installed for some GPUs.
Other than that it's pretty scattered all over.
I may have a follow up pull request as I know BenH has a bunch of AST
rework and fixes and I'd like to get those in once they've been tested
by AST, and I've got at least one pull request I'm just trying to get
the author to fix up.
Core:
- drm_mm reworked
- Connector list locking and iterators
- Documentation updates
- Format handling rework
- MMU-less support for fbdev helpers
- drm_crtc_from_index helper
- Core CRC API
- Remove drm_framebuffer_unregister_private
- Debugfs cleanup
- EDID/Infoframe fixes
- Release callback
- Tinydrm support (smaller drivers for simple hw)
panel:
- Add support for some new simple panels
i915:
- FBC by default for gen9+
- Shared dpll cleanups and docs
- GEN8 powerdomain cleanup
- DMC support on GLK
- DP MST audio support
- HuC loading support
- GVT init ordering fixes
- GVT IOMMU workaround fix
amdgpu/radeon:
- Power/clockgating improvements
- Preliminary SR-IOV support
- TTM buffer priority and eviction fixes
- SI DPM quirks removed due to firmware fixes
- Powerplay improvements
- VCE/UVD powergating fixes
- Cleanup SI GFX code to match CI/VI
- Support for > 2 displays on 3/5 crtc asics
- SI headless fixes
nouveau:
- Rework securre boot code in prep for GP10x secure boot
- Channel recovery improvements
- Initial power budget code
- MMU rework preperation
vmwgfx:
- Bunch of fixes and cleanups
exynos:
- Runtime PM support for MIC driver
- Cleanups to use atomic helpers
- UHD Support for TM2/TM2E boards
- Trigger mode fix for Rinato board
etnaviv:
- Shader performance fix
- Command stream validator fixes
- Command buffer suballocator
rockchip:
- CDN DisplayPort support
- IOMMU support for arm64 platform
imx-drm:
- Fix i.MX5 TV encoder probing
- Remove lower fb size limits
msm:
- Support for HW cursor on MDP5 devices
- DSI encoder cleanup
- GPU DT bindings cleanup
sti:
- stih410 cleanups
- Create fbdev at binding
- HQVDP fixes
- Remove stih416 chip functionality
- DVI/HDMI mode selection fixes
- FPS statistic reporting
omapdrm:
- IRQ code cleanup
dwi-hdmi bridge:
- Cleanups and fixes
adv-bridge:
- Updates for nexus
sii8520 bridge:
- Add interlace mode support
- Rework HDMI and lots of fixes
qxl:
- probing/teardown cleanups
ZTE drm:
- HDMI audio via SPDIF interface
- Video Layer overlay plane support
- Add TV encoder output device
atmel-hlcdc:
- Rework fbdev creation logic
tegra:
- OF node fix
fsl-dcu:
- Minor fixes
mali-dp:
- Assorted fixes
sunxi:
- Minor fix"
[ This was the "fixed" pull, that still had build warnings due to people
not even having build tested the result. I'm not a happy camper
I've fixed the things I noticed up in this merge. - Linus ]
* tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux: (1177 commits)
lib/Kconfig: make PRIME_NUMBERS not user selectable
drm/tinydrm: helpers: Properly fix backlight dependency
drm/tinydrm: mipi-dbi: Fix field width specifier warning
drm/tinydrm: mipi-dbi: Silence: ‘cmd’ may be used uninitialized
drm/sti: fix build warnings in sti_drv.c and sti_vtg.c files
drm/amd/powerplay: fix PSI feature on Polars12
drm/amdgpu: refuse to reserve io mem for split VRAM buffers
drm/ttm: fix use-after-free races in vm fault handling
drm/tinydrm: Add support for Multi-Inno MI0283QT display
dt-bindings: Add Multi-Inno MI0283QT binding
dt-bindings: display/panel: Add common rotation property
of: Add vendor prefix for Multi-Inno
drm/tinydrm: Add MIPI DBI support
drm/tinydrm: Add helper functions
drm: Add DRM support for tiny LCD displays
drm/amd/amdgpu: post card if there is real hw resetting performed
drm/nouveau/tmr: provide backtrace when a timeout is hit
drm/nouveau/pci/g92: Fix rearm
drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios
drm/nouveau/hwmon: expose power_max and power_crit
..
linux/netfilter.h is the last uapi header file that includes
linux/sysctl.h but it does not depend on definitions provided
by this essentially dead header file.
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYrvYlAAoJEFmIoMA60/r8FuQQAMDpia3kacyCAJpa+zjmyMNF
1slytaoIvP37dFq9XF1em031lwGNr5sahZ7nP1EKgALz4odZUzait7BUABcfviIn
Uesz2E1s/miMo4/0X1j9DqY9xV649DmmSIgk1yn3kvCkH/+Ix27dexu47auGzPEb
H/sEfd1RZidjZ5EWaG0ww5FrHcuge+JHtcH6vFQtWsTOspcx++IhaVIGjC0JCpqK
DnlQKilsJ38KUkvuDcxWjtFKxAc8De9jvCR4kX96OvbHahfAWwBO4AtUv7U3JpJN
2nyQk+I5kRagbfBucaXZISUtWM7h4peLiL+TGkvKg8eOVlOCedjYlrZW4SWkbAN+
0qwcHRQ8lwhNmgp3VYq7pmnugIvW4P2Fh3uqaplCAIwlpODxWPDQP7HLM2kyzmvq
gPGi0R4Yo2PdIXqfbilrzbFVeyqkIFECr287a6+5PekC0DxsqZvOG0uA1mWKLIaH
pRQMT0FO2SCCSOpcxRExeIj+XxhXlDVOrIBP6eMiFXAMgzUAyU8fLSZVMtXAvsTS
02hVDOc/Fq2jKlCSoJRIiRp5aj1QDFS/DjBhOnW7pXuvUTCrfYBXY5NCdT9UV3Q7
W6qHWkizRmRDGxUzqSODRt5aU7VOKbWvZnp10eJyKt5s2Iawe6We5V1NX+u18UIS
Scc1nbuPTL6u1n8PsaBG
=4Owc
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add ASPM L1 substate support
- enable PCIe Extended Tags when supported
- configure PCIe MPS settings on iProc, Versatile, X-Gene, and Xilinx
- increase VPD access timeout
- add ACS quirks for Intel Union Point, Qualcomm QDF2400 and QDF2432
- use new pci_irq_alloc_vectors() in more drivers
- fix MSI affinity memory leak
- remove unused MSI interfaces and update documentation
- remove unused AER .link_reset() callback
- avoid pci_lock / p->pi_lock deadlock seen with perf
- serialize sysfs enable/disable num_vfs operations
- move DesignWare IP from drivers/pci/host/ to drivers/pci/dwc/ and
refactor so we can support both hosts and endpoints
- add DT ECAM-like support for HiSilicon Hip06/Hip07 controllers
- add Rockchip system power management support
- add Thunder-X cn81xx and cn83xx support
- add Exynos 5440 PCIe PHY support
* tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (93 commits)
PCI: dwc: Remove dependency of designware on CONFIG_PCI
PCI: dwc: Add CONFIG_PCIE_DW_HOST to enable PCI dwc host
PCI: dwc: Split pcie-designware.c into host and core files
PCI: dwc: designware: Fix style errors in pcie-designware.c
PCI: dwc: designware: Parse "num-lanes" property in dw_pcie_setup_rc()
PCI: dwc: all: Split struct pcie_port into host-only and core structures
PCI: dwc: designware: Get device pointer at the start of dw_pcie_host_init()
PCI: dwc: all: Rename cfg_read/cfg_write to read/write
PCI: dwc: all: Use platform_set_drvdata() to save private data
PCI: dwc: designware: Move register defines to designware header file
PCI: dwc: Use PTR_ERR_OR_ZERO to simplify code
PCI: dra7xx: Group PHY API invocations
PCI: dra7xx: Enable MSI and legacy interrupts simultaneously
PCI: dra7xx: Add support to force RC to work in GEN1 mode
PCI: dra7xx: Simplify probe code with devm_gpiod_get_optional()
PCI: Move DesignWare IP support to new drivers/pci/dwc/ directory
PCI: exynos: Support the PHY generic framework
Documentation: binding: Modify the exynos5440 PCIe binding
phy: phy-exynos-pcie: Add support for Exynos PCIe PHY
Documentation: samsung-phy: Add exynos-pcie-phy binding
...
Pull networking fixes from David Miller:
1) Some 'const'ing in qlogic networking drivers, from Bhumika Goyal.
2) Fix scheduling while atomic in l2tp network namespace exit by
deferring the work to the workqueue. From Ridge Kennedy.
3) Fix use after free in dccp timewait handling, from Andrey Ryabinin.
4) mlx5e CQE compression engine not initialized properly, from Tariq
Toukan.
5) Some UAPI header fixes from Dmitry V. Levin.
6) Don't overwrite module parameter value in mlx4 driver, from Majd
Dibbiny.
7) Fix divide by zero in xt_hashlimit netfilter module, from Alban
Browaeys.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
bpf: Fix bpf_xdp_event_output
net/mlx4_en: Use __skb_fill_page_desc()
net/mlx4_core: Use cq quota in SRIOV when creating completion EQs
net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
net/mlx4: Spoofcheck and zero MAC can't coexist
net/mlx4: Change ENOTSUPP to EOPNOTSUPP
uapi: fix linux/rds.h userspace compilation errors
uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors
lib: Remove string from parman config selection
forcedeth: Remove return from a void function
bpf: fix spelling mistake: "proccessed" -> "processed"
uapi: fix linux/llc.h userspace compilation error
uapi: fix linux/ip6_tunnel.h userspace compilation errors
net/mlx5e: Fix wrong CQE decompression
net/mlx5e: Update MPWQE stride size when modifying CQE compress state
net/mlx5e: Fix broken CQE compression initialization
net/mlx5e: Do not reduce LRO WQE size when not using build_skb
net/mlx5e: Register/unregister vport representors on interface attach/detach
net/mlx5e: s390 system compilation fix
tcp: account for ts offset only if tsecr not zero
...
Because the Mellanox code required being based on a net-next tree,
I keept it separate from the remainder of the RDMA stack submission
that is based on 4.10-rc3.
This branch contains:
- Various mlx4 and mlx5 fixes and minor changes
- Support for adding a tag match rule to flow specs
- Support for cvlan offload operation for raw ethernet QPs
- A change to the core IB code to recognize raw eth capabilities and
enumerate them (touches non-Mellanox code)
- Implicit On-Demand Paging memory registration support
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYrx+WAAoJELgmozMOVy/du70P/1kpW2xY9Le04c3K7na2XOYl
AUVIDrW/8Go63tpOaM7jBT3k4GlwVFr3IOmBpS24KbW/THxjhyUeP5L5+z2x+go+
jkQOgtPWWEHr5zP3MzsNyB8fDx1YQOnJwEXxybQRW/cbw4CLjnhP+ezd6FdV/3Yy
pPEqDVlAErzvNweG+n2r1pjcUbR8uneC3inyMLnyzUBz4CHKmC8fgD3/qJIM+DNb
gtFT5xHFIXKCigWdQ/EwsTDcHub43V8OXlI5sO7loG6vToOUATMkjI4oOUNhDmYS
X7XLN3yRK9QHEfb5kutXIZEWzTGh7LiFtUYGaNNYqqzDfSiMRc9NC5kTOfplEXDV
Uo+AGb6Fh1zYIOzNk7o+tazIv3LaLv6+Fcm+9bbe0VUIqasaylsePqaTwMuIzx/I
xP5nitmd5lbYo8WdlasVdG6mH1DlJEUbU30v4DpmTpxCP6jGpog7lexyGyF3TgzS
NhnG0IiIClWh3WQ2/GdsFK/obIdFkpLeASli1hwD81vzPfly9zc2YpgqydZI3WCr
q6hTXYnANcP6+eciCpQPO7giRdXdiKey08Uoq/2jxb7Qbm4daG6UwopjvH9/lm1F
m6UDaDvzNYm+Rx+bL/+KSx9JO9+fJB1L51yCmvLGpWi6yJI4ZTfanHNMBsCua46N
Kev/DSpIAzX1WOBkte+a
=rspQ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull Mellanox rdma updates from Doug Ledford:
"Mellanox specific updates for 4.11 merge window
Because the Mellanox code required being based on a net-next tree, I
keept it separate from the remainder of the RDMA stack submission that
is based on 4.10-rc3.
This branch contains:
- Various mlx4 and mlx5 fixes and minor changes
- Support for adding a tag match rule to flow specs
- Support for cvlan offload operation for raw ethernet QPs
- A change to the core IB code to recognize raw eth capabilities and
enumerate them (touches non-Mellanox code)
- Implicit On-Demand Paging memory registration support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits)
IB/mlx5: Fix configuration of port capabilities
IB/mlx4: Take source GID by index from HW GID table
IB/mlx5: Fix blue flame buffer size calculation
IB/mlx4: Remove unused variable from function declaration
IB: Query ports via the core instead of direct into the driver
IB: Add protocol for USNIC
IB/mlx4: Support raw packet protocol
IB/mlx5: Support raw packet protocol
IB/core: Add raw packet protocol
IB/mlx5: Add implicit MR support
IB/mlx5: Expose MR cache for mlx5_ib
IB/mlx5: Add null_mkey access
IB/umem: Indicate that process is being terminated
IB/umem: Update on demand page (ODP) support
IB/core: Add implicit MR flag
IB/mlx5: Support creation of a WQ with scatter FCS offload
IB/mlx5: Enable QP creation with cvlan offload
IB/mlx5: Enable WQ creation and modification with cvlan offload
IB/mlx5: Expose vlan offloads capabilities
IB/uverbs: Enable QP creation with cvlan offload
...
This introduces an interface for user space to directly communicate on
rpmsg endpoints without the implementation of specific kernel drivers,
which is useful for e.g. debug channels.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYre9DAAoJEAsfOT8Nma3FGdIP/1PJ9ghIZxuW68jsA/BQKY/d
RLFDqJ8sPXiE/+TimcvPtnibEc/h6UGbUFcQGaTv1hlYI7WMCbsdIIAZap2kt5Bt
kaXcT5/A7O8nG0Je2/v17g7GLLkHRzNDp9F3ZrTo+PGROtLNS7AkIJ6eQo8+QKfw
ZoV0t1YS6v78iVKgUuqhtcbh/KpTYewVosBBzyAd2YSr243qr8btWMXh8MPjA1rd
hVVbkk8L5LPg5syEOOA2SAHyRl6m5f6JycEZiLpup6D6kwKKRkd+RTR1DrSKl6cL
Yus304YD/kKBaZkj0nluCPHof3WWb9SLggHIP7BKLPjIgHn3tW5TB45JeZarpTkX
T22uRt8zy0T9BsZy5d3uA5s5C2gUXw30fO/GAOtZ0kZ9aohTP+FnGe70t7oQ1oxr
bCU8kqM2jhlj26r0NZSKeiGt48LZM9xKKdluI4Bqtb/meTSm4FtNPAZ87grPjaEv
aa4qTNovkLuhBzx/tWfMEYvfSy4+17wuQJWWJ3RvZZhUpl+/djMf6ajFfVbofC/d
HlGtQi7qI8ultCtANp8r9ieKhTyPtC+ayDW8t0yWYJX10JlTLFMdhmvVyXIvV9sN
scuLLiJ1rY+nLcY/PDwbQwerOB5uZz0WdmsLdveIjG9At6U5+aWLjFqBKo1mCnK8
x2OHMikziXwiWMo5L7C0
=Lcxr
-----END PGP SIGNATURE-----
Merge tag 'rpmsg-v4.11' of git://github.com/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
"This introduces an interface for user space to directly communicate on
rpmsg endpoints without the implementation of specific kernel drivers,
which is useful for e.g. debug channels:
* tag 'rpmsg-v4.11' of git://github.com/andersson/remoteproc:
rpmsg: rpmsg_create_ept() returns NULL on error
rpmsg: qcom: smd: Return positively when not enabled
rpmsg: unlock on error in rpmsg_eptdev_read()
rpmsg: char: add CONFIG_NET dependency
rpmsg: smd: Register rpmsg user space interface for edges
rpmsg: Driver for user space endpoint interface
rpmsg: qcom_smd: Implement endpoint "poll"
rpmsg: Introduce "poll" to endpoint ops
rpmsg: qcom_smd: Add support for "label" property
Pull input updates from Dmitry:
- a new driver for Zeitech touchscreen controller
- a new driver for Samsung "touchkeys"
- touchscreen driver for Moorestown platform has been removed because
platform support is gone
- MPU3050 accelerometer driver was removed in favor of IIO driver
- miscellaneous driver cleanup and fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (88 commits)
Input: zet6223 - export OF device ID as module aliases
Input: tsc2004/5 - switch to using generic device properties
Input: tsc2004/5 - fix regulator handling
Input: tsc2005 - add OF device table
Input: add driver for Zeitec ZET6223
Input: joydev - do not report stale values on first open
Input: synaptics-rmi4 - forward upper mechanical buttons to PS/2 guest
Input: synaptics-rmi4 - clean up F30 implementation
Input: synaptics - use SERIO_OOB_DATA to handle trackstick buttons
Input: psmouse - add a custom serio protocol to send extra information
Input: synaptics-rmi4 - fix error return code in rmi_probe_interrupts()
Input: xpad - restore LED state after device resume
Input: synaptics-rmi4 - add rmi_find_function()
Input: xpad - fix stuck mode button on Xbox One S pad
Input: joydev - use clamp() macro
Input: refuse to register absolute devices without absinfo
Input: synaptics-rmi4 - add sysfs interfaces for hardware IDs
Input: synaptics-rmi4 - add sysfs attribute update_fw_status
Input: mousedev - stop offering PS/2 to userspace by default
Input: tca8418 - switch to using generic device properties
...
- Add new Broadcom bnxt_re RoCE driver
- rxe driver updates
- ioctl cleanups
- ETH_P_IBOE declaration cleanup
- IPoIB changes
- Add port state cache
- Allow srpt driver to accept guids as port names in config
- Update to hfi1 driver
- Update to srp driver
- Lots of misc. minor changes all over
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYrfewAAoJELgmozMOVy/dFnEP/2Qe7NqXRqxLS0ZqsQseFHgQ
jd236E7R/XtQQTE3PTcrWL0mq0DRF6tMEjfhUASKTbZVfCBTniJAoXYrvWhN/STq
LxAdigdV/0SPbxO3r9B1Xvk2v5BySaIBkaUDvcEXzT4e7UVQwZgxDkhhsYeY0Z/r
9bNB5760PzW8uO5cctXccNcWztZnW0IUZuAHVfQCPjZ7svoGwLnNDW6YQx+FsEkW
tbPdzMXX8VKHlC5RcKbfOOBjdNyrUpWl+uvWEc/7mazKscp4yKVFZL7PcxqPJSfd
aKdfqXYawhjZZpyws8Kn0rhkfT7xWKD/y9G5STykRJPj9/n1BDScFkmyDQhtP5bJ
GANzdgH0z7Dt9LkcAs86A8EVBbIdbdT2cpPVu7t0uWEIsJw/O5ThKpgjnrrTm6m+
89tgqLZooifTEsdj4UkZoyktrD3J9LSNZkgVmWtRn01W3oYFOPbdM4TmBZtg+/Yl
VGmOJEHMEsNuJBcJcOuRJ1MVz2LebXmPUcB0RXzgmHHgulZ/DqoOtlpg5JNmJcr5
wpw/yppkBop4V4+etJBlzDsZNmZZlX+AY0ZLqQJsDHNszDjwXgAy5Rn5FYIdMyk4
ff0FKb5dzASSxHRDxAsu2uoGaREM0NkpA0UYiIZbepGLSO8PuFG2ScQ6qzU47vqu
9SEzOaaQY2S2uqFFFnYp
=ugNm
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"First set of updates for 4.11 kernel merge window
- Add new Broadcom bnxt_re RoCE driver
- rxe driver updates
- ioctl cleanups
- ETH_P_IBOE declaration cleanup
- IPoIB changes
- Add port state cache
- Allow srpt driver to accept guids as port names in config
- Update to hfi1 driver
- Update to srp driver
- Lots of misc minor changes all over"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (114 commits)
RDMA/bnxt_re: fix for "bnxt_en: Update to firmware interface spec 1.7.0."
rdma_cm: fail iwarp accepts w/o connection params
IB/srp: Drain the send queue before destroying a QP
IB/core: Add support for draining IB_POLL_DIRECT completion queues
IB/srp: Improve an error path
IB/srp: Make a diagnostic message more informative
IB/srp: Document locking conventions
IB/srp: Fix race conditions related to task management
IB/srp: Avoid that duplicate responses trigger a kernel bug
IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
RDMA/qedr: Fix some error handling
RDMA/bnxt_re: add DCB dependency
IB/hns: include linux/module.h
IB/vmw_pvrdma: Expose vendor error to ULPs
vmw_pvrdma: switch to pci_alloc_irq_vectors
IB/hfi1: use size_t for passing array length
IB/ipoib: Remove redudant label
IB/ipoib: remove the unnecessary memory free
IB/mthca: switch to pci_alloc_irq_vectors
IB/hfi1: Code reuse with memdup_copy
...
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Revisit warning logic when not applying default helper assignment.
Jiri Kosina considers we are breaking existing setups and not warning
our users accordinly now that automatic helper assignment has been
turned off by default. So let's make him happy by spotting the warning
by when we find a helper but we cannot attach, instead of warning on the
former deprecated behaviour. Patch from Jiri Kosina.
2) Two patches to fix regression in ctnetlink interfaces with
nfnetlink_queue. Specifically, perform more relaxed in CTA_STATUS
and do not bail out if CTA_HELP indicates the same helper that we
already have. Patches from Kevin Cernekee.
3) A couple of bugfixes for ipset via Jozsef Kadlecsik. Due to wrong
index logic in hash set types and null pointer exception in the
list:set type.
4) hashlimit bails out with correct userspace parameters due to wrong
arithmetics in the code that avoids "divide by zero" when
transforming the userspace timing in milliseconds to token credits.
Patch from Alban Browaeys.
5) Fix incorrect NFQA_VLAN_MAX definition, patch from
Ken-ichirou MATSUZAWA.
6) Don't not declare nfnetlink batch error list as static, since this
may be used by several subsystems at the same time. Patch from
Liping Zhang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Consistently use types from linux/types.h to fix the following
linux/rds.h userspace compilation errors:
/usr/include/linux/rds.h:198:2: error: unknown type name 'u8'
u8 rx_traces;
/usr/include/linux/rds.h:199:2: error: unknown type name 'u8'
u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
/usr/include/linux/rds.h:203:2: error: unknown type name 'u8'
u8 rx_traces;
/usr/include/linux/rds.h:204:2: error: unknown type name 'u8'
u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
/usr/include/linux/rds.h:205:2: error: unknown type name 'u64'
u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX];
Fixes: 3289025aed ("RDS: add receive message trace used by application")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/in6.h> in uapi/linux/seg6.h to fix the following
linux/seg6.h userspace compilation error:
/usr/include/linux/seg6.h:31:18: error: array type has incomplete element type 'struct in6_addr'
struct in6_addr segments[0];
Include <linux/seg6.h> in uapi/linux/seg6_iptunnel.h to fix
the following linux/seg6_iptunnel.h userspace compilation error:
/usr/include/linux/seg6_iptunnel.h:26:21: error: array type has incomplete element type 'struct ipv6_sr_hdr'
struct ipv6_sr_hdr srh[0];
Fixes: a50a05f497 ("ipv6: sr: add missing Kbuild export for header files")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/if.h> to fix the following linux/llc.h userspace
compilation error:
/usr/include/linux/llc.h:26:27: error: 'IFHWADDRLEN' undeclared here (not in a function)
unsigned char sllc_mac[IFHWADDRLEN];
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/if.h> and <linux/in6.h> to fix the following
linux/ip6_tunnel.h userspace compilation errors:
/usr/include/linux/ip6_tunnel.h:23:12: error: 'IFNAMSIZ' undeclared here (not in a function)
char name[IFNAMSIZ]; /* name of tunnel device */
/usr/include/linux/ip6_tunnel.h:30:18: error: field 'laddr' has incomplete type
struct in6_addr laddr; /* local tunnel end-point address */
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge updates from Andrew Morton:
"142 patches:
- DAX updates
- various misc bits
- OCFS2 updates
- most of MM"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (142 commits)
mm/z3fold.c: limit first_num to the actual range of possible buddy indexes
mm: fix <linux/pagemap.h> stray kernel-doc notation
zram: remove obsolete sysfs attrs
mm/memblock.c: remove unnecessary log and clean up
oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA
mm: drop unused argument of zap_page_range()
mm: drop zap_details::check_swap_entries
mm: drop zap_details::ignore_dirty
mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled
mm: help __GFP_NOFAIL allocations which do not trigger OOM killer
mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically
mm: consolidate GFP_NOFAIL checks in the allocator slowpath
lib/show_mem.c: teach show_mem to work with the given nodemask
arch, mm: remove arch specific show_mem
mm, page_alloc: warn_alloc print nodemask
mm, page_alloc: do not report all nodes in show_mem
Revert "mm: bail out in shrink_inactive_list()"
mm, vmscan: consider eligible zones in get_scan_count
mm, vmscan: cleanup lru size claculations
mm, vmscan: do not count freed pages as PGDEACTIVATE
...
200 commits and noteworthy changes for most architectures.
* ARM:
- GICv3 save/restore
- cache flushing fixes
- working MSI injection for GICv3 ITS
- physical timer emulation
* MIPS:
- various improvements under the hood
- support for SMP guests
- a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers
to support copy-on-write, KSM, idle page tracking, swapping, ballooning
and everything else. KVM_CAP_READONLY_MEM is also supported, so that
writes to some memory regions can be treated as MMIO. The new MMU also
paves the way for hardware virtualization support.
* PPC:
- support for POWER9 using the radix-tree MMU for host and guest
- resizable hashed page table
- bugfixes.
* s390: expose more features to the guest
- more SIMD extensions
- instruction execution protection
- ESOP2
* x86:
- improved hashing in the MMU
- faster PageLRU tracking for Intel CPUs without EPT A/D bits
- some refactoring of nested VMX entry/exit code, preparing for live
migration support of nested hypervisors
- expose yet another AVX512 CPUID bit
- host-to-guest PTP support
- refactoring of interrupt injection, with some optimizations thrown in
and some duct tape removed.
- remove lazy FPU handling
- optimizations of user-mode exits
- optimizations of vcpu_is_preempted() for KVM guests
* generic:
- alternative signaling mechanism that doesn't pound on tsk->sighand->siglock
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJYral1AAoJEL/70l94x66DbNgH/Rx8YXuidFq2fe3RWOvld3RK
85OM/D5g38cTLpBE0/sJpcvX34iYN8U/l5foCZwpxB+83GHEk2Cr57JyfTogdaAJ
x8dBhHKQCA/HxSQUQLN6nFqRV+yT8WUR92Fhqx82+80BSen5Yzcfee/TDoW6T1IW
g8CYgX9FrRaGOX066ImAuUfdAdUVjyssfs9VttDTX+HiusPeuBPx/wsRe1ZEEPlH
vnltIJQb1ETV2GOZLUojKjzH6aZkjIl29XxjkYii9JTUornClG0DfW+5QT3uLrB5
gJ+G+Zmpsq8ZBx9jNDtAi7sFsoPY1Mzf+JPNCGXBra2sP2GrBAuXcxmgznRYltQ=
=8IIp
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"4.11 is going to be a relatively large release for KVM, with a little
over 200 commits and noteworthy changes for most architectures.
ARM:
- GICv3 save/restore
- cache flushing fixes
- working MSI injection for GICv3 ITS
- physical timer emulation
MIPS:
- various improvements under the hood
- support for SMP guests
- a large rewrite of MMU emulation. KVM MIPS can now use MMU
notifiers to support copy-on-write, KSM, idle page tracking,
swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is
also supported, so that writes to some memory regions can be
treated as MMIO. The new MMU also paves the way for hardware
virtualization support.
PPC:
- support for POWER9 using the radix-tree MMU for host and guest
- resizable hashed page table
- bugfixes.
s390:
- expose more features to the guest
- more SIMD extensions
- instruction execution protection
- ESOP2
x86:
- improved hashing in the MMU
- faster PageLRU tracking for Intel CPUs without EPT A/D bits
- some refactoring of nested VMX entry/exit code, preparing for live
migration support of nested hypervisors
- expose yet another AVX512 CPUID bit
- host-to-guest PTP support
- refactoring of interrupt injection, with some optimizations thrown
in and some duct tape removed.
- remove lazy FPU handling
- optimizations of user-mode exits
- optimizations of vcpu_is_preempted() for KVM guests
generic:
- alternative signaling mechanism that doesn't pound on
tsk->sighand->siglock"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits)
x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64
x86/paravirt: Change vcp_is_preempted() arg type to long
KVM: VMX: use correct vmcs_read/write for guest segment selector/base
x86/kvm/vmx: Defer TR reload after VM exit
x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
x86/kvm/vmx: Simplify segment_base()
x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
x86/kvm/vmx: Don't fetch the TSS base from the GDT
x86/asm: Define the kernel TSS limit in a macro
kvm: fix page struct leak in handle_vmon
KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now
KVM: Return an error code only as a constant in kvm_get_dirty_log()
KVM: Return an error code only as a constant in kvm_get_dirty_log_protect()
KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl()
KVM: x86: remove code for lazy FPU handling
KVM: race-free exit from KVM_RUN without POSIX signals
KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message
KVM: PPC: Book3S PR: Ratelimit copy data failure error messages
KVM: Support vCPU-based gfn->hva cache
KVM: use separate generations for each address space
...
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYoM2fAAoJEHm+PkMAQRiGr9MH/izEAMri7rJ0QMc3ejt+WmD0
8pkZw3+MVn71z6cIEgpzk4QkEWJd5rfhkETCeCp7qQ9V6cDW1FDE9+0OmPjiphDt
nnzKs7t7skEBwH5Mq5xygmIfkv+Z0QGHZ20gfQWY3F56Uxo+ARF88OBHBLKhqx3v
98C7YbMFLKBslKClA78NUEIdx0UfBaRqerlERx0Lfl9aoOrbBS6WI3iuREiylpih
9o7HTrwaGKkU4Kd6NdgJP2EyWPsd1LGalxBBjeDSpm5uokX6ALTdNXDZqcQscHjE
RmTqJTGRdhSThXOpNnvUJvk9L442yuNRrVme/IqLpxMdHPyjaXR3FGSIDb2SfjY=
=VMy8
-----END PGP SIGNATURE-----
Merge tag 'v4.10-rc8' into drm-next
Linux 4.10-rc8
Backmerge Linus rc8 to fix some conflicts, but also
to avoid pulling it in via a fixes pull from someone.
Userland developers asked to be notified immediately by the UFFDIO_API
ioctl if shmem missing mode is supported by userfaultfd in the running
kernel. This avoids the need to run UFFDIO_REGISTER on a shmem virtual
memory range to find out.
Link: http://lkml.kernel.org/r/20161216144821.5183-38-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Expand the userfaultfd_register/unregister routines to allow shared
memory VMAs.
Currently, there is no UFFDIO_ZEROPAGE and write-protection support for
shared memory VMAs, which is reflected in ioctl methods supported by
uffdio_register.
Link: http://lkml.kernel.org/r/20161216144821.5183-34-aarcange@redhat.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Userland developers asked to be notified immediately by the UFFDIO_API
ioctl if hugetlbfs missing mode is supported by userfaultfd in the
running kernel. This avoids the need to run UFFDIO_REGISTER on a
hugetlbfs virtual memory range to find out.
Link: http://lkml.kernel.org/r/20161216144821.5183-27-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Expand the userfaultfd_register/unregister routines to allow VM_HUGETLB
vmas. huge page alignment checking is performed after a VM_HUGETLB vma
is encountered.
Also, since there is no UFFDIO_ZEROPAGE support for huge pages do not
return that as a valid ioctl method for huge page ranges.
Link: http://lkml.kernel.org/r/20161216144821.5183-22-aarcange@redhat.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the page is punched out of the address space the uffd reader should
know this and zeromap the respective area in case of the #PF event.
Link: http://lkml.kernel.org/r/20161216144821.5183-14-aarcange@redhat.com
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The event denotes that an area [start:end] moves to different location.
Length change isn't reported as "new" addresses, if they appear on the
uffd reader side they will not contain any data and the latter can just
zeromap them.
Waiting for the event ACK is also done outside of mmap sem, as for fork
event.
Link: http://lkml.kernel.org/r/20161216144821.5183-12-aarcange@redhat.com
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the mm with uffd-ed vmas fork()-s the respective vmas notify their
uffds with the event which contains a descriptor with new uffd. This
new descriptor can then be used to get events from the child and
populate its mm with data. Note, that there can be different uffd-s
controlling different vmas within one mm, so first we should collect all
those uffds (and ctx-s) in a list and then notify them all one by one
but only once per fork().
The context is created at fork() time but the descriptor, file struct
and anon inode object is created at event read time. So some trickery
is added to the userfaultfd_ctx_read() to handle the ctx queues' locking
vs file creation.
Another thing worth noticing is that the task that fork()-s waits for
the uffd event to get processed WITHOUT the mmap sem.
[aarcange@redhat.com: build warning fix]
Link: http://lkml.kernel.org/r/20161216144821.5183-10-aarcange@redhat.com
Link: http://lkml.kernel.org/r/20161216144821.5183-9-aarcange@redhat.com
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "userfaultfd tmpfs/hugetlbfs/non-cooperative", v2
These userfaultfd features are finished and are ready for larger
exposure in -mm and upstream merging.
1) tmpfs non present userfault
2) hugetlbfs non present userfault
3) non cooperative userfault for fork/madvise/mremap
qemu development code is already exercising 2) and container postcopy
live migration needs 3).
1) is not currently used but there's a self test and we know some qemu
user for various reasons uses tmpfs as backing for KVM so it'll need it
too to use postcopy live migration with tmpfs memory.
All review feedback from the previous submit has been handled and the
fixes are included. There's no outstanding issue AFIK.
Upstream code just did a s/fe/vmf/ conversion in the page faults and
this has been converted as well incrementally.
In addition to the previous submits, this also wakes up stuck userfaults
during UFFDIO_UNREGISTER. The non cooperative testcase actually
reproduced this problem by getting stuck instead of quitting clean in
some rare case as it could call UFFDIO_UNREGISTER while some userfault
could be still in flight. The other option would have been to keep
leaving it up to userland to serialize itself and to patch the testcase
instead but the wakeup during unregister I think is preferable.
David also asked the UFFD_FEATURE_MISSING_HUGETLBFS and
UFFD_FEATURE_MISSING_SHMEM feature flags to be added so QEMU can avoid
to probe if the hugetlbfs/shmem missing support is available by calling
UFFDIO_REGISTER. QEMU already checks HUGETLBFS_MAGIC with fstatfs so if
UFFD_FEATURE_MISSING_HUGETLBFS is also set, it knows UFFDIO_REGISTER
will succeed (or if it fails, it's for some other more concerning
reason). There's no reason to worry about adding too many feature
flags. There are 64 available and worst case we've to bump the API if
someday we're really going to run out of them.
The round-trip network latency of hugetlbfs userfaults during postcopy
live migration is still of the order of dozen milliseconds on 10GBit if
at 2MB hugepage granularity so it's working perfectly and it should
provide for higher bandwidth or lower CPU usage (which makes it
interesting to add an option in the future to support THP granularity
too for anonymous memory, UFFDIO_COPY would then have to create THP if
alignment/len allows for it). 1GB hugetlbfs granularity will require
big changes in hugetlbfs to work so it's deferred for later.
This patch (of 42):
This adds proper documentation (inline) to avoid the risk of further
misunderstandings about the semantics of _IOW/_IOR and it also reminds
whoever will bump the UFFDIO_API in the future, to change the two ioctl
to _IOW.
This was found while implementing strace support for those ioctl,
otherwise we could have never found it by just reviewing kernel code and
testing it.
_IOC_READ or _IOC_WRITE alters nothing but the ioctl number itself, so
it's only worth fixing if the UFFDIO_API is bumped someday.
Link: http://lkml.kernel.org/r/20161216144821.5183-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Include <sys/socket.h> (guarded by ifndef __KERNEL__) to fix
the following linux/if.h userspace compilation errors:
/usr/include/linux/if.h:234:19: error: field 'ifru_addr' has incomplete type
struct sockaddr ifru_addr;
/usr/include/linux/if.h:235:19: error: field 'ifru_dstaddr' has incomplete type
struct sockaddr ifru_dstaddr;
/usr/include/linux/if.h:236:19: error: field 'ifru_broadaddr' has incomplete type
struct sockaddr ifru_broadaddr;
/usr/include/linux/if.h:237:19: error: field 'ifru_netmask' has incomplete type
struct sockaddr ifru_netmask;
/usr/include/linux/if.h:238:20: error: field 'ifru_hwaddr' has incomplete type
struct sockaddr ifru_hwaddr;
This also fixes userspace compilation of the following uapi headers:
linux/atmbr2684.h
linux/gsmmux.h
linux/if_arp.h
linux/if_bonding.h
linux/if_frad.h
linux/if_pppox.h
linux/if_tunnel.h
linux/netdevice.h
linux/route.h
linux/wireless.h
As no uapi header provides a definition of struct sockaddr, inclusion
of <sys/socket.h> seems to be the most conservative and the only safe
fix available.
All current users of <linux/if.h> are very likely to be including
<sys/socket.h> already because the latter is the sole provider
of struct sockaddr definition in libc, so adding a uapi header
with a definition of struct sockaddr would create a potential
conflict with <sys/socket.h>.
Replacing struct sockaddr in the definition of struct ifreq with
a different type would create a potential incompatibility with current
users of struct ifreq who might rely on ifru_addr et al members being
of type struct sockaddr.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the big tty/serial driver patchset for 4.11-rc1.
Not much here, but a lot of little fixes and individual serial driver
updates all over the subsystem. Majority are for the sh-sci driver and
platform (the arch-specific changes have acks from the maintainer).
The start of the "serial bus" code is here as well, but nothing is
converted to use it yet. That work is still ongoing, hopefully will
start to show up across different subsystems for 4.12 (bluetooth is one
major place that will be used.)
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2lDg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylwfwCgyExa4x8Lur2nZyu8cgcaVRU68VAAoNQh0WJt
EZdEhEkRMt4d64j+ApYI
=iv7x
-----END PGP SIGNATURE-----
Merge tag 'tty-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big tty/serial driver patchset for 4.11-rc1.
Not much here, but a lot of little fixes and individual serial driver
updates all over the subsystem. Majority are for the sh-sci driver and
platform (the arch-specific changes have acks from the maintainer).
The start of the "serial bus" code is here as well, but nothing is
converted to use it yet. That work is still ongoing, hopefully will
start to show up across different subsystems for 4.12 (bluetooth is
one major place that will be used.)
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (109 commits)
tty: pl011: Work around QDF2400 E44 stuck BUSY bit
atmel_serial: Use the fractional divider when possible
tty: Remove extra include in HVC console tty framework
serial: exar: Enable MSI support
serial: exar: Move register defines from uapi header to consumer site
serial: pci: Remove unused pci_boards entries
serial: exar: Move Commtech adapters to 8250_exar as well
serial: exar: Fix feature control register constants
serial: exar: Fix initialization of EXAR registers for ports > 0
serial: exar: Fix mapping of port I/O resources
serial: sh-sci: fix hardware RX trigger level setting
tty/serial: atmel: ensure state is restored after suspending
serial: 8250_dw: Avoid "too much work" from bogus rx timeout interrupt
serdev: ttyport: check whether tty_init_dev() fails
serial: 8250_pci: make pciserial_detach_ports() static
ARM: dts: STiH410-b2260: Enable HW flow-control
ARM: dts: STiH407-family: Use new Pinctrl groups
ARM: dts: STiH407-pinctrl: Add Pinctrl group for HW flow-control
ARM: dts: STiH410-b2260: Identify the UART RTS line
dt-bindings: serial: Update 'uart-has-rtscts' description
...
Here is the big staging and iio driver patchsets for 4.11-rc1.
We almost broke even this time around, with only a few thousand lines
added overall, as we removed the old and obsolete i4l code, but added
some new drivers for the RPi platform, as well as adding some new IIO
drivers.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2j/w8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymZ1ACdFR4o6xYrWEizmao4a/u+lUZE1aIAnRmcGcIc
J+leO1n9bE5iadQvKYUW
=sKVA
-----END PGP SIGNATURE-----
Merge tag 'staging-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/iio driver updates from Greg KH:
"Here is the big staging and iio driver patchsets for 4.11-rc1.
We almost broke even this time around, with only a few thousand lines
added overall, as we removed the old and obsolete i4l code, but added
some new drivers for the RPi platform, as well as adding some new IIO
drivers.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (669 commits)
Staging: vc04_services: Fix the "space prohibited" code style errors
Staging: vc04_services: Fix the "wrong indent" code style errors
staging: octeon: Use net_device_stats from struct net_device
Staging: rtl8192u: ieee80211: ieee80211.h - style fix
Staging: rtl8192u: ieee80211: ieee80211_tx.c - style fix
Staging: rtl8192u: ieee80211: rtl819x_BAProc.c - style fix
Staging: rtl8192u: ieee80211: ieee80211_module.c - style fix
Staging: rtl8192u: ieee80211: rtl819x_TSProc.c - style fix
Staging: rtl8192u: r8192U.h - style fix
Staging: rtl8192u: r8192U_core.c - style fix
Staging: rtl8192u: r819xU_cmdpkt.c - style fix
staging: rtl8192u: blank lines aren't necessary before a close brace '}'
staging: rtl8192u: Adding space after enum and struct definition
staging: rtl8192u: Adding space after struct definition
Staging: ks7010: Add required and preferred spaces around operators
Staging: ks7010: ks*: Remove redundant blank lines
Staging: ks7010: ks*: Add missing blank lines after declarations
staging: visorbus, replace init_timer with setup_timer
staging: vt6656: rxtx.c Removed multiple dereferencing
staging: vt6656: Alignment match open parenthesis
...
Here is the big char/misc driver patchset for 4.11-rc1.
Lots of different driver subsystems updated here. Rework for the hyperv
subsystem to handle new platforms better, mei and w1 and extcon driver
updates, as well as a number of other "minor" driver updates. Full
details are in the shortlog below.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2iRQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynhFACguVE+/ixj5u5bT5DXQaZNai/6zIAAmgMWwd/t
YTD2cwsJsGbTT1fY3SUe
=CiSI
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc driver patchset for 4.11-rc1.
Lots of different driver subsystems updated here: rework for the
hyperv subsystem to handle new platforms better, mei and w1 and extcon
driver updates, as well as a number of other "minor" driver updates.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (169 commits)
goldfish: Sanitize the broken interrupt handler
x86/platform/goldfish: Prevent unconditional loading
vmbus: replace modulus operation with subtraction
vmbus: constify parameters where possible
vmbus: expose hv_begin/end_read
vmbus: remove conditional locking of vmbus_write
vmbus: add direct isr callback mode
vmbus: change to per channel tasklet
vmbus: put related per-cpu variable together
vmbus: callback is in softirq not workqueue
binder: Add support for file-descriptor arrays
binder: Add support for scatter-gather
binder: Add extra size to allocator
binder: Refactor binder_transact()
binder: Support multiple /dev instances
binder: Deal with contexts in debugfs
binder: Support multiple context managers
binder: Split flat_binder_object
auxdisplay: ht16k33: remove private workqueue
auxdisplay: ht16k33: rework input device initialization
...
Highlights include:
- Support for direct mapped LPC on POWER9, giving Linux direct access to
devices that may be on there such as a UART.
- Memory hotplug support for the Power9 Radix MMU.
- Add new AUX vectors describing the processor's cache geometry, to be used by
glibc.
- The ability for a guest to ask the hypervisor to resize the guest's hash
table, and in addition support for doing so automatically when memory is
hotplugged into/out-of the guest. This allows the hash table to be sized
based on the current memory usage of the guest, rather than the maximum
possible memory usage.
- Implementation of optprobes (kprobe optimisation) for powerpc.
In addition there's the topic branch shared with the KVM tree, which includes
support for guests to use the Radix MMU on Power9.
Thanks to:
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton Blanchard,
Benjamin Herrenschmidt, Chris Packham, Daniel Axtens, Daniel Borkmann, David
Gibson, Finn Thain, Gautham R. Shenoy, Gavin Shan, Greg Kurz, Joel Stanley,
John Allen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael
Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi
Bangoria, Reza Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYrWj0AAoJEFHr6jzI4aWAsn4P/08Kz3TtOvDuuPGVNoO7fWOn
ag5/zVt8R4FuCALqWpAZbVqMuUU4wLxG0RuWlmBNYYhrjMC6JxHpOSjQXxM2D7YT
CdGTJxG414r6HMOeToL9i/z33o0m+KT07tscer+QMKlXVKCR2z0fEJch+zoPBHYA
5eVBFpLLTtbNiX9UnrcM/vYz61d56kT4YJey9/8qbAkTAc1rMPa8ucU5UiKYJ7yX
mF8cd7WE+7aqif9V8yN59G2rcbz+h3pbMw/gzImiYsYrUj4fLjU+VTKL5PPT+UFy
WWBXD3MAMm1dksMMZi3hgoo2BZhDn3RkymeYi6Jo4kDknNMPZzkMxGyvaJ8Eq5H/
bIYXdS1AbTtvaaEEuWqDFjpnChOEvj/8IeqitU0jjlql8BVjNKg/ESaaKucZr+pO
pk2Mfvw0Tb/lxJT5qj27yq4aRsxJwdFOoPYCN7MquPp/wV2Tg5M6h4nVQ4T6Wl0S
tMFQeCqXflhDWh0Xgr2UXpF66/YTj3Du5LasOTkgGeU30Z8TcNGFEmDWShKP3cEm
e0oQE+OHhIGN4WSBXBAto/Gw8/0v3pXlMs+VEfeHqenwPss2sWtSwXWe8khmiy9e
DJ48sTzj75/Zx1fiqRldw9YEnrL+NK0eOOpvzxeyKpfvdUytc+chFfEqUmO6kl3Z
DW2UmlZxmW+b0SfexCHL
=Icle
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Support for direct mapped LPC on POWER9, giving Linux direct access
to devices that may be on there such as a UART.
- Memory hotplug support for the Power9 Radix MMU.
- Add new AUX vectors describing the processor's cache geometry, to
be used by glibc.
- The ability for a guest to ask the hypervisor to resize the guest's
hash table, and in addition support for doing so automatically when
memory is hotplugged into/out-of the guest. This allows the hash
table to be sized based on the current memory usage of the guest,
rather than the maximum possible memory usage.
- Implementation of optprobes (kprobe optimisation) for powerpc.
In addition there's the topic branch shared with the KVM tree, which
includes support for guests to use the Radix MMU on Power9.
Thanks to:
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton
Blanchard, Benjamin Herrenschmidt, Chris Packham, Daniel Axtens,
Daniel Borkmann, David Gibson, Finn Thain, Gautham R. Shenoy, Gavin
Shan, Greg Kurz, Joel Stanley, John Allen, Madhavan Srinivasan,
Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Nathan Fontenot,
Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Reza
Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun"
* tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (129 commits)
powerpc/mm/radix: Skip ptesync in pte update helpers
powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm
powerpc/mm/radix: Update pte update sequence for pte clear case
powerpc/mm: Update PROTFAULT handling in the page fault path
powerpc/xmon: Fix data-breakpoint
powerpc/mm: Fix build break with BOOK3S_64=n and MEMORY_HOTPLUG=y
powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=y
powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n
powerpc/pseries: Fix typo in parameter description
powerpc/kprobes: Remove kprobe_exceptions_notify()
kprobes: Introduce weak variant of kprobe_exceptions_notify()
powerpc/ftrace: Fix confusing help text for DISABLE_MPROFILE_KERNEL
powerpc/powernv: Fix opal_exit tracepoint opcode
powerpc: Add a prototype for mcount() so it can be versioned
powerpc: Drop GPL from of_node_to_nid() export to match other arches
powerpc/kprobes: Optimize kprobe in kretprobe_trampoline()
powerpc/kprobes: Implement Optprobes
powerpc/kprobes: Fixes for kprobe_lookup_name() on BE
powerpc: Add helper to check if offset is within relative branch range
powerpc/bpf: Introduce __PPC_SH64()
...
Pull networking updates from David Miller:
"Highlights:
1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
Varadhan.
2) Simplify classifier state on sk_buff in order to shrink it a bit.
From Willem de Bruijn.
3) Introduce SIPHASH and it's usage for secure sequence numbers and
syncookies. From Jason A. Donenfeld.
4) Reduce CPU usage for ICMP replies we are going to limit or
suppress, from Jesper Dangaard Brouer.
5) Introduce Shared Memory Communications socket layer, from Ursula
Braun.
6) Add RACK loss detection and allow it to actually trigger fast
recovery instead of just assisting after other algorithms have
triggered it. From Yuchung Cheng.
7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.
8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.
9) Export MPLS packet stats via netlink, from Robert Shearman.
10) Significantly improve inet port bind conflict handling, especially
when an application is restarted and changes it's setting of
reuseport. From Josef Bacik.
11) Implement TX batching in vhost_net, from Jason Wang.
12) Extend the dummy device so that VF (virtual function) features,
such as configuration, can be more easily tested. From Phil
Sutter.
13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
Dumazet.
14) Add new bpf MAP, implementing a longest prefix match trie. From
Daniel Mack.
15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.
16) Add new aquantia driver, from David VomLehn.
17) Add bpf tracepoints, from Daniel Borkmann.
18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
Florian Fainelli.
19) Remove custom busy polling in many drivers, it is done in the core
networking since 4.5 times. From Eric Dumazet.
20) Support XDP adjust_head in virtio_net, from John Fastabend.
21) Fix several major holes in neighbour entry confirmation, from
Julian Anastasov.
22) Add XDP support to bnxt_en driver, from Michael Chan.
23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.
24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.
25) Support GRO in IPSEC protocols, from Steffen Klassert"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
Revert "ath10k: Search SMBIOS for OEM board file extension"
net: socket: fix recvmmsg not returning error from sock_error
bnxt_en: use eth_hw_addr_random()
bpf: fix unlocking of jited image when module ronx not set
arch: add ARCH_HAS_SET_MEMORY config
net: napi_watchdog() can use napi_schedule_irqoff()
tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
net/hsr: use eth_hw_addr_random()
net: mvpp2: enable building on 64-bit platforms
net: mvpp2: switch to build_skb() in the RX path
net: mvpp2: simplify MVPP2_PRS_RI_* definitions
net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
net: mvpp2: remove unused register definitions
net: mvpp2: simplify mvpp2_bm_bufs_add()
net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
net: mvpp2: release reference to txq_cpu[] entry after unmapping
net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJYrElpAAoJELDendYovxMvNFQH/RJU7lwDSf7rF7ZzFGvdcsfi
T4DDuowkYoJm2+GypoRVzZZ0lxJlxr0mNKPvgGDvuTogMY7pvjAf6B7/xCvTFsNU
UoO2I7ljgXxCXFRiXH50nAjS7PC2PFW3Qx+8XPIWeZmnUPeJi4Q43fiSloUt+a6l
JgS/autOCflGasR5MihCZXkvdVF81K6GuEd3hCh9GKZ/8RiwNPaY50vHnMv/hfqq
SJNRKOTSRXioYlTohLnjuPWHDMayRJEO48IXl3c7aNxDTkHjn78yaoPhJ7m+M0Bq
s2GyaQA4tCwADhP+tilmI5H1vFpt6w9x7O0dgWiSm7TB91lwfZOen2WhTPPp6L8=
=gktV
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.11-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"Xen features and fixes:
- a series from Boris Ostrovsky adding support for booting Linux as
Xen PVH guest
- a series from Juergen Gross streamlining the xenbus driver
- a series from Paul Durrant adding support for the new device model
hypercall
- several small corrections"
* tag 'for-linus-4.11-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/privcmd: add IOCTL_PRIVCMD_RESTRICT
xen/privcmd: Add IOCTL_PRIVCMD_DM_OP
xen/privcmd: return -ENOTTY for unimplemented IOCTLs
xen: optimize xenbus driver for multiple concurrent xenstore accesses
xen: modify xenstore watch event interface
xen: clean up xenbus internal headers
xenbus: Neaten xenbus_va_dev_error
xen/pvh: Use Xen's emergency_restart op for PVH guests
xen/pvh: Enable CPU hotplug
xen/pvh: PVH guests always have PV devices
xen/pvh: Initialize grant table for PVH guests
xen/pvh: Make sure we don't use ACPI_IRQ_MODEL_PIC for SCI
xen/pvh: Bootstrap PVH guest
xen/pvh: Import PVH-related Xen public interfaces
xen/x86: Remove PVH support
x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C
xen/manage: correct return value check on xenbus_scanf()
x86/xen: Fix APIC id mismatch warning on Intel
xen/netback: set default upper limit of tx/rx queues to 8
xen/netfront: set default upper limit of tx/rx queues to 8
Pull audit updates from Paul Moore:
"The audit changes for v4.11 are relatively small compared to what we
did for v4.10, both in terms of size and impact.
- two patches from Steve tweak the formatting for some of the audit
records to make them more consistent with other audit records.
- three patches from Richard record the name of a module on module
load, fix the logging of sockaddr information when using
socketcall() on 32-bit systems, and add the ability to reset
audit's lost record counter.
- my lone patch just fixes an annoying style nit that I was reminded
about by one of Richard's patches.
All these patches pass our test suite"
* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
audit: remove unnecessary curly braces from switch/case statements
audit: log module name on init_module
audit: log 32-bit socketcalls
audit: add feature audit_lost reset
audit: Make AUDIT_ANOM_ABEND event normalized
audit: Make AUDIT_KERNEL event conform to the specification
This update includes the usual round of major driver updates (ncr5380,
ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
megaraid_sas, ). There's also an assortment of minor fixes and the
major update of switching a bunch of drivers to pci_alloc_irq_vectors
from Christoph.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJYq5adAAoJEAVr7HOZEZN4bjUP/Atk7CSZVnC75pcYmncbEGCx
ysOlEHK4uW2HhiAYk3PlYMk+pKrMHet2zsbbM9PHJfopdOHZ7Sq1+UZZVeqE1Zun
8pe0NhON+fZx7XAnevdEvnSSULQZ+AGfjZO72iUwkJiN3ozYaFtCITOyn49l4GpR
ra9emskBh7CQOFW2voGn1AKeDijPYGx3+TO4AUrWjVMiByR06gb1bmImx+ljiUrs
jzRJPfrt90ORcTdpMateyN2EXxudcASMhX03SJ6fRI84hPAhMCROMbTv8RnzOTE4
DPbnvbYUowlHt43iUhJHSwGdkRRaRBnkzQENBp1fNrNzZgF6vB7+kShxbonrYB2p
gC4ewaJr0BNj+HsUnvTpe3WseiPOcfsnBsKilPLKBlm2dCKEXqFox/dj/T1uexxg
HoyFrl3u8fyEqVHrzRS4M9t/njWh0NFmXxb0wBdj+lkVFTRErGSKQ8SfOqshuSGs
P8NN88jy8vC7uqgzKBJ+UH3ehzn3qfBxasFHIC/e2awY9FqKjHGTxKMmSVpjXVxy
wCvE2FQ3k/qEj2XSM6f7/NGytlSOlju5q1rFtHPW2M+TFSh0LJWCnmVjR/Zle9em
pBWmtIgCv8W5b41zL2H94nLWAZbfdrrNU/XnX88l47LKnmorte/PGhpxu36NEsMS
VCgreQmFMdMRY+WzDWl1
=cBQx
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This update includes the usual round of major driver updates (ncr5380,
ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
megaraid_sas, ...).
There's also an assortment of minor fixes and the major update of
switching a bunch of drivers to pci_alloc_irq_vectors from Christoph"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (188 commits)
scsi: megaraid_sas: handle dma_addr_t right on 32-bit
scsi: megaraid_sas: array overflow in megasas_dump_frame()
scsi: snic: switch to pci_irq_alloc_vectors
scsi: megaraid_sas: driver version upgrade
scsi: megaraid_sas: Change RAID_1_10_RMW_CMDS to RAID_1_PEER_CMDS and set value to 2
scsi: megaraid_sas: Indentation and smatch warning fixes
scsi: megaraid_sas: Cleanup VD_EXT_DEBUG and SPAN_DEBUG related debug prints
scsi: megaraid_sas: Increase internal command pool
scsi: megaraid_sas: Use synchronize_irq to wait for IRQs to complete
scsi: megaraid_sas: Bail out the driver load if ld_list_query fails
scsi: megaraid_sas: Change build_mpt_mfi_pass_thru to return void
scsi: megaraid_sas: During OCR, if get_ctrl_info fails do not continue with OCR
scsi: megaraid_sas: Do not set fp_possible if TM capable for non-RW syspdIO, change fp_possible to bool
scsi: megaraid_sas: Remove unused pd_index from megasas_build_ld_nonrw_fusion
scsi: megaraid_sas: megasas_return_cmd does not memset IO frame to zero
scsi: megaraid_sas: max_fw_cmds are decremented twice, remove duplicate
scsi: megaraid_sas: update can_queue only if the new value is less
scsi: megaraid_sas: Change max_cmd from u32 to u16 in all functions
scsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandle to MR_DEVHANDLE_INVALID
scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD
...
Should be - 1 as in other _MAX definitions.
Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add netconf support to MPLS. Allows userpsace to learn and be notified
of changes to 'input' enable setting per interface.
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add Stream Reset Event described in rfc6525
section 6.1.1.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the kernel side, sockaddr_storage is #define'd to
__kernel_sockaddr_storage. Replacing struct sockaddr_storage with
struct __kernel_sockaddr_storage defined by <linux/socket.h> fixes
the following linux/rds.h userspace compilation error:
/usr/include/linux/rds.h:226:26: error: field 'dest_addr' has incomplete type
struct sockaddr_storage dest_addr;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consistently use types from linux/types.h to fix the following
linux/rds.h userspace compilation errors:
/usr/include/linux/rds.h:106:2: error: unknown type name 'uint8_t'
uint8_t name[32];
/usr/include/linux/rds.h:107:2: error: unknown type name 'uint64_t'
uint64_t value;
/usr/include/linux/rds.h:117:2: error: unknown type name 'uint64_t'
uint64_t next_tx_seq;
/usr/include/linux/rds.h:118:2: error: unknown type name 'uint64_t'
uint64_t next_rx_seq;
/usr/include/linux/rds.h:121:2: error: unknown type name 'uint8_t'
uint8_t transport[TRANSNAMSIZ]; /* null term ascii */
/usr/include/linux/rds.h:122:2: error: unknown type name 'uint8_t'
uint8_t flags;
/usr/include/linux/rds.h:129:2: error: unknown type name 'uint64_t'
uint64_t seq;
/usr/include/linux/rds.h:130:2: error: unknown type name 'uint32_t'
uint32_t len;
/usr/include/linux/rds.h:135:2: error: unknown type name 'uint8_t'
uint8_t flags;
/usr/include/linux/rds.h:139:2: error: unknown type name 'uint32_t'
uint32_t sndbuf;
/usr/include/linux/rds.h:144:2: error: unknown type name 'uint32_t'
uint32_t rcvbuf;
/usr/include/linux/rds.h:145:2: error: unknown type name 'uint64_t'
uint64_t inum;
/usr/include/linux/rds.h:153:2: error: unknown type name 'uint64_t'
uint64_t hdr_rem;
/usr/include/linux/rds.h:154:2: error: unknown type name 'uint64_t'
uint64_t data_rem;
/usr/include/linux/rds.h:155:2: error: unknown type name 'uint32_t'
uint32_t last_sent_nxt;
/usr/include/linux/rds.h:156:2: error: unknown type name 'uint32_t'
uint32_t last_expected_una;
/usr/include/linux/rds.h:157:2: error: unknown type name 'uint32_t'
uint32_t last_seen_una;
/usr/include/linux/rds.h:164:2: error: unknown type name 'uint8_t'
uint8_t src_gid[RDS_IB_GID_LEN];
/usr/include/linux/rds.h:165:2: error: unknown type name 'uint8_t'
uint8_t dst_gid[RDS_IB_GID_LEN];
/usr/include/linux/rds.h:167:2: error: unknown type name 'uint32_t'
uint32_t max_send_wr;
/usr/include/linux/rds.h:168:2: error: unknown type name 'uint32_t'
uint32_t max_recv_wr;
/usr/include/linux/rds.h:169:2: error: unknown type name 'uint32_t'
uint32_t max_send_sge;
/usr/include/linux/rds.h:170:2: error: unknown type name 'uint32_t'
uint32_t rdma_mr_max;
/usr/include/linux/rds.h:171:2: error: unknown type name 'uint32_t'
uint32_t rdma_mr_size;
/usr/include/linux/rds.h:212:9: error: unknown type name 'uint64_t'
typedef uint64_t rds_rdma_cookie_t;
/usr/include/linux/rds.h:215:2: error: unknown type name 'uint64_t'
uint64_t addr;
/usr/include/linux/rds.h:216:2: error: unknown type name 'uint64_t'
uint64_t bytes;
/usr/include/linux/rds.h:221:2: error: unknown type name 'uint64_t'
uint64_t cookie_addr;
/usr/include/linux/rds.h:222:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:228:2: error: unknown type name 'uint64_t'
uint64_t cookie_addr;
/usr/include/linux/rds.h:229:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:234:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:240:2: error: unknown type name 'uint64_t'
uint64_t local_vec_addr;
/usr/include/linux/rds.h:241:2: error: unknown type name 'uint64_t'
uint64_t nr_local;
/usr/include/linux/rds.h:242:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:243:2: error: unknown type name 'uint64_t'
uint64_t user_token;
/usr/include/linux/rds.h:248:2: error: unknown type name 'uint64_t'
uint64_t local_addr;
/usr/include/linux/rds.h:249:2: error: unknown type name 'uint64_t'
uint64_t remote_addr;
/usr/include/linux/rds.h:252:4: error: unknown type name 'uint64_t'
uint64_t compare;
/usr/include/linux/rds.h:253:4: error: unknown type name 'uint64_t'
uint64_t swap;
/usr/include/linux/rds.h:256:4: error: unknown type name 'uint64_t'
uint64_t add;
/usr/include/linux/rds.h:259:4: error: unknown type name 'uint64_t'
uint64_t compare;
/usr/include/linux/rds.h:260:4: error: unknown type name 'uint64_t'
uint64_t swap;
/usr/include/linux/rds.h:261:4: error: unknown type name 'uint64_t'
uint64_t compare_mask;
/usr/include/linux/rds.h:262:4: error: unknown type name 'uint64_t'
uint64_t swap_mask;
/usr/include/linux/rds.h:265:4: error: unknown type name 'uint64_t'
uint64_t add;
/usr/include/linux/rds.h:266:4: error: unknown type name 'uint64_t'
uint64_t nocarry_mask;
/usr/include/linux/rds.h:269:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:270:2: error: unknown type name 'uint64_t'
uint64_t user_token;
/usr/include/linux/rds.h:274:2: error: unknown type name 'uint64_t'
uint64_t user_token;
/usr/include/linux/rds.h:275:2: error: unknown type name 'int32_t'
int32_t status;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/in.h> to fix the following linux/mroute.h userspace
compilation errors:
/usr/include/linux/mroute.h:58:18: error: field 'vifc_lcl_addr' has incomplete type
struct in_addr vifc_lcl_addr; /* Local interface address */
/usr/include/linux/mroute.h:61:17: error: field 'vifc_rmt_addr' has incomplete type
struct in_addr vifc_rmt_addr; /* IPIP tunnel addr */
/usr/include/linux/mroute.h:72:17: error: field 'mfcc_origin' has incomplete type
struct in_addr mfcc_origin; /* Origin of mcast */
/usr/include/linux/mroute.h:73:17: error: field 'mfcc_mcastgrp' has incomplete type
struct in_addr mfcc_mcastgrp; /* Group in question */
/usr/include/linux/mroute.h:84:17: error: field 'src' has incomplete type
struct in_addr src;
/usr/include/linux/mroute.h:85:17: error: field 'grp' has incomplete type
struct in_addr grp;
/usr/include/linux/mroute.h:109:17: error: field 'im_src' has incomplete type
struct in_addr im_src,im_dst;
/usr/include/linux/mroute.h:109:24: error: field 'im_dst' has incomplete type
struct in_addr im_src,im_dst;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/in6.h> to fix the following linux/mroute6.h userspace
compilation errors:
/usr/include/linux/mroute6.h:80:22: error: field 'mf6cc_origin' has incomplete type
struct sockaddr_in6 mf6cc_origin; /* Origin of mcast */
/usr/include/linux/mroute6.h:81:22: error: field 'mf6cc_mcastgrp' has incomplete type
struct sockaddr_in6 mf6cc_mcastgrp; /* Group in question */
/usr/include/linux/mroute6.h:91:22: error: field 'src' has incomplete type
struct sockaddr_in6 src;
/usr/include/linux/mroute6.h:92:22: error: field 'grp' has incomplete type
struct sockaddr_in6 grp;
/usr/include/linux/mroute6.h:132:18: error: field 'im6_src' has incomplete type
struct in6_addr im6_src, im6_dst;
/usr/include/linux/mroute6.h:132:27: error: field 'im6_dst' has incomplete type
struct in6_addr im6_src, im6_dst;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/in6.h> to fix the following linux/ipv6_route.h userspace
compilation errors:
/usr/include/linux/ipv6_route.h:42:19: error: field 'rtmsg_dst' has incomplete type
struct in6_addr rtmsg_dst;
/usr/include/linux/ipv6_route.h:43:19: error: field 'rtmsg_src' has incomplete type
struct in6_addr rtmsg_src;
/ust/include/linux/ipv6_route.h:44:19: error: field 'rtmsg_gateway' has incomplete type
struct in6_addr rtmsg_gateway;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consistently use types from linux/types.h to fix the following
linux/target_core_user.h userspace compilation errors:
/usr/include/linux/target_core_user.h:108:4: error: unknown type name 'uint32_t'
uint32_t iov_cnt;
/usr/include/linux/target_core_user.h:109:4: error: unknown type name 'uint32_t'
uint32_t iov_bidi_cnt;
/usr/include/linux/target_core_user.h:110:4: error: unknown type name 'uint32_t'
uint32_t iov_dif_cnt;
/usr/include/linux/target_core_user.h:111:4: error: unknown type name 'uint64_t'
uint64_t cdb_off;
/usr/include/linux/target_core_user.h:112:4: error: unknown type name 'uint64_t'
uint64_t __pad1;
/usr/include/linux/target_core_user.h:113:4: error: unknown type name 'uint64_t'
uint64_t __pad2;
/usr/include/linux/target_core_user.h:117:4: error: unknown type name 'uint8_t'
uint8_t scsi_status;
/usr/include/linux/target_core_user.h:118:4: error: unknown type name 'uint8_t'
uint8_t __pad1;
/usr/include/linux/target_core_user.h:119:4: error: unknown type name 'uint16_t'
uint16_t __pad2;
/usr/include/linux/target_core_user.h:120:4: error: unknown type name 'uint32_t'
uint32_t __pad3;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).
Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.
If none of these two flags are set, this signals running
over older kernel.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
a VCPU out of KVM_RUN through a POSIX signal. A signal is attached
to a dummy signal handler; by blocking the signal outside KVM_RUN and
unblocking it inside, this possible race is closed:
VCPU thread service thread
--------------------------------------------------------------
check flag
set flag
raise signal
(signal handler does nothing)
KVM_RUN
However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
tsk->sighand->siglock on every KVM_RUN. This lock is often on a
remote NUMA node, because it is on the node of a thread's creator.
Taking this lock can be very expensive if there are many userspace
exits (as is the case for SMP Windows VMs without Hyper-V reference
time counter).
As an alternative, we can put the flag directly in kvm_run so that
KVM can see it:
VCPU thread service thread
--------------------------------------------------------------
raise signal
signal handler
set run->immediate_exit
KVM_RUN
check run->immediate_exit
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull networking fixes from David Miller:
1) In order to avoid problems in the future, make cgroup bpf overriding
explicit using BPF_F_ALLOW_OVERRIDE. From Alexei Staovoitov.
2) LLC sets skb->sk without proper skb->destructor and this explodes,
fix from Eric Dumazet.
3) Make sure when we have an ipv4 mapped source address, the
destination is either also an ipv4 mapped address or
ipv6_addr_any(). Fix from Jonathan T. Leighton.
4) Avoid packet loss in fec driver by programming the multicast filter
more intelligently. From Rui Sousa.
5) Handle multiple threads invoking fanout_add(), fix from Eric
Dumazet.
6) Since we can invoke the TCP input path in process context, without
BH being disabled, we have to accomodate that in the locking of the
TCP probe. Also from Eric Dumazet.
7) Fix erroneous emission of NETEVENT_DELAY_PROBE_TIME_UPDATE when we
aren't even updating that sysctl value. From Marcus Huewe.
8) Fix endian bugs in ibmvnic driver, from Thomas Falcon.
[ This is the second version of the pull that reverts the nested
rhashtable changes that looked a bit too scary for this late in the
release - Linus ]
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
rhashtable: Revert nested table changes.
ibmvnic: Fix endian errors in error reporting output
ibmvnic: Fix endian error when requesting device capabilities
net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification
net: xilinx_emaclite: fix freezes due to unordered I/O
net: xilinx_emaclite: fix receive buffer overflow
bpf: kernel header files need to be copied into the tools directory
tcp: tcp_probe: use spin_lock_bh()
uapi: fix linux/if_pppol2tp.h userspace compilation errors
packet: fix races in fanout_add()
ibmvnic: Fix initial MTU settings
net: ethernet: ti: cpsw: fix cpsw assignment in resume
kcm: fix a null pointer dereference in kcm_sendmsg()
net: fec: fix multicast filtering hardware setup
ipv6: Handle IPv4-mapped src to in6addr_any dst.
ipv6: Inhibit IPv4-mapped src address on the wire.
net/mlx5e: Disable preemption when doing TC statistics upcall
rhashtable: Add nested tables
tipc: Fix tipc_sk_reinit race conditions
gfs2: Use rhashtable walk interface in glock_hash_walk
...
Because of <linux/libc-compat.h> interface limitations, <netinet/in.h>
provided by libc cannot be included after <linux/in.h>, therefore any
header that includes <netinet/in.h> cannot be included after <linux/in.h>.
Change uapi/linux/l2tp.h, the last uapi header that includes
<netinet/in.h>, to include <linux/in.h> and <linux/in6.h> instead of
<netinet/in.h> and use __SOCK_SIZE__ instead of sizeof(struct sockaddr)
the same way as uapi/linux/in.h does, to fix linux/if_pppol2tp.h userspace
compilation errors like this:
In file included from /usr/include/linux/l2tp.h:12:0,
from /usr/include/linux/if_pppol2tp.h:21,
/usr/include/netinet/in.h:31:8: error: redefinition of 'struct in_addr'
Fixes: 47c3e7783b ("net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_*")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IOC_OPAL_ACTIVATE_LSP took the wrong strcure which would
give us the wrong size when using _IOC_SIZE, switch it to the
right structure.
Fixes: 058f8a2 ("Include: Uapi: Add user ABI for Sed/Opal")
Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
As part of c11e391da2 "dma-buf: Add ioctls to
allow userspace to flush" a new uapi header file dma-buf.h was added, but an
entry was not added on Kbuild to install it. This patch resolves this omission
so that "make headers_install" installs this header.
Signed-off-by: Denys Dmytriyenko <denys@ti.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Tiago Vignatti <tiago.vignatti@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1487102447-59265-1-git-send-email-denis@denix.org
The purpose if this ioctl is to allow a user of privcmd to restrict its
operation such that it will no longer service arbitrary hypercalls via
IOCTL_PRIVCMD_HYPERCALL, and will check for a matching domid when
servicing IOCTL_PRIVCMD_DM_OP or IOCTL_PRIVCMD_MMAP*. The aim of this
is to limit the attack surface for a compromised device model.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Recently a new dm_op[1] hypercall was added to Xen to provide a mechanism
for restricting device emulators (such as QEMU) to a limited set of
hypervisor operations, and being able to audit those operations in the
kernel of the domain in which they run.
This patch adds IOCTL_PRIVCMD_DM_OP as gateway for __HYPERVISOR_dm_op.
NOTE: There is no requirement for user-space code to bounce data through
locked memory buffers (as with IOCTL_PRIVCMD_HYPERCALL) since
privcmd has enough information to lock the original buffers
directly.
[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=524a98c2
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Enable user space application via WQ creation and modification to
turn on and off cvlan offload.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Expose raw packet capabilities to user space as part of query device.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The struct ib_uverbs_flow_spec_action_tag associates a tag_id with the
flow defined by any number of other flow_spec entries which can reference
L2, L3, and L4 packet contents.
Use of ib_uverbs_flow_spec_action_tag allows the consumer to identify
the set of rules which where matched by
the packet by examining the tag_id in the CQE.
Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch introduces the RoCE driver for the Broadcom
NetXtreme-E 10/25/40/50G RoCE HCAs.
The RoCE driver is a two part driver that relies on the parent
bnxt_en NIC driver to operate. The changes needed in the bnxt_en
driver have already been incorporated via Dave Miller's net tree
into the mainline kernel.
The vendor official git repository for this driver is available
on github as:
https://github.com/Broadcom/linux-rdma-nxt/
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYosRpAAoJEAhfPr2O5OEVAU4P/RB/A9v422J1aFixQ1fPp89P
xRp6m6xj2ln/r/ydl5j3LgSzA6nCSQT4p1jRalFRdpk/FyS4v6wE4RhaIDGW/Q1P
WDwRfcyrUdWIPZR6T0289m8eTHG0w4ewbEHPm5iG5UZQHZsObEpmNJD6mOSm00N4
0JhAmJIccWxIjObIajZizQ9BEUUE+T1PQgDf7QiTHFb3d1UrNXYu/cka8Ys9lafu
kR//BPRxLsCXWWHf45ZHgYH+V214haGcVhtlf1ehALlZQ3wNYZaC7XvH8G795ykR
ayrc77qLj2NROhTG4rvljyOH4L0I1IH0k2bC633FrPpgzOCCqNLBMLwn5YhFN3Z+
QNO2ChxDmZlWcqbepkOmK6IK2HmIulM4AOK0gQa+J4inYM3w2aZD3egaFFsx2I0+
1y7KSKAULNjEZanx80t3rxGw+bnIh80eD3n2ODgAU0spqEsm5cb3C3JIN3qeS8KQ
j6vVawB2+gTRCRIHWDBHnYUzkn2SPINp1bRajuxryH1u5TKgvs02NQB84i9mdNR+
iRpH7Yn8BIuA2w0sGGjbyBYyb6IVFrdx9yS7xsr4CbKYR8tWvEsqRxlN7V16J30M
crxj2OM+yAIXLTSB1bW892WL51JqJRGF3lFnUGS4R4PH9tMvR4YN/PT8yv0wZJC/
SfwwJ38/yZBkZuRSNEpt
=OTHY
-----END PGP SIGNATURE-----
Merge tag 'media/v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A colorspace regression fix in V4L2 core and a CEC core bug that makes
it discard valid messages"
* tag 'media/v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] cec: initiator should be the same as the destination for, poll
[media] videodev2.h: go back to limited range Y'CbCr for SRGB and, ADOBERGB
This reverts 'commit 7e0739cd9c ("[media] videodev2.h: fix
sYCC/AdobeYCC default quantization range").
The problem is that many drivers can convert R'G'B' content (often
from sensors) to Y'CbCr, but they all produce limited range Y'CbCr.
To stay backwards compatible the default quantization range for
sRGB and AdobeRGB Y'CbCr encoding should be limited range, not full
range, even though the corresponding standards specify full range.
Update the V4L2_MAP_QUANTIZATION_DEFAULT define accordingly and
also update the documentation.
Fixes: 7e0739cd9c ("[media] videodev2.h: fix sYCC/AdobeYCC default quantization range")
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Cc: <stable@vger.kernel.org> # for v4.9 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, most relevantly they are:
1) Extend nft_exthdr to allow to match TCP options bitfields, from
Manuel Messner.
2) Allow to check if IPv6 extension header is present in nf_tables,
from Phil Sutter.
3) Allow to set and match conntrack zone in nf_tables, patches from
Florian Westphal.
4) Several patches for the nf_tables set infrastructure, this includes
cleanup and preparatory patches to add the new bitmap set type.
5) Add optional ruleset generation ID check to nf_tables and allow to
delete rules that got no public handle yet via NFTA_RULE_ID. These
patches add the missing kernel infrastructure to support rule
deletion by description from userspace.
6) Missing NFT_SET_OBJECT flag to select the right backend when sets
stores an object map.
7) A couple of cleanups for the expectation and SIP helper, from Gao
feng.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.
Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X
2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.
3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.
In the future this behavior may be extended with a chain of
non-overridable programs.
Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.
Add several testcases and adjust libbpf.
Fixes: 3007098494 ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new attribute allows us to uniquely identify a rule in transaction.
Robots may trigger an insertion followed by deletion in a batch, in that
scenario we still don't have a public rule handle that we can use to
delete the rule. This is similar to the NFTA_SET_ID attribute that
allows us to refer to an anonymous set from a batch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows userspace to specify the generation ID that has been
used to build an incremental batch update.
If userspace specifies the generation ID in the batch message as
attribute, then nfnetlink compares it to the current generation ID so
you make sure that you work against the right baseline. Otherwise, bail
out with ERESTART so userspace knows that its changeset is stale and
needs to respin. Userspace can do this transparently at the cost of
taking slightly more time to refresh caches and rework the changeset.
This check is optional, if there is no NFNL_BATCH_GENID attribute in the
batch begin message, then no check is performed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Two security related issues in the rxe driver
- One compile issue in the RDMA uapi header
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYm1pNAAoJELgmozMOVy/dszkQAKJN8JedNNuKLyF2BbjXR7Dv
KsxVuXmdVpF8FPxrPlVUA76o26yEWfxVtJwEHi4hEVLmTXKHcd8EkL2o7geR7hTB
+j2J6HuH7e4y6ATX9H2o78fg1SRIWZJij4JdXilZRQ3pKj5DcmYynklBqqpMJ8Su
c2ceE5nbJdpbIhPD3yulmGVF/89zntaBVbh07D1O6rdbaSNwuc+wv0XdfJ+KUXju
ZfylACJBnMsIjxyuqZPV4djs91CI/tbgArZmh5tvgF+V9Gx6Vocbv90kS+BtbKH0
srX9MyBrSycY8eELhbFAg7XBJXsNk4Rk0yMuMEhF2IWjwzaa7plIgeCv7NA1NWqq
EKW0lzC0e7VV5ttjKHqe6iO+8JIC9QEi/36IqTgBBNqw0Cphocazq/mVY5fAu1uo
qWdxbeYz3owWu47NNJ15TvaEkMbMX8ACEu4KhaT0FA+jit4czJ3PeyCLqe8aD5Pa
AK4e2Lnj+CZb2aJN2Knh4Wu6tK9M2P0vuzHElrf0D3qe37HxaRQuZWLC9kOKplWZ
DGrYoM94EaeTZScZ4Lo7BSol7yuYXXFkE42/TarIZuT67GNM+qss9HRDtWzDnSuD
zX4EdJ/0kjX3SU2Em4g+7MelA4TMX74sLlEmU6iSUEWm1/pX+X9SYwT0iEmy2tR+
SXEti5uW/K/P7e2RmzQO
=zGBQ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Third round of -rc fixes for 4.10 kernel:
- two security related issues in the rxe driver
- one compile issue in the RDMA uapi header"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
RDMA: Don't reference kernel private header from UAPI header
IB/rxe: Fix mem_check_range integer overflow
IB/rxe: Fix resid update
Per PCIe r3.1, sec 6.2.10 and sec 7.13.4, on Root Ports that support "RP
Extensions for DPC",
When the DPC Trigger Status bit is Set and the DPC RP Busy bit is Set,
software must leave the Root Port in DPC until the DPC RP Busy bit reads
0b.
Wait up to 1 second for the Root Port to become non-busy.
[bhelgaas: changelog, spec references]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The eswitch_[gs]et command is supposed to be similar to port_[gs]et
command - for multiple eswitch attributes. However, when it was introduced
by 08f4b5918b ("net/devlink: Add E-Switch mode control") it was wrongly
named with the word "mode" in it. So fix this now, make the oririnal
enum value existing but obsolete.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* use shash in mac80211 crypto code where applicable
* some documentation fixes
* pass RSSI levels up in change notifications
* remove unused rfkill-regulator
* various other cleanups
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJYnHuZAAoJEGt7eEactAAdorYP/iIfEZjLblLZNui+93OHpmsq
QgXeQ9Vl/Xgq8g6vEb6jpHE6pvT/xoW/jt0s5rpYpPDzP7WRgkFQI144Jflx9gE+
7KRHYc7k9XqugbFGFakjT+DyduxrGkEhZ7Vpd39D+zlPaf+p/31Jk6C4MNwk2oG3
AA7ARUJfOKGpct3+l+cpnWQPUYQQKrSnjnMIwHL3Tu5pvGPDapdDWXbfn6T75Aof
3oVQSNtbEtdzW/Ty+HgDn/boOCRXzXlOCxFlE7OiH2AXfe2mqGU/hkcm/wZ0QxOf
9m3CxVjKH10tY6nsjDiXTOzFn+Zkzuum9/gcKtsgx+6BZ4qod2XuhtzmTeXggI8F
b7nGeadSfSS6/unUu5Fibdn+X4Cw8+Yu5Qyiwo+jLL6yyTLUg6aKN6N9HWddhBfa
pIjWAZVA1iB2m2XqUqRB/asEhslndSSfDmZK8nruYJSZWtQBNkNUdHXqqbqbBSHv
KtKbHMOKoLU4zKmV2vMWGy0qypZCxtZkNF6GURhMh2m89qBcIAApFwQMzK09MwmP
d5TcMwfi8YcKRb1Gw6n7gnJsC8e+tFDGXMi9w6z0FDGZvMbOlnO3ctSd/BY3B01H
DWimkX8Ev3kt6KKTgbJ/n0lR/vEmDGdKo2ahH1uHOTPudrjpOHUg0cdDzS/VPZqi
Qw4+FbVArISNrI2skTrA
=sqkq
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2017-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Some more updates:
* use shash in mac80211 crypto code where applicable
* some documentation fixes
* pass RSSI levels up in change notifications
* remove unused rfkill-regulator
* various other cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This command could be useful to inc/dec fields.
For example, to forward any TCP packet and decrease its TTL:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower ip_proto tcp \
action pedit munge ip ttl add 0xff pipe \
action mirred egress redirect dev veth0
In the example above, adding 0xff to this u8 field is actually
decreasing it by one, since the operation is masked.
Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend pedit to enable the user setting offset relative to network
headers. This change would enable to work with more complex header
schemes (vs the simple IPv4 case) where setting a fixed offset relative
to the network header is not enough.
After this patch, the action has information about the exact header type
and field inside this header. This information could be used later on
for hardware offloading of pedit.
Backward compatibility was being kept:
1. Old kernel <-> new userspace
2. New kernel <-> old userspace
3. add rule using new userspace <-> dump using old userspace
4. add rule using old userspace <-> dump using new userspace
When using the extended api, new netlink attributes are being used. This
way, operation will fail in (1) and (3) - and no malformed rule be added
or dumped. Of course, new user space that doesn't need the new
functionality can use the old netlink attributes and operation will
succeed.
Since action can support both api's, (2) should work, and it is easy to
write the new user space to have (4) work.
The action is having a strict check that only header types and commands
it can handle are accepted. This way future additions will be much
easier.
Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
dst_port 80 \
action pedit munge tcp dport set 8080 pipe \
action mirred egress redirect dev veth0
Will forward tcp port whose original dest port is 80, while modifying
the destination port to 8080.
Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new binder_fd_array object,
that allows us to support one or more file descriptors
embedded in a buffer that is scatter-gathered.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Martijn Coenen <maco@google.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Serban Constantinescu <serban.constantinescu@arm.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Martijn Coenen <maco@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously all data passed over binder needed
to be serialized, with the exception of Binder
objects and file descriptors.
This patchs adds support for scatter-gathering raw
memory buffers into a binder transaction, avoiding
the need to first serialize them into a Parcel.
To remain backwards compatibile with existing
binder clients, it introduces two new command
ioctls for this purpose - BC_TRANSACTION_SG and
BC_REPLY_SG. These commands may only be used with
the new binder_transaction_data_sg structure,
which adds a field for the total size of the
buffers we are scatter-gathering.
Because memory buffers may contain pointers to
other buffers, we allow callers to specify
a parent buffer and an offset into it, to indicate
this is a location pointing to the buffer that
we are fixing up. The kernel will then take care
of fixing up the pointer to that buffer as well.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Martijn Coenen <maco@google.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Serban Constantinescu <serban.constantinescu@arm.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Martijn Coenen <maco@google.com>
[jstultz: Fold in small fix from Amit Pundir <amit.pundir@linaro.org>]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
flat_binder_object is used for both handling
binder objects and file descriptors, even though
the two are mostly independent. Since we'll
have more fixup objects in binder in the future,
instead of extending flat_binder_object again,
split out file descriptors to their own object
while retaining backwards compatibility to
existing user-space clients. All binder objects
just share a header.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Martijn Coenen <maco@google.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Serban Constantinescu <serban.constantinescu@arm.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Martijn Coenen <maco@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
None of these registers is relevant for the userspace API.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the XR17V352 manual, bit 4 is IrDA control and bit 5 for
485. Fortunately, no driver used them so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.
Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched. If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.
This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry. If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.
The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.
The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.
When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.
It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.
The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array. It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to implement Sender-Side Procedures for the Add
Outgoing and Incoming Streams Request Parameter described in
rfc6525 section 5.1.5-5.1.6.
It is also to add sockopt SCTP_ADD_STREAMS in rfc6525 section
6.3.4 for users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to implement Sender-Side Procedures for the SSN/TSN
Reset Request Parameter descibed in rfc6525 section 5.1.4.
It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3
for users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tracksticks on the Lenovo thinkpads have their buttons connected
through the touchpad device. We already fixed that in synaptics.c, but
when we switch the device into RMI4 mode to have proper support, the
pass-through functionality can't deal with them easily.
We add a new PS/2 flag and protocol designed for psmouse. The RMI4 F03
pass-through can then emit a special set of commands to notify psmouse the
state of the buttons.
This patch implements the protocol in psmouse, while an other will
do the same for rmi4-f03.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The nl80211_nan_dual_band_conf enumeration doesn't make much sense.
The default value is assigned to a bit, which makes it weird if the
default bit and other bits are set at the same time.
To improve this, get rid of NL80211_NAN_BAND_DEFAULT and add a wiphy
configuration to let the drivers define which bands are supported.
This is exposed to the userspace, which then can make a decision on
which band(s) to use. Additionally, rename all "dual_band" elements
to "bands", to make things clearer.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove references to private kernel header and defines from exported
ib_user_verb.h file.
The code snippet below is used to reproduce the issue:
#include <stdio.h>
#include <rdma/ib_user_verb.h>
int main(void)
{
printf("IB_USER_VERBS_ABI_VERSION = %d\n", IB_USER_VERBS_ABI_VERSION);
return 0;
}
It fails during compilation phase with an error:
➜ /tmp gcc main.c
main.c:2:31: fatal error: rdma/ib_user_verb.h: No such file or directory
#include <rdma/ib_user_verb.h>
^
compilation terminated.
Fixes: 189aba99e7 ("IB/uverbs: Extend modify_qp and support packet pacing")
CC: Bodong Wang <bodong@mellanox.com>
CC: Matan Barak <matanb@mellanox.com>
CC: Christoph Hellwig <hch@infradead.org>
Tested-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch implements the kernel side of the TCP option patch.
Signed-off-by: Manuel Messner <mm@skelett.io>
Reviewed-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Just like with counters the direction attribute is optional.
We set priv->dir to MAX unconditionally to avoid duplicating the assignment
for all keys with optional direction.
For keys where direction is mandatory, existing code already returns
an error.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If NFT_EXTHDR_F_PRESENT is set, exthdr will not copy any header field
data into *dest, but instead set it to 1 if the header is found and 0
otherwise.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Update the drivers to pass the RSSI level as a cfg80211_cqm_rssi_notify
parameter and pass this value to userspace in a new nl80211 attribute.
This helps both userspace and also helps in the implementation of the
multiple RSSI thresholds CQM mechanism.
Note for marvell/mwifiex I pass 0 for the RSSI value because the new
RSSI value is not available to the driver at the time of the
cfg80211_cqm_rssi_notify call, but the driver queries the new value
immediately after that, so it is actually available just a moment later
if we wanted to defer caling cfg80211_cqm_rssi_notify until that moment.
Without this, the new cfg80211 code (patch 3) will call .get_station
which will send a duplicate HostCmd_CMD_RSSI_INFO command to the hardware.
Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
The big feature this time is support for POWER9 using the radix-tree
MMU for host and guest. This required some changes to arch/powerpc
code, so I talked with Michael Ellerman and he created a topic branch
with this patchset, which I merged into kvm-ppc-next and which Michael
will pull into his tree. Michael also put in some patches from Nick
Piggin which fix bugs in the interrupt vector code in relocatable
kernels when coming from a KVM guest.
Other notable changes include:
* Add the ability to change the size of the hashed page table,
from David Gibson.
* XICS (interrupt controller) emulation fixes and improvements,
from Li Zhong.
* Bug fixes from myself and Thomas Huth.
These patches define some new KVM capabilities and ioctls, but there
should be no conflicts with anything else currently upstream, as far
as I am aware.
Add a hypercall to retrieve the host realtime clock and the TSC value
used to calculate that clock read.
Used to implement clock synchronization between host and guest.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch is a quick fixup of the user structures that will prevent
the structures from being different sizes on 32 and 64 bit archs.
Taking this fix will allow us to *NOT* have to do compat ioctls for
the sed code.
Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Fixes: 19641f2d76 ("Include: Uapi: Add user ABI for Sed/Opal")
Signed-off-by: Jens Axboe <axboe@fb.com>
The libnetfilter_conntrack userland library always sets IPS_CONFIRMED
when building a CTA_STATUS attribute. If this toggles the bit from
0->1, the parser will return an error. On Linux 4.4+ this will cause any
NFQA_EXP attribute in the packet to be ignored. This breaks conntrackd's
userland helpers because they operate on unconfirmed connections.
Instead of returning -EBUSY if the user program asks to modify an
unchangeable bit, simply ignore the change.
Also, fix the logic so that user programs are allowed to clear
the bits that they are allowed to change.
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This resolves the merge errors that were reported in linux-next and it
picks up the staging and IIO fixes that we need/want in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, they are:
1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
sk_buff so we only access one single cacheline in the conntrack
hotpath. Patchset from Florian Westphal.
2) Don't leak pointer to internal structures when exporting x_tables
ruleset back to userspace, from Willem DeBruijn. This includes new
helper functions to copy data to userspace such as xt_data_to_user()
as well as conversions of our ip_tables, ip6_tables and arp_tables
clients to use it. Not surprinsingly, ebtables requires an ad-hoc
update. There is also a new field in x_tables extensions to indicate
the amount of bytes that we copy to userspace.
3) Add nf_log_all_netns sysctl: This new knob allows you to enable
logging via nf_log infrastructure for all existing netnamespaces.
Given the effort to provide pernet syslog has been discontinued,
let's provide a way to restore logging using netfilter kernel logging
facilities in trusted environments. Patch from Michal Kubecek.
4) Validate SCTP checksum from conntrack helper, from Davide Caratti.
5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
a copy&paste from the original helper, from Florian Westphal.
6) Reset netfilter state when duplicating packets, also from Florian.
7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
nft_meta, from Liping Zhang.
8) Add missing code to deal with loopback packets from nft_meta when
used by the netdev family, also from Liping.
9) Several cleanups on nf_tables, one to remove unnecessary check from
the netlink control plane path to add table, set and stateful objects
and code consolidation when unregister chain hooks, from Gao Feng.
10) Fix harmless reference counter underflow in IPVS that, however,
results in problems with the introduction of the new refcount_t
type, from David Windsor.
11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
from Davide Caratti.
12) Missing documentation on nf_tables uapi header, from Liping Zhang.
13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
New nested netlink attribute to associate tunnel info per vlan.
This is used by bridge driver to send tunnel metadata to
bridge ports in vlan tunnel mode. This patch also adds new per
port flag IFLA_BRPORT_VLAN_TUNNEL to enable vlan tunnel mode.
off by default.
One example use for this is a vxlan bridging gateway or vtep
which maps vlans to vn-segments (or vnis). User can configure
per-vlan tunnel information which the bridge driver can use
to bridge vlan into the corresponding vn-segment.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vxlan COLLECT_METADATA mode today solves the per-vni netdev
scalability problem in l3 networks. It expects all forwarding
information to be present in dst_metadata. This patch series
enhances collect metadata mode to include the case where only
vni is present in dst_metadata, and the vxlan driver can then use
the rest of the forwarding information datbase to make forwarding
decisions. There is no change to default COLLECT_METADATA
behaviour. These changes only apply to COLLECT_METADATA when
used with the bridging use-case with a special dst_metadata
tunnel info flag (eg: where vxlan device is part of a bridge).
For all this to work, the vxlan driver will need to now support a
single fdb table hashed by mac + vni. This series essentially makes
this happen.
use-case and workflow:
vxlan collect metadata device participates in bridging vlan
to vn-segments. Bridge driver above the vxlan device,
sends the vni corresponding to the vlan in the dst_metadata.
vxlan driver will lookup forwarding database with (mac + vni)
for the required remote destination information to forward the
packet.
Changes introduced by this patch:
- allow learning and forwarding database state in vxlan netdev in
COLLECT_METADATA mode. Current behaviour is not changed
by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
to support the new bridge friendly mode.
- A single fdb table hashed by (mac, vni) to allow fdb entries with
multiple vnis in the same fdb table
- rx path already has the vni
- tx path expects a vni in the packet with dst_metadata
- prior to this series, fdb remote_dsts carried remote vni and
the vxlan device carrying the fdb table represented the
source vni. With the vxlan device now representing multiple vnis,
this patch adds a src vni attribute to the fdb entry. The remote
vni already uses NDA_VNI attribute. This patch introduces
NDA_SRC_VNI netlink attribute to represent the src vni in a multi
vni fdb table.
iproute2 example (patched and pruned iproute2 output to just show
relevant fdb entries):
example shows same host mac learnt on two vni's.
before (netdev per vni):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
after this patch with collect metadata in bridged mode (single netdev):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the encode/decode functionality from the ife module instead of using
implementation inside the act_ife.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module is responsible for the ife encapsulation protocol
encode/decode logics. That module can:
- ife_encode: encode skb and reserve space for the ife meta header
- ife_decode: decode skb and extract the meta header size
- ife_tlv_meta_encode - encodes one tlv entry into the reserved ife
header space.
- ife_tlv_meta_decode - decodes one tlv entry from the packet
- ife_tlv_meta_next - advance to the next tlv
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the latest version of the IPv6 Segment Routing IETF draft [1] the
cleanup flag is removed and the flags field length is shrunk from 16 bits
to 8 bits. As a consequence, the input of the HMAC computation is modified
in a non-backward compatible way by covering the whole octet of flags
instead of only the cleanup bit. As such, if an implementation compatible
with the latest draft computes the HMAC of an SRH who has other flags set
to 1, then the HMAC result would differ from the current implementation.
This patch carries those modifications to prevent conflict with other
implementations of IPv6 SR.
[1] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-05
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Debugging issues caused by pfmemalloc is often tedious.
Add a new SNMP counter to more easily diagnose these problems.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Acked-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This ioctl opens a file to which a socket is bound and
returns a file descriptor. The caller has to have CAP_NET_ADMIN
in the socket network namespace.
Currently it is impossible to get a path and a mount point
for a socket file. socket_diag reports address, device ID and inode
number for unix sockets. An address can contain a relative path or
a file may be moved somewhere. And these properties say nothing about
a mount namespace and a mount point of a socket file.
With the introduced ioctl, we can get a path by reading
/proc/self/fd/X and get mnt_id from /proc/self/fdinfo/X.
In CRIU we are going to use this ioctl to dump and restore unix socket.
Here is an example how it can be used:
$ strace -e socket,bind,ioctl ./test /tmp/test_sock
socket(AF_UNIX, SOCK_STREAM, 0) = 3
bind(3, {sa_family=AF_UNIX, sun_path="test_sock"}, 11) = 0
ioctl(3, SIOCUNIXFILE, 0) = 4
^Z
$ ss -a | grep test_sock
u_str LISTEN 0 1 test_sock 17798 * 0
$ ls -l /proc/760/fd/{3,4}
lrwx------ 1 root root 64 Feb 1 09:41 3 -> 'socket:[17798]'
l--------- 1 root root 64 Feb 1 09:41 4 -> /tmp/test_sock
$ cat /proc/760/fdinfo/4
pos: 0
flags: 012000000
mnt_id: 40
$ cat /proc/self/mountinfo | grep "^40\s"
40 19 0:37 / /tmp rw shared:23 - tmpfs tmpfs rw
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Kerrisk <<mtk.manpages@gmail.com> writes:
I would like to write code that discovers the namespace setup on a live
system. The NS_GET_PARENT and NS_GET_USERNS ioctl() operations added in
Linux 4.9 provide much of what I want, but there are still a couple of
small pieces missing. Those pieces are added with this patch series.
Here's an example program that makes use of the new ioctl() operations.
8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
/* ns_capable.c
(C) 2016 Michael Kerrisk, <mtk.manpages@gmail.com>
Licensed under the GNU General Public License v2 or later.
Test whether a process (identified by PID) might (subject to LSM checks)
have capabilities in a namespace (identified by a /proc/PID/ns/xxx file).
*/
} while (0)
exit(EXIT_FAILURE); } while (0)
/* Display capabilities sets of process with specified PID */
static void
show_cap(pid_t pid)
{
cap_t caps;
char *cap_string;
caps = cap_get_pid(pid);
if (caps == NULL)
errExit("cap_get_proc");
cap_string = cap_to_text(caps, NULL);
if (cap_string == NULL)
errExit("cap_to_text");
printf("Capabilities: %s\n", cap_string);
}
/* Obtain the effective UID pf the process 'pid' by
scanning its /proc/PID/file */
static uid_t
get_euid_of_process(pid_t pid)
{
char path[PATH_MAX];
char line[1024];
int uid;
snprintf(path, sizeof(path), "/proc/%ld/status", (long) pid);
FILE *fp;
fp = fopen(path, "r");
if (fp == NULL)
errExit("fopen-/proc/PID/status");
for (;;) {
if (fgets(line, sizeof(line), fp) == NULL) {
/* Should never happen... */
fprintf(stderr, "Failure scanning %s\n", path);
exit(EXIT_FAILURE);
}
if (strstr(line, "Uid:") == line) {
sscanf(line, "Uid: %*d %d %*d %*d", &uid);
return uid;
}
}
}
int
main(int argc, char *argv[])
{
int ns_fd, userns_fd, pid_userns_fd;
int nstype;
int next_fd;
struct stat pid_stat;
struct stat target_stat;
char *pid_str;
pid_t pid;
char path[PATH_MAX];
if (argc < 2) {
fprintf(stderr, "Usage: %s PID [ns-file]\n", argv[0]);
fprintf(stderr, "\t'ns-file' is a /proc/PID/ns/xxxx file; "
"if omitted, use the namespace\n"
"\treferred to by standard input "
"(file descriptor 0)\n");
exit(EXIT_FAILURE);
}
pid_str = argv[1];
pid = atoi(pid_str);
if (argc <= 2) {
ns_fd = STDIN_FILENO;
} else {
ns_fd = open(argv[2], O_RDONLY);
if (ns_fd == -1)
errExit("open-ns-file");
}
/* Get the relevant user namespace FD, which is 'ns_fd' if 'ns_fd' refers
to a user namespace, otherwise the user namespace that owns 'ns_fd' */
nstype = ioctl(ns_fd, NS_GET_NSTYPE);
if (nstype == -1)
errExit("ioctl-NS_GET_NSTYPE");
if (nstype == CLONE_NEWUSER) {
userns_fd = ns_fd;
} else {
userns_fd = ioctl(ns_fd, NS_GET_USERNS);
if (userns_fd == -1)
errExit("ioctl-NS_GET_USERNS");
}
/* Obtain 'stat' info for the user namespace of the specified PID */
snprintf(path, sizeof(path), "/proc/%s/ns/user", pid_str);
pid_userns_fd = open(path, O_RDONLY);
if (pid_userns_fd == -1)
errExit("open-PID");
if (fstat(pid_userns_fd, &pid_stat) == -1)
errExit("fstat-PID");
/* Get 'stat' info for the target user namesapce */
if (fstat(userns_fd, &target_stat) == -1)
errExit("fstat-PID");
/* If the PID is in the target user namespace, then it has
whatever capabilities are in its sets. */
if (pid_stat.st_dev == target_stat.st_dev &&
pid_stat.st_ino == target_stat.st_ino) {
printf("PID is in target namespace\n");
printf("Subject to LSM checks, it has the following capabilities\n");
show_cap(pid);
exit(EXIT_SUCCESS);
}
/* Otherwise, we need to walk through the ancestors of the target
user namespace to see if PID is in an ancestor namespace */
for (;;) {
int f;
next_fd = ioctl(userns_fd, NS_GET_PARENT);
if (next_fd == -1) {
/* The error here should be EPERM... */
if (errno != EPERM)
errExit("ioctl-NS_GET_PARENT");
printf("PID is not in an ancestor namespace\n");
printf("It has no capabilities in the target namespace\n");
exit(EXIT_SUCCESS);
}
if (fstat(next_fd, &target_stat) == -1)
errExit("fstat-PID");
/* If the 'stat' info for this user namespace matches the 'stat'
* info for 'next_fd', then the PID is in an ancestor namespace */
if (pid_stat.st_dev == target_stat.st_dev &&
pid_stat.st_ino == target_stat.st_ino)
break;
/* Next time round, get the next parent */
f = userns_fd;
userns_fd = next_fd;
close(f);
}
/* At this point, we found that PID is in an ancestor of the target
user namespace, and 'userns_fd' refers to the immediate descendant
user namespace of PID in the chain of user namespaces from PID to
the target user namespace. If the effective UID of PID matches the
owner UID of descendant user namespace, then PID has all
capabilities in the descendant namespace(s); otherwise, it just has
the capabilities that are in its sets. */
uid_t owner_uid, uid;
if (ioctl(userns_fd, NS_GET_OWNER_UID, &owner_uid) == -1) {
perror("ioctl-NS_GET_OWNER_UID");
exit(EXIT_FAILURE);
}
uid = get_euid_of_process(pid);
printf("PID is in an ancestor namespace\n");
if (owner_uid == uid) {
printf("And its effective UID matches the owner "
"of the namespace\n");
printf("Subject to LSM checks, PID has all capabilities in "
"that namespace!\n");
} else {
printf("But its effective UID does not match the owner "
"of the namespace\n");
printf("Subject to LSM checks, it has the following capabilities\n");
show_cap(pid);
}
exit(EXIT_SUCCESS);
}
8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
Michael Kerrisk (2):
nsfs: Add an ioctl() to return the namespace type
nsfs: Add an ioctl() to return owner UID of a userns
fs/nsfs.c | 13 +++++++++++++
include/uapi/linux/nsfs.h | 9 +++++++--
2 files changed, 20 insertions(+), 2 deletions(-)
I'd like to write code that discovers the user namespace hierarchy on a
running system, and also shows who owns the various user namespaces.
Currently, there is no way of getting the owner UID of a user namespace.
Therefore, this patch adds a new NS_GET_CREATOR_UID ioctl() that fetches
the UID (as seen in the user namespace of the caller) of the creator of
the user namespace referred to by the specified file descriptor.
If the supplied file descriptor does not refer to a user namespace,
the operation fails with the error EINVAL. If the owner UID does
not have a mapping in the caller's user namespace return the
overflow UID as that appears easier to deal with in practice
in user-space applications.
-- EWB Changed the handling of unmapped UIDs from -EOVERFLOW
back to the overflow uid. Per conversation with
Michael Kerrisk after examining his test code.
Acked-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Michael Kerrisk <mtk-manpages@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.
The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.
Attempting to handle the automount case by using override_creds
almost works. It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.
Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.
vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.
sget and sget_userns are modified to not perform any permission checks
on submounts.
follow_automount is modified to stop using override_creds as that
has proven problemantic.
do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.
autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.
cifs is modified to pass the mountpoint all of the way down to vfs_submount.
debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter. To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.
Cc: stable@vger.kernel.org
Fixes: 069d5ac9ae ("autofs: Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79ff ("fs: Call d_automount with the filesystems creds")
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This is the main feature pull for radeon and amdgpu for 4.11. Highlights:
- Power and clockgating improvements
- Preliminary SR-IOV support
- ttm buffer priority support
- ttm eviction fixes
- Removal of the ttm lru callbacks
- Remove SI DPM quirks due to MC firmware issues
- Handle VFCT with multiple vbioses
- Powerplay improvements
- Lots of driver cleanups
* 'drm-next-4.11' of git://people.freedesktop.org/~agd5f/linux: (120 commits)
drm/amdgpu: fix amdgpu_bo_va_mapping flags
drm/amdgpu: access stolen VRAM directly on CZ (v2)
drm/amdgpu: access stolen VRAM directly on KV/KB (v2)
drm/amdgpu: fix kernel panic when dpm disabled on Kv.
drm/amdgpu: fix dpm bug on Kv.
drm/amd/powerplay: fix regresstion issue can't set manual dpm mode.
drm/amdgpu: handle vfct with multiple vbios images
drm/radeon: handle vfct with multiple vbios images
drm/amdgpu: move misc si headers into amdgpu
drm/amdgpu: remove unused header si_reg.h
drm/radeon: drop pitcairn dpm quirks
drm/amdgpu: drop pitcairn dpm quirks
drm: radeon: radeon_ttm: Handle return NULL error from ioremap_nocache
drm/amd/amdgpu/amdgpu_ttm: Handle return NULL error from ioremap_nocache
drm/amdgpu: add new virtual display ID
drm/amd/amdgpu: remove the uncessary parameter for ib scheduler
drm/amdgpu: Bring bo creation in line with radeon driver (v2)
drm/amd/powerplay: fix misspelling in header guard
drm/ttm: revert "add optional LRU removal callback v2"
drm/ttm: revert "implement LRU add callbacks v2"
...
Another round of -misc stuff:
- Noralf debugfs cleanup cleanup (not yet everything, some more driver
patches awaiting acks).
- More doc work.
- edid/infoframe fixes from Ville.
- misc 1-patch fixes all over, as usual
Noralf needs this for his tinydrm pull request.
* tag 'drm-misc-next-2017-01-30' of git://anongit.freedesktop.org/git/drm-misc: (48 commits)
drm/vc4: Remove vc4_debugfs_cleanup()
dma/fence: Export enable-signaling tracepoint for emission by drivers
drm/tilcdc: Remove tilcdc_debugfs_cleanup()
drm/tegra: Remove tegra_debugfs_cleanup()
drm/sti: Remove drm_debugfs_remove_files() calls
drm/radeon: Remove drm_debugfs_remove_files() call
drm/omap: Remove omap_debugfs_cleanup()
drm/hdlcd: Remove hdlcd_debugfs_cleanup()
drm/etnaviv: Remove etnaviv_debugfs_cleanup()
drm/etnaviv: allow build with COMPILE_TEST
drm/amd/amdgpu: Remove drm_debugfs_remove_files() call
drm/prime: Clarify DMA-BUF/GEM Object lifetime
drm/ttm: Make sure BOs being swapped out are cacheable
drm/atomic: Remove drm_atomic_debugfs_cleanup()
drm: drm_minor_register(): Clean up debugfs on failure
drm: debugfs: Remove all files automatically on cleanup
drm/fourcc: add vivante tiled layout format modifiers
drm/edid: Set YQ bits in the AVI infoframe according to CEA-861-F
drm/edid: Set AVI infoframe Q even when QS=0
drm/edid: Introduce drm_hdmi_avi_infoframe_quant_range()
...
Currently turning on NFSv4.2 results in 4.2 clients suddenly seeing the
individual file labels as they're set on the server. This is not what
they've previously seen, and not appropriate in may cases. (In
particular, if clients have heterogenous security policies then one
client's labels may not even make sense to another.) Labeled NFS should
be opted in only in those cases when the administrator knows it makes
sense.
It's helpful to be able to turn 4.2 on by default, and otherwise the
protocol upgrade seems free of regressions. So, default labeled NFS to
off and provide an export flag to reenable it.
Users wanting labeled NFS support on an export will henceforth need to:
- make sure 4.2 support is enabled on client and server (as
before), and
- upgrade the server nfs-utils to a version supporting the new
"security_label" export flag.
- set that "security_label" flag on the export.
This is commit may be seen as a regression to anyone currently depending
on security labels. We believe those cases are currently rare.
Reported-by: tibbs@math.uh.edu
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Enable user-space to issue vector I/O commands through ioctls. To issue
a vector I/O, the ppa list with addresses is also required and must be
mapped for the controller to access.
For each ioctl, the result and status bits are returned as well, such
that user-space can retrieve the open-channel SSD completion bits.
The implementation covers the traditional use-cases of bad block
management, and vectored read/write/erase.
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Metadata implementation, test, and fixes.
Signed-off-by: Simon A.F. Lund <slund@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
advertise whether KVM is capable of handling the PAPR extensions for
resizing the hashed page table during guest runtime. It also adds
definitions for two new VM ioctl()s to implement this extension, and
documentation of the same.
Note that, HPT resizing is already possible with KVM PR without kernel
modification, since the HPT is managed within userspace (qemu). The
capability defined here will only be set where an in-kernel implementation
of resizing is necessary, i.e. for KVM HV. To determine if the userspace
resize implementation can be used, it's necessary to check
KVM_CAP_PPC_ALLOC_HTAB. Unfortunately older kernels incorrectly set
KVM_CAP_PPC_ALLOC_HTAB even with KVM PR. If userspace it want to support
resizing with KVM PR on such kernels, it will need a workaround.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest. The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables. (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).
The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.
The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU. It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.
Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch introduce support for 2500BaseT and 5000BaseT link modes.
These modes are included in the new IEEE 802.3bz standard.
Signed-off-by: Pavel Belous <pavel.s.belous@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two stats in SCM_TIMESTAMPING_OPT_STATS:
TCP_NLA_DATA_SEGS_OUT: total data packets sent including retransmission
TCP_NLA_TOTAL_RETRANS: total data packets retransmitted
The names are picked to be consistent with corresponding fields in
TCP_INFO. This allows applications that are using the timestamping
API to measure latency stats to also retrive retransmission rate
of application write.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP
DF on transmit).
2) Netfilter needs to reflect the fwmark when sending resets, from Pau
Espin Pedrol.
3) nftable dump OOPS fix from Liping Zhang.
4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit,
from Rolf Neugebauer.
5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd
Bergmann.
6) Fix regression in handling of NETIF_F_SG in harmonize_features(),
from Eric Dumazet.
7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern.
8) tcp_fastopen_create_child() needs to setup tp->max_window, from
Alexey Kodanev.
9) Missing kmemdup() failure check in ipv6 segment routing code, from
Eric Dumazet.
10) Don't execute unix_bind() under the bindlock, otherwise we deadlock
with splice. From WANG Cong.
11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer,
therefore callers must reload cached header pointers into that skb.
Fix from Eric Dumazet.
12) Fix various bugs in legacy IRQ fallback handling in alx driver, from
Tobias Regnery.
13) Do not allow lwtunnel drivers to be unloaded while they are
referenced by active instances, from Robert Shearman.
14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven.
15) Fix a few regressions from virtio_net XDP support, from John
Fastabend and Jakub Kicinski.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits)
ISDN: eicon: silence misleading array-bounds warning
net: phy: micrel: add support for KSZ8795
gtp: fix cross netns recv on gtp socket
gtp: clear DF bit on GTP packet tx
gtp: add genl family modules alias
tcp: don't annotate mark on control socket from tcp_v6_send_response()
ravb: unmap descriptors when freeing rings
virtio_net: reject XDP programs using header adjustment
virtio_net: use dev_kfree_skb for small buffer XDP receive
r8152: check rx after napi is enabled
r8152: re-schedule napi for tx
r8152: avoid start_xmit to schedule napi when napi is disabled
r8152: avoid start_xmit to call napi_schedule during autosuspend
net: dsa: Bring back device detaching in dsa_slave_suspend()
net: phy: leds: Fix truncated LED trigger names
net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
net-next: ethernet: mediatek: change the compatible string
Documentation: devicetree: change the mediatek ethernet compatible string
bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
...
- Series of iw_cxgb4 fixes to make it work with the drain cq API
- One or two patches each to: srp, iser, cxgb3, vmw_pvrdma, umem, rxe,
and ipoib
- One big series (13 patches) for the new qedr driver
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYi6KyAAoJELgmozMOVy/d9wgP+gN1i3pEvXN+yy8JmQb/kYDT
9k74tbrPY/E3FnxqfWhqIWWSU427/+wXq7phqFxc/w9BNB3by7Ivud1FuHpBWfx5
PoEhQj+TA3gQEHk259+Jdz8OHLMrKEME3zg1tuDUM9vq+e9hfZWyRVXcTFuOZTKg
YoKBpAE4AmSyKCcogn0FNHkm7UxlSZGlIfiVA61I1vPYjV0Z/ePkiXg51Y3RNOUy
x8StD+RVfWHWJTZ7ZKASC26Mm9tLzQxENQ0kJXg24oG0CaAO90iO3oFUM86z5NP8
gFyvdCpsgDSByRtoLhPnUvSmoTTzcLPa65YneINu17MeKiKjZaVmZipwr54pNTw0
++mqnfuEYhOUzjN1naDaK3DihEAj7CoW9Gdx84s9hOggOyURFUrV7DqwJSEBAz02
9VxBWpKsR7NKF7xVdy/oTgoWpO8Nr7DuYyHklaxS6eORci7CD3lqs71uf1XJg48t
b9SfncPxHReC4L6Y9sAUZsseaLsSCy/3bestB5wTqL6iRxjp25fJ7EvK9VLW/gXl
FcRU5pzph1pKDAasb/M8CjUnS4f4LveenALKAcNViuZDh8WEimM/Jlfg5vHbUAr8
rc0z9s/0cgU9AL/ZPaWQb57ZivaS0pXXa1z+ykKvSvj7nYRAX2tsYLzaX8Im2ang
8Rqqy93AdiO4Zq8ypnCq
=d5rm
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Second round of -rc fixes for 4.10.
This -rc cycle has been slow for the rdma subsystem. I had already
sent you the first batch before the Holiday break. After that, we kept
only getting a few here or there. Up until this week, when I got a
drop of 13 to one driver (qedr). So, here's the -rc patches I have. I
currently have none held in reserve, so unless something new comes in,
this is it until the next merge window opens.
Summary:
- series of iw_cxgb4 fixes to make it work with the drain cq API
- one or two patches each to: srp, iser, cxgb3, vmw_pvrdma, umem,
rxe, and ipoib
- one big series (13 patches) for the new qedr driver"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits)
RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
IB/rxe: Prevent from completer to operate on non valid QP
IB/rxe: Fix rxe dev insertion to rxe_dev_list
IB/umem: Release pid in error and ODP flow
RDMA/qedr: Dispatch port active event from qedr_add
RDMA/qedr: Fix and simplify memory leak in PD alloc
RDMA/qedr: Fix RDMA CM loopback
RDMA/qedr: Fix formatting
RDMA/qedr: Mark three functions as static
RDMA/qedr: Don't reset QP when queues aren't flushed
RDMA/qedr: Don't spam dmesg if QP is in error state
RDMA/qedr: Remove CQ spinlock from CM completion handlers
RDMA/qedr: Return max inline data in QP query result
RDMA/qedr: Return success when not changing QP state
RDMA/qedr: Add uapi header qedr-abi.h
RDMA/qedr: Fix MTU returned from QP query
RDMA/core: Add the function ib_mtu_int_to_enum
IB/vmw_pvrdma: Fix incorrect cleanup on pvrdma_pci_probe error path
IB/vmw_pvrdma: Don't leak info from alloc_ucontext
IB/cxgb3: fix misspelling in header guard
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYixa5AAoJEAhfPr2O5OEVxEIP/RdOWmeLxBW0AT8sRlvlw43K
ecFIq0WTIGr6w7VJPm2uHIcw9sdDnwQgozbp5di4KkDLH13aNoJYjMknpl13oPLM
KIBUvIAz7WG+/k3Iq45RLZ7eovrWzGSOTRGHOAitidDBmuFofkLslfzlJKhnNacB
PhSgzQsjK0GrBj0z8Ud/dCSWSR1EU67Uq+pdr+eFGJpMtEicQipnj36BkDMECW02
12wGht0NW7eDMILrNXCwfBYqsFLKfgJqhypoJLci1rnnrGtuMVmlphZJSfPC6Riz
6D6mbgNaYcSAQq8VsHW8L/T5EMio/TNCkapY3LjeUOp/qxTk486/sYWg22fN0qvg
Y1lCU1s5jY0Wu9TBdb6ty/dQ52f7gG9SGjRupRNYCYIVJI0V7UQdWxIrr+uqAkH/
Ha/n7q7oTT6RxjmhHB+qAOspeWI31sGzRBhpDve9R+6MQa366aek14SpuG+4OdUN
7As7Lih9sagi90tB+J2B9PTeuZnr4XaZB3Z+t2vaBb19NG4/vpHYSuC0ymp0mzLh
9hnxtfZCsEOYLSRTWRz5PJmAK2t5j26C5Pb3qeRA40rNbKdSyTj4ntJ/VLUaiFg5
m7kJ222GvdUTeuBLhCvzkjk1sCt1YRvaqQkmLrzpynJnzojuS4SXeWWmdygQIZwq
cQC5lcwMFM/x1MmzBAWI
=r51U
-----END PGP SIGNATURE-----
Merge tag 'media/v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- fix a regression on tvp5150 causing failures at input selection and
image glitches
- CEC was moved out of staging for v4.10. Fix some bugs on it while not
too late
- fix a regression on pctv452e caused by VM stack changes
- fix suspend issued with smiapp
- fix a regression on cobalt driver
- fix some warnings and Kconfig issues with some random configs.
* tag 'media/v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] s5k4ecgx: select CRC32 helper
[media] dvb: avoid warning in dvb_net
[media] v4l: tvp5150: Don't override output pinmuxing at stream on/off time
[media] v4l: tvp5150: Fix comment regarding output pin muxing
[media] v4l: tvp5150: Reset device at probe time, not in get/set format handlers
[media] pctv452e: move buffer to heap, no mutex
[media] media/cobalt: use pci_irq_allocate_vectors
[media] cec: fix race between configuring and unconfiguring
[media] cec: move cec_report_phys_addr into cec_config_thread_func
[media] cec: replace cec_report_features by cec_fill_msg_report_features
[media] cec: update log_addr[] before finishing configuration
[media] cec: CEC_MSG_GIVE_FEATURES should abort for CEC version < 2
[media] cec: when canceling a message, don't overwrite old status info
[media] cec: fix report_current_latency
[media] smiapp: Make suspend and resume functions __maybe_unused
[media] smiapp: Implement power-on and power-off sequences without runtime PM
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEES2FAuYbJvAGobdVQPTuqJaypJWoFAliHTm8THG1rbEBwZW5n
dXRyb25peC5kZQAKCRA9O6olrKklajYkB/wOJxby2DFi4ukCPyrTpXtkQWmXa6+M
Wr/nY5PBg88JipQ+u3rl6bW9NMF3U3xtPO6I/ezfn1jfWOTg0jVIecEfqlW8JNAU
275IOaiQ7jYgGEo+ALUAv2kSe7Fl8HQyaq+9ugMkWUTdv5Vjay1GZgCzyKZzjM8q
8UOoB1YsXlB9f230HPMLeHcbGHqHCOpfNk9yWNcbd0Up6U7EweXtK4Pf/L8g+bL1
0LR1QHRoUtmivuIi96KhB7O/+w8WuigWRy7qkfGpFescZQWOHzbwCVHlZ9ZYCOnJ
eGEP8koksJuMg4OoEUCv0k2v3Txnfr8YNC5MvNdQ06YmoBWwQeamCozL
=fBeU
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.11-20170124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2017-01-24
this is a pull request of 4 patches for net-next/master.
The first patch by Oliver Hartkopp adds a netlink API to configure the
interface termination of a CAN card. The next two patches are by me and
add a netlink API to query and configure CAN interfaces that only
support fixed bitrates. The last patch by Colin Ian King simplifies the
return path in the softing_cs driver's softingcs_probe() function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change History
--------------
v4: Changes suggested by Emil, Christian
- return -ENODATA for asics with unlimited sessions
v3: changes suggested by Christian
- Add a check for UVD IP block using AMDGPU_HW_IP_UVD
query type.
- Add a check for asic_type to be less than
CHIP_POLARIS10 since starting Polaris, we support
unlimited UVD instances.
- Add kerneldoc style comment for
amdgpu_uvd_used_handles().
v2: as suggested by Christian
- Add a new query AMDGPU_INFO_NUM_HANDLES
- Create a helper function to return the number
of currently used UVD handles.
- Modify the logic to count the number of used
UVD handles since handles can be freed in
non-linear fashion.
v1:
- User might want to query the maximum number of UVD
instances supported by firmware. In addition to that,
if there are multiple applications using UVD handles
at the same time, he might also want to query the
currently used number of handles.
For this we add two variables max_handles and
used_handles inside drm_amdgpu_info_hw_ip. So now
an application (or libdrm) can use AMDGPU_INFO IOCTL
with AMDGPU_INFO_HW_IP_INFO query type to get these
values.
Signed-off-by: Arindam Nath <arindam.nath@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The address generation mode for IPv6 link-local can only be configured
by netlink messages. This patch adds the ability to change the address
generation mode via sysctl.
v1 -> v2
Removed the rtnl lock and switch to use RCU lock to iterate through
the netdev list.
v2 -> v3
Removed the addrgenmode variable from the idev structure and use the
systcl storage for the flag.
Simplifed the logic for sysctl handling by removing the supported
for all operation.
Added support for more types of tunnel interfaces for link-local
address generation.
Based the patches from net-next.
v3 -> v4
Removed unnecessary whitespace changes.
Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivante GC hardware uses simple 4x4 tiled and nested 64x64 supertiled
formats as well as so-called split-tiled variants for dual-pipe
hardware, where even and odd tiles start at different base addresses.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170126153217.26916-1-p.zabel@pengutronix.de
- cleanups&fixes for dw-hdmi bride driver (Laurent)
- updates for adv bridge driver (John Stultz) for nexus
- drm_crtc_from_index helper rollout (Shawn Guo)
- removing drm_framebuffer_unregister_private from drivers&core
- target_vblank (Andrey Grodzovsky)
- misc tiny stuff
* tag 'drm-misc-next-2017-01-23' of git://anongit.freedesktop.org/git/drm-misc: (49 commits)
drm: qxl: Open code teardown function for qxl
drm: qxl: Open code probing sequence for qxl
drm/bridge: adv7511: Re-write the i2c address before EDID probing
drm/bridge: adv7511: Reuse __adv7511_power_on/off() when probing EDID
drm/bridge: adv7511: Rework adv7511_power_on/off() so they can be reused internally
drm/bridge: adv7511: Enable HPD interrupts to support hotplug and improve monitor detection
drm/bridge: adv7511: Switch to using drm_kms_helper_hotplug_event()
drm/bridge: adv7511: Use work_struct to defer hotplug handing to out of irq context
drm: vc4: use crtc helper drm_crtc_from_index()
drm: tegra: use crtc helper drm_crtc_from_index()
drm: nouveau: use crtc helper drm_crtc_from_index()
drm: mediatek: use crtc helper drm_crtc_from_index()
drm: kirin: use crtc helper drm_crtc_from_index()
drm: exynos: use crtc helper drm_crtc_from_index()
dt-bindings: display: dw-hdmi: Clean up DT bindings documentation
drm: bridge: dw-hdmi: Assert SVSRET before resetting the PHY
drm: bridge: dw-hdmi: Fix the name of the PHY reset macros
drm: bridge: dw-hdmi: Define and use macros for PHY register addresses
drm: bridge: dw-hdmi: Detect PHY type at runtime
drm: bridge: dw-hdmi: Handle overflow workaround based on device version
...
Final block of feature work for 4.11:
- gen8 pd cleanup from Matthew Auld
- more cleanups for view/vma (Chris)
- dmc support on glk (Anusha Srivatsa)
- use core crc api (Tomue)
- track wedged requests using fence.error (Chris)
- lots of psr fixes (Nagaraju, Vathsala)
- dp mst support, acked for merging through drm-intel by Takashi
(Libin)
- huc loading support, including uapi for libva to use it (Anusha
Srivatsa)
* tag 'drm-intel-next-2017-01-23' of git://anongit.freedesktop.org/git/drm-intel: (111 commits)
drm/i915: Update DRIVER_DATE to 20170123
drm/i915: reinstate call to trace_i915_vma_bind
drm/i915: Assert that created vma has a whole number of pages
drm/i915: Assert the drm_mm_node is allocated when on the VM lists
drm/i915: Treat an error from i915_vma_instance() as unlikely
drm/i915: Reject vma creation larger than address space
drm/i915: Use common LRU inactive vma bumping for unpin_from_display
drm/i915: Do an unlocked wait before set-cache-level ioctl
drm/i915/huc: Assert that HuC vma is placed in GuC accessible range
drm/i915/huc: Avoid attempting to authenticate non-existent fw
drm/i915: Set adjustment to zero on Up/Down interrupts if freq is already max/min
drm/i915: Remove the double handling of 'flags from intel_mode_from_pipe_config()
drm/i915: Remove crtc->config usage from intel_modeset_readout_hw_state()
drm/i915: Release temporary load-detect state upon switching
drm/i915: Remove i915_gem_object_to_ggtt()
drm/i915: Remove i915_vma_create from VMA API
drm/i915: Add a check that the VMA instance we lookup matches the request
drm/i915: Rename some warts in the VMA API
drm/i915: Track pinned vma in intel_plane_state
drm/i915/get_params: Add HuC status to getparams
...
Backmerge Linus master to get the connector locking revert.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux: (645 commits)
sysctl: fix proc_doulongvec_ms_jiffies_minmax()
Revert "drm/probe-helpers: Drop locking from poll_enable"
MAINTAINERS: add Dan Streetman to zbud maintainers
MAINTAINERS: add Dan Streetman to zswap maintainers
mm: do not export ioremap_page_range symbol for external module
mn10300: fix build error of missing fpu_save()
romfs: use different way to generate fsid for BLOCK or MTD
frv: add missing atomic64 operations
mm, page_alloc: fix premature OOM when racing with cpuset mems update
mm, page_alloc: move cpuset seqcount checking to slowpath
mm, page_alloc: fix fast-path race with cpuset update or removal
mm, page_alloc: fix check for NULL preferred_zone
kernel/panic.c: add missing \n
fbdev: color map copying bounds checking
frv: add atomic64_add_unless()
mm/mempolicy.c: do not put mempolicy before using its nodemask
radix-tree: fix private list warnings
Documentation/filesystems/proc.txt: add VmPin
mm, memcg: do not retry precharge charges
proc: add a schedule point in proc_pid_readdir()
...
- bump version strings, by Simon Wunderlich
- ignore self-generated loop detect MAC addresses in translation table,
by Simon Wunderlich
- install uapi batman_adv.h header, by Sven Eckelmann
- bump copyright years, by Sven Eckelmann
- Remove an unused variable in translation table code, by Sven Eckelmann
- Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids
suggestion), and a follow up code clean up, by Gao Feng (2 patches)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAliKJcgWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoYzFEACGNOZcW2bFFpuE5CZWZpTW1n24
YAY/3C+THc89oUyn7ZbWpZ05HcG+JZXOWCWHaRTnd4UeA+sbki53Ioo52ZGgo79C
7LsyXMMH3F2NrXVEfCK/MUGZcqBAxMd4dcDPsYz5q5osxydjG3hEJLikqOluop3P
JKK9FTEeP2JeoWz9eEf7Io2te6EwcIjo3TRa9f53uyyhPnk5eS2NeSle5axZqS7c
l+NSZVlrrG7oUB0UdzABt5NWOvjDc+Lqp1dkoJo17PHOgXialIfjZOKUlKtjVx6E
06ow2HZaEYCtdlb0awjyPxtIpwiy0szTzJa/h4XtzeQHgbOKIqJWAEb80X7imHVE
aljP7A1uuGm0bcVQ+pq21PX8yLu4RCDPDE5Khu9atkiQP5+sVEdGiJ8Soaw4PmoD
yhDmXshrPGR8u5txN8gaHWG4MHt19645s8dHqHQ7tf5h+mf2QXQ2v/jJQsCV2UfY
vLd/JOD4ZVYWDCspcDdEwGc8KB9r6P31wXuSVjYfkqTFXocdzDr87V7C69CS0U+b
Lzel8oa/eVa2ppR+OhpELxhL2ahO7p1jZI2ix4NHftx5MAV4WJ3RfR2ev2Sf9Ukc
aGjzjuml3JVo2i+11lqiAtOaBK9wDNv1CaC+D2NmC6wpWzWQryNryw3fm2SoH3Xm
S+UoBDZpik+SIEBv3Q==
=cd1Y
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20170126' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- ignore self-generated loop detect MAC addresses in translation table,
by Simon Wunderlich
- install uapi batman_adv.h header, by Sven Eckelmann
- bump copyright years, by Sven Eckelmann
- Remove an unused variable in translation table code, by Sven Eckelmann
- Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids
suggestion), and a follow up code clean up, by Gao Feng (2 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains a large batch with Netfilter fixes for
your net tree, they are:
1) Two patches to solve conntrack garbage collector cpu hogging, one to
remove GC_MAX_EVICTS and another to look at the ratio (scanned entries
vs. evicted entries) to make a decision on whether to reduce or not
the scanning interval. From Florian Westphal.
2) Two patches to fix incorrect set element counting if NLM_F_EXCL is
is not set. Moreover, don't decrenent set->nelems from abort patch
if -ENFILE which leaks a spare slot in the set. This includes a
patch to deconstify the set walk callback to update set->ndeact.
3) Two fixes for the fwmark_reflect sysctl feature: Propagate mark to
reply packets both from nf_reject and local stack, from Pau Espin Pedrol.
4) Fix incorrect handling of loopback traffic in rpfilter and nf_tables
fib expression, from Liping Zhang.
5) Fix oops on stateful objects netlink dump, when no filter is specified.
Also from Liping Zhang.
6) Fix a build error if proc is not available in ipt_CLUSTERIP, related
to fix that was applied in the previous batch for net. From Arnd Bergmann.
7) Fix lack of string validation in table, chain, set and stateful
object names in nf_tables, from Liping Zhang. Moreover, restrict
maximum log prefix length to 127 bytes, otherwise explicitly bail
out.
8) Two patches to fix spelling and typos in nf_tables uapi header file
and Kconfig, patches from Alexander Alemayhu and William Breathitt Gray.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
09748a22f4 ("batman-adv: add generic netlink family for batman-adv")
introduced the new batman_adv.h which describes the netlink attributes and
commands of batman-adv. But the Kbuild entry to install the header was not
added.
All currently known tools ship their own copy of batman_adv.h but it should
be installed anyway to later be able to migrate to the system batman_adv.h.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
alternative way to perform Fast Open on the active side (client). Prior
to this patch, a client needs to replace the connect() call with
sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
to use Fast Open: these socket operations are often done in lower layer
libraries used by many other applications. Changing these libraries
and/or the socket call sequences are not trivial. A more convenient
approach is to perform Fast Open by simply enabling a socket option when
the socket is created w/o changing other socket calls sequence:
s = socket()
create a new socket
setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
newly introduced sockopt
If set, new functionality described below will be used.
Return ENOTSUPP if TFO is not supported or not enabled in the
kernel.
connect()
With cookie present, return 0 immediately.
With no cookie, initiate 3WHS with TFO cookie-request option and
return -1 with errno = EINPROGRESS.
write()/sendmsg()
With cookie present, send out SYN with data and return the number of
bytes buffered.
With no cookie, and 3WHS not yet completed, return -1 with errno =
EINPROGRESS.
No MSG_FASTOPEN flag is needed.
read()
Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
write() is not called yet.
Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
established but no msg is received yet.
Return number of bytes read if socket is established and there is
msg received.
The new API simplifies life for applications that always perform a write()
immediately after a successful connect(). Such applications can now take
advantage of Fast Open by merely making one new setsockopt() call at the time
of creating the socket. Nothing else about the application's socket call
sequence needs to change.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first seven patches from Or Gerlitz in this series further enhances
the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the
TC tunnel key set (encap) and unset (decap) actions.
Or Gerlitz says:
========================
As part of doing this change, few cleanups are done in the IPv4 code,
later we move to use the full tunnel key info provided to the driver as
the key for our internal hashing which is used to identify cases where
the same tunnel is used for encapsulating multiple flows. As done in the
IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh
lookups and construction of the IPv6 tunnel headers on the encap path and
matching on the outer hears in the decap path.
The last patch of the series enlarges the HW FDB size for the switchdev mode,
so it has now room to contain offloaded flows as many as min(max number
of HW flow counters supported, max HW table size supported).
========================
Next to Or's series you can find several patches handling several topics.
From Mohamad, add support for SRIOV VF min rate guarantee by using the
TSAR BW share weights mechanism.
From Or, Two patches to enable Eth VFs to query their min-inline value for
user-space.
for that we move a mlx5 low level min inline helper function from mlx5
ethernet driver into the core driver and then use it in mlx5_ib to expose
the inline mode to rdma applications through libmlx5.
From Kamal Heib, Reduce memory consumption on kdump kernel.
From Shaker Daibes, code reuse in CQE compression control logic
Thanks,
Saeed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJYh7FNAAoJEEg/ir3gV/o+TjsIAL1e92+5eutBS9ZvhMARi+Tc
c2V9V8bG8W1RWWTvx1G0aU4nNjWsr5L8Q8gzqpwhrQITBfgpWd+hlnxQCucyhxC3
AC1qQ+AKREe/C+25D+WJRq34/61ZHEH2rbKZvpZ1O8SuicVPbcvJ9eM+wOEDxwwX
u5C5kWQ0HRtCcnFiiOYkB+0CQPH7m3+ZzZek+jDowrexHMSE+yl8ZNtaSTX9c9QN
bE2cPiCVZd7ufKPIwY8LWHBryyl7sh5P+NqzD633OeiqP/pkZsW9A+czyt+d330f
6XTKOS1PCD+TfHE0sZJT4VMCjICMHrOFbNRZuwcxJQ6NfmwIJZfskX4NLbyGQTI=
=vF7U
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2017-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-24-01
The first seven patches from Or Gerlitz in this series further enhances
the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the
TC tunnel key set (encap) and unset (decap) actions.
Or Gerlitz says:
========================
As part of doing this change, few cleanups are done in the IPv4 code,
later we move to use the full tunnel key info provided to the driver as
the key for our internal hashing which is used to identify cases where
the same tunnel is used for encapsulating multiple flows. As done in the
IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh
lookups and construction of the IPv6 tunnel headers on the encap path and
matching on the outer hears in the decap path.
The last patch of the series enlarges the HW FDB size for the switchdev mode,
so it has now room to contain offloaded flows as many as min(max number
of HW flow counters supported, max HW table size supported).
========================
Next to Or's series you can find several patches handling several topics.
From Mohamad, add support for SRIOV VF min rate guarantee by using the
TSAR BW share weights mechanism.
From Or, Two patches to enable Eth VFs to query their min-inline value for
user-space.
for that we move a mlx5 low level min inline helper function from mlx5
ethernet driver into the core driver and then use it in mlx5_ib to expose
the inline mode to rdma applications through libmlx5.
From Kamal Heib, Reduce memory consumption on kdump kernel.
From Shaker Daibes, code reuse in CQE compression control logic
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to save
user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.
Sample exercise(showing variable length use of cookie)
.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4
.. dump all gact actions..
sudo $TC -s actions ls action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4
.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1
... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux 4.9 added two ioctl() operations that can be used to discover:
* the parental relationships for hierarchical namespaces (user and PID)
[NS_GET_PARENT]
* the user namespaces that owns a specified non-user-namespace
[NS_GET_USERNS]
For no good reason that I can glean, NS_GET_USERNS was made synonymous
with NS_GET_PARENT for user namespaces. It might have been better if
NS_GET_USERNS had returned an error if the supplied file descriptor
referred to a user namespace, since it suggests that the caller may be
confused. More particularly, if it had generated an error, then I wouldn't
need the new ioctl() operation proposed here. (On the other hand, what
I propose here may be more generally useful.)
I would like to write code that discovers namespace relationships for
the purpose of understanding the namespace setup on a running system.
In particular, given a file descriptor (or pathname) for a namespace,
N, I'd like to obtain the corresponding user namespace. Namespace N
might be a user namespace (in which case my code would just use N) or
a non-user namespace (in which case my code will use NS_GET_USERNS to
get the user namespace associated with N). The problem is that there
is no way to tell the difference by looking at the file descriptor
(and if I try to use NS_GET_USERNS on an N that is a user namespace, I
get the parent user namespace of N, which is not what I want).
This patch therefore adds a new ioctl(), NS_GET_NSTYPE, which, given
a file descriptor that refers to a user namespace, returns the
namespace type (one of the CLONE_NEW* constants).
Signed-off-by: Michael Kerrisk <mtk-manpages@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127,
at nf_log_packet(), so the extra part is useless.
Second, after adding a log rule with a very very long prefix, we will
fail to dump the nft rules after this _special_ one, but acctually,
they do exist. For example:
# name_65000=$(printf "%0.sQ" {1..65000})
# nft add rule filter output log prefix "$name_65000"
# nft add rule filter output counter
# nft add rule filter output counter
# nft list chain filter output
table ip filter {
chain output {
type filter hook output priority 0; policy accept;
}
}
So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1.
Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When programs need to calculate the csum from scratch for small UDP
packets and use bpf_l4_csum_replace() to feed the result from helpers
like bpf_csum_diff(), then we need a flag besides BPF_F_MARK_MANGLED_0
that would ignore the case of current csum being 0, and which would
still allow for the helper to set the csum and transform when needed
to CSUM_MANGLED_0.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some mlx5 HW models (CX4, CX4Lx), the VF driver needs to put part
of the packet headers on the TX descriptor so the e-switch can do proper
matching and steering. This is called "min-inline", it's advertized to
the VF by the FW and also enforced on them by the HW, such that if they
don't obey, their packets are dropped.
SRIOV VF libmlx5 instances should take into account the min-inline
value of their vports. For that end, we provide this value through
the vendor response part of init_ucontext command.
The min inline value is reported in a way which will let newer libmlx5
instances realize that they are running over an older kernel and act
accordingly (e.g apply some educated guess).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This action allows the user to sample traffic matched by tc classifier.
The sampling consists of choosing packets randomly and sampling them using
the psample module. The user can configure the psample group number, the
sampling rate and the packet's truncation (to save kernel-user traffic).
Example:
To sample ingress traffic from interface eth1, one may use the commands:
tc qdisc add dev eth1 handle ffff: ingress
tc filter add dev eth1 parent ffff: \
matchall action sample rate 12 group 4
Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets on
dev eth1 to psample group 4.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>