If f2fs manages multiple devices, in checkpoint, we need to issue flush
in those devices which contain dirty data/node in their cache before
we write checkpoint region, otherwise, filesystem metadata could be
corrupted if hitting SPO after checkpoint.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When multiple device feature is enabled, during ->fsync we will issue
flush in all devices to make sure node/data of the file being persisted
into storage. But some flushes of device could be unneeded as file's
data may be not writebacked into those devices. So this patch adds and
manage bitmap per inode in global cache to indicate which device is
dirty and it needs to issue flush during ->fsync, hence, we could improve
performance of fsync in scenario of multiple device.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It needs to stat size of ino management cache with all type instead of
orphan ino type.
Fixes: 652be55162 ("f2fs: show # of orphan inodes")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If we failed to issue flush in ->fsync, we need to keep FI_UPDATE_WRITE
flag to make sure triggering flush in next ->fsync.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Fan Li reported, there is no user traversing nid_list[ALLOC_NID_LIST]
which is used for tracking preallocated nids. Let's drop it, and only
track preallocated nids in free_nid_root radix-tree.
Reported-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In FI_NO_PREALLOC cases, direct I/O path may allocate blocks for an
inode but keep its inline data flag. This inconsistency may trigger
vfs clear_inode nrpages bug_on when evicting the inode. We should
convert inline data first in this case.
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Keep in line with the other Linux file system implementations
since page_cache_sync_readahead supports NULL file pointer,
and thus we can readahead data by f2fs itself without file opening
(something like the btrfs behavior).
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit ba38c27eb9 ("f2fs: enhance lookup xattr") introduces
lookup_all_xattrs duplicating from read_all_xattrs, which leaves
lots of similar codes in between them, so introduce new help
read_xattr_block to clean up redundant codes.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit ba38c27eb9 ("f2fs: enhance lookup xattr") introduces
lookup_all_xattrs duplicating from read_all_xattrs, which leaves
lots of similar codes in between them, so introduce new help
read_inline_xattr to clean up redundant codes.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit 2683446646 ("f2fs: reuse nids more aggressively") tries to
reuse nids as many as possilbe, in order to mitigate producing obsolete
node pages in page cache.
But acutally, before we reuse the nids and related node page cache,
we will always invalidate that node page, so there will be not any
obsolete node pages in cache.
Let's just revert previous commit, so that nm_i::next_scan_nid can be
increased ascendingly, making __build_free_nids traverses all NAT pages
more easily, finally, free nid bitmap cache can be enabled as soon as
possible.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This reverts commit b9cd20619e.
That patch causes much fewer node segments (which can be used for SSR)
than before, and in the corner case (e.g. create and delete *.txt files in
one same directory, there will be very few node segments but many data
segments), if the reserved free segments are all used up during gc, then
the write_checkpoint can still flush dentry pages to data ssr segments,
but will probably fail to flush node pages to node ssr segments, since
there are not enough node ssr segments left (the left ones are all
full).
So revert this patch to give a fair chance to let node segments remain
for SSR, which provides more robustness for corner cases.
Conflicts:
fs/f2fs/gc.c
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This contains one bug fix which causes a kernel panic during fstrim introduced
in 4.14-rc1.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlnc+awACgkQQBSofoJI
UNIQLg/9HB/NikmBxVtkDtwrTKpVEPK5AYRHOvoa9k6twGkU6pB8FE0cd2PstwlZ
tAwRstyt8W9nGzF5BPY+WAyVs9ybc26wIqNo13cnzwXbc0/cc4pTy8lzeiFQdQrK
JIzz2lHNt0b5euCsEEAsnwK+rTb5DPUMKm8JkBUQ8f94oxIHLWvg7Um9FBppTw7s
JNOJ8/ymzQVNlWu7VxFaVwfUPbEhK7gtpSWjO65fiprQ0JjwXLEr65356XU2XW8x
lhQkByPMfMv1ZyGSNr3m4Hih0M6250slNHzwrZDxTdH7NDJmy1DfcPiM+epMWZMa
4uT+2hsxhTCqDQbIEvP9jv+KVHV7AG9ldCD04a0RD+XoNKDVLKlzSMFWVcWE/d0H
jSaDrMZj+taseF72x/efP8P/RrTbzqYsqBoAkoByibOXvBf7U8vsLK4NuG7agoL4
EUXDMuVJDB5d8LJRSYt0lPI5R+lhRVlVuint7a9T09yiLyCeR0wGf+eoH9C9Y4V8
t/mEM9azBi9l7T0yraVfqnh+SPzwwlxYOLQeZTi0bf3uqmBOeKb0OvfOiwboOnaZ
5Rl6jYD/hgZAowXpbohRjqPJhMoLMabsTJ4kHj6uJcQDhvTqDpamm9g9Afsiyr6z
xPYo09iHHlWA/iSiV7VSnbZu8hr59bchVt86r77fy/4YH3DXOcM=
=fAsG
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fix from Jaegeuk Kim:
"This contains one bug fix which causes a kernel panic during fstrim
introduced in 4.14-rc1"
* tag 'f2fs-for-4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: fix potential panic during fstrim
This update consists of:
- fix for x86: sysret_ss_attrs test build failure preventing the x86
tests from running.
- fix mqueue: fix regression in silencing test run output.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJZ3PdwAAoJEAsCRMQNDUMcjgQP/2gxIs5i6XXedtpK+IWDi/74
ht6Afi4MIxMlQIJnH515bwxvhzJ1PfAIJERZp3w1gjvQac/Ew7xj+p6z53JtzZr3
OKLDR9Y5qH2wlTq72vxMGHZsjL2k4hGTIQshRgSbUwfbmZsJCm7YANIWX1AuMCYK
IO8Ke5x01N5s0+lNk5J3ZMWfOWnODiy9mFleIMf1YWqfsgpkvbhwJQLJMIPEng4x
OCX6EsrBOb+vE4EycYDrqK7urim8+tcS0nTpj2UH6SyYIpYXNiglIYNiuSgreAcE
o/3zGVuwP1DvtC0ASQw00C3vQ73UrXVkqnCHrTWBMF/aiHNM029ueX5sYrlKijPu
P8bynYADGjgBCwyviHz/yYKgw7qaBj/4nK3J1S8jhc3Cpv1adZv55FUn30kF/Wyk
+4OZRAdQZADKFGhXOS1eYXVBZG+Ss7HGOEoQ3yrq75rOETONTD4xD/DAKDjlBO6r
iec7nmSK6nA/z8oyVIfdNivpoRXA+ncwokkPp5Ypx75gZ+0m1FAylGOf8TVkVL32
QaVBdk9jqIYkIvrlHtP436o/ec0r3jl+y7tWLG3D4P2S5fhGsdYC+kXUehV06S5v
xVD1p581zmOQji/O6Ixu6uijVCmM2ZGEBsP9Kz/ABWVbGXRBAl1DV3YDOVXT4VM1
UDf63RcAQd1dS0uTbuqV
=V+1w
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-4.14-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
- fix for x86: sysret_ss_attrs test build failure preventing the x86
tests from running
- fix mqueue: fix regression in silencing test run output
* tag 'linux-kselftest-4.14-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: mqueue: fix regression in silencing output from RUN_TESTS
selftests: x86: sysret_ss_attrs doesn't build on a PIE build
Merge powerpc transactional memory fixes from Michael Ellerman:
"I figured I'd still send you the commits using a bundle to make sure
it works in case I need to do it again in future"
This fixes transactional memory state restore for powerpc.
* bundle'd patches from Michael Ellerman:
powerpc/tm: Fix illegal TM state in signal handler
powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks
Pull networking fixes from David Miller:
1) Fix object leak on IPSEC offload failure, from Steffen Klassert.
2) Fix range checks in ipset address range addition operations, from
Jozsef Kadlecsik.
3) Fix pernet ops unregistration order in ipset, from Florian Westphal.
4) Add missing netlink attribute policy for nl80211 packet pattern
attrs, from Peng Xu.
5) Fix PPP device destruction race, from Guillaume Nault.
6) Write marks get lost when BPF verifier processes R1=R2 register
assignments, causing incorrect liveness information and less state
pruning. Fix from Alexei Starovoitov.
7) Fix blockhole routes so that they are marked dead and therefore not
cached in sockets, otherwise IPSEC stops working. From Steffen
Klassert.
8) Fix broadcast handling of UDP socket early demux, from Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan
net: thunderx: mark expected switch fall-throughs in nicvf_main()
udp: fix bcast packet reception
netlink: do not set cb_running if dump's start() errs
ipv4: Fix traffic triggered IPsec connections.
ipv6: Fix traffic triggered IPsec connections.
ixgbe: incorrect XDP ring accounting in ethtool tx_frame param
net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
Revert commit 1a8b6d76dc ("net:add one common config...")
ixgbe: fix masking of bits read from IXGBE_VXLANCTRL register
ixgbe: Return error when getting PHY address if PHY access is not supported
netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook
tipc: Unclone message at secondary destination lookup
tipc: correct initialization of skb list
gso: fix payload length when gso_size is zero
mlxsw: spectrum_router: Avoid expensive lookup during route removal
bpf: fix liveness marking
doc: Fix typo "8023.ad" in bonding documentation
ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real
...
The u-blox TOBY-L2 is a LTE Cat 4 module with HSPA+ and 2G fallback.
This module allows switching to different USB profiles with the
'AT+UUSBCONF' command, and provides a ECM network interface when the
'AT+UUSBCONF=2' profile is selected.
The u-blox SARA-U2 is a HSPA module with 2G fallback. The default USB
configuration includes a ECM network interface.
Both these modules are controlled via AT commands through one of the
TTYs exposed. Connecting these modules may be done just by activating
the desired PDP context with 'AT+CGACT=1,<cid>' and then running DHCP
on the ECM interface.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Cc: Sunil Goutham <sgoutham@cavium.com>
Cc: Robert Richter <rric@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:
1) Fix packet drops due to incorrect ECN handling in IPVS, from Vadim
Fedorenko.
2) Fix splat with mark restoration in xt_socket with non-full-sock,
patch from Subash Abhinov Kasiviswanathan.
3) ipset bogusly bails out when adding IPv4 range containing more than
2^31 addresses, from Jozsef Kadlecsik.
4) Incorrect pernet unregistration order in ipset, from Florian Westphal.
5) Races between dump and swap in ipset results in BUG_ON splats, from
Ross Lagerwall.
6) Fix chain renames in nf_tables, from JingPiao Chen.
7) Fix race in pernet codepath with ebtables table registration, from
Artem Savkov.
8) Memory leak in error path in set name allocation in nf_tables, patch
from Arvind Yadav.
9) Don't dump chain counters if they are not available, this fixes a
crash when listing the ruleset.
10) Fix out of bound memory read in strlcpy() in x_tables compat code,
from Eric Dumazet.
11) Make sure we only process TCP packets in SYNPROXY hooks, patch from
Lin Zhang.
12) Cannot load rules incrementally anymore after xt_bpf with pinned
objects, added in revision 1. From Shmulik Ladkani.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2017-10-09
This series contains updates to ixgbe and arch/Kconfig.
Mark fixes a case where PHY register access is not supported and we were
returning a PHY address, when we should have been returning -EOPNOTSUPP.
Sabrina Dubroca fixes the use of a logical "and" when it should have been
the bitwise "and" operator.
Ding Tianhong reverts the commit that added the Kconfig bool option
ARCH_WANT_RELAX_ORDER, since there is now a new flag
PCI_DEV_FLAGS_NO_RELAXED_ORDERING that has been added to indicate that
Relaxed Ordering Attributes should not be used for Transaction Layer
Packets. Then follows up with making the needed changes to ixgbe to
use the new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag.
John Fastabend fixes an issue in the ring accounting when the transmit
ring parameters are changed via ethtool when an XDP program is attached.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit bc044e8db7 ("udp: perform source validation for
mcast early demux") does not take into account that broadcast packets
lands in the same code path and they need different checks for the
source address - notably, zero source address are valid for bcast
and invalid for mcast.
As a result, 2nd and later broadcast packets with 0 source address
landing to the same socket are dropped. This breaks dhcp servers.
Since we don't have stringent performance requirements for ingress
broadcast traffic, fix it by disabling UDP early demux such traffic.
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: bc044e8db7 ("udp: perform source validation for mcast early demux")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().
This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.
It also moves the call to start() to be when the mutex is held. This has
the nice side effect of serializing invocations to start(), which is
likely desirable anyway. It also prevents any possible other races that
might come out of this logic.
In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlnbJz4ACgkQa3t4Rpy0
AB2R+Q//UgCNRjosPLsEgLNR9zBP/Kys7cxy2ZtazBhAqYF7bil2QTh9o+Q0PW1d
d9B/Dwo1lQhYe2D4qh6YoNimakdN0SfGViqLoXl4s28vC6ZQLFWfHgKP845VXQbC
6ihGsOG9TC2Xe5MIKXHf4VUPLCEQHBv7yWyRFOjVd+IJ3dfz2STi3tQTfApv6O2/
LXpERzgb9m3gj0DeGpU50dN7wpO+uUNX87cKLrByBwzS9qHQECcMB/d4eRsirljF
EOtmMBWg/KnBfT3jwjmjLBEFLDDrPEa1aQn1C4WdhowK6Fg65XeIeO1czLqm0wRL
NnWXeS7h1fywQ3+e8HJ3qDkAlBGvO3+uMORVQf5HNgETtQ8BpDvfDLJEU31D4UA9
vdPIy6L01fL2MMQw3H0j9YQHPIdKTKZdHhI7aX2Pd+UoihQwuooS+g/Pyrf18qrc
8FmVxo4Uflmm9/pqZ7YiNVOFTptwz81XHJBaTMfrjgTHdS2N6EyjCc2ucSwjXbXU
ma7nNlYgMloOXOncN5JraFEhtQCkQvtw9mPWcIdpmi97+sj7VT4kP+5KOeVD9vjl
VSyji5WMAn6bBwwHSnon3yGFJUXmW1NYO0H786iHs7QqmWwD4BjpP6GAfjwPVPbm
kCmfcVb1YWkSEKgmdImn1SUExvkjxdhIwY++Wt5rksbxa9JMczQ=
=WEgb
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2017-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
pull-request: mac80211 2017-10-09
The QCA folks found another netlink problem - we were missing validation
of some attributes. It's not super problematic since one can only read a
few bytes beyond the message (and that memory must exist), but here's the
fix for it.
I thought perhaps we can make nla_parse_nested() require a policy, but
given the two-stage validation/parsing in regular netlink that won't work.
Please pull and let me know if there's any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2017-10-09
1) Fix some error paths of the IPsec offloading API.
2) Fix a NULL pointer dereference when IPsec is used
with vti. From Alexey Kodanev.
3) Don't call xfrm_policy_cache_flush under xfrm_state_lock,
it triggers several locking warnings. From Artem Savkov.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent patch removed the dst_free() on the allocated
dst_entry in ipv4_blackhole_route(). The dst_free() marked the
dst_entry as dead and added it to the gc list. I.e. it was setup
for a one time usage. As a result we may now have a blackhole
route cached at a socket on some IPsec scenarios. This makes the
connection unusable.
Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.
Fixes: b838d5e1c5 ("ipv4: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent patch removed the dst_free() on the allocated
dst_entry in ipv6_blackhole_route(). The dst_free() marked
the dst_entry as dead and added it to the gc list. I.e. it
was setup for a one time usage. As a result we may now have
a blackhole route cached at a socket on some IPsec scenarios.
This makes the connection unusable.
Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.
Fixes: 587fea7411 ("ipv6: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing the TX ring parameters with an XDP program attached may
cause the XDP queues to be cleared and the TX rings to be incorrectly
configured.
Fix by doing correct ring accounting in setup call.
Fixes: 33fdc82f08 ("ixgbe: add support for XDP_TX action")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.
With this new flag we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc ("net:add one common config...") did,
so revert this commit.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In ixgbe_clear_udp_tunnel_port(), we read the IXGBE_VXLANCTRL register
and then try to mask some bits out of the value, using the logical
instead of bitwise and operator.
Fixes: a21d0822ff ("ixgbe: add support for geneve Rx offload")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In cases where PHY register access is not supported, don't mislead
a caller into thinking that it is supported by returning a PHY
address. Instead, return -EOPNOTSUPP when PHY access is not
supported.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Commit 2c16d60332 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.
However this breaks subsequent iptables calls:
# iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
# iptables -A INPUT -s 5.6.7.8 -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.
That's because iptables works by loading existing rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.
However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.
One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.
However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.
This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.
It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.
Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.
References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
[2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
Reported-by: Rafael Buchbinder <rafi@rbk.ms>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet, but
the real server maybe reply an icmp error packet related to the exist
tcp conntrack, so we will access wrong tcp data.
Fix it by checking for the protocol field and only process tcp traffic.
Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When a bundling message is received, the function tipc_link_input()
calls function tipc_msg_extract() to unbundle all inner messages of
the bundling message before adding them to input queue.
The function tipc_msg_extract() just clones all inner skb for all
inner messagges from the bundling skb. This means that the skb
headroom of an inner message overlaps with the data part of the
preceding message in the bundle.
If the message in question is a name addressed message, it may be
subject to a secondary destination lookup, and eventually be sent out
on one of the interfaces again. But, since what is perceived as headroom
by the device driver in reality is the last bytes of the preceding
message in the bundle, the latter will be overwritten by the MAC
addresses of the L2 header. If the preceding message has not yet been
consumed by the user, it will evenually be delivered with corrupted
contents.
This commit fixes this by uncloning all messages passing through the
function tipc_msg_lookup_dest(), hence ensuring that the headroom
is always valid when the message is passed on.
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We change the initialization of the skb transmit buffer queues
in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also
initialize their spinlocks. This is needed because we may, during
error conditions, need to call skb_queue_purge() on those queues
further down the stack.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When gso_size reset to zero for the tail segment in skb_segment(), later
in ipv6_gso_segment(), __skb_udp_tunnel_segment() and gre_gso_segment()
we will get incorrect results (payload length, pcsum) for that segment.
inet_gso_segment() already has a check for gso_size before calculating
payload.
The issue was found with LTP vxlan & gre tests over ixgbe NIC.
Fixes: 07b26c9454 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I increased the scale of supported VRFs by having
all of them share the same LPM tree.
In order to avoid look-ups for prefix lengths that don't exist, each
route removal would trigger an aggregation across all the active virtual
routers to see which prefix lengths are in use and which aren't and
structure the tree accordingly.
With the way the data structures are currently laid out, this is a very
expensive operation. When preformed repeatedly - due to the invocation
of the abort mechanism - and with enough VRFs, this can result in a hung
task.
For now, avoid this optimization until it can be properly re-added in
net-next.
Fixes: fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
while processing Rx = Ry instruction the verifier does
regs[insn->dst_reg] = regs[insn->src_reg]
which often clears write mark (when Ry doesn't have it)
that was just set by check_reg_arg(Rx) prior to the assignment.
That causes mark_reg_read() to keep marking Rx in this block as
REG_LIVE_READ (since the logic incorrectly misses that it's
screened by the write) and in many of its parents (until lucky
write into the same Rx or beginning of the program).
That causes is_state_visited() logic to miss many pruning opportunities.
Furthermore mark_reg_read() logic propagates the read mark
for BPF_REG_FP as well (though it's readonly) which causes
harmless but unnecssary work during is_state_visited().
Note that do_propagate_liveness() skips FP correctly,
so do the same in mark_reg_read() as well.
It saves 0.2 seconds for the test below
program before after
bpf_lb-DLB_L3.o 2604 2304
bpf_lb-DLB_L4.o 11159 3723
bpf_lb-DUNKNOWN.o 1116 1110
bpf_lxc-DDROP_ALL.o 34566 28004
bpf_lxc-DUNKNOWN.o 53267 39026
bpf_netdev.o 17843 16943
bpf_overlay.o 8672 7929
time ~11 sec ~4 sec
Fixes: dc503a8ad9 ("bpf/verifier: track liveness for pruning")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should be "802.3ad" like everywhere else in the document.
Signed-off-by: Axel Beckert <abe@deuxchevaux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 35e015e1f5 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
was intended to affect accept_dad flag handling in such a way that
DAD operation and mode on a given interface would be selected
according to the maximum value of conf/{all,interface}/accept_dad.
However, addrconf_dad_begin() checks for particular cases in which we
need to skip DAD, and this check was modified in the wrong way.
Namely, it was modified so that, if the accept_dad flag is 0 for the
given interface *or* for all interfaces, DAD would be skipped.
We have instead to skip DAD if accept_dad is 0 for the given interface
*and* for all interfaces.
Fixes: 35e015e1f5 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reported-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A couple of serious fixes (use after free and blacklist for WRITE
SAME). One error leg fix (write_pending failure) and one user
experience problem (do not override max_sectors_kb) and one minor
unused function removal.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZ2P68AAoJEAVr7HOZEZN48hEP/0d7RH77AjV1smqQHJpel7b8
WFh7zWHfhyHEmDMf1xtepqw1RAsrkXfRy12wOOc3ppnaozBIh1GIvjHRWXtaEIaA
qsTJkROWro3XskxfKL8n0CqeATuk6EjfE+ehRFyXQ9F9yhIB4FxaYROzGUfM9rON
ScD2vHH0sE4PIUiavizjxSk6G4KNvGyqM/xtgUIymH7Dcd7MorOq1WBvGXp7etkG
QSCV1tvB63yg2jcqatANTLO0LuI9N023VGA/QrTLzpu6M54QZwMGAmVwZfkQ1/nO
RLGWTj6jrB4RSF690NN1QLRnf58GYEyIEa37Dlwp/bLyHv4Y9NxO3KB/M0MIf/x2
PJ5FmUw7IwPVzAk6WmGoUIvscDnrDplzVL0fMZKlnW3+8mQav+IIev7sBvPjMkMw
HA7PLNXrEpR1tOBhr9je2V2Jz9KxARZFRUqm238Rq03W6kjYQQbSG+dC06A7o2DQ
UYuXCWp+CZhbBSG29qPf8wzNbdndpkmXatwLrwVmmRn+/eo3BGF5/SfOKu0M8PTu
M4apqkZTZdMAzPckIe0lg6RIJ+F5lWPPX454CZqivM8MFNyjHbf2VJAvQbU9dNhM
dfrPsLogZNgCop13H06xAFS6m3dIP5YqUEo/yWXciC6hnvCP7z6ZkiqHTHJIIJMm
vwQJLkkB2Ex4NscJcZfw
=X/MK
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
- a couple of serious fixes: use after free and blacklist for WRITE
SAME
- one error leg fix: write_pending failure
- one user experience problem: do not override max_sectors_kb
- one minor unused function removal
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ibmvscsis: Fix write_pending failure path
scsi: libiscsi: Remove iscsi_destroy_session
scsi: libiscsi: Fix use-after-free race during iscsi_session_teardown
scsi: sd: Do not override max_sectors_kb sysfs setting
scsi: sd: Implement blacklist option for WRITE SAME w/ UNMAP
Pull i2c fixes from Wolfram Sang:
"I2C has three driver fixes for the newly introduced drivers and one ID
addition for the i801 driver"
* 'i2c/for-current-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i2c-stm32f7: make structure stm32f7_setup static const
i2c: ensure termination of *_device_id tables
i2c: i801: Add support for Intel Cedar Fork
i2c: stm32f7: fix setup structure
- Fix driver strength selection when selecting hs400es
- Delete bounce buffer handling:
This change fixes a problem related to how bounce buffers are being
allocated. However, instead of trying to fix that, let's just remove
the mmc bounce buffer code altogether, as it has practically no use.
MMC host:
- meson-gx: A couple of fixes related to clock/phase/tuning
- sdhci-xenon: Fix clock resource by adding an optional bus clock
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZ2IGlAAoJEP4mhCVzWIwpi5gP/2dyF4FOK02BS24iV5LjElVV
TkwpfNWDYguU9eBwCSrDZnCBsfeKgRobpTlre5qtNqGK2YqAwEO9n5lsZQL8HG/6
8LyVKSy9a7DWbuiK58dvswHKabjeZXZo1JdBx1JONTSfezNGGL5ZvyiDS7mHWv10
MsX+VukX31kFmS1HOd85Ayhdz9NlYZjnNSfmUN8UMuGMMysV007icwp8QX1uwW1s
2uPS0DFdtOlCoSs1ln6cyQMSoRZjRJ5Dm/SUFvRec4X7LC3ORxVbBXtbxtDGO8dS
6TIZHPILMpTHJcah/ONAk1LTXHO5Wt+x5o6vkca6uaEQEnvyUhKqK2NpwXCINRit
OW+eJkAPv4J1a6/geZ99C7V+SDCCMsHeRrzfWxO1wkj/2ptu2OKtLPPJst2lLYgU
QEXTgW920SxSWvWQaXcBmgkGZ67cyw3h2pI09QsmZJ9M4jmQpyZVIGWHgfsMFRkj
iwtzuRL15qqnQqrs62eWJN383/b+BYfKzTnilVWExs+ozcpjYMYxYTGFTOKC1YV1
yeV+qL60gaK41HobWMJnbv0ckaPTLGZ1oOgB6F9OX6fGZz0LBh1yiOjYG1j66fwb
KgDzN5sX4Ab/gDOT8zH1G8fGLYGBLDZMciuXrZfYf+mzJF8rpTI0he8BUwJm6KGC
YJ+kW1MnnWATW+U63hRs
=0WJ+
-----END PGP SIGNATURE-----
Merge tag 'mmc-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix driver strength selection when selecting hs400es
- Delete bounce buffer handling:
This change fixes a problem related to how bounce buffers are being
allocated. However, instead of trying to fix that, let's just
remove the mmc bounce buffer code altogether, as it has practically
no use.
MMC host:
- meson-gx: A couple of fixes related to clock/phase/tuning
- sdhci-xenon: Fix clock resource by adding an optional bus clock"
* tag 'mmc-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-xenon: Fix clock resource by adding an optional bus clock
mmc: meson-gx: include tx phase in the tuning process
mmc: meson-gx: fix rx phase reset
mmc: meson-gx: make sure the clock is rounded down
mmc: Delete bounce buffer handling
mmc: core: add driver strength selection when selecting hs400es
- Suspend fix for Samsung Exynos SoCs where we need to keep clks on
across suspend
- Two critical clk markings for clks that shouldn't ever turn off on
Rockchip SoCs
- A fix for a copy-paste mistake on Rockchip rk3128 causing some clks to
touch the same bit and trample over one another
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJZ2A7QAAoJEK0CiJfG5JUlbeMQAJZ6rty/oz0mMSt0V621pQ+0
FJo9Kv+mmWfdZ/WMUgfmqhvty5S5BYW69dQFZuwB4eWVZ8rFa8ZLDi85sAxmd8Q2
mTnhwz1QhCiXbllTHTAx024h2CvTe/fScw0+SEXoTTCDnHVnnNwLcElb8YdBNUKF
dm1fd26hmZpqrW9vRKYuxa96+aSfRzS1DnrRyn9a+KSmA4XTTJkfU8W2qxCEn2Em
rkIBVdWlRGA4Xk/e2pct9Ov/CMiMMNWE3pHGdzS3FtLUd+c0ocs9XO/2NVvBJQ7R
AIiWmkdUTLEkRwos/u1JqvtPxXx+qouEV3hsdWfJQL65Hz0clOcbyh6DtytOu7vi
I7QxF92fkOl1kQhUSWkzdKCVnZklUSEJQkkibyDksiVENgk+UASWyVTE+INR12tT
7jO+aj/u1nMRlZ0lgTjvh0uWioZyL/+6DWSphuh0W6xcRsG4T2kTQDt463WHBkxM
YjZQurtUZN2SKnCzTNInHAjJ0agqkD3rR67yZTxSOFJM6Coeu2faoe/Jke7in7lV
HAMWpxonvRHt9P+wu8CNwuyR1z9ZdlUTrOLkB2tBoXF8WpEUCdK4yIbTD940uQvG
Od3ltW4HzyCRXl4tlcklDyGj08McxL4Cv8OYf4PjVq2XSWaxlVDhoXbZJG8OfFnN
CJgJVOw4JWB7tEMhpL9U
=jV1S
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
- build fix to export the clk_bulk_prepare() symbol
- suspend fix for Samsung Exynos SoCs where we need to keep clks on
across suspend
- two critical clk markings for clks that shouldn't ever turn off on
Rockchip SoCs
- a fix for a copy-paste mistake on Rockchip rk3128 causing some clks
to touch the same bit and trample over one another
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: samsung: exynos4: Enable VPLL and EPLL clocks for suspend/resume cycle
clk: Export clk_bulk_prepare()
clk: rockchip: add sclk_timer5 as critical clock on rk3128
clk: rockchip: fix up rk3128 pvtm and mipi_24m gate regs error
clk: rockchip: add pclk_pmu as critical clock on rk3128
- Updates for various platforms
- boot log updates for upcoming HS48 family of cores (dual issue)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZ1/PLAAoJEGnX8d3iisJehiAP/jBOk2hPMZHQrD9j2m1oCihb
LrV/gSPIRHQAKeCCcCSaIO1SSnpCkcoeL6w6LXMtH+og4wFS47KxCe+l7KKp9L3y
Btkc4JZL7GKa9Sk99KlllBMB7ysC+CzCCGpcuQC7AxCdFEBmkYvP8se7cVWVOMAV
ZcCI0K498T5N/3kFkpQEQJ1XcN5V+jNtEcvFUyZzHGyW17pBUaYc44/lQra+fSDP
iWsDD6a/lWZ6TntLR/JlCxKUWXo18ZgQUxe0c9mFO0cv27vvuWGLonts9PY6U9v0
M7Tc3AtxkUc4tHwzkOPrJDiLEtFHhYkpD2P2CIwwi/16ysA2XCYXsdkChTbTSQFs
+kjOs7QtG5NXT7LUp4lSLnOgVtkH88pAcfeujHNDwqJ5bOQRtxtb4XJP2zsYnVr8
ec1BLf1BRrq4W06/v5J1VmNP0CBVB7bZkJU0d+Q4OJMn11nFJmg1/7VT3EpB6T87
heQkXnTU8OuVYE/KYN7EIhqcrR7+rQL95BghJmevdtPkkQkuR+yoJCIJEsG/WDu9
OzS+gmGgeuAgIRaewGKlZsNN+TCAELdK8ZiKjaDDsyxrExQcEYgGRYh/IOR4ny0P
VDUwr3FrEr+jrt8mtaUrG9DalLXPxfFBrQO8QNJUfHTF197EIyuZiAZF9++pkyxb
QEk7uPIYOPujUXc25vkY
=Lcdb
-----END PGP SIGNATURE-----
Merge tag 'arc-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC udpates from Vineet Gupta:
- updates for various platforms
- boot log updates for upcoming HS48 family of cores (dual issue)
* tag 'arc-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [plat-hsdk]: Add reset controller node to manage ethernet reset
ARC: [plat-hsdk]: Temporary fix to set CPU frequency to 1GHz
ARC: fix allnoconfig build warning
ARCv2: boot log: identify HS48 cores (dual issue)
ARC: boot log: decontaminate ARCv2 ISA_CONFIG register
arc: remove redundant UTS_MACHINE define in arch/arc/Makefile
ARC: [plat-eznps] Update platform maintainer as Noam left
ARC: [plat-hsdk] use actual clk driver to manage cpu clk
ARC: [*defconfig] Reenable soft lock-up detector
ARC: [plat-axs10x] sdio: Temporary fix of sdio ciu frequency
ARC: [plat-hsdk] sdio: Temporary fix of sdio ciu frequency
ARC: [plat-axs103] Add temporary quirk to reset ethernet IP
Pull block fixes from Jens Axboe:
"A collection of fixes for this series. This contains:
- NVMe pull request from Christoph, one uuid attribute fix, and one
fix for the controller memory buffer address for remapped BARs.
- use-after-free fix for bsg, from Benjamin Block.
- bcache race/use-after-free fix for a list traversal, fixing a
regression in this merge window. From Coly Li.
- null_blk change configfs dependency change from a 'depends' to a
'select'. This is a change from this merge window as well. From me.
- nbd signal fix from Josef, fixing a regression introduced with the
status code changes.
- nbd MAINTAINERS mailing list entry update.
- blk-throttle stall fix from Joseph Qi.
- blk-mq-debugfs fix from Omar, fixing an issue where we don't
register the IO scheduler debugfs directory, if the driver is
loaded with it. Only shows up if you switch through the sysfs
interface"
* 'for-linus' of git://git.kernel.dk/linux-block:
bsg-lib: fix use-after-free under memory-pressure
nvme-pci: Use PCI bus address for data/queues in CMB
blk-mq-debugfs: fix device sched directory for default scheduler
null_blk: change configfs dependency to select
blk-throttle: fix possible io stall when upgrade to max
MAINTAINERS: update list for NBD
nbd: fix -ERESTARTSYS handling
nvme: fix visibility of "uuid" ns attribute
bcache: use llist_for_each_entry_safe() in __closure_wake_up()