Commit Graph

685577 Commits

Author SHA1 Message Date
Christos Gkekas
5614fd84eb netxen_nic: Remove unused pointer hdr in netxen_setup_minidump()
Pointer hdr in netxen_setup_minidump() is set but never used, thus
should be removed.

Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:38:02 -07:00
David S. Miller
4b821cb200 Merge branch 'vxlan-geneve-fix-hlist-corruption'
Jiri Benc says:

====================
vxlan, geneve: fix hlist corruption

Fix memory corruption introduced with the support of both IPv4 and IPv6
sockets in a single device. The same bug is present in VXLAN and Geneve.
====================

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:36:42 -07:00
Jiri Benc
4b4c21fad6 geneve: fix hlist corruption
It's not a good idea to add the same hlist_node to two different hash lists.
This leads to various hard to debug memory corruptions.

Fixes: 8ed66f0e82 ("geneve: implement support for IPv6-based tunnels")
Cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:36:27 -07:00
Jiri Benc
69e766612c vxlan: fix hlist corruption
It's not a good idea to add the same hlist_node to two different hash lists.
This leads to various hard to debug memory corruptions.

Fixes: b1be00a6c3 ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:36:27 -07:00
Or Gerlitz
c1c1d86bde net/mlxfw: Properly handle dependancy with non-loadable mlx5
If mlx5 is set to be built-in and mlxfw as a module, we
get a link error:

drivers/built-in.o: In function `mlx5_firmware_flash':
(.text+0x5aed72): undefined reference to `mlxfw_firmware_flash'

Since we don't want to mandate selecting mlxfw for mlx5 users, we
use the IS_REACHABLE macro to make sure that a stub is exposed
to the caller.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:32:25 -07:00
David S. Miller
b2c9c5df66 iucv: Convert sk_wmem_alloc accesses to refcount_t.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:31:22 -07:00
David S. Miller
bba5850c8b ctcm_fsms: Convert skb->user accesses to refcount_t
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:29:57 -07:00
David S. Miller
63d7c880c5 Merge branch 'bpf-misc-helper-verifier-improvements'
Daniel Borkmann says:

====================
Misc BPF helper/verifier improvements

Miscellanous improvements I still had in my queue, it adds a new
bpf_skb_adjust_room() helper for cls_bpf, exports to fdinfo whether
tail call array owner is JITed, so iproute2 error reporting can be
improved on that regard, a small cleanup and extension to trace
printk, two verifier patches, one to make the code around narrower
ctx access a bit more straight forward and one to allow for imm += x
operations, that we've seen LLVM generating and the verifier currently
rejecting. We've included the patch 6 given it's rather small and
we ran into it from LLVM side, it would be great if it could be
queued for stable as well after the merge window. Last but not least,
test cases are added also related to imm alu improvement.

Thanks a lot!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:53 -07:00
Daniel Borkmann
6d191ed40d bpf: add various test cases for verifier selftest
Add couple of verifier test cases for x|imm += pkt_ptr, including the
imm += x extension.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
John Fastabend
43188702b3 bpf, verifier: add additional patterns to evaluate_reg_imm_alu
Currently the verifier does not track imm across alu operations when
the source register is of unknown type. This adds additional pattern
matching to catch this and track imm. We've seen LLVM generating this
pattern while working on cilium.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
John Fastabend
7bda4b40c5 bpf: extend bpf_trace_printk to support %i
Currently, bpf_trace_printk does not support common formatting
symbol '%i' however vsprintf does and is what eventually gets
called by bpf helper. If users are used to '%i' and currently
make use of it, then bpf_trace_printk will just return with
error without dumping anything to the trace pipe, so just add
support for '%i' to the helper.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Daniel Borkmann
9780c0ab1a bpf: export whether tail call has jited owner
We do export through fdinfo already whether a prog is JITed or not,
given a program load can fail in case of either prog or tail call map
has JITed property, but neither both are JITed or not JITed, we can
facilitate error reporting in loaders like iproute2 through exporting
owner_jited of tail call map. We already do export owner_prog_type
through this facility, so parser can pick up both for comparison.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Daniel Borkmann
f96da09473 bpf: simplify narrower ctx access
This work tries to make the semantics and code around the
narrower ctx access a bit easier to follow. Right now
everything is done inside the .is_valid_access(). Offset
matching is done differently for read/write types, meaning
writes don't support narrower access and thus matching only
on offsetof(struct foo, bar) is enough whereas for read
case that supports narrower access we must check for
offsetof(struct foo, bar) + offsetof(struct foo, bar) +
sizeof(<bar>) - 1 for each of the cases. For read cases of
individual members that don't support narrower access (like
packet pointers or skb->cb[] case which has its own narrow
access logic), we check as usual only offsetof(struct foo,
bar) like in write case. Then, for the case where narrower
access is allowed, we also need to set the aux info for the
access. Meaning, ctx_field_size and converted_op_size have
to be set. First is the original field size e.g. sizeof(<bar>)
as in above example from the user facing ctx, and latter
one is the target size after actual rewrite happened, thus
for the kernel facing ctx. Also here we need the range match
and we need to keep track changing convert_ctx_access() and
converted_op_size from is_valid_access() as both are not at
the same location.

We can simplify the code a bit: check_ctx_access() becomes
simpler in that we only store ctx_field_size as a meta data
and later in convert_ctx_accesses() we fetch the target_size
right from the location where we do convert. Should the verifier
be misconfigured we do reject for BPF_WRITE cases or target_size
that are not provided. For the subsystems, we always work on
ranges in is_valid_access() and add small helpers for ranges
and narrow access, convert_ctx_accesses() sets target_size
for the relevant instruction.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Daniel Borkmann
2be7e212d5 bpf: add bpf_skb_adjust_room helper
This work adds a helper that can be used to adjust net room of an
skb. The helper is generic and can be further extended in future.
Main use case is for having a programmatic way to add/remove room to
v4/v6 header options along with cls_bpf on egress and ingress hook
of the data path. It reuses most of the infrastructure that we added
for the bpf_skb_change_type() helper which can be used in nat64
translations. Similarly, the helper only takes care of adjusting the
room so that related data is populated and csum adapted out of the
BPF program using it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Daniel Borkmann
0daf434940 bpf, net: add skb_mac_header_len helper
Add a small skb_mac_header_len() helper similarly as the
skb_network_header_len() we have and replace open coded
places in BPF's bpf_skb_change_proto() helper. Will also
be used in upcoming work.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:51 -07:00
Tore Anderson
a68491f895 net: cdc_mbim: apply "NDP to end" quirk to HP lt4132
The HP lt4132 LTE/HSPA+ 4G Module (03f0:a31d) is a rebranded Huawei
ME906s-158 device. It, like the ME906s-158, requires the "NDP to end"
quirk for correct operation.

Signed-off-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:19:36 -07:00
Matteo Croce
75674c4cbb Documentation: fix wrong example command
In the IPVLAN documentation there is an example command line where the
master and slave interface names are inverted.
Fix the command line and also add the optional `name' keyword to better
describe what the command is doing.

v2: added commit message

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:08:34 -07:00
Sabrina Dubroca
889ce937c9 vxlan: correctly set vxlan->net when creating the device in a netns
Commit a985343ba9 ("vxlan: refactor verification and application of
configuration") modified vxlan device creation, and replaced the
assignment of vxlan->net to src_net with dev_net(netdev) in ->setup().

But dev_net(netdev) is not the same as src_net. At the time ->setup()
is called, dev_net hasn't been set yet, so we end up creating the
socket for the vxlan device in init_net.

Fix this by bringing back the assignment of vxlan->net during device
creation.

Fixes: a985343ba9 ("vxlan: refactor verification and application of configuration")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:04:10 -07:00
David S. Miller
c3b99db809 Merge branch 'hns-phy-loopback'
Lin Yun Sheng says:

====================
Add loopback support in phy_driver and hns ethtool fix

This Patch Set add set_loopback in phy_driver and use it to setup loopback
when doing ethtool phy self_test.

Patch V8:
	Respin the Patch based on net-next

Patch V7:
	1. Add comment why resume the phy in hns_nic_config_phy_loopback.
	2. Fix a typo error in patch description.

Patch V6:
	Fix Or'ing error code in __lb_setup.

Patch V5:
	Removing non loopback related code change.

Patch V4:
	1. Remove c45 checking
	2. Add -ENOTSUPP when function pointer is null,
	   take mutex in phy_loopback.

Patch V3:
	Calling phy_loopback enable and disable in pair in hns mac driver.

Patch V2:
	1. Add phy_loopback in phy_device.c.
	2. Do error checking and do the read and write once in
	   genphy_loopback.
	3. Remove gen10g_loopback in phy_device.c.

Patch V1:
	Initial Submit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:01:16 -07:00
Lin Yun Sheng
67cd9a997f net: hns: Use phy_driver to setup Phy loopback
Use function set_loopback in phy_driver to setup phy loopback
when doing ethtool self test.

Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:01:15 -07:00
Lin Yun Sheng
f0f9b4ed23 net: phy: Add phy loopback support in net phy framework
This patch add set_loopback in phy_driver, which is used by MAC
driver to enable or disable phy loopback. it also add a generic
genphy_loopback function, which use BMCR loopback bit to enable
or disable loopback.

Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:01:15 -07:00
Stephen Rothwell
6992c6c5dd net/mlx5: fix memcpy limit?
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:57:27 -07:00
Sabrina Dubroca
ec8add2a4c ipv6: dad: don't remove dynamic addresses if link is down
Currently, when the link for $DEV is down, this command succeeds but the
address is removed immediately by DAD (1):

    ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800

In the same situation, this will succeed and not remove the address (2):

    ip addr add 1111::12/64 dev $DEV
    ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800

The comment in addrconf_dad_begin() when !IF_READY makes it look like
this is the intended behavior, but doesn't explain why:

     * If the device is not ready:
     * - keep it tentative if it is a permanent address.
     * - otherwise, kill it.

We clearly cannot prevent userspace from doing (2), but we can make (1)
work consistently with (2).

addrconf_dad_stop() is only called in two cases: if DAD failed, or to
skip DAD when the link is down. In that second case, the fix is to avoid
deleting the address, like we already do for permanent addresses.

Fixes: 3c21edbd11 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:53:51 -07:00
Jim Baxter
e1069bbfcf net: cdc_ncm: Reduce memory use when kernel memory low
The CDC-NCM driver can require large amounts of memory to create
skb's and this can be a problem when the memory becomes fragmented.

This especially affects embedded systems that have constrained
resources but wish to maximise the throughput of CDC-NCM with 16KiB
NTB's.

The issue is after running for a while the kernel memory can become
fragmented and it needs compacting.
If the NTB allocation is needed before the memory has been compacted
the atomic allocation can fail which can cause increased latency,
large re-transmissions or disconnections depending upon the data
being transmitted at the time.
This situation occurs for less than a second until the kernel has
compacted the memory but the failed devices can take a lot longer to
recover from the failed TX packets.

To ease this temporary situation I modified the CDC-NCM TX path to
temporarily switch into a reduced memory mode which allocates an NTB
that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
sized memory block and only transmit NTB's with a single network frame
until the memory situation is resolved.
Each time this issue occurs we wait for an increasing number of
reduced size allocations before requesting a full size one to not
put additional pressure on a low memory system.

Once the memory is compacted the CDC-NCM data can resume transmitting
at the normal tx_max rate once again.

Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:50:49 -07:00
David S. Miller
2da95be940 Merge branch 'qed-Add-iWARP-support-for-QL4xxxx'
Michal Kalderon says:

====================
qed: Add iWARP support for QL4xxxx

This patch series adds iWARP support to our QL4xxxx networking adapters.
The code changes span across qed and qedr drivers, but this series contains
changes to qed only. Once the series is accepted, the qedr series will
be submitted to the rdma tree.
There is one additional qed patch which enables the iWARP, this patch is
delayed until the qedr series will be accepted.

The patches were previously sent as an RFC, and these are the first 12
patches in the RFC series:
https://www.spinics.net/lists/linux-rdma/msg51416.html

This series was tested and built against net-next.

MAINTAINERS file is not updated in this PATCH as there is a pending patch
for qedr driver update https://patchwork.kernel.org/patch/9752761.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:46 -07:00
Kalderon, Michal
93c45984d3 qed: Add iWARP support for physical queue allocation
iWARP has different physical queue requirements than RoCE

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00
Kalderon, Michal
5d7dc9620d qed: Add iWARP protocol support in context allocation
When computing how much memory is required for the different hw clients
iWARP protocol should be taken into account

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00
Kalderon, Michal
9816b61434 qed: iWARP CM add error handling
This patch introduces error handling for errors that occurred during
connection establishment.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00
Kalderon, Michal
fc4c6065e6 qed: iWARP implement disconnect flows
This patch takes care of active/passive disconnect flows.
Disconnect flows can be initiated remotely, in which case a async event
will arrive from peer and indicated to qedr driver. These
are referred to as exceptions. When a QP is destroyed, it needs to check
that it's associated ep has been closed.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00
Kalderon, Michal
4b0fdd7c8b qed: iWARP CM add active side connect
This patch implements the active side connect.
Offload a connection, process MPA reply and send RTR.
In some of the common passive/active functions, the active side
will work in blocking mode.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00
Kalderon, Michal
456a584947 qed: iWARP CM add passive side connect
This patch implements the passive side connect.
It addresses pre-allocating resources, creating a connection
element upon valid SYN packet received. Calling upper layer and
implementation of the accept/reject calls.

Error handling is not part of this patch.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00
Kalderon, Michal
65a91a6cdb qed: iWARP CM add listener functions and initial SYN processing
This patch adds the ability to add and remove listeners and identify
whether the SYN packet received is intended for iWARP or not. If
a listener is not found the SYN packet is posted back to the chip.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00
Kalderon, Michal
b5c29ca7da qed: iWARP CM - setup a ll2 connection for handling SYN packets
iWARP handles incoming SYN packets using the ll2 interface. This patch
implements ll2 setup and teardown. Additional ll2 connections will
be used in the future which are not part of this patch series.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00
Kalderon, Michal
cc4ad324e7 qed: Add iWARP support in ll2 connections
Add a new connection type for iWARP ll2 connections for setting
correct ll2 filters and connection type to FW.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:44 -07:00
Kalderon, Michal
526d1d05e4 qed: Rename some ll2 related defines
Make some names more generic as they will be used by iWARP too.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:44 -07:00
Kalderon, Michal
67b40dccc4 qed: Implement iWARP initialization, teardown and qp operations
This patch adds iWARP support for flows that have common code
between RoCE and iWARP, such as initialization, teardown and
qp setup verbs: create, destroy, modify, query.
It introduces the iWARP specific files qed_iwarp.[ch] and
iwarp_common.h

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:44 -07:00
Kalderon, Michal
c851a9dc43 qed: Introduce iWARP personality
iWARP personality introduced the need for differentiating in several
places in the code whether we are RoCE, iWARP or either. This
leads to introducing new macros for querying the personality.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:44 -07:00
Linus Torvalds
6f7da29041 Linux 4.12 2017-07-02 16:07:02 -07:00
Sylvain 'ythier' Hitier
401e000ab9 moduleparam: fix doc: hwparam_irq configures an IRQ
Signed-off-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-02 15:37:23 -07:00
Lawrence Brakmo
a5192c5237 bpf: fix to bpf_setsockops
Fixed build error due to misplaced "#ifdef CONFIG_INET" (moved 1
statement up).

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-02 15:16:52 -07:00
Helge Deller
247462316f parisc: Report SIGSEGV instead of SIGBUS when running out of stack
When a process runs out of stack the parisc kernel wrongly faults with SIGBUS
instead of the expected SIGSEGV signal.

This example shows how the kernel faults:
do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000]
trap #15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000

The vma->vm_end value is the first address which does not belong to the vma, so
adjust the check to include vma->vm_end to the range for which to send the
SIGSEGV signal.

This patch unbreaks building the debian libsigsegv package.

Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-02 22:27:08 +02:00
Eric Biggers
b0f94efd5a parisc: use compat_sys_keyctl()
Architectures with a compat syscall table must put compat_sys_keyctl()
in it, not sys_keyctl().  The parisc architecture was not doing this;
fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-02 22:10:47 +02:00
Linus Torvalds
79c4968169 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
 "Here's a final round of fixes for 4.12:

   - Fix misordered instructions in assembly code making kenel startup
     via UHB unreliable.

   - Fix special case of MADDF and MADDF emulation.

   - Fix alignment issue in address calculation in pm-cps on 64 bit.

   - Fix IRQ tracing & lockdep when rescheduling

   - Systems with MAARs require post-DMA cache flushes.

  The reordering fix and the MADDF/MSUBF fix have sat in linux-next for
  a number of days. The others haven't propagated from my pull tree to
  linux-next yet but all have survived manual testing and Imagination's
  automated test system and there are no pending bug reports"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Avoid accidental raw backtrace
  MIPS: Perform post-DMA cache flushes on systems with MAARs
  MIPS: Fix IRQ tracing & lockdep when rescheduling
  MIPS: pm-cps: Drop manual cache-line alignment of ready_count
  MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately
  MIPS: head: Reorder instructions missing a delay slot
2017-07-02 11:53:44 -07:00
Linus Torvalds
3a61a54cd7 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
 "One final fix for 4.12 - Doug found a boot failure case triggered by
  requesting a non-even MB vmalloc size"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8685/1: ensure memblock-limit is pmd-aligned
2017-07-02 10:09:40 -07:00
Kees Cook
5d6dec6fba locking/refcount: Remove the half-implemented refcount_sub() API
CONFIG_REFCOUNT_FULL=y (correctly) does not provide a refcount_sub(),
which should not be part of proper refcount design patterns.

Remove the erroneous extern and the later !CONFIG_REFCOUNT_FULL
accidental implementation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 29dee3c03a ("locking/refcounts: Out-of-line everything")
Link: http://lkml.kernel.org/r/20170701180129.GA17405@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-02 11:24:36 +02:00
David S. Miller
bcdb239b45 Merge branch 'bpf-Add-support-for-sock_ops'
Lawrence Brakmo says:

====================
bpf: Add support for sock_ops

Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.) and setting
connection parameters such as buffer sizes, initial window, SYN/SYN-ACK
RTOs, etc.

Unlike current BPF program types that expect to be called at a particular
place in the network stack code, SOCK_OPS program can be called at
different places and use an "op" field to indicate the context. There
are currently two types of operations, those whose effect is through
their return value and those whose effect is through the new
bpf_setsocketop BPF helper function.

Example operands of the first type are:
  BPF_SOCK_OPS_TIMEOUT_INIT
  BPF_SOCK_OPS_RWND_INIT
  BPF_SOCK_OPS_NEEDS_ECN

Example operands of the secont type are:
  BPF_SOCK_OPS_TCP_CONNECT_CB
  BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB
  BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB

Current operands are only called during connection establishment so
there should not be any BPF overheads after connection establishment. The
main idea is to use connection information form both hosts, such as IP
addresses and ports to allow setting of per connection parameters to
optimize the connection's peformance.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it does not require
application changes and it can be updated easily at any time.

It uses the existing bpf cgroups infrastructure so the programs can be
attached per cgroup with full inheritance support. Although the bpf cgroup
framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK),
I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type
expects to be called only once during the connections's lifetime. In contrast,
the new program type will be called multiple times from different places in the
network stack code.  For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set congestion
control, etc. As a result it has "op" field to specify the type of operation
requested.

This patch set also includes sample BPF programs to demostrate the differnet
features.

v2: Formatting changes, rebased to latest net-next

v3: Fixed build issues, changed socket_ops to sock_ops throught,
    fixed formatting issues, removed the syscall to load sock_ops
    program and added functionality to use existing bpf attach and
    bpf detach system calls, removed reader/writer locks in
    sock_bpfops.c (used when saving sock_ops global program)
    and fixed missing module refcount increment.

v4: Removed global sock_ops program and instead used existing cgroup bpf
    infrastructure to support a new BPF_CGROUP_ATTCH type.

v5: fixed kbuild warning happening in bpf-cgroup.h
    removed automatic converstion to host byte order from some sock_ops
      fields (ipv4 and ipv6 addresses, remote port)
    Added conversion to host byte order in some of the sample programs
    Added to sample BPF program comments about using load_sock_ops to load
    Removed is_req_sock field from bpf_sock_ops_kern and related places,
      using sk_fullsock() instead.

v6: fixes to BPF helper function setsockopt (possible NULL deferencing, etc.)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:15 -07:00
Lawrence Brakmo
04df41e343 bpf: update tools/include/uapi/linux/bpf.h
Update tools/include/uapi/linux/bpf.h to include changes related to new
bpf sock_ops program type.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo
6c4a01b278 bpf: Sample bpf program to set sndcwnd clamp
Sample BPF program, tcp_clamp_kern.c, to demostrate the use
of setting the sndcwnd clamp. This program assumes that if the
first 5.5 bytes of the host's IPv6 addresses are the same, then
the hosts are in the same datacenter and sets sndcwnd clamp to
100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
sizes to 150KB.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo
13bf96411a bpf: Adds support for setting sndcwnd clamp
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which
sets the initial congestion window. It is useful to limit the sndcwnd
when the host are close to each other (small RTT).

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo
7bc62e2854 bpf: Sample BPF program to set initial cwnd
Sample BPF program that assumes hosts are far away (i.e. large RTTs)
and sets initial cwnd and initial receive window to 40 packets,
send and receive buffers to 1.5MB.

In practice there would be a test to insure the hosts are actually
far enough away.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00