This is a revert of the EMAC bindings. The discussion has not settled down
yet on a proper representation of the PHY, and therefore we cannot commit
to a binding yet
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZo+FRAAoJEBx+YmzsjxAgarAP/3UNFp1qce9IorGf9YOYFRcf
YVFf/laHpEsbjGC2iDDCMSrW34hx1Jsmnci8Pp9QM6d5vzDg4F41MW1T6Vx6R72v
DioMLctVi6nHUmJa0nJuiwhMQUmu+9gn7qFYk1N/bBFmKNrrettNv6kb929a+U29
5xiBNzaV7lVOPLJleOcAjRhxndsCcoSQ7r5yUQ04L0D2VItSw+tkQg9RVCeW/wxZ
mGgXUSSvz0IMQ9bAyczG8RRlT2tLXRyXqJ5ktPJbxigV9OehMQOsShZZvzTiisbp
pQxw5NoWuy4Bs1dnSW9+FmkQc+ikhD+jl1l8e/dFN8OS6YmBZYSWsJ7usdxMmxx/
iiVPFFg0cheDlnu3njokBkOOUiEwYcM662bXDhXe8nZlVlZHfomq50JW89PVdM9F
6hXPO38uYGf8qjD35N55mU97iLRZawHTeBHvH1uhOCRxxAVYpgEFMyWSVXjDVzZv
SncMlMqx0wDtc11G72aF3TL6Mvfcp88X52l0otvAiQbjLltOSxUuDLXQd/ydVE+p
jRJXjYPxQa+TKg/36WGe2gdLIONiV39NSpy951gJR96OI4/CcuSkk8n1oMmMYBbB
2JMOC3q3KypuHOUXyw1fyZdZvgvQXWOwiETXqp7VnCIHP0RRXEKJC/41vLBwhlb0
0YNQmFIbjzIFGFN6PjsX
=0xy6
-----END PGP SIGNATURE-----
Merge tag 'sunxi-fixes-for-4.13-3' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes
Allwinner fixes for 4.13, take 3
This is a revert of the EMAC bindings. The discussion has not settled down
yet on a proper representation of the PHY, and therefore we cannot commit
to a binding yet
* tag 'sunxi-fixes-for-4.13-3' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm: dts: sunxi: Revert EMAC changes
arm64: dts: allwinner: Revert EMAC changes
dt-bindings: net: Revert sun8i dwmac binding
Signed-off-by: Olof Johansson <olof@lixom.net>
The phy is connected at early stage of probe but not properly
disconnected if error occurs. This patch fixes the issue.
Also changing the return type of xgene_enet_check_phy_handle(),
since this function always returns success.
Signed-off-by: Quan Nguyen <qnguyen@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both the nfp_net_pf_app_start() and the nfp_net_pci_probe() functions
call nfp_net_pf_app_stop_ctrl(pf) so there is a double free. The free
should be done from the probe function because it's allocated there so
I have removed the call from nfp_net_pf_app_start().
Fixes: 02082701b9 ("nfp: create control vNICs and wire up rx/tx")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Belous says:
====================
net:ethernet:aquantia: Atlantic driver Update 2017-08-23
This series contains updates for aQuantia Atlantic driver.
It has bugfixes and some improvements.
Changes in v2:
- "MCP state change" fix removed (will be sent as
a separate fix after further investigation.)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We should inform user about wrong firmware version
by printing message in dmesg.
Fixes: 3d2ff7eebe26 ("net: ethernet: aquantia: Atlantic hardware abstraction layer")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the HW supports up to 32 multicast filters we should
track count of multicast filters to avoid overflow.
If we attempt to add >32 multicast filter - just set NETIF_ALLMULTI flag
instead.
Fixes: 94f6c9e4cdf6 ("net: ethernet: aquantia: Support for NIC-specific code")
Signed-off-by: Igor Russkikh <Igor.Russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver choose the optimal interrupt throttling settings depends
of current link speed.
Due this bug link_status field from aq_hw is never updated and as result
always used same interrupt throttling values.
Fixes: 3d2ff7eebe26 ("net: ethernet: aquantia: Atlantic hardware abstraction layer")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware has the HW Checksum Offload bug when small
TCP patckets (with length <= 60 bytes) has wrong "checksum valid" bit.
The solution is - ignore checksum valid bit for small packets
(with length <= 60 bytes) and mark this as CHECKSUM_NONE to allow
network stack recalculate checksum itself.
Fixes: ccf9a5ed14be ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of RSS queues should be not more than numbers of CPU.
Its does not make sense to increase perfomance, and also cause problems on
some motherboards.
Fixes: 94f6c9e4cdf6 ("net: ethernet: aquantia: Support for NIC-specific code")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes datapath spinlocks which does not perform any
useful work.
Fixes: 6e70637f9f1e ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
... which may happen with certain values of tp_reserve and maclen.
Fixes: 58d19b19cd ("packet: vnet_hdr support for tpacket_rcv")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For a bond slave device as a tipc bearer, the dev represents the bond
interface and orig_dev represents the slave in tipc_l2_rcv_msg().
Since we decode the tipc_ptr from bonding device (dev), we fail to
find the bearer and thus tipc links are not established.
In this commit, we register the tipc protocol callback per device and
look for tipc bearer from both the devices.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmwgfx currently cannot support non-blocking commit because when
vmw_*_crtc_page_flip is called, drm_atomic_nonblocking_commit()
schedules the update on a thread. This means vmw_*_crtc_page_flip
cannot rely on the new surface being bound before the subsequent
dirty and flush operations happen.
Cc: <stable@vger.kernel.org> # 4.12.x
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Compare the number of bytes actually seen on the wire to the byte
count field returned by the slave device.
Previously we just overwrote the byte count returned by the slave
with the real byte count and let the caller figure out if the
message was sane.
Signed-off-by: Stephen Douthit <stephend@adiengineering.com>
Tested-by: Dan Priamo <danp@adiengineering.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
According to Table 15-14 of the C2000 EDS (Intel doc #510524) the
rx data pointed to by the descriptor dptr contains the byte count.
desc->rxbytes reports all bytes read on the wire, including the
"byte count" byte. So if a device sends 4 bytes in response to a
block read, on the wire and in the DMA buffer we see:
count data1 data2 data3 data4
0x04 0xde 0xad 0xbe 0xef
That's what we want to return in data->block to the next level.
Instead we were actually prefixing that with desc->rxbytes:
bad
count count data1 data2 data3 data4
0x05 0x04 0xde 0xad 0xbe 0xef
This was discovered while developing a BMC solution relying on the
ipmi_ssif.c driver which was trying to interpret the bogus length
field as part of the IPMI response.
Signed-off-by: Stephen Douthit <stephend@adiengineering.com>
Tested-by: Dan Priamo <danp@adiengineering.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
This fixes compiler errors in perf such as:
tests/attr.c: In function 'store_event':
tests/attr.c:66:27: error: format '%llu' expects argument of type 'long long unsigned int', but argument 6 has type '__u64 {aka long unsigned int}' [-Werror=format=]
snprintf(path, PATH_MAX, "%s/event-%d-%llu-%d", dir,
^
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Michael Cree <mcree@orcon.net.nz>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Turner <mattst88@gmail.com>
Commit 3cc2dac5be ("drivers/video/fbdev/atyfb: Replace MTRR UC hole
with strong UC") introduces calls to ioremap_wc and ioremap_uc. This
causes build failures with alpha:allmodconfig. Map the missing functions
to ioremap_nocache.
Fixes: 3cc2dac5be ("drivers/video/fbdev/atyfb:
Replace MTRR UC hole with strong UC")
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Since commit 71810db27c (modversions: treat symbol CRCs
as 32 bit quantities) R_ALPHA_REFLONG relocations can be required
to load modules. This implements it.
Tested-by: Bob Tracy <rct@gherkin.frus.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Patch 8525023121 introduced a typo.
That said, the identity AND insns added by that patch are more
clearly written as MOV. At the same time, re-schedule the ev6
version so that the first dispatch can execute in parallel.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
There are direct branches between {str*cpy,str*cat} and stx*cpy.
Ensure the branches are within range by merging these objects.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Pull cgroup fix from Tejun Heo:
"A late but obvious fix for cgroup.
I broke the 'cpuset.memory_pressure' file a long time ago (v4.4) by
accidentally deleting its file index, which made it a duplicate of the
'cpuset.memory_migrate' file. Spotted and fixed by Waiman"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: Fix incorrect memory_pressure control file mapping
Pull libata fixes from Tejun Heo:
"Late fixes for libata. There's a minor platform driver fix but the
important one is READ LOG PAGE.
This is a new ATA command which is used to test some optional features
but it broke probing of some devices - they locked up instead of
failing the unknown command.
Christoph tried blacklisting, but, after finding out there are
multiple devices which fail this way, backed off to testing feature
bit in IDENTIFY data first, which is a bit lossy (we can miss features
on some devices) but should be a lot safer"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
Revert "libata: quirk read log on no-name M.2 SSD"
libata: check for trusted computing in IDENTIFY DEVICE data
libata: quirk read log on no-name M.2 SSD
sata: ahci-da850: Fix some error handling paths in 'ahci_da850_probe()'
ChunYu found a kernel warn_on during syzkaller fuzzing:
[40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0
[40226.144849] Call Trace:
[40226.147590] <IRQ>
[40226.149859] dump_stack+0xe2/0x186
[40226.176546] __warn+0x1a4/0x1e0
[40226.180066] warn_slowpath_null+0x31/0x40
[40226.184555] inet_sock_destruct+0x78d/0x9a0
[40226.246355] __sk_destruct+0xfa/0x8c0
[40226.290612] rcu_process_callbacks+0xaa0/0x18a0
[40226.336816] __do_softirq+0x241/0x75e
[40226.367758] irq_exit+0x1f6/0x220
[40226.371458] smp_apic_timer_interrupt+0x7b/0xa0
[40226.376507] apic_timer_interrupt+0x93/0xa0
The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct.
As after commit f970bd9e3a ("udp: implement memory accounting helpers"),
udp has changed to use udp_destruct_sock as sk_destruct where it would
udp_rmem_release all rmem.
But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after
changing family to PF_INET. If rmem is not 0 at that time, and there is
no place to release rmem before calling inet_sock_destruct, the warn_on
will be triggered.
This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt
any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has
already set it's sk_destruct with inet_sock_destruct and UDP has set with
udp_destruct_sock since they're created.
Fixes: f970bd9e3a ("udp: implement memory accounting helpers")
Reported-by: ChunYu Wang <chunwang@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2017-08-29
1) Fix dst_entry refcount imbalance when using socket policies.
From Lorenzo Colitti.
2) Fix locking when adding the ESP trailers.
3) Fix tailroom calculation for the ESP trailer by using
skb_tailroom instead of skb_availroom.
4) Fix some info leaks in xfrm_user.
From Mathias Krause.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit aac2fea94f.
It turns out that that patch was complete and utter garbage, and broke
KVM, resulting in odd oopses.
Quoting Andrea Arcangeli:
"The aforementioned commit has 3 bugs.
1) mmu_notifier_invalidate_range cannot be used in replacement of
mmu_notifier_invalidate_range_start/end.
For KVM mmu_notifier_invalidate_range is a noop and rightfully so.
A MMU notifier implementation has to implement either
->invalidate_range method or the invalidate_range_start/end
methods, not both. And if you implement invalidate_range_start/end
like KVM is forced to do, calling mmu_notifier_invalidate_range in
common code is a noop for KVM.
For those MMU notifiers that can get away only implementing
->invalidate_range, the ->invalidate_range is implicitly called by
mmu_notifier_invalidate_range_end(). And only those secondary MMUs
that share the same pagetable with the primary MMU (like AMD
iommuv2) can get away only implementing ->invalidate_range.
So all cases (THP on/off) are broken right now.
To fix this is enough to replace mmu_notifier_invalidate_range with
mmu_notifier_invalidate_range_start;mmu_notifier_invalidate_range_end.
Either that or call multiple mmu_notifier_invalidate_page like
before.
2) address + (1UL << compound_order(page) is buggy, it should be
PAGE_SIZE << compound_order(page), it's bytes not pages, 2M not
512.
3) The whole invalidate_range thing was an attempt to call a single
invalidate while walking multiple 4k ptes that maps the same THP
(after a pmd virtual split without physical compound page THP
split).
It's unclear if the rmap_walk will always provide an address that
is 2M aligned as parameter to try_to_unmap_one, in presence of THP.
I think it needs also an address &= (PAGE_SIZE <<
compound_order(page)) - 1 to be safe"
In general, we should stop making excuses for horrible MMU notifier
users. It's much more important that the core VM is sane and safe, than
letting MMU notifiers sleep.
So if some MMU notifier is sleeping under a spinlock, we need to fix the
notifier, not try to make excuses for that garbage in the core VM.
Reported-and-tested-by: Bernhard Held <berny156@gmx.de>
Reported-and-tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: axie <axie@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 35f0b6a779.
We now conditionalize issuing of READ LOG PAGE on the TRUSTED
COMPUTING SUPPORTED bit in the identity data and this shouldn't be
necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
ATA-8 and later mirrors the TRUSTED COMPUTING SUPPORTED bit in word 48 of
the IDENTIFY DEVICE data. Check this before issuing a READ LOG PAGE
command to avoid issues with buggy devices. The only downside is that
we can't support Security Send / Receive for a device with an older
revision due to the conflicting use of this field in earlier
specifications.
tj: The reason we need this is because some devices which don't
support READ LOG PAGE lock up after getting issued that command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull xen-blkback fix from Konrad:
"[...] A bug-fix when shutting down xen block backend driver with
multiple queues and the driver not clearing all of them."
If a restartable syscall is called using the indirect o32 syscall
handler - eg: syscall(__NR_waitid, ...), then it is possible for the
incorrect arguments to be passed to the syscall after it has been
restarted. This is because the syscall handler tries to shift all the
registers down one place in pt_regs so that when the syscall is restarted,
the "real" syscall is called instead. Unfortunately it only shifts the
arguments passed in registers, not the arguments on the user stack. This
causes the 4th argument to be duplicated when the syscall is restarted.
Fix by removing all the pt_regs shifting so that the indirect syscall
handler is called again when the syscall is restarted. The comment "some
syscalls like execve get their arguments from struct pt_regs" is long
out of date so this should now be safe.
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Tested-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15856/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Since commit 669c409222 ("MIPS: Give __secure_computing() access to
syscall arguments."), upon syscall entry when seccomp is enabled,
syscall_trace_enter() passes a carefully prepared struct seccomp_data
containing syscall arguments to __secure_computing(). Unfortunately it
directly uses mips_get_syscall_arg() and fails to take into account the
indirect O32 system calls (i.e. syscall(2)) which put the system call
number in a0 and have the arguments shifted up by one entry.
We can't just revert that commit as samples/bpf/tracex5 would break
again, so use syscall_get_arguments() which already takes indirect
syscalls into account instead of directly using mips_get_syscall_arg(),
similar to what populate_seccomp_data() does.
This also removes the redundant error checking of the
mips_get_syscall_arg() return value (get_user() already zeroes the
result if an argument from the stack can't be loaded).
Reported-by: James Cowgill <James.Cowgill@imgtec.com>
Fixes: 669c409222 ("MIPS: Give __secure_computing() access to syscall arguments.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16994/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If a zero for the number of lines manages to slip through, scroll()
may underflow some offset calculations, causing accesses outside the
video memory.
Make the check in __putstr() more pessimistic to prevent that.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current slack space is not enough for LZ4, which has a worst case
overhead of 0.4% for data that cannot be further compressed. With
an LZ4 compressed kernel with an embedded initrd, the output is likely
to overwrite the input.
Increase the slack space to avoid that.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1503842124-29718-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When running perf on the ftrace:function tracepoint, there is a bug
which can be reproduced by:
perf record -e ftrace:function -a sleep 20 &
perf record -e ftrace:function ls
perf script
ls 10304 [005] 171.853235: ftrace:function:
perf_output_begin
ls 10304 [005] 171.853237: ftrace:function:
perf_output_begin
ls 10304 [005] 171.853239: ftrace:function:
task_tgid_nr_ns
ls 10304 [005] 171.853240: ftrace:function:
task_tgid_nr_ns
ls 10304 [005] 171.853242: ftrace:function:
__task_pid_nr_ns
ls 10304 [005] 171.853244: ftrace:function:
__task_pid_nr_ns
We can see that all the function traces are doubled.
The problem is caused by the inconsistency of the register
function perf_ftrace_event_register() with the probe function
perf_ftrace_function_call(). The former registers one probe
for every perf_event. And the latter handles all perf_events
on the current cpu. So when two perf_events on the current cpu,
the traces of them will be doubled.
So this patch adds an extra parameter "event" for perf_tp_event,
only send sample data to this event when it's not NULL.
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: huawei.libin@huawei.com
Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While examining the kernel source code, I found a dangerous operation that
could turn into a double-fetch situation (a race condition bug) where the same
userspace memory region are fetched twice into kernel with sanity checks after
the first fetch while missing checks after the second fetch.
1. The first fetch happens in line 9573 get_user(size, &uattr->size).
2. Subsequently the 'size' variable undergoes a few sanity checks and
transformations (line 9577 to 9584).
3. The second fetch happens in line 9610 copy_from_user(attr, uattr, size)
4. Given that 'uattr' can be fully controlled in userspace, an attacker can
race condition to override 'uattr->size' to arbitrary value (say, 0xFFFFFFFF)
after the first fetch but before the second fetch. The changed value will be
copied to 'attr->size'.
5. There is no further checks on 'attr->size' until the end of this function,
and once the function returns, we lose the context to verify that 'attr->size'
conforms to the sanity checks performed in step 2 (line 9577 to 9584).
6. My manual analysis shows that 'attr->size' is not used elsewhere later,
so, there is no working exploit against it right now. However, this could
easily turns to an exploitable one if careless developers start to use
'attr->size' later.
To fix this, override 'attr->size' from the second fetch to the one from the
first fetch, regardless of what is actually copied in.
In this way, it is assured that 'attr->size' is consistent with the checks
performed after the first fetch.
Signed-off-by: Meng Xu <mengxu.gatech@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: meng.xu@gatech.edu
Cc: sanidhya@gatech.edu
Cc: taesoo@gatech.edu
Link: http://lkml.kernel.org/r/1503522470-35531-1-git-send-email-meng.xu@gatech.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ldt->entries[] is allocated in alloc_ldt_struct(). It has
ldt->nr_entries elements and ldt->nr_entries is capped at LDT_ENTRIES.
So if "idx" is == ldt->nr_entries then we're reading beyond the end of
the buffer. It seems duplicative to have two limit checks when one
would work just as well so I removed the check against LDT_ENTRIES.
The gdt_page.gdt[] array has GDT_ENTRIES entries.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Fixes: d07bdfd322 ("perf/x86: Fix USER/KERNEL tagging of samples properly")
Link: http://lkml.kernel.org/r/20170818102516.gqwm4xdvvuvjw5ho@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 3510ca20ec ("Minor page waitqueue cleanups") made the page
queue code always add new waiters to the back of the queue, which helps
upcoming patches to batch the wakeups for some horrid loads where the
wait queues grow to thousands of entries.
However, I forgot about the nasrt add_page_wait_queue() special case
code that is only used by the cachefiles code. That one still continued
to add the new wait queue entries at the beginning of the list.
Fix it, because any sane batched wakeup will require that we don't
suddenly start getting new entries at the beginning of the list that we
already handled in a previous batch.
[ The current code always does the whole list while holding the lock, so
wait queue ordering doesn't matter for correctness, but even then it's
better to add later entries at the end from a fairness standpoint ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of
@node. The assumption seems that if !NUMA, there shouldn't be more than
one node and thus reporting cpu_online_mask regardless of @node is
correct. However, that assumption was broken years ago to support
DISCONTIGMEM and whether a system has multiple nodes or not is
separately controlled by NEED_MULTIPLE_NODES.
This means that, on a system with !NUMA && NEED_MULTIPLE_NODES,
cpumask_of_node() will report cpu_online_mask for all possible nodes,
indicating that the CPUs are associated with multiple nodes which is an
impossible configuration.
This bug has been around forever but doesn't look like it has caused any
noticeable symptoms. However, it triggers a WARN recently added to
workqueue to verify NUMA affinity configuration.
Fix it by reporting empty cpumask on non-zero nodes if !NUMA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent commit a8ec3ee861 "arc: Mask individual IRQ lines during core
INTC init" breaks interrupt handling on ARCv2 SMP systems.
That commit masked all interrupts at onset, as some controllers on some
boards (customer as well as internal), would assert interrutps early
before any handlers were installed. For SMP systems, the masking was
done at each cpu's core-intc. Later, when the IRQ was actually
requested, it was unmasked, but only on the requesting cpu.
For "common" interrupts, which were wired up from the 2nd level IDU
intc, this was as issue as they needed to be enabled on ALL the cpus
(given that IDU IRQs are by default served Round Robin across cpus)
So fix that by NOT masking "common" interrupts at core-intc, but instead
at the 2nd level IDU intc (latter already being done in idu_of_init())
Fixes: a8ec3ee861 ("arc: Mask individual IRQ lines during core INTC init")
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta: reworked changelog, removed the extraneous idu_irq_mask_raw()]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 464d62421c ("select: switch compat_{get,put}_fd_set() to
compat_{get,put}_bitmap()") changed the calculation on how many bytes
need to be zeroed when userspace handed over a NULL pointer for a fdset
array in the select syscall.
The calculation was changed in compat_get_fd_set() wrongly from
memset(fdset, 0, ((nr + 1) & ~1)*sizeof(compat_ulong_t));
to
memset(fdset, 0, ALIGN(nr, BITS_PER_LONG));
The ALIGN(nr, BITS_PER_LONG) calculates the number of _bits_ which need
to be zeroed in the target fdset array (rounded up to the next full bits
for an unsigned long).
But the memset() call expects the number of _bytes_ to be zeroed.
This leads to clearing more memory than wanted (on the stack area or
even at kmalloc()ed memory areas) and to random kernel crashes as we
have seen them on the parisc platform.
The correct change should have been
memset(fdset, 0, (ALIGN(nr, BITS_PER_LONG) / BITS_PER_LONG) * BYTES_PER_LONG);
which is the same as can be archieved with a call to
zero_fd_set(nr, fdset).
Fixes: 464d62421c ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()"
Acked-by:: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now it doesn't check for the cached route expiration in ipv6's
dst_ops->check(), because it trusts dst_gc that would clean the
cached route up when it's expired.
The problem is in dst_gc, it would clean the cached route only
when it's refcount is 1. If some other module (like xfrm) keeps
holding it and the module only release it when dst_ops->check()
fails.
But without checking for the cached route expiration, .check()
may always return true. Meanwhile, without releasing the cached
route, dst_gc couldn't del it. It will cause this cached route
never to expire.
This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
in .check.
Note that this is even needed when ipv6 dst_gc timer is removed
one day. It would set dst.obsolete in .redirect and .update_pmtu
instead, and check for cached route expiration when getting it,
just like what ipv4 route does.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c5cff8561d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
net/ipv6/route.c:1394:30: error: incompatible types in comparison
expression (different address spaces)
./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
expression (different address spaces)
This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.
Fixes: c5cff8561d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passing commands for logging to t4_record_mbox() with size
MBOX_LEN, when the actual command size is actually smaller,
causes out-of-bounds stack accesses in t4_record_mbox() while
copying command words here:
for (i = 0; i < size / 8; i++)
entry->cmd[i] = be64_to_cpu(cmd[i]);
Up to 48 bytes from the stack are then leaked to debugfs.
This happens whenever we send (and log) commands described by
structs fw_sched_cmd (32 bytes leaked), fw_vi_rxmode_cmd (48),
fw_hello_cmd (48), fw_bye_cmd (48), fw_initialize_cmd (48),
fw_reset_cmd (48), fw_pfvf_cmd (32), fw_eq_eth_cmd (16),
fw_eq_ctrl_cmd (32), fw_eq_ofld_cmd (32), fw_acl_mac_cmd(16),
fw_rss_glb_config_cmd(32), fw_rss_vi_config_cmd(32),
fw_devlog_cmd(32), fw_vi_enable_cmd(48), fw_port_cmd(32),
fw_sched_cmd(32), fw_devlog_cmd(32).
The cxgb4vf driver got this right instead.
When we call t4_record_mbox() to log a command reply, a MBOX_LEN
size can be used though, as get_mbox_rpl() will fill cmd_rpl up
completely.
Fixes: 7f080c3f2f ("cxgb4: Add support to enable logging of firmware mailbox commands")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the bindings have been controversial, and we follow the DT stable ABI
rule, we shouldn't let a driver with a DT binding that might change slip
through in a stable release.
Remove the compatibles to make sure the driver will not probe and no-one
will start using the binding currently implemented. This commit will
obviously need to be reverted in due time.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>