linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-18 06:16:24 +07:00

Author	SHA1	Message	Date
Randy Dunlap	f5f67cc0e0	scripts/faddr2line: fix location of start_kernel in comment Fix a source file reference location to the correct path name. Link: http://lkml.kernel.org/r/1d50bd3d-178e-dcd8-779f-9711887440eb@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-11-18 10:15:09 -08:00
Roman Gushchin	a76cf1a474	mm: don't reclaim inodes with many attached pages Spock reported that commit `172b06c32b` ("mm: slowly shrink slabs with a relatively small number of objects") leads to a regression on his setup: periodically the majority of the pagecache is evicted without an obvious reason, while before the change the amount of free memory was balancing around the watermark. The reason behind is that the mentioned above change created some minimal background pressure on the inode cache. The problem is that if an inode is considered to be reclaimed, all belonging pagecache page are stripped, no matter how many of them are there. So, if a huge multi-gigabyte file is cached in the memory, and the goal is to reclaim only few slab objects (unused inodes), we still can eventually evict all gigabytes of the pagecache at once. The workload described by Spock has few large non-mapped files in the pagecache, so it's especially noticeable. To solve the problem let's postpone the reclaim of inodes, which have more than 1 attached page. Let's wait until the pagecache pages will be evicted naturally by scanning the corresponding LRU lists, and only then reclaim the inode structure. Link: http://lkml.kernel.org/r/20181023164302.20436-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Spock <dairinin@gmail.com> Tested-by: Spock <dairinin@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> [4.19.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-11-18 10:15:09 -08:00
Michal Hocko	9d7899999c	mm, memory_hotplug: check zone_movable in has_unmovable_pages Page state checks are racy. Under a heavy memory workload (e.g. stress -m 200 -t 2h) it is quite easy to hit a race window when the page is allocated but its state is not fully populated yet. A debugging patch to dump the struct page state shows has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0 page:ffffea0437fb0000 count:1 mapcount:1 mapping:ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1 flags: 0x5fffffc0090034(uptodate\|lru\|active\|head\|swapbacked) Note that the state has been checked for both PageLRU and PageSwapBacked already. Closing this race completely would require some sort of retry logic. This can be tricky and error prone (think of potential endless or long taking loops). Workaround this problem for movable zones at least. Such a zone should only contain movable pages. Commit `15c30bc090` ("mm, memory_hotplug: make has_unmovable_pages more robust") has told us that this is not strictly true though. Bootmem pages should be marked reserved though so we can move the original check after the PageReserved check. Pages from other zones are still prone to races but we even do not pretend that memory hotremove works for those so pre-mature failure doesn't hurt that much. Link: http://lkml.kernel.org/r/20181106095524.14629-1-mhocko@kernel.org Fixes: `15c30bc090` ("mm, memory_hotplug: make has_unmovable_pages more robust") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Baoquan He <bhe@redhat.com> Tested-by: Baoquan He <bhe@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-11-18 10:15:09 -08:00
Vasily Averin	873d7bcfd0	mm/swapfile.c: use kvzalloc for swap_info_struct allocation Commit `a2468cc9bf` ("swap: choose swap device according to numa node") changed 'avail_lists' field of 'struct swap_info_struct' to an array. In popular linux distros it increased size of swap_info_struct up to 40 Kbytes and now swap_info_struct allocation requires order-4 page. Switch to kvzmalloc allows to avoid unexpected allocation failures. Link: http://lkml.kernel.org/r/fc23172d-3c75-21e2-d551-8b1808cbe593@virtuozzo.com Fixes: `a2468cc9bf` ("swap: choose swap device according to numa node") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Huang Ying <ying.huang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-11-18 10:15:09 -08:00
Aaro Koskinen	f341e16fb6	MAINTAINERS: update OMAP MMC entry Jarkko's e-mail address hasn't worked for a long time. We still want to keep this driver working as it is critical for some of the OMAP boards. I use and test this driver frequently, so change myself as a maintainer with "Odd Fixes" status. Link: http://lkml.kernel.org/r/20181106222750.12939-1-aaro.koskinen@iki.fi Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-11-18 10:15:09 -08:00
Mike Kravetz	5e41540c8a	hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444! This bug has been experienced several times by the Oracle DB team. The BUG is in remove_inode_hugepages() as follows: /* * If page is mapped, it was faulted in after being * unmapped in caller. Unmap (again) now after taking * the fault mutex. The mutex will prevent faults * until we finish removing the page. * * This race can only happen in the hole punch case. * Getting here in a truncate operation is a bug. / if (unlikely(page_mapped(page))) { BUG_ON(truncate_op); In this case, the elevated map count is not the result of a race. Rather it was incorrectly incremented as the result of a bug in the huge pmd sharing code. Consider the following: - Process A maps a hugetlbfs file of sufficient size and alignment (PUD_SIZE) that a pmd page could be shared. - Process B maps the same hugetlbfs file with the same size and alignment such that a pmd page is shared. - Process B then calls mprotect() to change protections for the mapping with the shared pmd. As a result, the pmd is 'unshared'. - Process B then calls mprotect() again to chage protections for the mapping back to their original value. pmd remains unshared. - Process B then forks and process C is created. During the fork process, we do dup_mm -> dup_mmap -> copy_page_range to copy page tables. Copying page tables for hugetlb mappings is done in the routine copy_hugetlb_page_range. In copy_hugetlb_page_range(), the destination pte is obtained by: dst_pte = huge_pte_alloc(dst, addr, sz); If pmd sharing is possible, the returned pointer will be to a pte in an existing page table. In the situation above, process C could share with either process A or process B. Since process A is first in the list, the returned pte is a pointer to a pte in process A's page table. However, the check for pmd sharing in copy_hugetlb_page_range is: / If the pagetables are shared don't copy or take references */ if (dst_pte == src_pte) continue; Since process C is sharing with process A instead of process B, the above test fails. The code in copy_hugetlb_page_range which follows assumes dst_pte points to a huge_pte_none pte. It copies the pte entry from src_pte to dst_pte and increments this map count of the associated page. This is how we end up with an elevated map count. To solve, check the dst_pte entry for huge_pte_none. If !none, this implies PMD sharing so do not copy. Link: http://lkml.kernel.org/r/20181105212315.14125-1-mike.kravetz@oracle.com Fixes: `c5c99429fa` ("fix hugepages leak due to pagetable page sharing") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-11-18 10:15:09 -08:00
Olof Johansson	8fcb2312d1	kernel/sched/psi.c: simplify cgroup_move_task() The existing code triggered an invalid warning about 'rq' possibly being used uninitialized. Instead of doing the silly warning suppression by initializa it to NULL, refactor the code to bail out early instead. Warning was: kernel/sched/psi.c: In function `cgroup_move_task': kernel/sched/psi.c:639:13: warning: `rq' may be used uninitialized in this function [-Wmaybe-uninitialized] Link: http://lkml.kernel.org/r/20181103183339.8669-1-olof@lixom.net Fixes: `2ce7135adc` ("psi: cgroup support") Signed-off-by: Olof Johansson <olof@lixom.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-11-18 10:15:09 -08:00
Vitaly Wool	ca0246bb97	z3fold: fix possible reclaim races Reclaim and free can race on an object which is basically fine but in order for reclaim to be able to map "freed" object we need to encode object length in the handle. handle_to_chunks() is then introduced to extract object length from a handle and use it during mapping. Moreover, to avoid racing on a z3fold "headless" page release, we should not try to free that page in z3fold_free() if the reclaim bit is set. Also, in the unlikely case of trying to reclaim a page being freed, we should not proceed with that page. While at it, fix the page accounting in reclaim function. This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups". Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Signed-off-by: Jongseok Kim <ks77sj@gmail.com> Reported-by-by: Jongseok Kim <ks77sj@gmail.com> Reviewed-by: Snild Dolkow <snild@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-11-18 10:15:09 -08:00
Jon Maloy	1c1274a569	tipc: don't assume linear buffer when reading ancillary data The code for reading ancillary data from a received buffer is assuming the buffer is linear. To make this assumption true we have to linearize the buffer before message data is read. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 22:08:02 -08:00
David S. Miller	17bf1693a6	Merge branch 'bcmgenet-fix-aborted-suspend' Doug Berger says: ==================== net: bcmgenet: fix aborted suspend It is not enough to return an error code from the driver suspend routine. The driver must also restore the device functionality. This commit corrects the issue introduced by commit `0db55093b5` ("net: bcmgenet: return correct value 'ret' from bcmgenet_power_down") by calling the driver resume function if the suspend function returns an error. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 22:04:39 -08:00
Doug Berger	c5a54bbcec	net: bcmgenet: abort suspend on error If an error occurs during suspension of the driver the driver should restore the hardware configuration and return an error to force the system to resume. Fixes: `0db55093b5` ("net: bcmgenet: return correct value 'ret' from bcmgenet_power_down") Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 22:04:39 -08:00
Doug Berger	a94cbf03eb	net: bcmgenet: code movement This commit switches the order of bcmgenet_suspend and bcmgenet_resume in the file to prevent the need for a forward declaration in the next commit and to make the review of that commit easier. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 22:04:38 -08:00
Nathan Chancellor	8a962c4aa1	geneve: Initialize addr6 with memset Clang warns: drivers/net/geneve.c:428:29: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] struct in6_addr addr6 = { 0 }; ^ {} Rather than trying to appease the various compilers that support the kernel, use memset, which is unambiguous. Fixes: `a07966447f` ("geneve: ICMP error lookup handler") Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 22:03:06 -08:00
Jon Maloy	adba75be0d	tipc: fix lockdep warning when reinitilaizing sockets We get the following warning: [ 47.926140] 32-bit node address hash set to 2010a0a [ 47.927202] [ 47.927433] ================================ [ 47.928050] WARNING: inconsistent lock state [ 47.928661] 4.19.0+ #37 Tainted: G E [ 47.929346] -------------------------------- [ 47.929954] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 47.930116] swapper/3/0 [HC0[0]:SC1[3]:HE1:SE0] takes: [ 47.930116] 00000000af8bc31e (&(&ht->lock)->rlock){+.?.}, at: rhashtable_walk_enter+0x36/0xb0 [ 47.930116] {SOFTIRQ-ON-W} state was registered at: [ 47.930116] _raw_spin_lock+0x29/0x60 [ 47.930116] rht_deferred_worker+0x556/0x810 [ 47.930116] process_one_work+0x1f5/0x540 [ 47.930116] worker_thread+0x64/0x3e0 [ 47.930116] kthread+0x112/0x150 [ 47.930116] ret_from_fork+0x3a/0x50 [ 47.930116] irq event stamp: 14044 [ 47.930116] hardirqs last enabled at (14044): [<ffffffff9a07fbba>] __local_bh_enable_ip+0x7a/0xf0 [ 47.938117] hardirqs last disabled at (14043): [<ffffffff9a07fb81>] __local_bh_enable_ip+0x41/0xf0 [ 47.938117] softirqs last enabled at (14028): [<ffffffff9a0803ee>] irq_enter+0x5e/0x60 [ 47.938117] softirqs last disabled at (14029): [<ffffffff9a0804a5>] irq_exit+0xb5/0xc0 [ 47.938117] [ 47.938117] other info that might help us debug this: [ 47.938117] Possible unsafe locking scenario: [ 47.938117] [ 47.938117] CPU0 [ 47.938117] ---- [ 47.938117] lock(&(&ht->lock)->rlock); [ 47.938117] <Interrupt> [ 47.938117] lock(&(&ht->lock)->rlock); [ 47.938117] [ 47.938117] * DEADLOCK * [ 47.938117] [ 47.938117] 2 locks held by swapper/3/0: [ 47.938117] #0: 0000000062c64f90 ((&d->timer)){+.-.}, at: call_timer_fn+0x5/0x280 [ 47.938117] #1: 00000000ee39619c (&(&d->lock)->rlock){+.-.}, at: tipc_disc_timeout+0xc8/0x540 [tipc] [ 47.938117] [ 47.938117] stack backtrace: [ 47.938117] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G E 4.19.0+ #37 [ 47.938117] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 47.938117] Call Trace: [ 47.938117] <IRQ> [ 47.938117] dump_stack+0x5e/0x8b [ 47.938117] print_usage_bug+0x1ed/0x1ff [ 47.938117] mark_lock+0x5b5/0x630 [ 47.938117] __lock_acquire+0x4c0/0x18f0 [ 47.938117] ? lock_acquire+0xa6/0x180 [ 47.938117] lock_acquire+0xa6/0x180 [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0 [ 47.938117] _raw_spin_lock+0x29/0x60 [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0 [ 47.938117] rhashtable_walk_enter+0x36/0xb0 [ 47.938117] tipc_sk_reinit+0xb0/0x410 [tipc] [ 47.938117] ? mark_held_locks+0x6f/0x90 [ 47.938117] ? __local_bh_enable_ip+0x7a/0xf0 [ 47.938117] ? lockdep_hardirqs_on+0x20/0x1a0 [ 47.938117] tipc_net_finalize+0xbf/0x180 [tipc] [ 47.938117] tipc_disc_timeout+0x509/0x540 [tipc] [ 47.938117] ? call_timer_fn+0x5/0x280 [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc] [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc] [ 47.938117] call_timer_fn+0xa1/0x280 [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc] [ 47.938117] run_timer_softirq+0x1f2/0x4d0 [ 47.938117] __do_softirq+0xfc/0x413 [ 47.938117] irq_exit+0xb5/0xc0 [ 47.938117] smp_apic_timer_interrupt+0xac/0x210 [ 47.938117] apic_timer_interrupt+0xf/0x20 [ 47.938117] </IRQ> [ 47.938117] RIP: 0010:default_idle+0x1c/0x140 [ 47.938117] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 65 8b 2d d8 2b 74 65 0f 1f 44 00 00 e8 c6 2c 8b ff fb f4 <65> 8b 2d c5 2b 74 65 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 b4 2b [ 47.938117] RSP: 0018:ffffaf6ac0207ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 47.938117] RAX: ffff8f5b3735e200 RBX: 0000000000000003 RCX: 0000000000000001 [ 47.938117] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8f5b3735e200 [ 47.938117] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000 [ 47.938117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 47.938117] R13: 0000000000000000 R14: ffff8f5b3735e200 R15: ffff8f5b3735e200 [ 47.938117] ? default_idle+0x1a/0x140 [ 47.938117] do_idle+0x1bc/0x280 [ 47.938117] cpu_startup_entry+0x19/0x20 [ 47.938117] start_secondary+0x187/0x1c0 [ 47.938117] secondary_startup_64+0xa4/0xb0 The reason seems to be that tipc_net_finalize()->tipc_sk_reinit() is calling the function rhashtable_walk_enter() within a timer interrupt. We fix this by executing tipc_net_finalize() in work queue context. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 22:01:31 -08:00
Eric Dumazet	33d9a2c72f	net-gro: reset skb->pkt_type in napi_reuse_skb() eth_type_trans() assumes initial value for skb->pkt_type is PACKET_HOST. This is indeed the value right after a fresh skb allocation. However, it is possible that GRO merged a packet with a different value (like PACKET_OTHERHOST in case macvlan is used), so we need to make sure napi->skb will have pkt_type set back to PACKET_HOST. Otherwise, valid packets might be dropped by the stack because their pkt_type is not PACKET_HOST. napi_reuse_skb() was added in commit `96e93eab20` ("gro: Add internal interfaces for VLAN"), but this bug always has been there. Fixes: `96e93eab20` ("gro: Add internal interfaces for VLAN") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:59:18 -08:00
David S. Miller	52c951f104	Merge branch 'net-hns3-Add-vf-mtu-support' Salil Mehta says: ==================== net: hns3: Add vf mtu support This patchset adds vf mtu support to HNS3 driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:57:30 -08:00
Yunsheng Lin	cdca4c485d	net: hns3: up/down netdev in hclge module when setting mtu Currently netdev is down in enet module, and it is before mtu range checking in hclge module, which may be cause netdev being down unnecessarily. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:57:29 -08:00
Yunsheng Lin	818f167587	net: hns3: Add mtu setting support for vf The patch adds mtu setting support for vf, currently vf and pf share the same hardware mtu setting. Mtu set by vf must be less than or equal to pf' mtu, and mtu set by pf must be greater than or equal to vf' mtu. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:57:29 -08:00
Yunsheng Lin	a6d818e31d	net: hns3: Add vport alive state checking support Currently there is no way for pf to know if a vf device is alive or not, so PF does not know which vf to notify when reset happens, or which vf's mtu is invalid when vf and pf share the same hardware mtu setting. This patch adds vport alive state checking support, in order to support the above scenario. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:57:29 -08:00
Yunsheng Lin	e6d7d79d3e	net: hns3: Refactor mac mtu setting related functions This patch refactors mac mtu setting related functions, normalizes the use of mps and mtu. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:57:29 -08:00
Yunsheng Lin	a0b4371751	net: hns3: Support two vlan header when setting mtu This patch adds supports for two vlan header when setting mtu. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:57:29 -08:00
David S. Miller	5396527f8c	Merge branch 'tdc-fixes' Lucas Bates says: ==================== Prevent uncaught exceptions in tdc This patch series addresses two potential bugs in tdc that can cause exceptions to be raised in certain circumstances. These exceptions are generally not handled, so instead we will prevent them from being raised. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:54:53 -08:00
Brenda J. Butler	c6cecf4ae4	tc-testing: tdc.py: Guard against lack of returncode in executed command Add some defensive coding in case one of the subprocesses created by tdc returns nothing. If no object is returned from exec_cmd, then tdc will halt with an unhandled exception. Signed-off-by: Brenda J. Butler <bjb@mojatatu.com> Signed-off-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:54:53 -08:00
Lucas Bates	5aaf642852	tc-testing: tdc.py: ignore errors when decoding stdout/stderr Prevent exceptions from being raised while decoding output from an executed command. There is no impact on tdc's execution and the verify command phase would fail the pattern match. Signed-off-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:54:53 -08:00
Rob Herring	d7b4a2f232	net: fsl: Use device_type helpers to access the node type Remove directly accessing device_node.type pointer and use the accessors instead. This will eventually allow removing the type pointer. Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:52:58 -08:00
Rob Herring	ee5b60eba7	atm: Convert to using %pOFn instead of device_node.name In preparation to remove the node name pointer from struct device_node, convert printf users to use the %pOFn format specifier. Cc: Chas Williams <3chas3@gmail.com> Cc: linux-atm-general@lists.sourceforge.net Cc: netdev@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:52:11 -08:00
Sabrina Dubroca	16f7eb2b77	ip_tunnel: don't force DF when MTU is locked The various types of tunnels running over IPv4 can ask to set the DF bit to do PMTU discovery. However, PMTU discovery is subject to the threshold set by the net.ipv4.route.min_pmtu sysctl, and is also disabled on routes with "mtu lock". In those cases, we shouldn't set the DF bit. This patch makes setting the DF bit conditional on the route's MTU locking state. This issue seems to be older than git history. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:50:55 -08:00
Toke Høiland-Jørgensen	8840c3e234	MAINTAINERS: Add entry for CAKE qdisc We would like the existing community to be kept in the loop for any new developments on CAKE; and I certainly plan to keep maintaining it. Reflect this in MAINTAINERS. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:39:53 -08:00
Nikolay Aleksandrov	9d332e69c1	net: bridge: fix vlan stats use-after-free on destruction Syzbot reported a use-after-free of the global vlan context on port vlan destruction. When I added per-port vlan stats I missed the fact that the global vlan context can be freed before the per-port vlan rcu callback. There're a few different ways to deal with this, I've chosen to add a new private flag that is set only when per-port stats are allocated so we can directly check it on destruction without dereferencing the global context at all. The new field in net_bridge_vlan uses a hole. v2: cosmetic change, move the check to br_process_vlan_info where the other checks are done v3: add change log in the patch, add private (in-kernel only) flags in a hole in net_bridge_vlan struct and use that instead of mixing user-space flags with private flags Fixes: `9163a0fc1f` ("net: bridge: add support for per-port vlan stats") Reported-by: syzbot+04681da557a0e49a52e5@syzkaller.appspotmail.com Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:38:44 -08:00
Eric Dumazet	001c96db01	net: align gnet_stats_basic_cpu struct This structure is small (12 or 16 bytes depending on 64bit or 32bit kernels), but we do not want it spanning two cache lines. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:37:29 -08:00
Eric Dumazet	9a5ee46230	net: align pcpu_sw_netstats and pcpu_lstats structs Do not risk spanning these small structures on two cache lines, it is absolutely not worth it. For 32bit arches, the hint might not be enough, but we do not really care anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:37:13 -08:00
Colin Ian King	7c460cf9cd	net: aquantia: fix spelling mistake "specfield" -> "specified" There is a spelling mistake in a netdev_err message. Fix this. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:36:09 -08:00
Slavomir Kaslev	95506588d2	socket: do a generic_file_splice_read when proto_ops has no splice_read splice(2) fails with -EINVAL when called reading on a socket with no splice_read set in its proto_ops (such as vsock sockets). Switch this to fallbacks to a generic_file_splice_read instead. Signed-off-by: Slavomir Kaslev <kaslevs@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:34:11 -08:00
YueHaibing	098aafaa68	net: aquantia: cleanup err handing in hw_atl_utils_fw_rpc_wait 'err' always be 0 in the two places. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:15:04 -08:00
Martin Schiller	df5a8ec64e	net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs Up until commit `7e5fbd1e07` ("net: mdio-gpio: Convert to use gpiod functions where possible"), the _cansleep variants of the gpio_ API was used. After that commit and the change to gpiod_ API, the _cansleep() was dropped. This then results in WARN_ON() when used with GPIO devices which do sleep. Add back the _cansleep() to avoid this. Fixes: `7e5fbd1e07` ("net: mdio-gpio: Convert to use gpiod functions where possible") Signed-off-by: Martin Schiller <ms@dev.tdt.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:11:51 -08:00
David S. Miller	1115439f53	Merge branch 'ncsi-Allow-enabling-multiple-packages-and-channels' Samuel Mendoza-Jonas says: ==================== net/ncsi: Allow enabling multiple packages & channels This series extends the NCSI driver to configure multiple packages and/or channels simultaneously. Since the RFC series this includes a few extra changes to fix areas in the driver that either made this harder or were roadblocks due to deviations from the NCSI specification. Patches 1 & 2 fix two issues where the driver made assumptions about the capabilities of the NCSI topology. Patches 3 & 4 change some internal semantics slightly to make multi-mode easier. Patch 5 introduces a cleaner way of reconfiguring the NCSI configuration and keeping track of channel states. Patch 6 implements the main multi-package/multi-channel configuration, configured via the Netlink interface. Readers who have an interesting NCSI setup - especially multi-package with HWA - please test! I think I've covered all permutations but I don't have infinite hardware to test on. Changes in v2: - Updated use of the channel lock in ncsi_reset_dev(), making the channel invisible and leaving the monitor check to ncsi_stop_channel_monitor(). - Fixed ncsi_channel_is_tx() to consider the state of channels in other packages. Changes in v3: - Fixed bisectability bug in patch 1 - Consider channels on all packages in a few places when multi-package is enabled. - Avoid doubling up reset operations, and check the current driver state before reset to let any running operations complete. - Reorganise the LSC handler slightly to avoid enabling Tx twice. Changes in v4: - Fix failover in the single-channel case - Better handle ncsi_reset_dev() entry during a current config/suspend operation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:09:50 -08:00
Samuel Mendoza-Jonas	8d951a75d0	net/ncsi: Configure multi-package, multi-channel modes with failover This patch extends the ncsi-netlink interface with two new commands and three new attributes to configure multiple packages and/or channels at once, and configure specific failover modes. NCSI_CMD_SET_PACKAGE mask and NCSI_CMD_SET_CHANNEL_MASK set a whitelist of packages or channels allowed to be configured with the NCSI_ATTR_PACKAGE_MASK and NCSI_ATTR_CHANNEL_MASK attributes respectively. If one of these whitelists is set only packages or channels matching the whitelist are considered for the channel queue in ncsi_choose_active_channel(). These commands may also use the NCSI_ATTR_MULTI_FLAG to signal that multiple packages or channels may be configured simultaneously. NCSI hardware arbitration (HWA) must be available in order to enable multi-package mode. Multi-channel mode is always available. If the NCSI_ATTR_CHANNEL_ID attribute is present in the NCSI_CMD_SET_CHANNEL_MASK command the it sets the preferred channel as with the NCSI_CMD_SET_INTERFACE command. The combination of preferred channel and channel whitelist defines a primary channel and the allowed failover channels. If the NCSI_ATTR_MULTI_FLAG attribute is also present then the preferred channel is configured for Tx/Rx and the other channels are enabled only for Rx. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:09:49 -08:00
Samuel Mendoza-Jonas	2878a2cfe5	net/ncsi: Reset channel state in ncsi_start_dev() When the NCSI driver is stopped with ncsi_stop_dev() the channel monitors are stopped and the state set to "inactive". However the channels are still configured and active from the perspective of the network controller. We should suspend each active channel but in the context of ncsi_stop_dev() the transmit queue has been or is about to be stopped so we won't have time to do so. Instead when ncsi_start_dev() is called if the NCSI topology has already been probed then call ncsi_reset_dev() to suspend any channels that were previously active. This resets the network controller to a known state, provides an up to date view of channel link state, and makes sure that mode flags such as NCSI_MODE_TX_ENABLE are properly reset. In addition to ncsi_start_dev() use ncsi_reset_dev() in ncsi-netlink.c to update the channel configuration more cleanly. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:09:49 -08:00
Samuel Mendoza-Jonas	0b970e1b04	net/ncsi: Don't mark configured channels inactive The concepts of a channel being 'active' and it having link are slightly muddled in the NCSI driver. Tweak this slightly so that NCSI_CHANNEL_ACTIVE represents a channel that has been configured and enabled, and NCSI_CHANNEL_INACTIVE represents a de-configured channel. This distinction is important because a channel can be 'active' but have its link down; in this case the channel may still need to be configured so that it may receive AEN link-state-change packets. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:09:49 -08:00
Samuel Mendoza-Jonas	cd09ab095c	net/ncsi: Don't deselect package in suspend if active When a package is deselected all channels of that package cease communication. If there are other channels active on the package of the suspended channel this will disable them as well, so only send a deselect-package command if no other channels are active. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:09:49 -08:00
Samuel Mendoza-Jonas	8e13f70be0	net/ncsi: Probe single packages to avoid conflict Currently the NCSI driver sends a select-package command to all possible packages simultaneously to discover what packages are available. However at this stage in the probe process the driver does not know if hardware arbitration is available: if it isn't then this process could cause collisions on the RMII bus when packages try to respond. Update the probe loop to probe each package one by one, and once complete check if HWA is universally supported. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:09:49 -08:00
Samuel Mendoza-Jonas	60ab49bfe4	net/ncsi: Don't enable all channels when HWA available NCSI hardware arbitration allows multiple packages to be enabled at once and share the same wiring. If the NCSI driver recognises that HWA is available it unconditionally enables all packages and channels; but that is a configuration decision rather than something required by HWA. Additionally the current implementation will not failover on link events which can cause connectivity to be lost unless the interface is manually bounced. Retain basic HWA support but remove the separate configuration path to enable all channels, leaving this to be handled by a later implementation. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 21:09:49 -08:00
Arjun Vynipadath	2391b0030e	cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size The SGE Host Page Size has nothing to do with the actual Host Page Size. It's the SGE's BAR2 Doorbell/GTS Page Size for interpreting the SGE Ingress/Egress Queue per Page values. Firmware reads all of these things and makes all the subsequent changes necessary. The Host Driver uses the SGE Host Page Size in order to properly calculate BAR2 Offsets. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 20:36:25 -08:00
Yousuk Seung	e8bd8fca67	tcp: add SRTT to SCM_TIMESTAMPING_OPT_STATS Add TCP_NLA_SRTT to SCM_TIMESTAMPING_OPT_STATS that reports the smoothed round trip time in microseconds (tcp_sock.srtt_us >> 3). Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 20:34:36 -08:00
Stephen Hemminger	54e8cb7861	uapi/ethtool: fix spelling errors Trivial spelling errors found by codespell. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 20:33:32 -08:00
David S. Miller	6f0271d929	tun: Adjust on-stack tun_page initialization. Instead of constantly playing with the struct initializer syntax trying to make gcc and CLang both happy, just clear it out using memset(). >> drivers/net/tun.c:2503:42: warning: Using plain integer as NULL pointer Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 16:53:46 -08:00
Jason Wang	f9e06c45cb	tuntap: free XDP dropped packets in a batch Thanks to the batched XDP buffs through msg_control. Instead of calling put_page() for each page which involves a atomic operation, let's batch them by record the last page that needs to be freed and its refcnt count and free them in a batch. Testpmd(virtio-user + vhost_net) + XDP_DROP shows 3.8% improvement. Before: 4.71Mpps After : 4.89Mpps Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 12:00:42 -08:00
Jason Wang	e4dab1e6ea	vhost_net: mitigate page reference counting during page frag refill We do a get_page() which involves a atomic operation. This patch tries to mitigate a per packet atomic operation by maintaining a reference bias which is initially USHRT_MAX. Each time a page is got, instead of calling get_page() we decrease the bias and when we find it's time to use a new page we will decrease the bias at one time through __page_cache_drain_cache(). Testpmd(virtio_user + vhost_net) + XDP_DROP on TAP shows about 1.6% improvement. Before: 4.63Mpps After: 4.71Mpps Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-17 12:00:42 -08:00
David S. Miller	b8b9618a4f	Merge branch 'net-sched-gred-introduce-per-virtual-queue-attributes' Jakub Kicinski says: ==================== net: sched: gred: introduce per-virtual queue attributes This series updates the GRED Qdisc. The Qdisc matches nfp offload very well, but before we can offload it there are a number of improvements to make. First few patches add extack messages to the Qdisc and pass extack to netlink validation. Next a new netlink attribute group is added, to allow GRED to be extended more easily. Currently GRED passes C structures as attributes, and even an array of C structs for virtual queue configuration. User space has hard coded the expected length of that array, so adding new fields is not possible. New two-level attribute group is added: [TCA_GRED_VQ_LIST] [TCA_GRED_VQ_ENTRY] [TCA_GRED_VQ_DP] [TCA_GRED_VQ_FLAGS] [TCA_GRED_VQ_STAT_] [TCA_GRED_VQ_ENTRY] [TCA_GRED_VQ_DP] [TCA_GRED_VQ_FLAGS] [TCA_GRED_VQ_STAT_] [TCA_GRED_VQ_ENTRY] ... Statistics are dump only. Patch 4 switches the byte counts to be 64 bit, and patch 5 introduces the new stats attributes for dump. Patch 6 switches RED flags to be per-virtual queue, and patch 7 allows them to be dumped and set at virtual queue granularity. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-16 23:08:52 -08:00
Jakub Kicinski	7211101502	net: sched: gred: allow manipulating per-DP RED flags Allow users to set and dump RED flags (ECN enabled and harddrop) on per-virtual queue basis. Validation of attributes is split from changes to make sure we won't have to undo previous operations when we find out configuration is invalid. The objective is to allow changing per-Qdisc parameters without overwriting the per-vq configured flags. Old user space will not pass the TCA_GRED_VQ_FLAGS attribute and per-Qdisc flags will always get propagated to the virtual queues. New user space which wants to make use of per-vq flags should set per-Qdisc flags to 0 and then configure per-vq flags as it sees fit. Once per-vq flags are set per-Qdisc flags can't be changed to non-zero. Vice versa - if the per-Qdisc flags are non-zero the TCA_GRED_VQ_FLAGS attribute has to either be omitted or set to the same value as per-Qdisc flags. Update per-Qdisc parameters: per-Qdisc \| per-VQ \| result 0 \| 0 \| all vq flags updated 0 \| non-0 \| error (vq flags in use) non-0 \| 0 \| -- impossible -- non-0 \| non-0 \| all vq flags updated Update per-VQ state (flags parameter not specified): no change to flags Update per-VQ state (flags parameter set): per-Qdisc \| per-VQ \| result 0 \| any \| per-vq flags updated non-0 \| 0 \| -- impossible -- non-0 \| non-0 \| error (per-Qdisc flags in use) Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-16 23:08:51 -08:00

... 3 4 5 6 7 ...

797800 Commits