Go to file
Johannes Weiner 869712fd3d mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges
While upgrading from 4.16 to 5.2, we noticed these allocation errors in
the log of the new kernel:

  SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
    cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0
    node 0: slabs: 5, objs: 170, free: 0

        slab_out_of_memory+1
        ___slab_alloc+969
        __slab_alloc+14
        kmem_cache_alloc+346
        inet_twsk_alloc+60
        tcp_time_wait+46
        tcp_fin+206
        tcp_data_queue+2034
        tcp_rcv_state_process+784
        tcp_v6_do_rcv+405
        __release_sock+118
        tcp_close+385
        inet_release+46
        __sock_release+55
        sock_close+17
        __fput+170
        task_work_run+127
        exit_to_usermode_loop+191
        do_syscall_64+212
        entry_SYSCALL_64_after_hwframe+68

accompanied by an increase in machines going completely radio silent
under memory pressure.

One thing that changed since 4.16 is e699e2c6a6 ("net, mm: account
sock objects to kmemcg"), which made these slab caches subject to cgroup
memory accounting and control.

The problem with that is that cgroups, unlike the page allocator, do not
maintain dedicated atomic reserves.  As a cgroup's usage hovers at its
limit, atomic allocations - such as done during network rx - can fail
consistently for extended periods of time.  The kernel is not able to
operate under these conditions.

We don't want to revert the culprit patch, because it indeed tracks a
potentially substantial amount of memory used by a cgroup.

We also don't want to implement dedicated atomic reserves for cgroups.
There is no point in keeping a fixed margin of unused bytes in the
cgroup's memory budget to accomodate a consumer that is impossible to
predict - we'd be wasting memory and get into configuration headaches,
not unlike what we have going with min_free_kbytes.  We do this for
physical mem because we have to, but cgroups are an accounting game.

Instead, account these privileged allocations to the cgroup, but let
them bypass the configured limit if they have to.  This way, we get the
benefits of accounting the consumed memory and have it exert pressure on
the rest of the cgroup, but like with the page allocator, we shift the
burden of reclaimining on behalf of atomic allocations onto the regular
allocations that can block.

Link: http://lkml.kernel.org/r/20191022233708.365764-1-hannes@cmpxchg.org
Fixes: e699e2c6a6 ("net, mm: account sock objects to kmemcg")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>	[4.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-06 08:47:50 -08:00
arch powerpc fixes for 5.4 #4 2019-11-02 11:08:19 -07:00
block iocost: don't nest spin_lock_irq in ioc_weight_write() 2019-10-31 11:40:57 -06:00
certs PKCS#7: Refactor verify_pkcs7_signature() 2019-08-05 18:40:18 -04:00
crypto Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
Documentation Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-01 17:48:11 -07:00
drivers USB fixes for 5.4-rc6 2019-11-03 08:25:25 -08:00
fs ocfs2: protect extent tree in ocfs2_prepare_inode_for_write() 2019-11-06 08:47:08 -08:00
include mm: thp: handle page cache THP correctly in PageTransCompoundMap 2019-11-06 08:28:58 -08:00
init Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
ipc ipc/sem.c: convert to use built-in RCU list checking 2019-09-25 17:51:41 -07:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-01 17:48:11 -07:00
lib dump_stack: avoid the livelock of the dump_lock 2019-11-06 08:47:50 -08:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges 2019-11-06 08:47:50 -08:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-01 17:48:11 -07:00
samples samples/bpf: Add a workaround for asm_inline 2019-10-03 17:37:11 +02:00
scripts scripts/gdb: fix debugging modules compiled with hot/cold partitioning 2019-11-06 08:47:50 -08:00
security efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN 2019-10-31 09:40:21 +01:00
sound ALSA: timer: Fix mutex deadlock at releasing card 2019-10-30 22:54:56 +01:00
tools mm/gup_benchmark: fix MAP_HUGETLB case 2019-11-06 08:28:58 -08:00
usr kbuild: update compile-test header list for v5.4-rc2 2019-10-05 15:29:49 +09:00
virt kvm: call kvm_arch_destroy_vm if vm creation fails 2019-10-31 12:13:16 +01:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Modules updates for v5.4 2019-09-22 10:34:46 -07:00
.mailmap A few MIPS fixes: 2019-10-26 19:43:12 -04:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer 2019-10-10 08:12:51 -07:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS MAINTAINERS: update information for "MEMORY MANAGEMENT" 2019-11-06 08:47:50 -08:00
Makefile Linux 5.4-rc6 2019-11-03 14:07:26 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.